[darcs-users] Linux in darcs: Repository not updated?

David Roundy droundy at abridgegame.org
Sat Feb 5 13:26:11 UTC 2005


On Fri, Feb 04, 2005 at 03:58:21PM +0100, Jan-Benedict Glaw wrote:
> On Fri, 2005-02-04 09:00:40 -0500, David Roundy <droundy at abridgegame.org>
> wrote in message <20050204140033.GF5279 at abridgegame.org>:
> > I'm imagining the output as something like (this is just a sketch)
> > 
> > struct darcs_stat {
> >   u32 objtype; // file/directory/symlink/doesn't exist
> >   u64 ctime;
> >   u64 mtime;
> >   u32 file_length; // or is this not returned by stat?
> >   // I can't think what else at the moment.
> > };
> > 
> > Perhaps just ints would be good, but we'd certainly have to be careful on
> > these integer types to make sure I can line them up with the haskell code.
> 
> What should the u64 for times contain? Something like VMS? ...or a
> time_t in the MSBs and usec/nsec in the LSBs?

The u64 (and I was just guessing that a u64 would work) just needs to store
whatever time information the OS stores in any format.  All darcs does is
checks if two files have identical modification times.  When it writes to
_darcs/current/, if the file is identical to the one in the working
directory it sets its mtime to be identical to that of hte one in the
working directory (to save time when running whatsnew later).  So as long
as we can read, write and compare mtimes losslessly, we're good.  We don't
really need the ctime, but I think you can't write the mtime without also
writing the ctime.

> > I'd also need a settimes call, and eventually would like to have a call to
> > read a symlink's contents.
> 
> mtime, ctime and atime?
> 
> Reading a symlink's contents is reading the file name it points to?
> Also, how do we want to handle this on Win32? IIRC, it'll create .lnk
> files for it's "symlinks". Shall we internally just use the name without
> the trailing .lnk? Or keep the suffix? From my memories, the windows VFS
> API is a hugh mess...  Recording this suffix (and re-creating it
> possibly again) will be a hugh mess if you replay the patch on a Un*x
> system as well as on Windows...

Yes, I mean reading the file name it points to.  We could just omit symlink
support on windows if necesary, although I think we need to know that they
aren't real files--that's why I put "doesn't exist" as an option for the
enumeration for filesystem object types--it's where we'd stick things that
we can't handle.  Support for reading symlinks is definitely a future
feature.

> > I think ideally I might want these calls to accept the file name as a char
> > * to a non-null-terminated string together with a string length, since this
> > would work best with darcs' internal format for strings (which isn't null
> > terminated, so we can take substrings cheaply).
> 
> No problem with that. We'll then have to re-allocate for all file names
> within the C part to get something \0 terminated.

I was thinking that perhaps we could just have a static buffer (and check
that it's big enough.  If we're going to have to call malloc every time,
perhaps we may as well do that on the haskell side.  On the other hand, I'm
not really sure how expensive a malloc is when compared with a
stat--perhaps it's negligible.  On the third hand, if malloc is negligible,
then it's probably also negligible (although perhaps less so) when done on
the haskell side.

> > I'd also need a function to read the contents of a directory.  I'm not
> > quite sure how best to return the results from this.  I'm thinking you
> > could malloc an array of char, and then return the filenames as a
> > null-terminated list? I'm not sure.  Returning variable-size arguments from
> > C to haskell isn't as easy as fixed-size arguments.
> 
> Maybe something like a "directory handle" could be created to which the
> Haskell code could refer to?
> 
> DIRHANDLE get_dir_contents (char *dirname, size_t dirnamelen);
> 
> ...to be accessed by:
> int get_number_of_dir_entries (DIRHANDLE handle);
> struct darcs_stat * get_dir_entry (DIRHANDLE handle, int i);
> 
> (returning NULL for i outside bounds) and finished with
> 
> int put_dir_contents (DIRHANDLE handle);
> 
> With C code, this could be nicely used in a loop. Maybe Haskell can
> access it like this, too, with DIRHANDLE basically being a pointer? In
> Haskell, you'd be able to fill an own list with multiple calls to
> get_dir_entry(), right?

This would be reasonable, but I'm not sure... would the darcs_stat include
the filename of the entry?

I guess this way we're mallocing the darcs_stat each time we call
get_dir_entry, and it'll need to be freed explicitly on the haskell side?
That's not bad.

I'm not clear what the put_dir_contents would do.  Would it basically be
like a free_dir_contents?

It just occurred to me that when reading the filenames in a directory, we
may as well malloc them and then free them from the haskell code.  The
PackedStrings hold basically C strings, so this would eliminate any copying
of the filename, and it would only get malloced once and freed once, which
is as efficient as we can go.  So I'm imagining something like

char *get_dir_entry (DIRHANDLE handle);

which would return a malloced string that the caller is responsible for
freeing.

> > A "readfile" function might be nice, that would read a file into a
> > buffer.
> 
> You don't want to do that :-)  Even right now, the memory footprint is
> something that could proudly presented. (My understanding is that this
> is more because of the way the compiler works, not generally because the
> code is poorly designed.)

Actually, it's what we currently do.  If mmap is available (and the file
isn't user-modifiable) we mmap it, but otherwise we just read it in.  Most
of the memory usage isn't the file contents themselves, but the lists of
line endings.

> > This could come later, though, when I'm moving away from haskell IO.
> > Also, eventually, simple wrappers to output IO would be helpful, again
> > when moving away from haskell IO.  These would allow me to avoid the
> > cost of translations between haskell Strings (which are linked list of
> > 32 bit unicode characters) and arrays of char.  Also in this category
> > would be a
> 
> A linked list?! ...and in full-blown 32bit instead of something as sexy
> as UTF-8? Wow, that a design decision 8^O

Well, that's what a haskell String is, and when doing IO using the haskell
standard libraries, there's no other choice.

> Is it 32bit unicode even on Windows? IIRC, all (newer) windows internal
> APIs will use wchar_t, which is only 16bit on Windows.

Yeah, it's 32 bits everywhere.  But only 8 bits of those are non-zero,
since there aren't any implementations of haskell that support unicode...
:) Perhaps you can see why I'd like to move away from the haskell IO
libraries... the language is great, but its standard libraries leave a lot
to be desired.

> However, on the long therm, there needs to be some decision about how to
> use and store file names. IIRC, Windows offers a plain open() as well as
> one that accepts 16bit wchar_t. Also, on eg. Linux, you can use UTF-8
> encoded file names (or other encodings, as long as they don't contain \0
> or '/').

This certainly is an issue that'll need to be discussed.  My leaning is to
always deal with eight bit char * filenames.  On posixy systems filenames
are always simple byte sequences, and since any unicode translation may
fail (either one direction or the other), I'd rather just avoid it.

On the other hand, this is an area where the opinions of people who
actually use non-ACII characters in their filenames matter more than mine.

> On Linux, you may have '\\' in a filename, whereas on Windows you'll
> face some trouble...

Yeah, but that's like having filenames that differ only in case--just don't
do it if you want your repository to be portable.  We *could* add a check
that prints a warning if you try to add a file with a '\\' in its name...

> > new interface to zlib, which would eliminate the whole stupid thread and
> > pipe business, and just run the threaded IO synchronously--and wouldn't try
> > to treat the thread as a haskell Handle.  But again, this would only be
> > used after I've got the haskell code reworked to avoid using Handles.
> 
> ACK :)  But I think the zlib code can be quite enhanced even without
> doing much Haskell hacking.

But I wouldn't want to focus too much on fixing up the current thread/pipe
trick.  I just did that because I couldn't figure out any other way to
write to a compressed file as if it were a normal file.  But it's ugly, and
I'd love to see it go.

One quick(ish) improvement would be to eliminate the thread/pipe business
when opening a file for read.  We always read the entire file when reading
from a compressed file (it's always a patch, and we're about to parse it
anyways), so there's no need for all the trickery.  We can just write a
function

char *read_possibly_compressed_file(char *fname);

which returns a newly malloced C string containing the file contents.  This
would make trivial the function

gzReadFilePS :: FilePath -> IO PackedString

which is the only function that calls gzopen for read access, and the only
user of gzread.

The only catch is that the current gzReadFilePS uses a mmapped PackedString
rather than a malloced one if the file isn't actually compressed.  So
perhaps we'd want (with a better name)

struct foo {
  char *mmapped_contents;
  char *malloced_contents;
};

struct foo read_possibly_compressed_file(char *fname);

where only one of the two pointers is not null, and thus the haskell code
knows how to free it.  Alternatively, we could output a flag telling
haskell how to do the free.  Or we could even give a function pointer, but
that starts getting scary.  Passing functions around in haskell is one
thing, but passing functions from C to haskell is quite another.
-- 
David Roundy
http://www.darcs.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20050205/40bc013a/attachment.pgp 


More information about the darcs-users mailing list