[darcs-devel] nolinks argument
Kevin Quick
quick at sparq.org
Tue Jul 17 00:05:48 PDT 2007
Summary: after looking at things I still prefer making the copy/link
discrimination in copyLocal (i.e. at the bottom level) and ensuring
that [DarcsFlag] is passed on down through all paths even though this
has the "splatter" effect on many functions.
If after reading my comments below anyone would still like to see an
actual implementation of Eric's alternative proposal then let me know,
but if the comments below suffice then I don't need to spend the time...
> If I'm reading the code right, the hard-link code is used for at least
> two things:
>
> * fetchFile/gzFetchFile - links/copy/downloads a file into a temporary
> directory
>
> * copy_repo_patches - copying the patch files
>
> Now, would you agree that only the latter is important? (Anyway,
> linking here seems unlikely for the former, as /tmp is usually on a
> separate partition). If that is the case, maybe we could get the same
> benefits by having two versions of copyFileOrUrl, one of which supports
> NoLinks.
As it turns out, fetchFile and gzFetchFile are both used to read the
specified file into local memory, and if their argument is a file, they
will do so directly. They only call copyFileOrUrl if they know it
isn't a file, meaning these will never end up invoking copyLocal.
Thus, these paths never involve a link v.s. copy decision. I *could*
reduce the splatter a bit by not requiring [DarcsFlag] to be an
argument to fetchFile/gzFetchFile (internally supplying a null array to
copyFileOrUrl), but I that (a) assumes that no other [DarcsFlag] might
eventually be useful down below, and (b) silently replaces actual
[DarcsFlag] values with something else, which sounds like a bug hunt
nightmare at some future point. So as long as there's splatter, I'd
include these functions.
>
> The patch Kevin made is pretty straightforward stuff, just modify
> copyLocal to track [DarcsFlag]. The only annoying thing is that as a
> consequence, a lot of unrelated functions (see below) are getting
> splattered with the extra argument by their association with copyLocal
>
> Perhaps having *two* copyFileOrUrl functions would be a lesser evil than
> splattering all those poor little functions (*) What do you (all) think?
** There are several paths that end up at copyFileOrUrl/copyLocal. Not
a lot, but having the determination made at upper levels seems to add
clutter and decisions about low-level issues that don't really need to
belong at those levels. And future darcs development would have to
remember to re-implement this stuff as appropriate, with insidious bugs
(i.e. non-obvious, corner case) resulting if it didn't.
For example, HashedRepo.lhs calls copyLocal in write_either_inventory,
which has a couple of paths that lead to there; write_either_inventory
would need the copy/link logic and unfortunately it doesn't have
[DarcsFlag] info so it would need some splatter as well. Since the
copyLocal seems to work just fine, there's nothing to indicate that
anything is wrong until ultimately a user notices that their inventory
file(s) are links even though the --nolinks flag was used.
** Many of the functions in External.hs (which is where copyLocal
exists) already have [DarcsFlag] arguments, including--ironically
enough--copyLocals. Thus needing/applying [DarcsFlag] information at
that level isn't unusual.
** The [DarcsFlag] is really an environment (ReaderT monad anyone?) that
describes how darcs should work overall. Anything which might be
affected by an environment setting will need access to to that
environment. Sorry, this is sounding pedantic, but allowing the
information to flow through to where it's used seems most appropriate.
As an aside, it might be interesting to replace the IO monad usage
within darcs with a more abstract monad (e.g. "DarcsIO") which would
allow monad stacking like adding a ReaderT to hold [DarcsFlag] and
maybe other things like a StateT for the main repo or WriterT for
accumulating output changes. I'm not sure if this would improve things
or not: it might reduce the number of arguments passed around, simplify
existing structures (e.g. no need to embed [DarcsFlag] in the Repository
objects), and remove the potential for future splatter, but on the
other hand it would require familiarity with those monadic styles which
might be more obscure instead. Comments anyone?
> Moreover, I believe copyFileOrUrl has a dangerously misleading name
> because it does not make clear the fact that file could potentially
> be linked. ...
> ... I was thinking that we could mirror the copy/clone distinction and
> have a function cloneFileOrUrl. ...
Unfortunately, cloneX is used quite a bit in External.hs and it has
nothing to do with links, so at a minimum some other name would need to
be used.
Personally, I'd rather have one function whose name indicated its
overall function and look in that function for the details. Having
multiple functions with slightly different names and slightly different
functionality but which all appear to act the same from a high-level is
more confusing to me.
> > * Eric, I think you are also correct in that only darcs get is actually
> > affected by --nolinks. This seems to be because push/pull will read
> > the patches into memory, perform various commutation analysis and
> > whatnot, and then write them as new files to the output repository.
>
> I'm not sure what to think of this.
>
> I would be inclined to taking the arguments out, and adding back should
> the performance enhancements be made. I appreciate the need for
> consistency. I just worry that it might mislead the user in building
> the wrong mental model of darcs (albeit not fundmentally wrong as darcs
> could do it in principle).
I'd prefer to have the user build a mental model that darcs is really
good at things and then try to make the implementation match that
model. :-) It turns out that even darcs get will only create links
when doing an "old-style" copy; if you have a HashedRepo it will
actually read every remote patch into memory and write it as a separate
operation (assuming I'm reading the code correctly). My hunch is that
this is a slow way to do things and that we should at least be using
System.Directory.copyFile if we can't link files. This lets the
implementation of System do the most optimal file copying method
available and frees up darcs memory for more important stuff like
patch commutation and merging and whatnot (and avoiding GC cycles).
As always, I appreciate the review and feedback, and the above is
simply the results of additional investigation and opinionation on my
part. Let me know what does and doesn't sound good and which way you'd
like me to proceed.
--
Kevin Quick
quick at org after sparq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20070717/c4d72927/attachment.pgp
More information about the darcs-devel
mailing list