[darcs-devel] Re: patch for lazy partial repos
David Roundy
droundy at darcs.net
Fri Apr 13 15:22:15 PDT 2007
On Thu, Apr 12, 2007 at 03:23:59PM +0100, Simon Marlow wrote:
> Max Battcher wrote:
> >This may be completely off the mark, I'm entirely ignorant of how
> >partial repositories work (and I've not had the reason to work with
> >one thus far), but does darcs need the "url symlink" patches anyway?
> >
> >Why can't darcs just take on the lazy behavior whenever a patch just
> >doesn't exist or is perhaps an empty file or some other similar
> >thing... ie, in that changes -v when it needs the information in a
> >patch and the patch isn't available couldn't it just attempt an
> >auto-pull of that patch from _darcs/prefs/defaultrepo, unless some
> >alternative is specified in an argument?
>
> This sounds like much more reasonable behaviour than auto-downloading from
> the original location that was specified with 'darcs get'.
I'm pretty well convinced. The code will be a tad uglier, but it won't be
bad. My leaning is to have a new file _darcs/prefs/patch_sources (or
something like that), which will have a list of urls from which we will try
to grab patch files. I might keep the --lazy flag to get, but make it
just add the source repo to this file and otherwise act like --partial.
Another option would be to simply make --partial do this addition.
We could also add a --full-repo=url flag that either has the same effect,
or that also adds that url to the patch_sources file.
As a related feature, it'd be nice to have a "file cache", which would hold
patch and inventory files. It could either have a fixed location (e.g.
~/.darcs/cache/) or be configurable. Then this would be the first place to
look for a missing patch file, and if the user specifies --cache-files (or
some similar flag), then they'd benefit in disk usage (on posix systems)
from file sharing, since we could just hard link from the file cache into
repositories, and they'd also avoid redundant downloads (caused by running
pull twice from the same or similar sources). In fact, it'd also make
sense to treat the current directory (the repository you're pulling into)
as a cache for the one you're pulling from, so you didn't have to download
patches that you already have, if the ones you've got are already in the
proper context (i.e. their hash matches). Hmmmm. This hashed inventory
business is cool... if only I had more time and energy to work on it!
Now I'm wondering about avoiding reparsing patch files by using weak
references and a hash table (indexed by the hashes themselves), so we don't
need to even reparse files that are already in memory, and we could
potentially also benefit from a fast compare that checks pointer equality,
although both of these would require trickier people than myself to
implement (or alternatively, patienter people).
> There are other things that could go wrong. For example, I'm used to
> pulling from a --partial repo to get a new --partial repo (e.g. the ghc-6.6
> repo used to be partial, it isn't any more).
Pulling from a --lazy repo will get you a new --lazy repo, but if either
the first --lazy repo disappears or the repo that it was gotten from
disappears you'd be hosed... in the sense that your --lazy repo would rever
to being a --partial repo.
> Personally I'd be happy if darcs said something like:
>
> cannot complete this operation because the following patch is not
> available:
> "blah blah ..."
> please use --full-repo P to specify where to fetch the patch from.
Yes, this does sound *very* good.
--
David Roundy
Department of Physics
Oregon State University
More information about the darcs-devel
mailing list