[darcs-users] Applying formal descriptions to files

Max Battcher me at worldmaker.net
Thu Feb 5 19:29:44 UTC 2009


Maurício wrote:
> I'm not familiar with this, but you guys who did
> computer science surely are, since I've seen it used
> a lot (like, say, in Haskell98 report). If I
> understand properly, a 'metasyntax' allows one to
> describe unambiguously the structure of a file
> (please correct me if I'm wrong). I see this, for
> instance, among many I found in wikipedia:
> 
> http://en.wikipedia.org/wiki/Wirth_syntax_notation
> 
> (I think it looks like the one used in Haskell98,
> although with small diferences.)
> 
> What if we had the option to attach such kind of
> descriptions (can I call them "descriptions"?) to
> files under darcs control? If there exists a
> consolidated and popular such metasyntax (is
> there?), darcs could use it to "understand" the
> file, and remember changes in a structured way.
> 
> Does that make sense for someone who, unlike me,
> do understand computer science?

It's been suggested before, but there are a lot of pitfalls and the whole
thing is much harder than it seems.  Let me try my best to break it down
into sub-problems:

1) Metasyntaxes such as BNF/WSN describe "formal grammars" or "Context-Free
Grammars" (CFG), which is that they describe the syntaxes of a subset of
languages that can be described in a certain manner (context-free; which is
that parsing the current item is not dependent on the value of the last
item).

- Not all languages can be described by a CFG.  In fact, most modern
programming languages are not truly Context-Free and are often "CFG with
exceptions", and the exceptions vary from language to language and there are
no agreed upon standards for describing them...

- There are no real repositories of well-defined grammars. In most cases the
definitive grammar for a language is the one embedded into that language's
definitive compiler, and often intertwined with its code.  Even the
metasyntax of choice often varies from language to language.

2) Metasyntaxes can be used to only parse "well-formed" documents.

- Not all programming results in a well-formed document: Programmers often
work on "scraps" (work in progress code or pseudo-code fragments) that don't
compile/parse at various points in time.

3) Converting a parse tree back to text is a whole other ball-game.

- Now that you've parsed all of the whitespace out, how do you plan to put
it back in when it comes time to edit your file in a text editor?  No
metasyntax standard that I know of has any sort of generated whitespace
hinting.

There are other problems of other sorts, but I think the above covers the
big ones.  For the most part I don't think you will find it easy to define
good grammars for most of the languages you might want to work with, and
most of the languages that you _can_ define easily with a CFG have
better/easier domain models (markup languages such as XML, JSON, what have
you).

Not to say that the idea is without merit; just that it is a harder problem
then it might at first appear.  I'll save my own ideas on how it might be
done for another time...

--
--Max Battcher--
http://worldmaker.net



More information about the darcs-users mailing list