[darcs-users] Sphinx or reST (for user manual)?

Tue Oct 28 13:36:44 UTC 2008

I'm unilaterally moving this thread onto the mailing list because I
want to give any pro-Sphinx lurkers a fair chance to respond, and
because I want my rationale on record.

For those who have just joined us: Mark, Max and I have been working
on overhauling the user manual infrastructure as a prelude to doing
some serious copy editing, with the intention of increasing its
accessibility (more output formats), readability (that's the copy
editing) and maintainability (by switching from TeX to a lightweight
markup language).

So far, I've done some bulk (read: buggy) translation of src/*.tex to
restructured text (reST).  I posted the rendered output to the list
earlier this week.

    darcs get http://code.haskell.org/darcs/user-manual

Max then made some minor changes to demonstrate Sphinx, an extension
of reST that addresses some large-document issues with plain reST.

    darcs get http://repos.worldmaker.net/darcs/sphinx/

I've now had a chance to evaluate Sphinx for our user manual, and
below I make a case for sticking with plain reST, at least in the
short term.

On Tue, Oct 28, 2008 at 03:55:00AM -0400, Max Battcher wrote:
> * .. include becomes .. toctree
>
> * $Darcs_Version$ was replaced with |version| (an actual reST
>   substitution), which is pulled from the configuration file.

This was my intention all along, though I'd like to make it explicit
with this in GNUmakefile

    doc/manual/version.txt:
    	echo ".. |version|: $(DARCS_VERSION)" >$@
    clean:
        rm doc/manual/version.txt

and then simply have darcs.txt do ".. include:: version.txt".

> Anyway, I prefer the Sphinx output of a full documentation website
> over the single, giant HTML page

While I value the split-HTML output Sphinx can provide, I think the
single-file HTML output of rst2html is far from useless.  For example,
I prefer using grep(1) to non-standard search forms provided by
websites, and it's much easier to just do M-x occur in my browser, or
w3m -dump | grep if the document does not span multiple pages.

> I'd be interested in an up/down opinion on merging my changes...

Right now I don't want to apply this, for reasons I outline below.

Basically, I was interested in Sphinx for one feature: the ability to
emit multi-file HTML output, with <link rel=next/prev/toc> tags to
link the pages.

I previously assumed Sphinx did this like docbook: you handle include
directives normally, build a single intermediary XML document, then
emit an HTML page for each <chapter> in the intermediary (where the
file name is based on the chapter title).

It appears that Sphinx does not do this; instead it generates an HTML
file for each of the source documents, using a non-standard "toctree"
directive to add links between them.  Actually it's slightly worse
than that; sphinx-build will simply render ALL files (that have the
appropriate extension) in the source tree, even if they aren't listed
in a toctree.  This has the undesirable consequence of creating
HACKING.html in the output tree, though HACKING.txt is only relevant
to the technical writers (not readers).

More importantly, this non-standard directive is not understood by
tools that expect reST format.  In particular, this means that if we
use toctree we cannot easily produce

- single-file HTML via rst2html.

- an ODF word processor document via rst2odt.

- PDFs via rst2pdf.

  rst2pdf can use standard PDF fonts (Times, Helvetica) and create
  compressed PDFs, which currently HALVES the file size compared to
  rst2latex + pdflatex.

  rst2latex sometimes emits invalid LaTeX from valid reST, resulting
  in confusing bugs in the pdflatex stage.

  IMO rst2pdf output looks better out of the box, and it is arguably
  easier to customize with declarative JSON stylesheets than
  rst2latex's raw LaTeX "stylesheet".  (I'm assuming Sphinx doesn't
  improve on rst2latex's stylesheet support.)

  rst2pdf has few dependencies: itself and reportlab (both pure
  Python, thus portable), and optionally pyhyphen/wordaxe for
  hyphenation (which includes a C module).

  By comparison, to make a PDF via sphinx-build requires sphinx,
  texlive and -- due to the huge number of (mostly unused) .sty files
  included in the .tex sphinx produces -- many "extra" TeX packages.

  pdflatex also generates a heap of useless and confusing output
  unless you use a wrapper like rubber -- increasing the number of
  build dependencies.

This problem is probably surmountable by writing a small sed script to
convert toctree directives into ..include directives.  It may even be
possible to simply do something like

    .. when:: sphinx

       ..toctree:: ...

    .. unless:: sphinx

       ..include:: ...

However I'd like to take a different approach.  In short, rst2html is
adequate for our needs except for one thing: it can't produce
docbook-style split-HTML output.  Let's simply extend the stock HTML
writer class (or possibly write a second HTML writer class) with this
feature.  This means that

    1. other docutils users benefit;
    2. the aforementioned toctree issues go away; and
    3. sphinx' api-oriented features don't get in our way.

IMO split-HTML is an important feature, but not a critical one.
Therefore, we don't have to deliver this feature right away; all we
have to do immediately is produce PDF and single-HTML output that is
no worse than what we have with pure LaTeX -- and I think we can
achieve that with docutils alone.  Supporting split-HTML can be a
mid-term goal.

--------------------------------------------------

Here's some other grumbling that isn't really relevant to my
arguments above.

- Having a second makefile is just wrong, see the ACM article
  "Recursive Make Considered Harmful".  This isn't a problem with
  sphinx itself, just sphinx-quickstart.  Similarly,

  I don't like having non-documentation files in the doc/manual source
  tree (particularly conf.py and _foo).  I have determined that this
  can be obviated by a make rule like this:

  clean:
      rm -rf doc/manual/html doc/manual/conf.py
  doc/manual/html: doc/manual/*.txt
      mkdir $@
      echo >$(<D)/conf.py  "# This stub keeps sphinx happy."
      echo >>$(<D)/conf.py "# See GNUmakefile for actual config."
      sphinx-build \
        -Dversion=$(DARCS_VERSION) \
        -Dsource_suffix=.txt \
        -Dmaster_doc=darcs \
        $(<D) $@

- sphinx-build ALWAYS emits SGR (ANSI) escape sequences.  Since I
  often use dumb terminals and pure (non-tty) buffers, means I have to
  go out of my way to make sphinx-build's output readable.

- I don't like tweaking stylesheets and templates; it takes time both
  up-front and for ongoing maintenance (so it works with newer
  versions of sphinx), it tends to introduces accessibility issues,
  and tends to introduce rendering issues with buggy (i.e. all)
  browsers you forget to test with.

  And because Sphinx's default templates mention "modules", we'd have
  to make and maintain at least minimal changes to make it "right" for
  a user manual.