scd2html - generate HTML from scdoc source files
As hinted at in the last status update, I felt compelled to create scd2html. There are obviously other options for creating HTML versions of man pages, so I figured it would be worth writing down how I ended up here. There are a few things to be learned along the way, about roff, about scdoc, and - in a plot twist that I bet you did not see coming - about web assembly.
The problem space
The problem to be solved can be stated as follows. Find tools that allow me to write man pages for my projects with the following constraints:
- Have a single source file in an easy syntax
- Generate a decent roff output to be viewed with man
- Generate a decent HTML output to be viewed in browsers
Note that these constraints are not well-defined: words like easy and decent leave plenty of room for interpretation. But that’s what I started with.
There are two obvious different approaches here. Either use a source format that can be converted to both roff and HTML, or convert to roff first and convert the result to HTML.
The latter approach is often used, but has issues. They stem from the fact that
roff is a presentation layer, and the HTML looses most of the document
structure. Take for example lists: in roff, list items are essentially
indented text preceded by a bullet point that’s slightly less indented. That is
why mandoc
, one of the most popular converters from roff to HTML, does not
generate any actual HTML lists (you can inspect for example the source of the
lists section in scdoc(5)). Not only is that an accessibility issue, it
even breaks the presentation if a list item spans multiple lines (you can see
it by looking at the same section while making the window very narrow).
At some point I was almost willing to accept this, as I really liked the HTML
that pandoc generates from roff. However, I
started looking into using Pandoc’s autolink_bare_uris
extension to generate
clickable links in the HTML output. Unfortunately, I discovered that they take
a very “liberal” approach to parsing email addresses. That approach
essentially rules out all lists.sr.ht mailing lists, so I decided that I did
not want to work with Pandoc any more.
So I would prefer to generate the HTML from something that is not roff. I have tried AsciiDoc, but it does a bit too much. I don’t really like neither the syntax nor the default HTML output (which e.g. insists on including JS). Apparently AsciiDoc even added some Markdown compatibility now, but in my opinion that just makes it even more confusing.
In praise of: scdoc
I had already started using scdoc for some projects. In my opinion, it’s amazing. Both the language and the tool are laser-focused on a single task: provide a simple syntax and generate man pages from it. The syntax takes cues from Markdown, making it easy to remember the basics. Where it deviates (e.g. tables), it can be explained in a few paragraphs.
Writing something that generates HTML from scdoc files was an obvious candidate. The only “problem” was that the code has the same laser-focus on its single job. I recommend that you read it, really. Even though some of the syntax constructs are not trivial to parse, the code looks very simple. When trying to change anything, though, you realize the complex machinery that it adds up to. Nothing can just be removed, and adding something is more difficult than it initially looks.
Enter scd2html
As you already know, I ended up doing it anyways. That it worked is mostly owed to the clarity of the original code, my acceptance of the fact that the new code is much uglier, and to some extend a compromise on features to not mess it up even more.
The features that scd2html
brings are:
- scdoc as input format - allows me to stick to scdoc (see above)
-
Generate terse HTML - the output uses semantically meaningful tags
(
<header>
,<section>
, lists), and very little inline styling. Even unstyled, the output represents a reasonable result, except for tables. - Automatic links - a re2c-based detection of links and emails was added, automatically generating proper hyperlinks.
- Section anchors - you can link to individual sections of a man page. A feature quite necessary for larger man pages.
The styling is “take it or leave it”. A fragment can be generated by passing
the -f
flag, which can then be embedded into an HTML paged with custom
styling. Examples of the built-in style (inspired by Pandoc) can be seen
here: scdoc(1), scdoc(5), vsync(1).
And now for something completely different
After building this, I thought: wouldn’t it be nice if folks could give this a try on their own files before going forth and compiling this? And I took this as an excuse to play with something I had long since been wanting to play with: emscripten.
I am not exactly a fan of JavaScript or WebAssembly, but the idea of running C
code in the browser was interesting enough that I wanted to give it a try to
understand how it works. Turns out, it’s fairly straight-forward. I added a few
commits to a wasm branch if you are interested. You can find the result
here: bitfehler.net/scd2html. You can simply
open a scdoc file from your computer and it will display the HTML generated by
scd2html
.
I don’t think this a suitable means of distributing software, but it certainly does make for an interesting toy…
That’s it. As always, I’d love to see your feedback in my public inbox or
find me in the #sr.ht.watercooler
IRC channel!