Su Tech Ennui: Semi-literate programming

Friday, April 11, 2008

Semi-literate programming

Don Knuth had a problem. He wanted to write a book on programming algorithms, but wasn't happy with the tools available. So he wroteET X   - a major piece of typesetting software that not only has a huge following among people who appreciate good typesetting practises, but which also has had its core algorithms ripped off by just about all the GUI typesetting packages on the market. Or at least, the better ones.

Don Knuth had another problem. He wanted to write a large typesetting package, but wasn't happy with the tools available. So he wrote WEB, a tool that basically presents your source code in a more readable way - suitable indeed for publishing as a book. His argument was (I hope I'm not mischaracterising him) that we read programs more than we write them. So his Web system made reading programs really easy, though in my humble opinion they were still damned ugly to write.


Meanwhile, on the other side of the pond, Inmos was building transputers, and they created a parallel programming language called OCCAM which, among other things, was innovative in its layout style - it made the depth of indentation significant to the compiler rather than just the reader. (A trick that has been copied by Python in more recent years)

This feature allowed them to add a tweak to their text editor, which was the ability to fold a section of code so that it wasn't all displayed at once; but if you went into a fold (I was going to say 'clicked on it', but this was before the days of rabid clicking) it would expand in situ.


Many years later, and we have the World Wide Web (as opposed to Knuth's WEB) and we're still struggling to publish software in a way that's easy to read online. Knuth's stuff is great for books, but not so hot for web pages.

And we have various tools that take source code and display them as HTML, though not all are of equal quality. Some of them seem to take the approach that if you can distinguish a syntactic feature, you should do so, in as bright a colour as possible!)

And finally we get to the meat of today's post. Borrowing a little from Knuth, and a little from Inmos, I've modified my C to HTML filter so that it adds folds to the HTML view of a program. Although it's technically possible to do so automatically as Inmos did with OCCAM, it's not desirable because doing so would fold at syntactic units rather than logical groups. So I have taken the old school approach of inserting markup into the source file for the benefit of the folding code.

I've tried to so so fairly unobtrusively, with markers that don't grate on the sensibilities too much when you're editing the raw text; and they're hidden completely in the HTML view. After I wrote this I discovered that there is a sort of a standard for folding markup ("{{{" and "}}}") but rather than go back and change my code to the standard, I'm going to stick with my first idea for now. I may make that change later if I want to use a folding editor on my sources, but for now I'ld rather have the aesthetics of my own scheme which I think is less intrusive.

Here's my current programming project: it's about 2000 lines of C, but displays as a single page. My rule of thumb in selecting what to mark is to fold about a screen's worth at a time - pretty much the same rule we used to use when writing procedures with a regular text editor, except now our screens are significantly larger.

I have to say I like the way it has come out. It doesn't have all of Knuth's tricks, such as the ability to reorder program fragments, but I think the one thing it does, it does well, and gives most of the benefit that came from Knuth's WEB - the ability to read the overall structure and a textual description of an algorithm, and only push down into the code if you really need to.

This was an easy hack to add to a C to HTML filter, and I've already done the same trick for PL/SQL. Duplicating it for other languages should be quite easy too.

1 comment:

G said...

There is now a webified interface in the style of Knuth's WEB.

Having played around with it a little, I'm not sure yet if I like it. In fact I'm beginning to think that the key feature - that distingushes Knuth's WEB from pale imitators, which is its major strength - is also its major weakness... and that is the reordering of source code. If you think that code peppered with GOTOs is hard to follow, how much harder to follow is it when the source code itself is randomly ordered?

I'm also having doubts about Knuth's premise that we read code more than we edit it. That's certainly not the case for me, and having tried a little WEB (although to be fair, possibly not enough to have really appreciated it) I'm finding that I'd much rather edit something that looks like regular source code and derive the printable/documented version from that, rather than deriving the source code from the documentation.

I was going to say 'time will tell' but I think that given the number of years since WEB came out, and its tiny following, time has already told.