Su Tech Ennui: Quick hack #17 in a series of 42: inlining LaTeX "\newcommand" macros

Sunday, November 4, 2007

Quick hack #17 in a series of 42: inlining LaTeX "\newcommand" macros

A poster on the TeXhax mailing list asked how he could pre-process his LaTeX file to remove his own macros (created using \newcommand) because his typesetter for some reason does not allow user-defined macros. It turns out that TeX/LaTeX doesn't have the ability to output the de-macroed source, so I hacked up a macro processor in C that's compatible with LaTeX's syntax and leaves anything it doesn't recognise untouched.

I was quite surprised how hard it was to get right. It took me about 3 hours to write rather than the 30 minutes I expected, and here's why: in previous code where I've needed to do macro expansion, I've done it as a filter on reading the source stream, and as a result, if a macro definition included a call to another macro, it was trivially expanded on the fly before being fed into the new definition. This implementation falls out in the wash with no explicit effort, and for the sort of things I've needed it for in the past (such as an alias mechanism for a command-line shell, or for a cpp-like source-code preprocessor) it's actually what you want, because aliases chain correctly when redefined (eg alias ls = ls -l - you don't need to worry if ls is already an alias or not).

However for TeX's mechanism, the generated text is then reprocessed for recursive macro expansion. This is useful in TeX because it lets you use a macro to generate other macros, but it does complicate the business of writing the macro expansion code.

The solution I used was one that I had first come across in 1977 in an implementation of the POP2 language, called "Wonderpop". Wonderpop treated its input stream as a list like any other list. The last element of the list was a function call to fetch data from the input file (or console), and if that was all that was left in the list, then it was just like normal I/O ... but if you joined some other elements in front of the list, the next time you read from it, you'ld get those elements before you'ld get the remaining data in the source file. In C terms it was something like ungetc(), but far more powerful because the list could be manipulated. As well as pushing text back on the head of the input stream, you could pull out the Nth item from the input stream, and if that item had not been read yet, the next N items from the input stream would be fetched and converted to list cells, and the object at the end of the list would be the lazy evaluation function which would read the rest of the stream only when needed.

In fact, I believe you could append items to the end of these lists (though I hadn't ever done that myself) - in which case, once the function to fetch new data had exhausted the real file, it would return items from the appended list elements.

Wonderpop made great use of this modifiable input stream facility, by allowing the user to modify PROGLIST, which was a stream containing the source of your program itself. By using syntax-directed macros, you could add new keywords to the language which were implemented by manipulating your source code as a list. For example, if the base language only supported LOOP/IF ... BREAK/ENDIF, you could build your own implementation of WHILE or FOR on top of it. This made Wonderpop into a user-extensible language for very little added complexity in the compiler. Very cool.

So a simple version of the POP2 model was what I used for this TeX-like preprocessor: it has a large buffer in front of the actual stream object, to which you can push data back and have it be re-read before you get to the following data from stdin. And of course to avoid multiple buffer copies (shunting up data after a read if you didn't want your array to grow indefinitely), it uses a cyclic buffer. One advantage of a cyclic buffer that I hadn't thought of before I started implementing the code was that you could push back items to either end of the buffer, so they would be read either instantly or after the current expansion was finished. Turned out to be useful and necessary.

Being a quick hack so that the guy could get his document printed, I haven't made it ultra robust yet, but with the exception of a couple of deliberate shortcuts that I already know about, I think the code turned out pretty nicely. Here it is: http://www.gtoal.com/src/newcommand/newcommand.c.html

G

No comments: