Su Tech Ennui: 2008

Friday, July 4, 2008

Still crazy after all these years...

Today I found - if not a bug, then at least a problem - with an old piece of software that is so stable it hasn't been modified in several years.

I bought some music from the online service Rhapsody last night, which turned out to be a 650Mb download. Rather carelessly, I had let the charge on my laptop get low and it dropped power after about 150Mb of the download. When I reconnected and restarted the download, Rhapsody was apparently smart enough to know what I'd already received, and the second time I fetched the zip file (by hitting 'reload' on the download page), it was missing all the tracks I was supposed to have already downloaded.

Well, Rhapsody's customer support was non-existent and my only option was to repair the truncated zip file. I knew from years ago that there was a 'zipfix' utility and a little Googling showed that there are many more utilities like that now, and zip itself (the info-zip version I mean) even has a "-F" flag which would try to fix the file.

Well, to cut a long story short, four hours later I had exhausted every option and zip repair program in the world. The typical behaviour was that they would create a new file with all the right file names but zero length for each. It was unbelievable to me that the most common problem with a zip file - truncation - was not repairable with the standard tools.

The problem was that the catalogue info is written at the end of the file, and was never written to this file. Most zip code seeks to the end of the file to get this info, and from there jumps to the files you want to extract. However the zip format was defined such that you can process a zip file sequentially, eg to unpack data being read from a network stream or pipe on the fly, without having to buffer the whole file first. Well, at some point in the past, people seem to have forgotten about this and none of the code out there takes advantage of it.

I Googled for stream-based unzip code and did find a couple of implementations, but after major hassles compiling them, none of them would read this file.

What I eventually did was this: there is a utility with info-zip called "funzip" which extracts the first (and only the first) file from a zip archive, by reading it sequentially. (I think funzip may have been the starting point for the gzip program.) This was almost what I wanted and indeed it did spit out the first track to stdout when I ran it.

So I got into the code and found that at the point where it has read the first file, the file pointer is sitting at the start of the next file. So ... one "goto" later, and we're now going through the extraction process again! All that needed to be added at that point was changing the output stream to write to a different file each time instead of stdout, and bingo, 21 tracks of The Doors were recovered

It was literally 5 minutes work including download and compile time.

Once was a time my first response was to write code to fix a problem. I've been too lazy lately and too dependent on other people's code. By the way, some of those programs that didn't work cost $30.

If anyone is in the same boat, here's the fix to funzip.c:
  char next_file[256];
static int filenum = 0;
sprintf(next_file, "file%0d.mp3", ++filenum);
if ((out = fopen(next_file, "w")) == (FILE *)NULL)
goto trynext;

(Where exactly to insert these lines is left as an exercise for the reader ;-))

Friday, April 11, 2008

Semi-literate programming

Don Knuth had a problem. He wanted to write a book on programming algorithms, but wasn't happy with the tools available. So he wroteET X   - a major piece of typesetting software that not only has a huge following among people who appreciate good typesetting practises, but which also has had its core algorithms ripped off by just about all the GUI typesetting packages on the market. Or at least, the better ones.

Don Knuth had another problem. He wanted to write a large typesetting package, but wasn't happy with the tools available. So he wrote WEB, a tool that basically presents your source code in a more readable way - suitable indeed for publishing as a book. His argument was (I hope I'm not mischaracterising him) that we read programs more than we write them. So his Web system made reading programs really easy, though in my humble opinion they were still damned ugly to write.

Meanwhile, on the other side of the pond, Inmos was building transputers, and they created a parallel programming language called OCCAM which, among other things, was innovative in its layout style - it made the depth of indentation significant to the compiler rather than just the reader. (A trick that has been copied by Python in more recent years)

This feature allowed them to add a tweak to their text editor, which was the ability to fold a section of code so that it wasn't all displayed at once; but if you went into a fold (I was going to say 'clicked on it', but this was before the days of rabid clicking) it would expand in situ.

Many years later, and we have the World Wide Web (as opposed to Knuth's WEB) and we're still struggling to publish software in a way that's easy to read online. Knuth's stuff is great for books, but not so hot for web pages.

And we have various tools that take source code and display them as HTML, though not all are of equal quality. Some of them seem to take the approach that if you can distinguish a syntactic feature, you should do so, in as bright a colour as possible!)

And finally we get to the meat of today's post. Borrowing a little from Knuth, and a little from Inmos, I've modified my C to HTML filter so that it adds folds to the HTML view of a program. Although it's technically possible to do so automatically as Inmos did with OCCAM, it's not desirable because doing so would fold at syntactic units rather than logical groups. So I have taken the old school approach of inserting markup into the source file for the benefit of the folding code.

I've tried to so so fairly unobtrusively, with markers that don't grate on the sensibilities too much when you're editing the raw text; and they're hidden completely in the HTML view. After I wrote this I discovered that there is a sort of a standard for folding markup ("{{{" and "}}}") but rather than go back and change my code to the standard, I'm going to stick with my first idea for now. I may make that change later if I want to use a folding editor on my sources, but for now I'ld rather have the aesthetics of my own scheme which I think is less intrusive.

Here's my current programming project: it's about 2000 lines of C, but displays as a single page. My rule of thumb in selecting what to mark is to fold about a screen's worth at a time - pretty much the same rule we used to use when writing procedures with a regular text editor, except now our screens are significantly larger.

I have to say I like the way it has come out. It doesn't have all of Knuth's tricks, such as the ability to reorder program fragments, but I think the one thing it does, it does well, and gives most of the benefit that came from Knuth's WEB - the ability to read the overall structure and a textual description of an algorithm, and only push down into the code if you really need to.

This was an easy hack to add to a C to HTML filter, and I've already done the same trick for PL/SQL. Duplicating it for other languages should be quite easy too.

Wednesday, February 6, 2008

Formatting documents for the Amazon Kindle

Today's Blog post is going to be a little different. Instead of preparing an article in advance, I'm going to treat this entry like a Wiki page and update it continuously as I make new discoveries…

(For the moment, I've still to add some links to example pages and a couple of screen shots.)

We were lucky enough to get our hands on a Kindle quite early, but as I mentioned in another post, we're paying the Early Adopter Tax because the facilities in the Kindle are quite limited. (Bottom line if you're interested in getting one, as an e-book reader its merely adequate, but as an always-on internet device with no usage charges, it's a winner. Although I qualify that latter statement by adding that you'ld need to get about 2 years of usage out of it to make it financially viable against Sprint service on a Palm Treo 650… With the Internet access on the Kindle being flagged as 'experimental' it's not clear that it will remain free indefinitely, if they get cold feet over people actually using it for internet and not just for downloading books…)

First, some basic facts. The screen is 600 x 800 pixels at a physical size of 3.6in x 4.8 inches (no hardware border — the pixels go right up to the edge), giving a resolution of 166.66dpi. It only supports 2-bit grey scale (white, light grey, dark grey and black) even though the browser tells the web server that it supports 32bit color! Images are transcoded on the fly by Amazon's proxy, which all traffic goes through (meaning that they could later shut down free internet access if it starts costing them too much). There's no Javascript, so no way of asking the browser what the screen resolution is.

Now, it has to be said that text on the Kindle looks OK, but with only one font supported in the Browser, it's not going to allow much creativity in page design, especially if you want to offer free eBooks via HTML. The only supported font is Caecilia, which is a slab-serif face. You might be aware of a font called Rockwell — it's in that style. The font comes with only the ISO Latin1 character set glyphs, with bold and/or italic supported as would be expected.

There are a limited number of sizes supported, although interestingly there appears to be a bug in the HTML rendering, because HTML defines 7 font size names for use in CSS: xxsmall, xsmall, small, medium, large, xlarge and xxlarge. These are supposed to map to the font sizes 1 through 7 (as in <font size=7>). However it seems that xsmall and small are the same font size but with marginally different letter spacing, and that all the following sizes are 1 less than they should be. This means that xxlarge is really size 6, with size 7 not being accessible using CSS — only with <span ...>. This means in effect that 8 sizes of type are supported.

So... the first discovery about the Kindle is that ebooks look a lot nicer than web pages. For example, ebooks are usually justified and often have hyphenation enabled, but the web browser doesn't support justification and of course HTML doesn't have any automatic hyphenation.

What's worse is that the HTML browser is very limited by default, although there's an option that the user can manually turn on to enable a few extra features. Justification, however, is not one of them.

So, at this point, I started examining what was possible under the 'advanced' version of the browser. One critical feature that was enabled was the use of <div> to position text at absolute coordinates. With this turned on, it's possible to simulate full justification by microspacing between words.

By the way, if you don't need to do justi­fi­cation, there's an easy hack to do hyph­en­ation that doesn't req­uire pre-form­atting the text: just pre-process your HTML to in­sert the soft-hy­phen char­acters at hy­phen­ation points in every word. (Hope­fully this para­graph will con­tain at least one hy­phen that came from in­sert­ing a prof­lig­acy of soft hy­phen char­acters and long­ish words...)

One downside of pre-formatting text nicely inET X is that it's not scalable. If the Kindle didn't allow you to change the font size, that wouldn't be a problem, but it does. There's a hidden menu where you can change the default font size. So if you do change the default size, any documents created using absolute positioning will fall apart. So documents will need an intro page to detect if the font size has been changed. It's on my to-do list. (There's no javascript and no way of detecting it except by asking the user; similarly you can't find out the screen size in pixels — you just have to know it, and hope that later Kindles don't change it)

I've been looking at usingET X to do this justification, which requires writing a dvi to html driver that positions text absolutely. Which meant that I had to find font metric files for the Caecilia fonts that HTML uses. I believe they're actually on the Kindle, but I don't have a cable to download them so I poked around the net until I found something close enough although not quite the exact same font. Then I spent an evening of trial and error tweaking theET X font sizes until the spacing exactly matched the text in HTML.

What remains to be done is to build aET X base file that uses the characters available in Kindle HTML only.

Incidentally the subset of ISO Latin1 that HTML supports is smaller than the subset that's supported in ebooks, even though the ebook format is basically just HTML too.

Since this doesn't work in the basic browser with the extensions not turned on, we need a way to detect the browser style so that we can advise the user to change mode. Since the browser doesn't support Javascript, this was quite a challenge. I eventually solved it by designing a page that uses divs to overlay some extra text only when the browser is in advanced mode. The extra text contains the link that takes you to the nicely formatted document; otherwise you're taken to a page with instructions on how to enable advanced mode. (By the way that link will only work on a Kindle. I need to add even more HTML hi-jinks to provide a version that will work on a regular web browser as well. It'll make sense when I include the screenshot later this week...)

Here's a screenshot of the intro page to a test book in 'dumb' mode

[insert scan here]

(by the way I do know that the Kindle can save screenshots to SD memory — I just put it under the scanner for fun, since you can't scan traditional displays!)

and here is the same page in 'advanced' mode

[insert scan here]

And finally, here's some tech info that will come in useful for anyone working on this device:

[insert table of pixel and font sizes]

Also to add: page showing HTML features that work in browser (distinct from table of entity references).

Mention the lack of left and right single and double quotes, and ways to hack around that; also em-dashes, and the various tricks around that (superscripted underlines, or rules, or overlapping minuses).

Point out the trick used to typeset the wordET X here, and show how it falls apart when you use view/text size/larger to increase the font size — same problem as on the Kindle. So you have to do the font size switching *outside* of the browser, at the server, and always leave the browser in the default font size.

Also point out that justification in the Kindle (and most web browsers) is poor, and thatET X does a far better job, although in regular browsers you can combine the justification and the soft hyphenation trick to get quite a readable page.

(And add a comment about the appalling physical layout and the accidental pressing of all the side buttons — leaving one awkward area where it can be held safely, which rapidly leads to dirty fingerprints in that area)

Monday, January 21, 2008

Good customer service experience with Netflix

The overall theme of this blog is that there's really no excuse for crap software and the needlessly wasted hours we spend working around it. So I felt a blog post coming on this morning when I discovered that Netflix "Watch Now" wasn't working on my laptop, and I had spent several hours researching the problem on the net, and of course on the Netflix web site.

Getting nowhere, I finally gave up and tried the last resort of all techies - calling the support line (something you never expect to work...) and of course, what should happen but an hour and a quarter later I hung up the phone, tired of waiting for someone to pick up.

Well, while researching the problem, I had accidentally come across the blog of Netflix's CIO, and I'd noticed that a couple of people had posted requests there for help with netflix problems and appeared to have received satisfactory help. So somewhat apologetically I did the same because I'd pretty much spent the day on this problem (my day off work - MLK) and was getting somewhat frustrated.

I was much surprised and delighted when later that afternoon Netflix called me and a very competant young man, Matthew, worked out exactly what my problem was and pointed me at the solution.

Good work, and credit due to Netflix.

Now, for anyone else who bought the Walmart "Black Friday" Acer portable (model 5315-2153) who finds that they get a blank screen when watching Netflix Watch Instantly (and for the benefit of other Googlers looking for a solution, 'black screen' when watching 'Netflix Watch Now'), here's the cause and the solution:

At some point you've updated your Intel 965 Chipset driver, and picked up the October 2007 one. This breaks Watch instantly and apparently some other things such as Second Life!

A later driver (10th Jan 2008) can be found as follows:

Start with Googleing for upgrade intel 965 driver acer aspire 5315 which will take you to an Intel support page.

Select the drop-downs: graphics/laptop graphics controllers/965 express chipset family/advanced windows users

This takes you to a form for email requests - but we're only interested in the link to "download" which is above the form.

Select "Windows vista home basic 32 bit", and download "Intel graphics
media accelerator driver for windows vista 32 (exe)

Running this exe (as Administrator) installs smoothly, and on reboot, Netflix Watch Instantly is now working just fine.

Kudos to netflix for knowing the problem; minus several thousand brownie points to Intel (and Acer) for releasing a driver update that breaks stuff.

PS Matthew explained that the delays in answering the support line were uncharacteristic and due to a huge influx of new folks all trying Watch Instantly for the first time.