Su Tech Ennui: February 2008

Wednesday, February 6, 2008

Formatting documents for the Amazon Kindle

Today's Blog post is going to be a little different. Instead of preparing an article in advance, I'm going to treat this entry like a Wiki page and update it continuously as I make new discoveries…

(For the moment, I've still to add some links to example pages and a couple of screen shots.)

We were lucky enough to get our hands on a Kindle quite early, but as I mentioned in another post, we're paying the Early Adopter Tax because the facilities in the Kindle are quite limited. (Bottom line if you're interested in getting one, as an e-book reader its merely adequate, but as an always-on internet device with no usage charges, it's a winner. Although I qualify that latter statement by adding that you'ld need to get about 2 years of usage out of it to make it financially viable against Sprint service on a Palm Treo 650… With the Internet access on the Kindle being flagged as 'experimental' it's not clear that it will remain free indefinitely, if they get cold feet over people actually using it for internet and not just for downloading books…)

First, some basic facts. The screen is 600 x 800 pixels at a physical size of 3.6in x 4.8 inches (no hardware border — the pixels go right up to the edge), giving a resolution of 166.66dpi. It only supports 2-bit grey scale (white, light grey, dark grey and black) even though the browser tells the web server that it supports 32bit color! Images are transcoded on the fly by Amazon's proxy, which all traffic goes through (meaning that they could later shut down free internet access if it starts costing them too much). There's no Javascript, so no way of asking the browser what the screen resolution is.

Now, it has to be said that text on the Kindle looks OK, but with only one font supported in the Browser, it's not going to allow much creativity in page design, especially if you want to offer free eBooks via HTML. The only supported font is Caecilia, which is a slab-serif face. You might be aware of a font called Rockwell — it's in that style. The font comes with only the ISO Latin1 character set glyphs, with bold and/or italic supported as would be expected.

There are a limited number of sizes supported, although interestingly there appears to be a bug in the HTML rendering, because HTML defines 7 font size names for use in CSS: xxsmall, xsmall, small, medium, large, xlarge and xxlarge. These are supposed to map to the font sizes 1 through 7 (as in <font size=7>). However it seems that xsmall and small are the same font size but with marginally different letter spacing, and that all the following sizes are 1 less than they should be. This means that xxlarge is really size 6, with size 7 not being accessible using CSS — only with <span ...>. This means in effect that 8 sizes of type are supported.

So... the first discovery about the Kindle is that ebooks look a lot nicer than web pages. For example, ebooks are usually justified and often have hyphenation enabled, but the web browser doesn't support justification and of course HTML doesn't have any automatic hyphenation.

What's worse is that the HTML browser is very limited by default, although there's an option that the user can manually turn on to enable a few extra features. Justification, however, is not one of them.

So, at this point, I started examining what was possible under the 'advanced' version of the browser. One critical feature that was enabled was the use of <div> to position text at absolute coordinates. With this turned on, it's possible to simulate full justification by microspacing between words.

By the way, if you don't need to do justi­fi­cation, there's an easy hack to do hyph­en­ation that doesn't req­uire pre-form­atting the text: just pre-process your HTML to in­sert the soft-hy­phen char­acters at hy­phen­ation points in every word. (Hope­fully this para­graph will con­tain at least one hy­phen that came from in­sert­ing a prof­lig­acy of soft hy­phen char­acters and long­ish words...)

One downside of pre-formatting text nicely inET X is that it's not scalable. If the Kindle didn't allow you to change the font size, that wouldn't be a problem, but it does. There's a hidden menu where you can change the default font size. So if you do change the default size, any documents created using absolute positioning will fall apart. So documents will need an intro page to detect if the font size has been changed. It's on my to-do list. (There's no javascript and no way of detecting it except by asking the user; similarly you can't find out the screen size in pixels — you just have to know it, and hope that later Kindles don't change it)

I've been looking at usingET X to do this justification, which requires writing a dvi to html driver that positions text absolutely. Which meant that I had to find font metric files for the Caecilia fonts that HTML uses. I believe they're actually on the Kindle, but I don't have a cable to download them so I poked around the net until I found something close enough although not quite the exact same font. Then I spent an evening of trial and error tweaking theET X font sizes until the spacing exactly matched the text in HTML.

What remains to be done is to build aET X base file that uses the characters available in Kindle HTML only.

Incidentally the subset of ISO Latin1 that HTML supports is smaller than the subset that's supported in ebooks, even though the ebook format is basically just HTML too.

Since this doesn't work in the basic browser with the extensions not turned on, we need a way to detect the browser style so that we can advise the user to change mode. Since the browser doesn't support Javascript, this was quite a challenge. I eventually solved it by designing a page that uses divs to overlay some extra text only when the browser is in advanced mode. The extra text contains the link that takes you to the nicely formatted document; otherwise you're taken to a page with instructions on how to enable advanced mode. (By the way that link will only work on a Kindle. I need to add even more HTML hi-jinks to provide a version that will work on a regular web browser as well. It'll make sense when I include the screenshot later this week...)

Here's a screenshot of the intro page to a test book in 'dumb' mode

[insert scan here]

(by the way I do know that the Kindle can save screenshots to SD memory — I just put it under the scanner for fun, since you can't scan traditional displays!)

and here is the same page in 'advanced' mode

[insert scan here]

And finally, here's some tech info that will come in useful for anyone working on this device:

[insert table of pixel and font sizes]

Also to add: page showing HTML features that work in browser (distinct from table of entity references).

Mention the lack of left and right single and double quotes, and ways to hack around that; also em-dashes, and the various tricks around that (superscripted underlines, or rules, or overlapping minuses).

Point out the trick used to typeset the wordET X here, and show how it falls apart when you use view/text size/larger to increase the font size — same problem as on the Kindle. So you have to do the font size switching *outside* of the browser, at the server, and always leave the browser in the default font size.

Also point out that justification in the Kindle (and most web browsers) is poor, and thatET X does a far better job, although in regular browsers you can combine the justification and the soft hyphenation trick to get quite a readable page.

(And add a comment about the appalling physical layout and the accidental pressing of all the side buttons — leaving one awkward area where it can be held safely, which rapidly leads to dirty fingerprints in that area)