Su Tech Ennui: Still crazy after all these years...

Friday, July 4, 2008

Still crazy after all these years...

Today I found - if not a bug, then at least a problem - with an old piece of software that is so stable it hasn't been modified in several years.

I bought some music from the online service Rhapsody last night, which turned out to be a 650Mb download. Rather carelessly, I had let the charge on my laptop get low and it dropped power after about 150Mb of the download. When I reconnected and restarted the download, Rhapsody was apparently smart enough to know what I'd already received, and the second time I fetched the zip file (by hitting 'reload' on the download page), it was missing all the tracks I was supposed to have already downloaded.

Well, Rhapsody's customer support was non-existent and my only option was to repair the truncated zip file. I knew from years ago that there was a 'zipfix' utility and a little Googling showed that there are many more utilities like that now, and zip itself (the info-zip version I mean) even has a "-F" flag which would try to fix the file.

Well, to cut a long story short, four hours later I had exhausted every option and zip repair program in the world. The typical behaviour was that they would create a new file with all the right file names but zero length for each. It was unbelievable to me that the most common problem with a zip file - truncation - was not repairable with the standard tools.

The problem was that the catalogue info is written at the end of the file, and was never written to this file. Most zip code seeks to the end of the file to get this info, and from there jumps to the files you want to extract. However the zip format was defined such that you can process a zip file sequentially, eg to unpack data being read from a network stream or pipe on the fly, without having to buffer the whole file first. Well, at some point in the past, people seem to have forgotten about this and none of the code out there takes advantage of it.

I Googled for stream-based unzip code and did find a couple of implementations, but after major hassles compiling them, none of them would read this file.

What I eventually did was this: there is a utility with info-zip called "funzip" which extracts the first (and only the first) file from a zip archive, by reading it sequentially. (I think funzip may have been the starting point for the gzip program.) This was almost what I wanted and indeed it did spit out the first track to stdout when I ran it.

So I got into the code and found that at the point where it has read the first file, the file pointer is sitting at the start of the next file. So ... one "goto" later, and we're now going through the extraction process again! All that needed to be added at that point was changing the output stream to write to a different file each time instead of stdout, and bingo, 21 tracks of The Doors were recovered

It was literally 5 minutes work including download and compile time.

Once was a time my first response was to write code to fix a problem. I've been too lazy lately and too dependent on other people's code. By the way, some of those programs that didn't work cost $30.

If anyone is in the same boat, here's the fix to funzip.c:
  char next_file[256];
static int filenum = 0;
sprintf(next_file, "file%0d.mp3", ++filenum);
if ((out = fopen(next_file, "w")) == (FILE *)NULL)
goto trynext;

(Where exactly to insert these lines is left as an exercise for the reader ;-))

1 comment:

ian said...

goto statements?!?!?! you ARE crazy!