Saturday, December 29, 2007

The Art of Text Digitization


Recently I completed the digitization of Eugene Oneguine [Onegin]: A Romance of Russian Life in Verse by Aleksandr Pushkin, translated into English by Henry Spalding, published in 1881. This text version of Spalding’s translation is now available on the Project Gutenberg site.

Spalding’s translation of Pushkin’s most famous work, Eugene Onegin, was the first published in English and beat all other attempts by about half a century. It’s not considered a great translation, but then neither are most of the others. Pushkin's poetry presents many problems for translators.

Spalding apparently learned Russian while stationed at the British Embassy in St. Petersburg. His translations include Suvoroff, Khiva and Turkestan, On the Island of Saghalin, and The Tale of Frithiof (translated from Swedish). Spalding retired from the 104th Foot Regiment (Bengal Fusiliers) in 1880 with the rank of Major.

In Vladimir Nabokov’s commentary to his own English translation of Onegin (1964), he calls Spalding “bluff Spalding” and “Matter-of-fact Lt.-Col. Spalding.” It was the controversy surrounding Nabokov’s translation that first interested me in the poem many years ago. Nabokov decided that absolute literalness could not be retained alongside Pushkin’s rhyme and melody, so he translated Onegin with a rigorous literal exactness and jettisoned the rhyme. This led to Nabokov’s friend Edmund Wilson writing a scathing review of the translation in the New York Review of Books, although given his limited knowledge of Russian, Wilson was in no position to criticize Nabokov. The two exchanged rebuttals and were never friends again. Many believe Wilson became jealous over the success of Nabokov’s book Lolita, while his own Memoirs of Hecate County went virtually unnoticed.

I began this project a couple years ago. At that time, Spalding’s book was available in just a couple dozen libraries. The only way to read it was to somehow find a copy for sale, probably at an expensive price due to its scarcity, or borrow it through interlibrary loan—and a lot of libraries won’t loan such an old book. Spalding’s translation was all but lost to the world outside the handful of libraries that owned it. I realized that I was in a position to resurrect Spalding’s translation back from the dead.

But my interest in the project waned as I involved myself in other writings. Within the past few months, I noticed Spalding’s book had been digitized as part of the Google Book Library project, and the entire scanned book was freely available to anyone. Sadly, but predictably, not all the pages were scanned accurately, so some of the text is missing in the Google edition. But it did make me wonder about the future of Project Gutenberg.

The digitization of the world’s literature has begun in earnest. Here are the current numbers for the bigger players, as best as I can guess:

Google Books 2,000,000
Microsoft Live Books 1,500,000
Amazon 400,000
Internet Archive 315,000
Project Gutenberg 20,000

In the past year or so, we have seen the introduction of e-book readers by two major players—Sony and Amazon. Sony’s bookstore offers about 25,000 books for sale that can be read by its Reader; Amazon offers about 90,000. More books can be had in the Sony format on the internet—Amazon’s Kindle-formatted books are now following. The wave of the future? Yeah, but is the future now or at some point down the road, that’s the only question. Will this generation embrace the digital book or will it have to wait before any real romance develops between readers and e-books? So far, there hasn’t been a generation raised on e-books.

Digitization initiatives such as Project Gutenberg, Internet Archive, Google Books, and Microsoft’s Live Search Books have sprung to life many thousands of books hidden away in the catacombs of a few libraries. Once digitized, these books become available to the entire world. Spalding’s Onegin, basically lost to the greater world of scholars for over a century, is now available to all on the internet in a digitized version. It’s something like the way we’ve been losing and finding ancient works after many years—Archimedes’ Palimpsest, for example, although these books were never lost, but were not widely available until right now.