Saturday, June 17, 2006

Google Botches Shakespeare eBooks

Google has launched a new Shakespeare site with digitizations of his works. The Google ebooks I examined all appear to be based on print books from the Harvard library.

Many of these Google ebooks have serious problems due to the conversion from print to digital. I'm a bit surprised that Google didn't check and edit the results for some of the more prominent titles before going public. Didn't they have anyone click through the pages to find obvious problems? Apparently not. It's very shoddy work in some cases. At the bottom of each page is the proud motto, "Digitized by Google." If I owned the company, I wouldn't want my name associated with some of this work.

Let's start with Hamlet. Here is a screenshot of one of the problem pages:

Google Hamlet

On the right are smeared words, and a couple of lines of type on the bottom are distorted (very common among the books I checked that have problems). You can read the distorted text in many cases, but Google includes a "search inside the book" feature. If you try to find the phrases from much of the distorted text, the search feature won't find them!

Let's move on to Julius Caesar. Here is a screenshot:

Google Julius Caesar

The words along the right margin are cut off! The reason of course is because that is the middle of the page where the binding is, and the mechanism used to scan the pages didn't or couldn't flatten the page enough to read all of the type. This is a right-hand page; the same problem exists for left-hand pages as well. Needless to say, this interferes with the search feature as well.

I'll finish with the Merry Wives of Windsor. Here is a screenshot:

Google Merry Wives of Windsor

Would you want your company's name on this product? Blurry, unreadable type that can't be found with the search engine, and who would want to try to read it anyway.

To be fair, not all the books I checked had problems (not that I looked at every page). Problem books include:
  • Othello
  • Macbeth
  • Comedy of Errors
  • Antony and Cleopatra
  • Love's Labor's Lost

Books I didn't see anything seriously wrong include:
  • Romeo & Juliet
  • King Lear
  • The Tempest
  • Richard II
  • The Merchant of Venice
  • Timon of Athens
  • The Taming of the Shrew

A little effort in terms of quality control or simply having a Google employee click through books to find obvious problems would have gone a long way. But Google apparently wants you the reader to do the job for them free of charge. At the bottom of every page is a link to notify Google of any problems, such as the ones I've described above.

I guess they're saying it's your job to notify them of bad pages; they're not going to take the initiative themselves.

I'm really starting to worry about Google.

tags: , , , , ,

2 comments:

victor@bookyards.com said...

You sum up the problems of Google Books perfectly. I have been involved in producing ebooks on the web for a number of years. (see http://www.bookyards.com ). When Google announced their digital project, I felt that there was no longer any purpose in doing our digital project. This perception was wrong....Google's project is too ambitious, resulting in the confusion and mess that it finds itself presently in.

If you want Shakespeare, just go to
http://www.bookyards.com/search_results.html?type=books&author_id=596&author_name=Shakespeare%2C%20William

We have also compiled a good collection of other digital libraries with books available for downloading. Just go to Bookyards “Library Collections - E Books” at http://www.bookyards.com/links.html?type=links&category_id=1780
There are approximately 350 digital libraries separated alphabetically and by category, with over 200,000 ebooks

Jill Hurst-Wahl said...

Thanks for this detailed post. I also chronicled some of the problems with Google's scanning activities last fall here.