Saturday, April 28, 2007

Google Books' Plain Label Books

Several years ago I uploaded a plain text book to Project Gutenberg called "Chess History & Reminiscences" by H. E. Bird. Recently I wondered if the book had been digitized and was available through Google Book Search and I found that it was. The first page of the full-text copy was titled "Plain Label Books." Photo Sharing and Video Hosting at Photobucket

I wondered what this was and couldn't find anything in a google.com search. The second page with the copyright notice caught my attention:

Photo Sharing and Video Hosting at Photobucket

Many questions come to mind: Who are Plain Label Books? They are responsible for a lot of texts in Google Books. Who is the Editor, "Chumley P. Grumley"? And is he related to "Chumley P. Crumley," another Editor with the good folks at Plain Label Books? Someone has assigned ISBN numbers to this text as well (but they aren't in WorldCat). I see the words "Not copyrighted in the United States," but at the bottom right of every page is a watermark that reads "Copyrighted material." Who is copyrighting what, and where?

Looking through the text itself, I have the overwhelming suspicion that it is the exact same text that I uploaded to Gutenberg five years ago, except the introductory Gutenberg-related text has been stripped off. Nowhere do I find any explanation of the relationship between this text and the one I sent to Gutenberg. Why not? And why the bogus "Grumley" character?

A search of google.com provided no light as to the identity of Grumley or his doppelganger, Crumley. A pity, because surely they deserve at least a biographical Wikipedia page for their work in Google Books. I'd like to know something about them and their editorial philosophy.

Google's mission, as I think we have all memorized by now, is to organize the world's information and make it universally accessible and useful. But a goal such as this demands transparency, and I can't see through the first two pages of this digital book.

Looking through other books, I was distressed to see the full-text copy of This Side of Paradise. Many of the words along the inner margin were cut off during the scanning process, making the book of limited value. Once scanned, aren't the books returned to their library? But if the scanning was unsuccessful, as I think we can say in this case as well as many others, wouldn't it need to be rescanned at some point? It seems to be an inefficient process. Someone should be on hand to review the scans and determine if changes need to be made. Obviously, that isn't happening.

The Google Book project is a wonderful thing. I hope the bugs can eventually be ironed out. But if the goal is to make information accessible, the Google scanning process has a long way to go to claim success. And so the question remains whether that really is the goal.

UPDATE: See my "The Mystery of Plain Label Books Solved."

UPDATE: See my "Google Responds to Our Plain Label Books Post."

UPDATE 092908: Daniel Oldis of Plain Label Books recently sent me a note describing his fascination with Google Books. He posted every book he could find, and included some of his own published writings as well. He could download, format and convert a book to pdf in 5 minutes! Like me, he was unimpressed with the quality of Google's own scanned facsimiles. I well understand his obsession with digital books, but it does drain one and we might not see any new Plain Label Books.

No comments: