Monday, May 15, 2006

After Scanning Comes Translation

An article in Sunday's New York Times ("Scan This Book!") discusses book digitization projects such as Google's and the building of a universal library containing all books in all languages.

Superstar, an entrepreneurial company based in Beijing, has scanned every book from 900 university libraries in China. It has already digitized 1.3 million unique titles in Chinese, which it estimates is about half of all the books published in the Chinese language since 1949.

The Million Book Project is scanning books in Chinese, Indian, Arabic and French.

Recalling the Library of Babel Paradigm, that library faced many of the same problems we have today. On the positive side, it housed all human knowledge--and these digitization initiatives bring us closer to that goal. But the denizens of Babel were stumped when confronted with books in unknown languages:

He showed his find to a wandering decoder who told him the lines were written in Portuguese; others said they were Yiddish. Within a century, the language was established: a Samoyedic Lithuanian dialect of Guarani, with classical Arabian inflections.

Language was one of the biggest roadblocks in Babel. Therefore, as more and more books in foreign languages are digitized, the problem of adequate translations will need to be addressed. What do we do when millions of books written in Chinese and Indian are placed on the internet but few people in the Western world can read them? The more foreign-languages books become available, the greater the pressure to create translation applications that can give the English-speaking world a good idea of what these books are saying. Right now, Babelfish and other translation services are inadequate to the task.

Translation is the next big step after digitization. Because without translation, the digitization of Chinese, Indian, Arabic, Japanese, etc. is useless to those of us in the West because few of us understand those texts. As in Babel, language is a roadblock today's digital library needs to overcome on the way to the Perfect Library.

Funny that we're even talking about this, isn't it?

tags: , , , , ,

No comments: