Google Translate and Google Voice: a mass experiment

Google Inc.Image via Wikipedia
The New York Times has what strikes me as an important article on Google’s translation service, and while it emphasizes processing power and data, there’s more here. (The section on their original data is interesting too.)

Using Computing Might, Google Improves Translation Tool – NYTimes.com

Google’s quick rise to the top echelons of the translation business is a reminder of what can happen when Google unleashes its brute-force computing power on complex problems. [….]

Creating a translation machine has long been seen as one of the toughest challenges in artificial intelligence. For decades, computer scientists tried using a rules-based approach — teaching the computer the linguistic rules of two languages and giving it the necessary dictionaries.

But in the mid-1990s, researchers began favoring a so-called statistical approach. They found that if they fed the computer thousands or millions of passages and their human-generated translations, it could learn to make accurate guesses about how to translate new texts.

It turns out that this technique, which requires huge amounts of data and lots of computing horsepower, is right up Google’s alley.

Like its rivals in the field, most notably Microsoft and I.B.M., Google has fed its translation engine with transcripts of United Nations proceedings, which are translated by humans into six languages, and those of the European Parliament, which are translated into 23. This raw material is used to train systems for the most common languages.
But Google has scoured the text of the Web, as well as data from its book scanning project and other sources, to move beyond those languages. For more obscure languages, it has released a “tool kit” that helps users with translations and then adds those texts to its database.

And down here I wonder about whether we have allowed Google to use us for massive data mining of… voices:

Google has used a similar approach — immense computing power, heaps of data and statistics — to tackle other complex problems. In 2007, for example, it began offering 800-GOOG-411, a free directory assistance service that interprets spoken requests. It allowed Google to collect the voices of millions of people so it could get better at recognizing spoken English.

A year later, Google released a search-by-voice system that was as good as those that took other companies years to build.

The Times also links to a sidebar, comparing translations for famous works. Kafka, of course, has the vermin/bug/waterbug/cockroach issue to stump computers. Worth a look.

By Paul Romaine

Paul Romaine is a grant writer and independent curator in New York City.

Leave a comment

Your email address will not be published. Required fields are marked *