A new translation system unveiled by Google, the Neural Machine Translation (GNMT) framework comes close to human translators in it’s proficiency.
Not knowing the local language can be hell — but Google’s new translation software might prove to be the bilingual travel partner you’re always wanted. A recently released paper notes that Google’s Neural Machine Translation system (GNMT) reduces translation errors by an average of 60% compared to the familiar phrase-based approach. The framework is based on unsupervised deep learning technology.
Deep learning simulates the way our brains form connections and process information inside a computer. Virtual neurons are mapped out by a program, and the connections between them receive a numerical value, a “weight”. The weight determines how each of these virtual neurons treats data imputed to it — low-weight neurons recognize the basic features of data, which they feed to the heavier neurons for further processing, and so on. The end goal is to create a software that can learn to recognize patterns in data and respond to each one accordingly.
Programmers train these frameworks by feeding them data, such as digitized images or sound waves. They rely on big sets of training data and powerful computers to work effectively, which are becoming increasingly available. Deep learning has proven its worth in image and speech recognition in the past, and adapting it to translation seems like the logical next step.
And it works like a charm
GNMT draws on 16 processors to transform words into a value called “vector.” This represents how closely it relates to other words in its training database — 2.5 billion sentence pairs for English and French, and 500 million for English and Chinese. “Leaf” is more related to “tree” than to “car”, for example, and the name “George Washington” is more related to “Roosevelt” than to “Himalaya”, for example. Using the vectors of the input words, the system chooses a list of possible translations, ranked based on their probability of occurrence. Cross-checking helps improve overall accuracy.
The increased accuracy in translation happened because Google let their neural network do without much of the previous supervision from programmers. They fed the initial data, but let the computer take over from there, training itself. This approach is called unsupervised learning, and has proven to be more efficient than previous supervised learning techniques, where humans held a large measure of control on the learning process.
In a series of tests pitting the system against human translators, it came close to matching their fluency for some languages. Bilingually fluent people rated the system between 64 and 87 percent better than the previous one. While some things still slip through GNMT’s fingers, such as slang or colloquialisms, those are some solid results.
Google is already using the new system for Chinese to English translation, and plans to completely replace it’s current translation software with GNMT.