In a way, it’s shocking that machine translation–the computer-aided translation of human languages–hasn’t been perfected yet. Philosophers and mathematicians have been proposing its possibility since before linguistics was a formal discipline, and, as Spencer Ackerman at Danger Room notes, researchers were making bold and specific claims about the imminence of effective machine translation in the mid 1950s, when computers still depended on punchcards.
Processing power has since reached comparatively stratospheric levels. And yet machine translation, to be blunt, is still pretty terrible. Services like Google Translate are impressive mainly because their predecessors were so ineffective. Asking Google to translate a webpage or a block of text will usually net you something good enough to provide a gist, but that’s it.
It’s no surprise, then, that universal language translation is something that the Pentagon is hungry for. It is a surprise that the DoD, via DARPA, seems to think that it’s an attainable goal in the near term. DARPA’s 2012 budget request (PDF) contains the following initiative, for which $15 million will be earmarked in 2012:
The Boundless Operational Language Translation (BOLT) program will enable communication regardless of medium (voice or text), and genre (conversation, chat, or messaging) through expansion of language translation capabilities, humanmachine multimodal dialogue, and language generation. The BOLT program will enable warfighters and military/government personnel to readily communicate with coalition partners and local populations and will enhance intelligence through better exploitation of all language sources including messaging and conversations. The program will also enable sophisticated search of stored language information and analysis of the information by increasing the capability of machines for deep language comprehension.
Bureaucratic language has a tendency to flatten even the most ambitious claims, so let’s unpack this a little. DARPA is planning a universal translation technology that does the following:
- Translates text
- Translates voice messages
- Understands colloquial errors as well as incorrect and incomplete syntax
- Interprets poor pronunciation.
As is so often the case with DARPA’s plans, this sounds impossible–or at the very least, implausible–in the near future. There’s also the issue of redundancy: private companies have been hard at work on machine translation for years, and have invested millions of dollars with varying degrees of success.
Google recently released a smartphone app that bridges its voice recognition and translation services, allowing people to do something like what is described in this proposal, albeit not to the standard laid out for BOLT. I find it hard to imagine that a team of engineers could even replicate Google’s present successes for $15 million, much less something significantly more capable.
To dismiss BOLT as a pipe dream, though, is a mistake. A year of research and a chunk of change won’t summon perfect battlefield translation devices into existence, but may provide valuable insights and translation techniques to bring dream of universal, instant and good translation into reality.
Google Translate has been built an effective but narrow set of techniques. Google, instead of attempting to construct a translation ruleset from scratch, has allowed its computers to deduce them on their own. According to the company:
[The computers learn rules] by analyzing millions and millions of documents that have already been translated by human translators. These translated texts come from books, organizations like the UN and websites from all around the world. Our computers scan these texts looking for statistically significant patterns — that is to say, patterns between the translation and the original text that are unlikely to occur by chance. Once the computer finds a pattern, it can use this pattern to translate similar texts in the future.
This has worked out pretty well! Google Translate service has reached levels of accuracy far beyond earliers services like AltaVista’s (now Yahoo!’s) Babelfish. It’s moderately effective at producing results from flat, syntactical text, like the content of a news article or an email–the kind of stuff that people most commonly want Google to translate, and which the company can then sell the most ads against. The translations are still conspicuously broken, but they’re good enough for plenty of uses.
BOLT seems to be less focused on creating a massive database of words and rules, and more focused on the thornier problems of translation; a day when you can enter a news article into Google Translate and be returned a perfect translation is easily conceivable and likely to occur soon, with or without DARPA’s help. But a day when you can hold up a device to a panicked villager’s mouth in a war-torn area of the world, and convert his colloquial, unusually inflected and regionally pronounced speech into usable data is still over the horizon.
New research into techniques for deep comprehension, inference and subtle contextual clues–the truly hard problems of machine translation–won’t net usable results immediately. Or maybe even soon! But even as a complement to existing technologies rather than a replacement, a fresh and novel attempt at a true universal translator can’t hurt. This thing is long overdue.
Top image depicts the output of the MIT-designed Whirlwind translation computer, as printed in the January 1956 issue of Scientific American. Accessed on the fantastic ModernMechanix.com