<oo><dh><translate> Characterizing Linguistic Divergences in Machine Translation
Harry Potter in TurkishHarry Potter in Turkish; The title of Book 1 translates as Harry Potter ve Felsefe Taşı.

How Google Translates

Key aspects of machine translation

Click here for a crash course introduction into how Google Translate operates. There are a few main points which are pertinent to our project and its goals and which are summarized below.
Statistical Approach: no grammar rules
Traditionally, in applied linguistics, linguists often rely on and create grammars which describe how a language operates: its syntax, what's morphologically permissible, etc. Google does not utilize grammar rules but rather translates from a statiscal approach. Franz Jozef Och criticizes grammar based rules here.
Corpora: big data
Google stores droves and droves of data in differnet kinds of language. The more corpora there is to work with for a particular language, the more effective statistical predictions of word combinations can be. Adam Tanner published the article "Google seeks world of instant translations" on how Google Translate acquired its huge amount of data from the United Nations. A past analysis, conducted in part by Ethan Shen, showed English to/from French to be highly accurate compared to other pathways. Perhaps the prominence of the French language in the United Nations and other government entities contributed to a better corpora, and thus more accurate statistical algorithms.
Intermediary Languages: connections
If you were translating, for example, Marathi to Hebrew, the translation would not be direct. English is predominantly used as an intermediate language. For less spoken languages, there may be further intermediate languages used in a translation. For instance, Slovak reportedly is translated into the closely related Czech language before something else. Forbes Magazine published the article "Does Google Translate Go From/To English Before Translating From/To The Language I've Chosen?."
Last Modified: