Click here for a crash
course introduction into how Google Translate operates. There
are a few main points which are pertinent to our project and its goals
and which are summarized below.
Statistical Approach: no grammar
rules
Traditionally, in applied linguistics, linguists
often rely on and create grammars which describe how a language
operates: its syntax, what's morphologically permissible,
etc. Google does not utilize grammar rules but rather
translates from a statiscal approach.
Franz Jozef Och criticizes grammar based rules
here.
Corpora: big data
Google
stores droves and droves of data in differnet kinds of
language. The more corpora there is to work with for a
particular language, the more effective statistical
predictions of word combinations can be. Adam Tanner
published the article
"Google seeks world of instant translations"
on how Google Translate acquired its huge amount of data
from the United Nations. A past analysis, conducted in part by Ethan Shen, showed English to/from French to be highly accurate compared to other pathways. Perhaps the prominence of the French language in the United Nations and other government entities contributed to a better corpora, and thus more accurate statistical algorithms.Intermediary Languages:
connections
If you were translating, for example, Marathi to Hebrew, the translation would not be direct. English is predominantly used as an intermediate language. For less spoken languages, there may be further intermediate languages used in a translation. For instance, Slovak reportedly is translated into the closely related Czech language before something else. Forbes Magazine published the article "Does Google Translate Go From/To English Before Translating From/To The Language I've Chosen?."