<oo><dh><translate> Characterizing Linguistic Divergences in Machine Translation
Dumbledore's Dark Secrets Revealed!
In on a government cover up by the corrupt Ministry of Magic, the wizarding press falsely attacks Albus Dumbledore!


Error ratios, graphs, and tables

1000 500 0 Divergence Count German Slovak French 50% 25% 0% Transposition Errors Errors by Part of Speech 0 50 100 150 200 250 300 adj adv conj det noun prep pro verb
[Above] The Divergence Count shows the raw number of divergences that were found in each text. Transposition Errors gives the percentage of sentences that contain word order errors. Errors by Part of Speech shows the number of errors that occured for each part of speech for each language.
In the first chapter of Harry Potter, German had 315 errors, Slovak had 1052, and French had 147. In the first chapter, German had 373 sentences, 23% of which had transposition errors, while 35% of Slovak's 346 sentences and 18% of French's 283 sentences contained transpositions errors. This was not too surprising, as German and Slovak syntax is quite different from English, much more so than French syntax. In German, indirect objects and direct objects are switched and in dependent clauses the verb appears at the end of the clause. In French, however, the only main difference in word order is in the placement of adjectives.
For each lexical category, Slovak consistently had more errors, while French almost always had the least. All the languages generally had a high number of verb errors and a lower number of adjective, adverb, and conjunction errors. This is not too surprising, since in English adverbs and adjectives do not inflect and conjunctions translate fairly easily in all three languages.
[Below] Errors by Type shows the number of each type of error for each language.
Error types were more spread out among the languages, with French having the most gender errors and German a higher percentage of person errors and semantic errors, which fall under the mistranslation category. Slovak had 355 deletions, making up more than a third of its total errors.
Errors by Type 0 50 100 150 200 250 300 350 asp cas del gen ins moo mst num per pos ten
Unsurprisely, our results consistantly show that Slovak is the most difficult of the three for google to translate. Given that Slovak is the furthest from English and has to first translate into Czech, it is completely understandable that google translate would have a hard time with it. However, we were initially expecting to see less errors in German, since both English and German are Germanic languages and share many common grammatical features. However, our research on google translate's methods brought to our attention the fact that google translate has a larger French database, allowing for better translation.
Last Modified: