Future research we are interested in would be exploring further languages
in the same analysis. Additionally, we are interested in a longitudinal
analysis. For instance, Marathi was very recently added to Google
Translate's inventory of languages. Knowing that Google Translate is
supposed to improve over time based on user feedback, perhaps, we could
analyze divergences in Marathi to English outputs at intervals over time
and check for degree of improvement.
As far as improving our analysis, perhaps we could develop a schema that
considers finer details. For instance, we could identify clauses and
identify in which they are transposed (are predicates appearing before
subjects? Are relative clauses not properly conjoined with the noun they
are describing?) Another possibility could be developing a more coherent
strategy for dealing with verb phrases... should they all be identified as
a phrases or would it be better to identify them into phrases and then
chunk them into individual words, and identify errors on a word level?
Finally, while this aspect was set aside for this particular project for
reasons specified in the methodology section, perhaps we could develop a
method where we would attempt to identify multiple errors in one word, and
see how this different approach would affect our results.
It would also be interesting to characterize the types of grammatical
environments specific errors are occurring. This might shed light on what
is triggering specific errors.
Overall, we believe our analysis as is did a thorough job of answering our
initial research questions and could serve as a sound jumping off point
into asking more exploratory questions regarding machine translation and
machine learning.