November 25, 2016
One of the primary targets of the SSIX project is sentiment analysis in the financial domain across multiple languages. Work is progressing on the English front, where a three-way validated sentiment gold standard has been developed and is currently being used to train the sentiment classifier. The work on English can rely on several available resources, such as text normalization tools, polarity lexica and distributed word representations that allow the development of a sentiment classifier for English to be grounded on pre-existing resources. This should render the development process faster and the tool more powerful.
Addressing other languages, however, is a more complex issue that raises a number of questions. Resources for other languages may neither be as readily available, nor as good in quality. This raises the question whether it is possible/sufficient to rely on the resources we have for English to address sentiment classification in other languages. Suppose, as it is, in fact, the case, that we want to develop a sentiment classifier for German when we already have a working version for English. Is there a way to capitalise on the resources developed for English to create a classifier for German?
The following three approaches seem available, at least:
- Create a Gold Standard for German from scratch, manually annotate and cross review it, then train the new classifier on it. We call this the “Native” approach.
- Take the English Gold Standard, translate it (either manually or automatically) to German, and train the German classifier on it. We can call this the “Foreign” approach.
- Use machine translation to convert the German input to English and feed the English translation to the English classifier. We call this the “Direct Translation” approach.
The three approaches differ in quality, efficiency and costs. We discuss them in more detail below.
The Native Approach
Building a new Gold Standard corpus from scratch is expensive, but potentially very rewarding. The most prominent benefit is that no translation is taking place and the native expert judgments are on “first hand” data. Creating such a gold standard is both costly and time-consuming, as we need more than one annotator (at least 3) to agree on the sentiment of each piece of text in order to ensure good quality data. Considering that the sample should contain several thousands of tweets and that a domain like Finance needs judgments made by specialists, the cost may quickly skyrocket.
On the flip-side, the only variable in the Native Approach is the agreement of the annotators, provided their individual domain knowledge and familiarity with the exchange media (tweets) does not lead to vastly different sentiment scores for the same data. Due to the conditions of its design and implementation, we could assume that once available, such a gold standard would be the standard against which any other approach should be benchmarked.
The Foreign Approach
In this approach, instead of building a new corpus and annotating it manually, we use the already existing English language gold standard and translate it to German. This approach presupposes that a statement with positive sentiment in English remains positive in German and vice-versa for negative judgments. Several translation methods are available: It can either be done manually, via machine translation, or in a hybrid way, using computer aided translation tools or post-translation review by human translators. We can also take advantage of the fact that only some words are sentiment-bearing thus targeting these words in context for optimal translation and ignoring the rest.
If we use human translation, the task of creating a translated GS will be cheaper than the creation of a native GS, in that one domain expert will be enough, where previously three were needed. Surely, the cost and time decrease drastically when using machine translation, but the resulting data, especially in a technical domain such as finance, may be of lower quality. Machine translation could, for instance, systematically map an English term to a German term which is synonymous in some other domain, but which is not relevant to the financial domain.
A human-reviewed machine translation is surely the safest approach if one wants to speed up the process and keep costs limited. This may actually reveal error patterns in the translation that can be fixed in post-processing.
The Direct Translation Approach
Instead of training a new classifier on German data, we translate the German input text to English and feed it to the English classifier. Clearly, translation here can mean only machine translation, as we will be dealing with large amounts of input data to be processed in real time. This approach can also add further costs as machine translation on large amounts of data comes at a cost.
The translation-based approaches in 2 and 3 face a number of issues related to the domain and the specificity of the text involved. Spelling errors, uncommon abbreviations and rhetorical text are all extra challenges that need to be tackled. Input normalization and output optimization are strategies that can be pursued to improve the quality and accuracy of the translation. First, we may remove elements like repeated characters or delete unknown strings. During post-analysis of translated material we can map common MT mistakes to the desired output, for instance, terms that need a specific translation in the domain of reference. There is a large range of operations that can be performed – some language-specific, some more general. In this respect, GeoFluent  is specifically designed not only to support automatic translation but also in preparing the input and correcting the output of the translation process (pre- and post-processing of the data).
We are currently experimenting on the Foreign Approach, planning to benchmark it against a classifier trained over a Native gold standard for German. However, while exploring these strategies another possibility has appeared as not only viable but even promising. The results of Balahur and Turchi  show that the use of multiple languages at the same time may improve the overall classification of sentiment in the data. In particular: “a multilingual system can simply employ joint training data from different languages in a single classifier, thus making the sentiment classification straightforward, not needing any language detection software or training different classifiers.” This is a strong claim that the authors confirm with empirical results. The data we are collecting for the various languages will help us to perform further tests and eventually confirm or reject the claim.
 Multilingual sentiment analysis using machine translation?
A Balahur, M Turchi – Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis