May 3, 2016
Why Machine Learning?
Can computers learn how to perform complex tasks such as driving autonomous cars , spotting cancer cells  or analysing sentiment  without being specifically instructed? The answer is Yes – and Machine Learning (ML) provides the tools! Rather than relying on explicit instructions (rules), ML systems learn from examples. This allows them to discover relationships in the data and to tackle cases which go beyond what can be captured by rules. With the vast amounts of data available, today and ever increasing processing capacities, ML is gaining in power and efficiency. The quality of any ML system depends on the quality of the examples it is provided with.
In the SSIX project, we want to leverage the power of ML to assign accurate sentiment to financial microblogging messages.
Annotated data are central to supervised Machine Learning Approaches. A supervised ML algorithm learns from a provided set of examples which are of the form input – expected output (e.g. person crossing the street – slow down, “The new iPhone has a shitty battery!” – negative sentiment, etc). By analysing a large quantity of such examples, the ML algorithm creates a function linking the input and output. In the context of Natural Language Processing (NLP), an annotated dataset with ground truth output labels is often referred to as a “Gold Standard”.
There are two main ways to obtain manually annotated data :
- Crowdsourcing – (e.g. Amazon Mechanical Turk, CrowdFlower) harnesses the “wisdom of the crowd” with hundreds or thousands of anonymous workers providing annotations. With this method, large quantities of annotated data can be obtained in a short period of time. The often large number of annotators gives a complete picture by including many different views (as annotation often concerns problems which are not straightforward). Most Crowdsourcing platforms provide quality assurance mechanisms (e.g. test questions) to make sure that the task is correctly understood by the workers and carried out carefully. However, for specialist tasks, Crowdsourcing may not always be suitable as it is difficult to verify that workers have the required background knowledge and annotation quality may suffer as a result.
- Smaller-scale manual annotation – The alternative to Crowdsourcing is manual annotation by a small group. Due to the specific nature of our task and the extensive domain knowledge required, we rely on a smaller set of qualified domain experts to provide high-quality sentiment annotations for SSIX. Our annotators all have a background in finance as well as hands-on experience in trading. This ensures that they correctly interpret the stock-related sentiment expressed in our data. Annotations are collected using a custom Web Interface (see below demo video).
The SSIX Financial Microblog Sentiment Annotation Interface (FiMSAI)
In the context of the SSIX project we are interested in sentiment about stocks, that is, whether the author of a microblogging message thinks that the price of a given stock is going to increase (positive/bullish) or decrease (negative/bearish). In our gold standard, we want to annotate this sentiment for each stock that is mentioned in a message. Stocks can be identified using so-called “cashtags”, that is, the stock’s ticker symbol prefixed with a dollar sign “$”, such as “$AAPL” for the company Apple Inc. For each such cashtag detected in a message, we ask annotators to provide the sentiment that is expressed about it. For ease of use, annotators can simply place a slider on a continuous scale. Colouring additionally supports this process.
If a message is not relevant to the financial context, is classified as spam, or the annotator is unsure what sentiment to assign, they can do so by clicking on the corresponding buttons.
In addition to annotating the sentiment expressed about a cashtag, our annotators also indicate the span of text in which the sentiment is expressed. This allows us to capture the domain-specific language used to express bullish/bearish intent, which can be quite difficult to understand for non-experts. Consider the following Stocktwits message:
The message says “holding puts til 3:50pm” and “Long next week” meaning the author is anticipating a very short term decline in the price of Apple, with it increasing soon after. Collecting such positively/negatively loaded words or phrases further will further enrich our automated sentiment analysis.
Learning Algorithms & Features
A Machine Learning algorithm is also called a “classifier” as it classifies instances of data (e.g. assigning “dangerous situation – need to stop” for a tricky situation an autonomous car is faced with, or “negative sentiment” to a derogatory remark about a new technical gadget). There are different types of classifiers, that is, different ways in which the function mapping input and output values can be learnt. Typical classifier techniques applied in Natural Language Processing (NLP) are Artificial Neural Networks (ANN, Support Vector Machines (SVM) and Naive Bayes (NB). The classifier will typically be chosen after experimenting with a few alternatives as each NLP problem has distinct requirements. To learn more about the many types of classifiers out there, consult the following article “A Tour of Machine Learning Algorithms”  which gives a concise overview.
Selecting the best classifier for a problem is important – but the best classifier is helpless without high-quality features. Features are the characteristics of the input data that the classifier can use to assist the classification decision. Features can be numerical or textual. In NLP, as the instance to be classified consists of text, features often address linguistic characteristics such as word frequencies, polarity words (positively or negatively loaded expressions – often used in sentiment analysis), parts-of-speech patterns, sentence structure, and knowledge from external databases such as lists of Named Entities. The features chosen depend on the problem to be addressed and iterative cycles of feature creation, testing and evaluation are needed to establish the optimum feature set. The following article “Discover Feature Engineering, How to Engineer Features and How to Get Good at It”  gives an introduction to feature engineering in general terms.
In the context of SSIX, we are currently investigating various linguistic features such as:
- Part-of-speech tags
- Morphological information (e.g. base forms of words)
- Emotion words, both from the general domain (anger, sadness, joy, …) and the stocks domain (bullish/bearish)
- Emotion expressed
- Distance between the mention of a stock ticker and an emotion word
- Syntactic relation between stock ticker mention and emotion word (e.g. from dependency parsing)
Domain-specific semantic models (e.g. Distributional Semantic Models  trained on Finance texts)
Figure 1: Mention of a Stock Ticker with associated features in the SSIX GATE Pipeline.
 Manually labelled data provides the best quality, this is why we focus on this approach to obtain gold standard data. Class labels can, however, also be obtained using Distant Supervision (https://cs.stanford.edu/people/alecmgo/pap-ers/TwitterDistantSupervision09.pdf) or bootstrapping methods.ssix