May 30, 2017
Investors daily have received news and opinions about different stocks or firms through specialised platforms. These one convey information, which leads different investors to make decisions. These decisions impact the market, which aggregates the information that investors receive and reflects it via the price of the firms or stocks. Given an algorithm for classifying text according to sentiment, we can ask whether this algorithm helps us predict future returns.
Sentiment analysis has provided recent advances in natural language processing and machine learning. Most of these approaches have been applied to post in microblogs or social media how it is possible to see on recent works or publications. The domain of finance, for example, has unique linguistic and semantic features, whose interpretation depends on the formulation of models which reflect the economic and mathematical tools used by the experts to assess financial information.
In the NLP state-of-art, some deep learning algorithms have been used in classification tasks involving corpora. Recurrent Neural Network (RNN) algorithms like LSTM, for instance, has been used in many different classification problems about texts.
Combining sentiment analysis approaches and Deep Learning models to predict finance statements is a big and exciting challenge. A generic finance predictor architecture can be seen in Figure 1.
The SSIX project targets sentiment analysis in the financial domain. One of its primary objectives is to realise a financial sentiment platform capable of scoring finance statements through continuous values between 1(positive) and -1(negative). For this, we have analysed a large finance statements data source extracted from StockTwits to develop a sentiment classification model. StockTwits is a micro-blogging platform targeted towards financial events. In many messages users of StockTwits use the hashtags #bullish and #bearish to signal the nature of the reported event. Further information about StockTwits and their tags can be found here.
The results of this data source analysis were a new Finance English Gold Standard (GS) corpus and a trained an English sentiment classification model on it. A possible problem faced by us when you are analysing opinion statements is how to normalise the data and to remove expected noises to extract useful information. For this, pre-processing each sample before training the model is an important task. This sentiment classification model has taken into consideration some important features to extract sentiment and semantic meaning about words which compound a statement and then detecting sentiment patterns about finances statements. For example, a substantial effort has been invested in developing sophisticated financial polarity-lexicons that can be used to investigate how financial sentiments relate to future company performance considering some sentiment lexicons containing common finance terms.ssix