March 10, 2017
The main source of data for the SSIX project is undoubtedly Twitter. In fact, it is the only platform still providing free and continuous access to live streams of contents, thanks to the official Streaming APIs (for more information, read this article on how to collect data from Twitter).
Although all the data retrieved from Twitter has to be considered as publicly accessible data – primarily because users have automatically given consent to the distribution of their data to third parties outside the Twitter platform – the SSIX project draws particular attention to the privacy of collected data.
To reach this, the data ingestion infrastructure of the whole architecture, provided by 3rdPLACE, applies an anonymization process to the data entering the system before storing it. This consists in the removal from the original Twitter object (received in JSON format) of all those fields that have been identified as potentially sensitive. These fields are not used by the SSIX platform for applying sentiment analysis and for the generation of the X-Scores.
Here is a list of the fields that are currently discarded:
All the information kept after this process is stored in a secure repository on the Google Cloud Platform, as explained in this previous article.