SSIX - Social Sentiment Indices powered by X-Scores

The creation of real-world Artificial Intelligence (AI) applications is dependent on leveraging a large volume of commonsense knowledge. Simple semantic interpretation tasks such as understanding that if ‘A is married to B’ then ‘A is the spouse of B’ or that ‘car, vehicle, auto’ have very similar meanings are examples of semantic approximation operations/inferences that are present in practically all applications of AI that interpret natural language.

Many AI applications depend on being semantically flexible, i.e. coping with the large vocabulary variation that is permitted by natural language. Sentiment Analysis, Question Answering, Information Extraction, Semantic Search and Classification tasks are examples of tasks in which the ability to do semantic approximation is a central requirement.

Distributional Semantics Models and Word Vector models emerged as successful approaches for supporting semantic approximations due to their ability to build comprehensive semantic approximation models and also to their simplicity of representation.

One very useful tool for this purpose is Indra, a distributional semantics engine which facilitates the deployment of robust distributional semantic models for industry-level applications.

Its key features are:

  • Supports multiple distributional semantic models and distance measures.
  • No strings attached: permissive license for commercial and academic use.
  • Access to the semantic models as a service.
  • High performance vector computation.
  • Easy deploy: Deploy the infrastructure in 3 steps.
  • Intrinsically multi-lingual.
  • Pre-build models from different languages.

Also it allows the experimentation with multiple distributional models, including:

  • Latent Semantic Analysis (LSA)
  • Explicit Semantic Analysis (ESA)
  • Word2Vec (W2V)
  • Global Vectors (GloVe)
  • JSON over HTTP API (REST like)

You easily integrate Indra as a service into your application. This is the payload consumed by Indra to compute Semantic Similarity between words or phrase pairs.

Request data model
{
“corpus”: “wiki-2014”,
“model”: “W2V”,
“language”: “EN”,
“scoreFunction”: “COSINE”,
“pairs”: [{
“t2”: “love”,
“t1”: “mother”
},
{
“t2”: “love”,
“t1”: “father”
}]
}

The request should receive as parameters:

model: The distributional model

W2V
GLOVE
LSA
ESA
language: Two-letter-code ISO 639-1.
EN – English
DE – German
ES – Spanish
FR – French
PT – Portuguese
IT – Italian
SV – Swedish
ZH – Chinese
NL – Dutch
RU – Russian
KO – Korean
JP – Japanese
AR – Arabic
FA – Persian
corpus: The name of the corpus used to build the models.

wiki-2014 (except JP and KO)
wiki-2016 (only JP and KO)
scoreFunction: The function to compute the relatedness between the distributional vectors

COSINE
ALPHASKEW
CHEBYSHEV
CITYBLOCK
SPEARMAN
PEARSON
DICE
EUCLIDEAN
JACCARD
JACCARD2
JENSENSHANNON

Response model

This is the response for the request above.

{
“corpus”: “wiki-2014”,
“model”: “W2V”,
“language”: “EN”,
“pairs”: [
{
“t1”: “mother”,
“t2”: “love”,
“score”: 0.45996829519139865
},
{
“t1”: “father”,
“t2”: “love”,
“score”: 0.32337835808129745
}
],
“scoreFunction”: “COSINE”
}

Public Endpoint
Indra has a public endpoint for demonstration only hence you can try right now with cURL on the command line:

curl -X POST -H “Content-Type: application/json” -d ‘{
“corpus”: “wiki-2014”,
“model”: “W2V”,
“language”: “EN”,
“scoreFunction”: “COSINE”,
“pairs”: [{
“t2”: “love”,
“t1”: “mother”
},
{
“t2”: “love”,
“t1”: “father”
}]
}’ “http://indra.lambda3.org/relatedness”

For further information go to: https://github.com/Lambda-3/indra
 
This blog post was written by SSIX partner André Freitas at Passau University.
For the latest update, like us on Facebook, follow us on Twitter and join us on LinkedIn.

 

Tweet about this on TwitterShare on Facebook0Share on Google+0Share on LinkedIn34
Author :
Print

Leave a Reply