Isanette: A Common and Common Sense Knowledge Base for Opinion Mining
Cambria, E. et al., 2011. Isanette: A Common and Common Sense Knowledge Base for Opinion Mining. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on. pp. 315 —322.
This paper demonstrates how different knowledge sources can be combined for considering common sense and common knowledge in opinion mining by propagating sentiment from affect words such as "happy" or "sad" to general concepts (e.g. "birthday gift", "school graduation", ...)
The authors build upon two knowledg sources:
- ConceptNet - a natural language-based semantic network of common sense knowledge and
- Probase - the largest existing taxonomy of common knowledge that was created by Microsoft through Open Information Extraction.
Methods and Concepts
- Building the knowledge base: the authors create a 2,715,218 x 1,331,231 matrix of concepts (e.g. artists) and their instances (e.g. picaso, mozart, ...) based on Probase's hyponym-hypernym relations.
- afterwards they reduce the matrix' sparseness by performing the following steps:
- apply NLP to merge multiple word forms into one concept/instance
- discard nodes with a low connectivity (hapax legomena and nodes with a connectivity <=10)
- exploit dimensionality reduction techniques to infer negative evidence (e.g. "alitalia" is not a "country, ...)
- integrate common sense knowledge from ConceptNet using blending (compare: "Digital Intuition - Applying Common Sense Using Dimensionality Reduction")
- Creation of the Knowledge Base:
- build a Vector Space Representation of the Knowledge Base
- apply truncated singular value decomposition (SVD)
- represent common and common sense instances by vectors of "d=500" coordinates that can be seen as describing instances in terms of 'eigenconcepts' for the instances.
- semantic clusting - cluster the space into 5000 different categories that are represented by Isanette's hub concepts (=top 5,000 concepts with the highest in-degree in Isanette).
- build a Vector Space Representation of the Knowledge Base
- Opinion Mining
- Interpretation of affective valence indicators (special punctuation, complete uper-case words, onomatopoeic repetitons (e.g. "grrr", "arg", ...), exclamation words, negations, degree adverbs and emoticons)
- Semantic parsing deconstructs the text into concepts e.g. "I bought a lot of very nice Christmas presents" -> "buy christmas present"
- Calculate the polarity according to the Hourglass dimensions that are based on Pleasantness, Attention, Sensitivity and Aptitude p = \sum ( [Pleasantness(concept)) + |Attention(concept)| - |Sensitivity(concept)| + Aptitude(concept)] / 9N (9 is a normalization factor, since the Hourglass dimensions are defined as [-3, +3])
Sentiment Analysis
- early work: classify entire documents
- later: paragraph level opinion analysis
- recent work:
- sentence level sentiment analysis
- detection of subjective sentences
- use of semantic frames
- identification sentiment topics (targets)
Interesting Resources
- Open Mind Common Sense (OMCS) - a collection of common sense statements over a number of concepts
- Hourglass of Emotions: Cambria, E. et al., 2010. SenticSpace: Visualizing Opinions and Sentiments in a Multi-dimensional Vector Space. In R. Setchi et al., eds. Knowledge-Based and Intelligent Information and Engineering Systems. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, pp. 385—393. .