Natural Language Processing Resources

Sentiment Analysis

Lexicons

  1. Opinion lexicons and datasets
  2. MPQA Subjectivity Lexicon
  3. Emotion lexicon
  4. Loughran and McDonald Financial Sentiment Dictionaries
  5. FrameNet - a lexical database for frame structure of selected words.
    Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley FrameNet Project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1 (pp. 86–90). Stroudsburg, PA, USA: Association for Computational Linguistics.
  6. VerbNet - maps verbs to their corresponding Levin verb classes
    Schuler, K. K. (2006). VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon. University of Pennsylvania.

Datasets

  1. Stanford Twitter Sentiment Dataset - Hu et al. 2013
  2. Obama-McCain Debate (OMD) Twitter Dataset - Hu et al. 2013
  3. SemEval-2013 Twitter Dataset
  4. Sander Analytics Twitter Sentiment Dataset
  5. Blitzer Sentiment Dataset
  6. SFU Review Corpus - contains 17,263 sentences + negation/speculation cues and scopes; domain: reviews of books, cars, computers, cookware, hotels, movies, music and phones
  7. Multi-Domain Sentiment Dataset (version 2.0) - Amazon product reviews for books, DVD, electronics and kitchen.

Evaluations & Challenges

  1. SemEval 2014 - Aspect Based Sentiment Analysis
  2. ESWC-14 Challenge on Concept-Level Sentiment Analysis
  3. GermEval 2014 Named Entity Recognition Shared Task

Text Corpora

  1. RateMDs50,000 doctor reviews;
  2. Drugs-Forum.com - discussions of illicit drugs; use of text summarization to reveal information on drug use.

Controlled Vocabulary

  1. Consumer Health Vocabulary | Query Interface - translates between consumer and expert jargon