Skip to Main content Skip to Navigation
Conference papers

The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs

Abstract : This paper introduces the three SSIX corpora for sentiment analysis. These corpora address the need to provide annotated data for supervised learning methods. They focus on stock-market related messages extracted from two financial microblog platforms, i.e., StockTwits and Twitter. In total they include 2,886 messages with opinion targets. These messages are provided with polarity annotation set on a continuous scale by three or four experts in each language. The annotation information identifies the targets with a sentiment score. The annotation process includes manual annotation verified and consolidated by financial experts. The creation of the annotated corpora took into account principled sampling strategies as well as inter-annotator agreement before consolidation in order to maximize data quality
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://hal.univ-rennes2.fr/hal-02280345
Contributor : Laurence Leroux <>
Submitted on : Tuesday, September 10, 2019 - 3:11:21 PM
Last modification on : Monday, January 20, 2020 - 3:24:05 PM
Long-term archiving on: : Friday, February 7, 2020 - 10:28:28 PM

File

L18-1423.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-02280345, version 1

Citation

Thomas Gaillat, Manel Zarrouk, André Freitas, Brian Davis. The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs. LREC: Language Resources and Evaluation‎ Conference, May 2018, Miyazaki, Japan. pp.2671-2675. ⟨hal-02280345⟩

Share

Metrics

Record views

519

Files downloads

47