Sentiment analysis on streams of twitter data

Sentiment Analysis on Twitter Data is a challenging problem due to the nature, diversity and volume of the data. In this work we implement a system on Apache Spark, an open-source framework for programming with Big Data. The sentiment analysis tool is based on Machine Learning methodologies and Natu...

Full description

Bibliographic Details
Main Author: Μπαλτάς, Αλέξανδρος
Other Authors: Τσακαλίδης, Αθανάσιος
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10889/10365
Description
Summary:Sentiment Analysis on Twitter Data is a challenging problem due to the nature, diversity and volume of the data. In this work we implement a system on Apache Spark, an open-source framework for programming with Big Data. The sentiment analysis tool is based on Machine Learning methodologies and Natural Language Processing techniques and utilises Apache Spark’s Machine learning library, MLlib. In order to address the nature of Big Data we introduce some preprocess- ing steps of the input for achieving better results in Sentiment Analysis. The classification algorithms are used for both binary and ternary classification, and we examine the effect of the dataset size as well as the features of the input on the quality of results.