Abstract
Sentiment analysis is defined as a computational study of people’s beliefs and opinions regarding entities and events, and their attributes, as expressed in a text. The two main approaches to sentiment analysis are machine learning and lexicon-based.The machine learning approach builds a model by learning from observed data to analyse the sentiment of a text, whereas the lexicon-based approach associates sentiment scores with individual words and calculates an overall sentiment score for the document.
Each approach has strengths and weaknesses: the lexicon-based approach can capture specific lexical sentiment behaviour very precisely, but relies on expert development of lexicons which is expensive and does not scale easily; the machine learning
approach can exploit broader linguistic context and so is more robust, but it is less precise and requires large-scale training data. This project introduced a novel ’extended’ lexical approach which uses inheritance-based techniques to represent both lexical behaviour and broader linguistic context derived from corpus-based learning. This approach used lexical items not just in isolation, but in context, which allowed the study to take into account more complex linguistic constructions. The corpusbased learning technique was then used to refine this model with examples derived from corpus data. This was done by using a non-monotonic, inheritance-based architecture to represent both the lexical algorithmic component and the example-based refinements. This thesis introduced a sentiment modelling system called Galadriel, based on the inheritance mechanisms of the lexical knowledge description representation language DATR. The Galadriel system handles sentiment phrases and supports exceptions to general rules using corpus-based learning methodology. However, I did not aim to explore automatic acquisition for sentiment analysis using machine learning methods in this thesis.
More specifically, this project developed a final system (Galadriel) to address different levels of sentiment analysis related to the current research area: documentlevel, sentence-level and aspect-level. The main properties of the Galadriel system involve the calculation of sentiment magnitude and the polarity of a text. A calibration method was introduced to assign cut-off values for the Galadriel score for each sentiment category, such as positive, negative and neutral, or for more than three-scale categories, using corpus-based learning evaluation techniques. Sensitivity and stability of the numerical position of individual lexical entries’ magnitude were then tested. This project also explored the neutral behaviour of sentiment and proposed a method to define the neutral category in sentiment analysis; the neutral class is not often addressed in the existing literature. Finally, the performance of the system was measured using precision, recall and f–score values. The evaluation results show that the Galadriel system yields comparable results across the different levels of sentiment task. The final evaluation shows that the f-score of the Galadriel system at sentence-level is 0.8284, document-level is 0.78 (three class)/0.75 (four class) and aspect-level is 0.8079(Restaurant)/0.7464(Laptop).
Date of Award | Oct 2018 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Roger Evans (Supervisor) & Gulden Uchyigit (Supervisor) |