Feature Extraction for Sentiment Analysis in Indonesian Twitter

Main Article Content

Eka Dyar Wahyuni
Amalia Anjani Arifiyanti
Mohamad Irwan Afandi

Abstract

Twitter's sentiment analysis is one of the most interesting fields of research lately. It intertwines the natural language processing techniques with data mining. Up to this point, many algorithms have been proposed to better understand sentiment from text. The proposed method can be focused on the preprocessing step, dataset splitting method (training and testing), dataset balancing method (when the data is unbalanced), to the improvement of the existing algorithm. But, the main focus of this paper is on feature extraction from tweets using TF-IDF. The features obtained from this process are expected to improve the accuracy of the classification process. The dataset used in this research is in Indonesian, which has a very different form when compared to English. This dataset consists of 1068 manually labeled tweets related to the "school from home" policy caused by the COVID-19 outbreak, taken from March to July. All steps required to process this data will be implemented using python. To validate its utility, the performance of the proposed method is compared with each other. Finally, the results are summarized by reflecting on the impact of the inclusion of the proposed features for each classification algorithm for sentiment detection

Article Details

Section
Articles