Imbalanced Dataset Problem in Sentiment Analysis
Tarih
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
In this study, the problems caused by unbalanced data sets on sentiment analysis are discussed and the situation of balancing the data sets of the methods based on sample incrementation and sample reduction analysed to reach more reliable classification results and the positive and negative effects of these methods are revealed.For this purpose, the effects of ROS, SMOTE, RUS and NM algorithms on the logistic regression classifier were analyzed by using word-based N-gram structures on three different sentiment analysis datasets containing both Turkish and English texts.As a result, it has been found out that the sample increase methods (ROS, SMOTE) increase the classifier performance values and the sample reduction methods (RUS and NM) decrease the performance values on the data sets and the results are explained in detail.
Açıklama
Anahtar Kelimeler
Kaynak
4th International Conference on Computer Science and Engineering (UBMK) -- SEP 11-15, 2019 -- Samsun, TURKEY












