Imbalanced Dataset Problem in Sentiment Analysis

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In this study, the problems caused by unbalanced data sets on sentiment analysis are discussed and the situation of balancing the data sets of the methods based on sample incrementation and sample reduction analysed to reach more reliable classification results and the positive and negative effects of these methods are revealed.For this purpose, the effects of ROS, SMOTE, RUS and NM algorithms on the logistic regression classifier were analyzed by using word-based N-gram structures on three different sentiment analysis datasets containing both Turkish and English texts.As a result, it has been found out that the sample increase methods (ROS, SMOTE) increase the classifier performance values and the sample reduction methods (RUS and NM) decrease the performance values on the data sets and the results are explained in detail.

Açıklama

Anahtar Kelimeler

imbalanced dataset, sentiment analysis, ROS, RUS, SMOTE, NM

Kaynak

2019 4Th International Conference On Computer Science And Engineering (Ubmk)
4th International Conference on Computer Science and Engineering (UBMK) -- SEP 11-15, 2019 -- Samsun, TURKEY

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren