Semi-supervised Turkish Text Categorization with Word2Vec, Doc2Vec and FastText Algorithms
Yükleniyor...
Tarih
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
IEEE
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
In this study, the performance values of Word2Vec, Doc2Vec and FastText algorithms are compared on the text categorization problem based on a semi-supervised learning technique. The impact of some preprocessing techniques are also analyzed on a corpus that contains approximately 5 million Turkish news documents which are both in labeled and unlabeled manners. Naive Bayes, Support Vector Machines, Artificial Neural Networks, Decision Trees and Logistic Regression classification algorithms are used at the classification phase and the obtained results are shared.
Açıklama
Anahtar Kelimeler
word2vec, doc2vec, fasttext, text categorization
Kaynak
2019 27Th Signal Processing And Communications Applications Conference (Siu)
27th Signal Processing and Communications Applications Conference (SIU) -- APR 24-26, 2019 -- Sivas Cumhuriyet Univ, Sivas, TURKEY
27th Signal Processing and Communications Applications Conference (SIU) -- APR 24-26, 2019 -- Sivas Cumhuriyet Univ, Sivas, TURKEY












