Unsupervised and supervised term weigthing methods for character n-gram based author categorization
Citation
Naiboğlu, H. S., Kaptıkaçtı, O., Sardal, E. C., Güran, A., & Uysal, M. (2014). Unsupervised and supervised term weigthing methods for character n-gram based author categorization. In CIE 2014 - 44th International Conference on Computers and Industrial Engineering (pp. 1798-1807). İstanbul: Computers and Industrial Engineering.Abstract
Author categorization considers the problem of identifying the author of an anonymous article. The goal of this work is to identify authors of articles by using different character n-gram based representations of documents. The use of character n-gram models is a relatively simple idea, but it turns out to be quite effective in many applications. The most important point in n-gram based methods is how to represent the documents. In this study, several widely used unsupervised and supervised n-gram weighting methods are investigated on a Turkish data corpus in combination with different classification algorithms. Apart from this, the character n-gram based features are compared with some stylistic markers and the evaluation results are shared in detail.