A novel semantic smoothing kernel for text classification with class-based weighting
MetadataShow full item record
CitationAltınel, B., Diri, B., Ganiz, M.C. (2014). A novel semantic smoothing kernel for text classification with class-based weighting. Knowledge-Based Systems, 1-13. https://dx.doi.org/10.1016/j.knosys.2015.07.008
In this study, we propose a novel methodology to build a semantic smoothing kernel to use with Support Vector Machines (SVM) for text classification. The suggested approach is based on two key concepts; class-based term weighting and changing the orthogonality of vector space. A class-based term weighting methodology is used for transformation of documents from the original space to the feature space. This class-based weighting basically groups terms based on their importance for each class and consequently smooths the representation of documents. This is accomplished by changing the orthogonality of the Vector Space Model (VSM) with introducing class-based dependencies between terms. As a result, on the extreme case, two documents can be seen as similar even if they do not share any terms but their terms are similarly weighted for a particular class. The resulting semantic kernel can directly make use of class information in extracting semantic information between terms, therefore it can be considered as a supervised kernel. For our experimental evaluation, we analyze the performance of the suggested kernel with a large number of experiments on benchmark textual datasets and present results with respect to varying experimental conditions. To the best of our knowledge, this is the first study to use class-based term weighting in order to build a supervised semantic kernel for SVM. We compare our results with kernels that are commonly used in SVM such as linear kernel, polynomial kernel, Radial Basis Function (RBF) kernel and with several corpus-based semantic kernels. According to our experimental results the proposed method favorably improves classification accuracy over linear kernel and several corpus-based semantic kernels in terms of both accuracy and speed.