A new hybrid semi-supervised algorithm for text classification with class-based semantics

Altinel Berna; Ganiz Murat Can

首页> 外文期刊>Knowledge-Based Systems >A new hybrid semi-supervised algorithm for text classification with class-based semantics

【24h】

A new hybrid semi-supervised algorithm for text classification with class-based semantics

机译：一种基于类语义的文本混合半监督新算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Vector Space Models (VSM) are commonly used in language processing to represent certain aspects of natural language semantics. Semantics of VSM comes from the distributional hypothesis, which states that words that occur in similar contexts usually have similar meanings. In our previous work, we proposed novel semantic smoothing kernels based on classspecific transformations. These kernels use class term matrices, which can be considered as a new type of VSM. By using the class as the context, these methods can extract class specific semantics by making use of word distributions both in documents and in different classes. In this study, we adapt two of these semantic classification approaches to build a novel and high performance semi-supervised text classification algorithm. These approaches include Helmholtz principle based calculation of term meanings in the context of classes for initial classification and a supervised term weighting based semantic kernel with Support Vector Machines (SVM) for the final classification model. The approach used in the first phase is especially good at learning with very small datasets, while the approach in the second phase is specifically good at eliminating noise in a relatively large and noisy training sets when building a classification model. Overall, as a semantic semi-supervised learning algorithm, our approach can effectively utilize abundant source of unlabeled instances to improve the classification accuracy significantly especially when the amount of labeled instances are limited. (C) 2016 Elsevier B.V. All rights reserved.

机译：向量空间模型（VSM）通常用于语言处理中，以表示自然语言语义的某些方面。 VSM的语义来自分布假设，该假设指出出现在相似上下文中的单词通常具有相似的含义。在我们以前的工作中，我们提出了基于类特定转换的新颖语义平滑内核。这些内核使用类术语矩阵，可以将其视为新型的VSM。通过使用类作为上下文，这些方法可以通过利用文档和不同类中的单词分布来提取类特定的语义。在这项研究中，我们采用了两种语义分类方法，以构建一种新颖且高性能的半监督文本分类算法。这些方法包括用于初始分类的类中基于Helmholtz原理的术语含义计算，以及用于最终分类模型的基于监督术语加权的语义内核和支持向量机（SVM）。在第一阶段中使用的方法特别擅长使用非常小的数据集进行学习，而在第二阶段中使用的方法特别擅长在建立分类模型时消除相对较大且嘈杂的训练集中的噪声。总体而言，作为一种语义半监督学习算法，我们的方法可以有效地利用大量未标记实例的来源，从而显着提高分类准确性，尤其是在标记实例数量有限的情况下。（C）2016 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2016年第15期|50-64|共15页
作者
Altinel Berna; Ganiz Murat Can;
展开▼
作者单位

Marmara Univ, Dept Comp Engn, Istanbul, Turkey;

Marmara Univ, Dept Comp Engn, Istanbul, Turkey;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Semantics; Semi-supervised classification; Text classification; Semantic smoothing kemel; Class-based transformations;

机译：语义;半监督分类;文本分类;语义平滑kemel;基于类的转换;

相似文献

外文文献
中文文献
专利

1. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
2. A novel semantic smoothing kernel for text classification with class-based weighting [J] . Altinel Berna, Diri Banu, Ganiz Murat Can Knowledge-Based Systems . 2015,第NOVa期

机译：一种新颖的基于类加权的文本分类语义平滑内核
3. Rough set and ensemble learning based semi-supervised algorithm for text classification [J] . Lei Shi, Xinming Ma, Lei Xi, Expert Systems with Application . 2011,第5期

机译：基于粗糙集和集成学习的半监督文本分类算法
4. Semantic Features for Multi-view Semi-supervised and Active Learning of Text Classification [C] . Shiliang Sun IEEE Interntional Conference on Data Mining Workshops . 2008

机译：多视图半监督和主动学习文本分类的语义特征
5. Theoretical analysis of classification under CCC-Noise and its application to semi-supervised text mining. [D] . Bi, Yingtao. 2008

机译：CCC噪声下分类的理论分析及其在半监督文本挖掘中的应用。
6. Semi-Supervised Text Classification Framework: An Overview of Dengue Landscape Factors and Satellite Earth Observation [O] . Zhichao Li, Helen Gurgel, Nadine Dessay, 2020

机译：半监督文本分类框架：登革热景观因素和卫星地球观测概述
7. Learning a Deep Hybrid Model for Semi-Supervised Text Classification [O] . Er G. Ororbia Ii, C. Lee Giles, David Reitter 2015

机译：学习半监督文本分类的深度混合模型

A new hybrid semi-supervised algorithm for text classification with class-based semantics

摘要

著录项

相似文献

相关主题

期刊订阅