Ternary encoding based feature extraction for binary text classification

Hakan Alt?n?ay; Zafer Erenel

首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >Ternary encoding based feature extraction for binary text classification

【24h】

Ternary encoding based feature extraction for binary text classification

机译：基于三进制编码的特征提取用于二进制文本分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A novel framework for termset based feature extraction is proposed for binary text classification. The proposed approach is based on the encoding of the terms within a termset. The ternary codes '+1' and '?1' are used to represent the class that the term supports, whereas '0' denotes no support to any of the classes. Four different encoding schemes are proposed where the term weights and the term occurrence probabilities in the positive and negative documents are used to define the ternary code of a given term. The ternary patterns are utilized to define novel features by splitting them into positive and negative codes where each code is treated as a different feature extractor. Use of the derived features individually and together with bag of words representation are both investigated. The histograms of the resultant features are also employed to study the improvements that can be achieved using a small number of additional features to augment bag of words representation. Experiments conducted on four benchmark datasets with different characteristics have shown that the proposed feature extraction framework provides significant improvements compared to the bag of words representation.

机译：提出了一种新的基于术语集的特征提取框架，用于二进制文本分类。所提出的方法基于术语集中术语的编码。三元代码“ +1”和“？1”用于表示该术语支持的类别，而“ 0”表示不支持任何类别。提出了四种不同的编码方案，其中使用正负文档中的术语权重和术语出现概率来定义给定术语的三进制编码。三元模式通过将新颖特征分为正码和负码来定义新颖特征，其中每个代码都被视为不同的特征提取器。分别研究了衍生特征的使用以及单词袋的表示。所得特征的直方图也用于研究使用少量附加特征来增强单词表示袋所能实现的改进。在具有不同特征的四个基准数据集上进行的实验表明，与袋式表示相比，所提出的特征提取框架提供了显着的改进。

著录项

来源
《Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies》 |2014年第1期|共17页
作者
Hakan Alt?n?ay; Zafer Erenel;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
Local ternary patterns; Feature extraction; Termsets; n-grams; Termset weighting; Text classification;

机译：局部三元模式;特征提取;术语集;n-gram;术语集权重;文本分类;

相似文献

外文文献
中文文献
专利

1. Ternary encoding based feature extraction for binary text classification [J] . Hakan Alt?n?ay, Zafer Erenel Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2014,第1期

机译：基于三进制编码的特征提取用于二进制文本分类
2. Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm [J] . Muhammad Azam, Tanvir Ahmed, Fahad Sabah, International journal of computer science and network security . 2018,第12期

机译：使用K最近邻算法的基于特征提取的文本分类
3. Lexicon based feature extraction for emotion text classification [J] . Bandhakavi Anil, Wiratunga Nirmalie, Padmanabhan Deepak, Pattern recognition letters . 2017,第jula1期

机译：基于词汇的特征提取用于情感文本分类
4. Encoding Ordinal Features into Binary Features for Text Classification [C] . Andrea Esuli, Fabrizio Sebastiani Advances in information retrieval . 2009

机译：将序数特征编码为二进制特征以进行文本分类
5. Feature selection and extraction for text classification. [D] . Bakus, Jan. 2005

机译：用于文本分类的特征选择和提取。
6. Sentimental text mining based on an additional features method for text classification [O] . Ching-Hsue Cheng, Hsien-Hsiu Chen -1

机译：基于附加特征方法的情感文本挖掘
7. Encoding Ordinal Features into Binary Features for Text Classification [O] . Andrea Esuli, Fabrizio Sebastiani 2013

机译：将序数特征编码为二进制特征以进行文本分类

Ternary encoding based feature extraction for binary text classification

摘要

著录项

相似文献

相关主题

期刊订阅