PMI Based Clustering Algorithm for Feature Reduction in Text Classification

P.Jeyadurga; Prof. P. R. Vijaya Lakshmi; J.S.Kanchana

首页> 外文期刊>International Journal of Innovative Research in Science, Engineering and Technology >PMI Based Clustering Algorithm for Feature Reduction in Text Classification

【24h】

PMI Based Clustering Algorithm for Feature Reduction in Text Classification

机译：基于PMI的文本分类中特征约简的聚类算法。

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Feature clustering is a feature reduction method that reduces the dimensionality of feature vectors for text classification. In this paper an incremental feature clustering approach is proposed that uses Semantic similarity to cluster the features. Pointwise Mutual Information (PMI) is widely used word similarity measure, which finds Semantic similarity between two words and is an alternative for distributional similarity. PMI computation requires simple statistics about two words for similarity measure, that is number of cooccurrences or correlations between two concepts of fixed size are computed. Once the words from preprocessed documents are fed, clusters are formed and one feature (head word) is identified for each cluster which are used for indexing the document. PMI assumes that a word have single sense, but clustering can be optimized further if polysemies of words are considered. Hence PMI may be combined with PMImax, which estimates correlation between the closest senses of two words also, thereby better feature reduction and execution time compared with other approaches.

机译：特征聚类是一种特征缩减方法，可减少用于文本分类的特征向量的维数。本文提出了一种利用语义相似度对特征进行聚类的增量特征聚类方法。点向互信息（Pointwise Mutual Information，PMI）是广泛使用的单词相似度度量，它可以发现两个单词之间的语义相似度，并且是分布相似度的替代方法。 PMI计算需要针对两个词的简单统计，以进行相似性度量，即计算固定大小的两个概念之间的共现次数或相关性。一旦馈送了来自预处理文档的单词，就形成了聚类，并为每个聚类标识了一个特征（标头），用于索引文档。 PMI假设一个单词具有单一含义，但是如果考虑单词的多义性，则可以进一步优化聚类。因此，PMI可以与PMImax结合使用，后者还可以估计两个单词的最接近感觉之间的相关性，从而与其他方法相比，可以更好地减少特征并缩短执行时间。

著录项

来源
《International Journal of Innovative Research in Science, Engineering and Technology》 |2014年第3期|共4页
作者
P.Jeyadurga; Prof. P. R. Vijaya Lakshmi; J.S.Kanchana;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类一般工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. The Feature Selection Method based on Genetic Algorithm for Efficient of Text Clustering and Text Classification [J] . Sung-Sam Hong, Wanhee Lee, Myung-Mook Han International Journal of Advances in Soft Computing and Its Applications . 2015,第1aSpecial期

机译：基于遗传算法的高效文本聚类和分类的特征选择方法
2. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering [J] . Abualigah Laith Mohammad, Khader Ahamad Tajudin Journal of supercomputing . 2017,第11期

机译：基于混合遗传算法和遗传算子的无监督文本特征选择技术
3. A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification [J] . Jiang Jung-Yi, Liou Ren-Jia, Lee Shie-Jue Knowledge and Data Engineering, IEEE Transactions on . 2011,第3期

机译：文本分类的模糊自构造特征聚类算法
4. A Confidence-based Hierarchical Feature Clustering Algorithm for Text Classification [C] . Jung-Yi Jiang, Kai-Tai Yin, Shie-Jue Lee International Conference on Intelligent Pervasive Computing . 2007

机译：文本分类基于置信的分层特征聚类算法
5. Information retrieval: A framework for recommending text-based classification algorithms. [D] . Saleeb, Hany. 2002

机译：信息检索：一种推荐基于文本的分类算法的框架。
6. Improved support vector machine classification algorithm based on adaptive feature weight updating in the Hadoop cluster environment [O] . Jianfang Cao, Min Wang, Yanfei Li, 2012

机译：Hadoop集群环境中基于自适应特征权重更新的改进支持向量机分类算法
7. Method of Feature Reduction in Short Text Classification Based on Feature Clustering [O] . 2019

机译：基于特征群集的短文本分类的特征减少方法
8. Rough Set Feature Selection Algorithms for Textual Case-Based Classification. [R] . Gupta, K. M., Aha, D. W., Moore, P. 2006

机译：基于文本案例分类的粗糙集特征选择算法。

PMI Based Clustering Algorithm for Feature Reduction in Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅