首页> 外文OA文献 >An Improved K-Nearest Neighbors Approach Using Modified Term Weighting And Similarity Coefficient For Text Classification

【2h】

An Improved K-Nearest Neighbors Approach Using Modified Term Weighting And Similarity Coefficient For Text Classification

机译：改进的K最近邻方法，使用改进的术语加权和相似系数进行文本分类

页面导航

摘要
著录项
相似文献
相关主题

摘要

Pengelasan teks automatik adalah penting kerana peningkatan bilangan dokumen digital dan oleh itu ia perlu diurus. Kaedah pemodelan statistik terkini tidak memberi maklumat berguna yang mencukupi tentang topik untuk setiap ciri dan kategori. Tambahan pula, penyarian sifat menggunakan frekuensi kata-frekuensi dokumen songsang (TF-IDF) tradisional menghasilkan pengenalan kategori yang terlalu banyak untuk sesuatu dokumen. Dalam usaha pengelasan pula, kaedah k-jiran terdekat (k-NN) sedia ada dengan jarak Euclid dan skor keserupaan kosinus menghasilkan julat varians yang besar dalam prestasinya. Untuk menangani isu ini, kajian ini mengelaskan topik untuk teks pendek dan panjang dengan menggunakan pendekatan baharu untuk tahap-tahap utama pengelasan teks (iaitu penyarian sifat dan pengelasan teks). Kajian ini juga memperkenalkan TD-IDF dengan logaritma dan k-NN dengan skor keserupaan kosinus yang baharu untuk penyarian sifat dan pengelasan masing-masing. Lagipun, faktor yang memberi kesan terhadap prestasi pembelajaran mesin berselia juga dikenalpasti.ududAutomatic text classification is important because of the increased availability of digital documents and therefore the need to organize them. The current state-of-the-art statistical modeling approaches do not provide sufficient useful information on the topics for each feature and category. Furthermore, feature extraction using traditional term frequency-inverse document frequency (TF-IDF) results in the identification of too many categories for a particular document. In terms of classification, current k-NN approaches with Euclidean distance and cosine similarity score produce a wide range of variance in performance. To address these issues, this study classifies topics for short and long texts using a new method for the main stage (i.e., feature extraction and text classification). The study also introduces TF-IDF with logarithm and k-NN with a new cosine similarity score for feature extraction and classification, respectively.

机译：由于数字文档数量的增加，自动文本分类很重要，因此需要进行管理。最新的统计建模方法没有提供有关每个功能和类别的主题的足够有用的信息。另外，使用传统的反向文档频率（TF-IDF）搜索属性会导致识别出太多的文档类别。在分类的基础上，现有的具有Euclid距离和余弦相似性得分的k -NN方法（k-NN）在性能上产生很大范围的差异。为了解决这个问题，本研究使用一种新方法将文本分类的主要阶段（即文本表征和文本分类）分类为短文本和长文本。这项研究还介绍了具有对数的TD-IDF和具有新余弦相似性得分的k-NN，以对其进行表征和分类。此外，还确定了影响受监督机器学习性能的因素：自动文本分类非常重要，因为数字文档的可用性越来越高，因此需要对它们进行组织。当前的最新统计建模方法无法提供有关每个功能和类别的主题的足够有用的信息。此外，使用传统术语频率反文档频率（TF-IDF）进行特征提取会导致为特定文档识别太多类别。在分类方面，当前具有欧几里得距离和余弦相似性得分的k-NN方法在性能上会产生很大的差异。为了解决这些问题，本研究使用主要阶段的新方法（即特征提取和文本分类）对短文本和长文本的主题进行了分类。该研究还介绍了具有对数的TF-IDF和具有新的余弦相似度得分的k-NN，分别用于特征提取和分类。

著录项

作者
Kadhim Ammar Ismael;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Using modified term frequency to improve term weighting for text classification [J] . Long Chen, Liangxiao Jiang, Chaoqun Li Engineering Applications of Artificial Intelligence . 2021,第May期

机译：使用修改的术语频率来改进文本分类的术语加权
2. A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine [J] . Chin Heng Wan, Lam Hong Lee, Rajprasad Rajkumar, Expert systems with applications . 2012,第15期

机译：结合K近邻和支持向量机的文本依赖度低的混合文本分类方法。
3. Text categorization based on k-nearest neighbor approach for Web site classification [J] . Oh-Woog Kwon, Jong-Hyeok Lee Information Processing & Management . 2003,第1期

机译：基于k近邻法的文本分类用于网站分类。
4. Multi-label Text Categorization Using K-Nearest Neighbor Approach with M-Similarity [C] . Yi Feng, Zhaohui Wu, Zhongmei Zhou International Conference on String Processing and Information Retrieval(SPIRE 2005); 20051102-04; Buenos Aires(AR) . 2005

机译：使用M相似度的K最近邻方法进行多标签文本分类
5. Automatic signal classification using a K-nearest neighbor approach for non-coherent wideband receivers [D] . Rahimi, Ahdel. 2016

机译：使用k最近邻方法进行自动信号分类，用于非相干宽带接收器
6. PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction [O] . Lawrence WC Chan, Ying Liu, Tao Chan, 2015

机译：PubMed支持的临床术语加权法可改善诊断预测中的患者间相似性度量
7. An Improved Similarity and Time Age Weight Approach Combining K-nearest Neighbor and Latent Factor Model [O] . 2017

机译：基于k - 最近邻和潜在因子模型的改进的相似性和时龄权重方法
8. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. [R] . Han, E., Karypis, G., Kumar, V. 1999

机译：使用权重调整的k-最近邻分类的文本分类。

An Improved K-Nearest Neighbors Approach Using Modified Term Weighting And Similarity Coefficient For Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅