首页> 外文会议>ACM Annual Symposium on Applied Computing >An Optimized Approach for KNN Text Categorization using P-trees
【24h】

An Optimized Approach for KNN Text Categorization using P-trees

机译:使用P树的KNN文本分类的优化方法

获取原文

摘要

The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text categorization is the process of assigning categories or labels to documents based entirely on their contents. Formally, it can be viewed as a mapping from the document space into a set of predefined class labels (aka subjects or categories); F: D→{C1, C2...Cn} where F is the mapping function, D is the document space and {C1, C2...Cn} is the set of class labels. Given an unlabeled document d, we need to find its class label, Ci, using the mapping function F where F(d) = Ci. In this paper, an optimized k-Nearest Neighbors (KNN) classifier that uses intervalization and the P-tree technology to achieve a high degree of accuracy, space utilization and time efficiency is proposed: As new samples arrive, the classifier finds the k nearest neighbors to the new sample from the training space without a single database scan.
机译:文本挖掘的重要性源于巨大的文本数据库的可用性,持有需要开采的有价值的有价值的信息。 文本分类是将类别或标签分配给完全基于其内容的文档的过程。 正式地,它可以被视为从文档空间的映射到一组预定义的类标签(AKA科目或类别); f:d→{c1,c2 ... cn}其中f是映射函数,d是文档空间,{c1,c2 ... cn}是类标签集。 鉴于未标记的文档D,我们需要使用其中f(d)= ci的映射函数f找到其类标签ci。 在本文中,提出了一种优化的K-CORMATE邻居(KNN)分类器,其使用间隔化和P树技术实现高度精度,空间利用率和时间效率:随着新的样本到达,分类器找到最近的k 没有单一数据库扫描的训练空间,邻居到新的样本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号