...
首页> 外文期刊>Asian Journal of Information Technology >Text Document Clustering with Flocking Algorithm using Specific Crimes Judgment Corpus
【24h】

Text Document Clustering with Flocking Algorithm using Specific Crimes Judgment Corpus

机译:基于特定犯罪判决语料库的植绒算法文本文档聚类

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Text document clustering is the fundamental technique to mine massive amount of textual data. The problem is of high dimension and most of the machine learning algorithms does not perform well with all the terms in the corpus. In this study, researchers proposed an application of flocking algorithm for text document clustering using two document representation methods. They are Unigram and Noun. In this research, the problem of high dimensions has been dealt with representing documents as Bag of Nouns (BoN) and Bag of Unigrams (BoU). As there are thousands of words present in documents to find Unigram, user has to connect with WordNet and verified the selected features are Unigram. The same process is repeated for Noun. In clustering algorithm, boids follow four simple local rules like alignment, separation, cohesion and similarity to calculate the velocity for flocking. Experiments were conducted with documents of 20 Newsgroup, Reuter Real datasets and Specific Crime Judgment corpus to study the advantages of the system. Flocking algorithm for Text Document clustering is compared with Unigram based document representation and Noun based Document representation. It is observed that Flocking algorithm with Bag of Noun is working efficiently than Bag of Unigram and Bag of Words.
机译:文本文档聚类是挖掘大量文本数据的基本技术。问题是高维度的,并且大多数机器学习算法对于语料库中的所有术语都不能很好地执行。在这项研究中,研究人员提出了一种植群算法在文本文档聚类中的一种应用,该算法使用两种文档表示方法。他们是Unigram和名词。在这项研究中,高维问题已通过将文档表示为名词袋(BoN)和名词袋(BoU)来解决。由于文档中存在成千上万的单词来查找Unigram,因此用户必须连接WordNet并验证所选功能是否为Unigram。名词重复相同的过程。在聚类算法中,投标遵循四个简单的局部规则(如对齐,分离,内聚和相似性)来计算植绒速度。对20个新闻组的文档,路透真实数据集和特定犯罪判决语料进行了实验,以研究该系统的优势。将文本文档聚类的植绒算法与基于Unigram的文档表示和基于Noun的文档表示进行了比较。可以看出,带有名词袋的植绒算法比Unigram袋和单词袋有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号