首页> 外文期刊>International Journal of Computer Trends and Technology >Hybrid Combination of Error Back Propagation and Genetic Algorithm for Text Document Clustering
【24h】

Hybrid Combination of Error Back Propagation and Genetic Algorithm for Text Document Clustering

机译:文本文档聚类误差误差传播与遗传算法的混合组合

获取原文
           

摘要

High dimensional test data need clustering. So clustering is an important and difficult task to perform when automation is required. Many scholars are working in this field to reduce manual operation or background information passing. This paper has proposed a model for documents clustering without having background information. Document term features were extracted and collect in a matrix as per term frequency value. A genetic algorithm was applied to cluster each term in a cluster as per the similarity of content. Term frequency distance was a measuring evaluation parameter for finding the fitness of the chromosome. Cluster centers representing document terms were obtained from genetic algorithms. The output of the genetic algorithm was used as a training vector for the document cluster class identification. The experiment was done on a real dataset of research articles from various fields of engineering. The result shows that the proposed model has increased the precision, recall, and accuracy parameter of document clustering.
机译:高维测试数据需要聚类。因此,群集是在需要自动化时执行的重要和困难的任务。许多学者在这一领域工作,以减少手动操作或背景信息传递。本文提出了一个文档聚类模型而不具有背景信息。根据术语频率值提取文档项特征并以矩阵收集。根据内容的相似性,将遗传算法应用于集群中的每个术语。术语频率距离是用于找到染色体的适应度的测量评估参数。代表文档术语的集群中心是从遗传算法获得的。遗传算法的输出用作文档群集类识别的训练矢量。实验是在各种工程领域的研究文章的真实数据集完成的。结果表明,所提出的模型增加了文档聚类的精度,召回和准确性参数。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号