首页> 外文期刊>Computer and Information Science >Representation of textual documents by the approach wordnet and n-grams for the unsupervised classification (clustering) with 2D cellular automata: a comparative study
【24h】

Representation of textual documents by the approach wordnet and n-grams for the unsupervised classification (clustering) with 2D cellular automata: a comparative study

机译:用词网和n-gram表示文本文件用于二维细胞自动机的无监督分类(聚类):一项比较研究

获取原文
       

摘要

Normal 0 21 false false false MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable{mso-style-name:"Tableau Normal";mso-tstyle-rowband-size:0;mso-tstyle-colband-size:0;mso-style-noshow:yes;mso-style-parent:"";mso-padding-alt:0cm 5.4pt 0cm 5.4pt;mso-para-margin:0cm;mso-para-margin-bottom:.0001pt;mso-pagination:widow-orphan;font-size:10.0pt;font-family:"Times New Roman";mso-ansi-language:#0400;mso-fareast-language:#0400;mso-bidi-language:#0400;} In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by Wordnet approach and the representation of text documents by the method n-grams. Our work is to make a comparative study of two approaches to representation that is the conceptual approach (Wordnet) and the n-grams. Section 1 gives an introduction on the biomimétisme and text mining, Section 2 presents r epresentation of texts based on Wordnet approach and ? the n grams , Section 3 ? describes the cellular automaton for clustering, Section 4 shows the experimentation and comparison results and finally Section 5 ? gives a conclusion and perspectives.
机译:正常0 21否否否MicrosoftInternetExplorer4 / *样式定义* / table.MsoNormalTable {mso-style-name:“ Tableau Normal”; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso- style-noshow:是; mso-style-parent:“”; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination :寡妇孤儿;字体大小:10.0pt;字体家族:“ Times New Roman”; mso-ansi语言:#0400; mso-fareast语言:#0400; mso-bidi语言:#0400;}在本文中,我们提出了一种二维元胞自动机(Class_AC),以解决无监督分类(聚类)情况下的文本挖掘问题。在尝试元胞自动机之前,我们通过Wordnet方法对数据库REUTERS 21,578中的文本文档进行了数据索引,并通过n-gram方法对文本文档的表示进行了矢量化处理。我们的工作是对两种表示方法进行比较研究,即概念方法(Wordnet)和n-gram。第1节介绍了生物记忆和文本挖掘,第2节介绍了基于Wordnet方法和?的文本表示。 n克,第3节?描述了用于聚类的元胞自动机,第4节显示了实验和比较结果,最后是第5节?给出结论和观点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号