首页> 外文期刊>International journal of information retrieval research >Text Clustering using Distances Combination by Social Bees: Towards 3D Visualisation Aspect
【24h】

Text Clustering using Distances Combination by Social Bees: Towards 3D Visualisation Aspect

机译:使用社交蜜蜂的距离组合进行文本聚类:面向3D可视化方面

获取原文
获取原文并翻译 | 示例
       

摘要

Recently, the researchers proved that 90% of the information existed on the web, were presented in unstructured format (text free). The automatic text classification (clustering), has become a crucial challenge in the computer science community, where most of the classical techniques, have known different problems in terms of time execution, multiplicity of data (marketing, biology, economics), and the initialization of cluster number. Nowadays, the bio-inspired paradigm, has known a genuine success in several sectors and particularly in the world of data-mining. The content of this work, is a novel approach called distances combination by social bees (DC-SB) for text clustering, composed of four steps: Pre-processing using different methods of texts representation (bag of words and n-gram characters) and the weighting TF-IDF,for the construction of the vectors; Bees 'artificial life, the authors have imitated the functioning of social bees using three artificial worker bees(cleaner, guardian and forager) where each one of them is characterized by a distance measure different to others generated from the artificial queen (centroid) of the cluster (hive); Clustering using the concept of filtering where each filter is controlled by an artificial worker, and a document must pass three different obstacles to be added to the cluster. For the experiments they use the benchmark Reuters 21578 and a variety of validation tools (execution timef-measure and entropy) with a variation of parameters (threshold, distance measures combination and texts representation). The authors have compared their results with the performances of other methods existed in literature (Cellular Automata 2D, Artificial Immune System (AJS) and Artificial Social Spiders (ASS)), the conclusion obtained prove that the approach can solve the text clustering problem; finally, the visualization step, which provides a 3D navigation of the results obtained by the mean of a global and detailed view of the hive and the apiary, using the functionality of zooming and rotation.
机译:最近,研究人员证明90%的信息都存在于网络中,并且是以非结构化格式(无文本)呈现的。自动文本分类(聚类)已成为计算机科学界的一项严峻挑战,在计算机科学界,大多数经典技术在时间执行,数据多样性(营销,生物学,经济学)和初始化方面都存在不同的问题。集群号。如今,受生物启发的范例已在多个领域,尤其是在数据挖掘领域取得了真正的成功。这项工作的内容是一种新颖的方法,称为社交蜜蜂距离组合(DC-SB)用于文本聚类,包括四个步骤:使用不同的文本表示方法(词袋和n-gram字符)进行预处理;以及加权TF-IDF,用于向量的构造;蜜蜂的人工生活,作者使用三种人工工蜂(清洁工,监护人和觅食者)模仿了社会蜜蜂的功能,其中每一种的特征在于距离的度量不同于从人工女王(质心)产生的其他度量。集群(蜂巢);使用过滤概念进行聚类,其中每个过滤器均由人工操作者控制,并且文档必须通过三个不同的障碍才能添加到聚类中。对于实验,他们使用基准的Reuters 21578和各种验证工具(执行时间f度量和熵),这些工具具有不同的参数(阈值,距离度量组合和文本表示)。作者将其结果与文献中已有的其他方法(Cellular Automata 2D,人工免疫系统(AJS)和人工社会蜘蛛(ASS))的性能进行了比较,得出的结论证明该方法可以解决文本聚类问题;最后,可视化步骤,使用缩放和旋转功能,通过对蜂巢和养蜂场的全局和详细视图的方式,对结果进行3D导航。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号