首页> 中文期刊> 《模式识别与人工智能》 >基于群体智能的半结构化藏文文本聚类算法

基于群体智能的半结构化藏文文本聚类算法

     

摘要

To apply swarm intelligence techniques to cluster semi-structured Tibetan Web texts, a semi-structured Tibetan text clustering algorithm based on swarm Intelligence ( SCAST) is proposed. Taking into a full consideration of accuracy and efficiency of Tibetan text clustering, a vector space model is used to express Tibetan texts, and the Tibetan texts and intelligent ants are randomly put in a two dimensional text vector space. Then, intelligent ants randomly select a Tibetan text, calculate the similarity between this text and others in the local area, and compute the probability of pick-up operation or drop-down operation to determine whether to pick up, move, or drop down the text. Finally, Tibetan texts are accurately clustered according to their similarities by iterative training of the proposed algorithm. The experimental results on real Tibetan Web text datasets show that the proposed algorithm is more accurate than the traditional κ-means clustering algorithm with average increase of 8 . 0%.%将群体智能技术应用于半结构化的藏文Web文本聚类,提出基于群体智能的半结构化藏文Web文本聚类算法( SCAST).充分考虑群体智能技术对藏文文本聚类准确性和时间效率的影响,SCAST算法首先运用向量空间模型表示藏文文本信息,将藏文文本和智能蚁群随机放置于一个文本向量空间中.然后智能蚂蚁随机选择藏文文本,计算藏文文本在当前局部区域内的相似性,获得拾起或者放下文本的概率,进而决定是否“拾起”,“移动”,“放下”藏文文本.最后通过多次迭代训练,将藏文文本按其相似性聚集在一起,得到最终聚类结果.大量真实藏文Web文本数据上的实验结果表明,相较于传统的κ-means聚类算法,基于群体智能的藏文文本聚类算法在聚类准确率上平均提高约8.0%.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号