首页> 外文期刊>Journal of Information Recording >A New Experience in Persian Text Clustering using FarsNet Ontology
【24h】

A New Experience in Persian Text Clustering using FarsNet Ontology

机译:使用FarsNet本体的波斯文本聚类新体验

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering through organizing large text corpora has a key role in an easy navigation and browsing of massive amounts of text data and in particular in search engines. The documents comparison using the conventional clustering techniques is based on the surface similarities of words or extracted morphemes. This leads to non-semantic clusters usually. In this paper, Farsi, also known as Persian, has been taken into account with regards to the fact that the amount of electronic Farsi texts are growing rapidly. The documents are enriched by using semantic relationships - synonymy, hypemymy and hyp-onymy-extracted from FarsNet lexical ontology. A WSD procedure is proposed to decrease uncertainty. After preprocessing routines, three clustering algorithms including Bisecting K-means, LSI and PLSI based clustering is applied on the pre-categorized Persian Hamshahri corpus. Experimental results show the improvement of clustering quality when text data is enriched by the semantic relations especially using PLSI based approach.
机译:通过组织大型文本语料库进行聚类在轻松导航和浏览大量文本数据(尤其是在搜索引擎中)方面起着关键作用。使用常规聚类技术的文档比较是基于单词或提取的词素的表面相似性。这通常导致非语义簇。在本文中,考虑到电子波斯语文本的数量正在迅速增长这一事实,波斯语也被称为波斯语。通过使用语义关系(从FarsNet词汇本体中提取的同义词,同义和hyp-onymyy)来丰富文档。建议采用WSD程序以减少不确定性。经过预处理程序后,将三类聚类算法(包括平分K均值,基于LSI和PLSI的聚类)应用于预分类的波斯Hamshahri语料库。实验结果表明,尤其是基于PLSI的方法通过语义关系丰富文本数据时,聚类质量得到改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号