首页> 外文会议>International Conference on Intelligent Systems Design and Applications >Suffix Tree Based Approach for Chinese Information Retrieval
【24h】

Suffix Tree Based Approach for Chinese Information Retrieval

机译:基于后缀树的中文信息检索方法

获取原文

摘要

With the widespread of the Internet, great research interests are being shown in Chinese language information retrieval in recent years. The absence of word boundaries in Chinese language makes Chinese information retrieval (IR) different to European IR. In order to apply traditional IR approaches to Chinese language, sentences have to be segmented into words first. Word segmentation is playing a key role in Chinese IR. As word segmentation is not straightforward and the results are sometime ambiguous, n-grams are used as an alternative. Several experimental studies have been conducted to compare words and n-grams[5, 6], word segmentation and its effect on information retrieval[3]. These studies show that using either words or n-grams leads to comparable performances. Higher word segmentation accuracy does not necessarily result in better retrieval performance. In this paper we propose a suffix tree based approach for Chinese information retrieval without word segementation.
机译:随着互联网的广泛,近年来,中文信息检索显示了巨大的研究兴趣。汉语语言中的缺点使得中文信息检索(IR)与欧洲IR不同。为了应用传统的中国语言方法,必须先被分段为单词。单词分割在中国IR中发挥着关键作用。由于文字分割并不直,并且结果是含糊不清的,则n-gram被用作替代方案。已经进行了几项实验研究以比较词语和n-grams [5,6],词分割及其对信息检索的影响[3​​]。这些研究表明,使用任一词或n克导致可比性的性能。更高的单词分割精度不一定会导致更好的检索性能。在本文中,我们提出了一种基于后缀树的汉语信息检索方法,没有单词段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号