首页> 外文会议>IEEE International Conference on Systems, Man, and Cybernetics >Statistical learning and analyses of Chinese ancient books for information retrieval
【24h】

Statistical learning and analyses of Chinese ancient books for information retrieval

机译:中国古籍信息检索统计学习与分析

获取原文

摘要

The technique of full text retrieval for modern Chinese has been studied for a long time, but the same cannot be said for ancient Chinese books, especially in China. This paper tries to find the characteristics of Chinese ancient books which can be used for information retrieval. Statistical analysis was carried out on ancient Chinese books of over 35,000,000 words, including most of the works in common use. Based on these experiments some characteristics of ancient Chinese works are analyzed and compared with modern Chinese, including the basic unit of ancient works, the proportion of double character words, sentence length, and the field dependency of ancient Chinese works. We then give conclusions on ancient Chinese which is useful for information retrieval, especially when building inverted indexes and selecting the index unit. Depending on the conclusion, a full-text retrieval system for ancient Chinese books has been designed and realized. It shows that statistical learning and analyses are a great help in ancient Chinese information retrieval.
机译:已经研究了现代汉语的全文检索技术已经过了很长时间,但古代汉语书籍,特别是在中国,也不能说。本文试图找到中国古代书籍的特点,可用于信息检索。统计分析是在古代中文书籍中进行的超过35,000,000字,包括常见的大部分作品。基于这些实验,分析了中国古代工程的一些特征,与现代汉语相比,包括古代作品的基本单位,双字符词,句子长度和中国古代工程的野外依赖。然后我们在古代中文中得出结论,这对于信息检索有用,特别是在构建倒置索引并选择索引单元时。根据结论,设计和实现了古代书籍的全文检索系统。它表明,统计学习和分析是古代信息检索的巨大帮助。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号