首页> 外文会议>International Conference on Information Science and Control Engineering >The Application Analysis of the Construction Method of Minimum Entropy Unsupervised thesaurus in Ancient Chinese Word Segmentation
【24h】

The Application Analysis of the Construction Method of Minimum Entropy Unsupervised thesaurus in Ancient Chinese Word Segmentation

机译:最小熵无监督词库构建方法在古汉语分词中的应用分析

获取原文

摘要

Ancient Chinese text segmentation is the basic work of the Intelligentization of ancient books. In this paper, the unsupervised thesaurus construction algorithm based on the minimum entropy model is applied to a large-scale ancient text corpus, and the lexicon composed of high-frequency cooccurring neighbor characters in the ancient text is extracted; and the lexicon is combined with existing word segmentation tools to perform ancient text segmentation experiment. The experimental results show that this method has different enhancement effects on the word segmentation effect of ancient texts in different periods, which shows that the vocabulary has a certain range of effectiveness. In addition, this article is one of the few works that apply monolingual word segmentation methods to ancient Chinese word segmentation. The work of this article has enriched the research in related fields.
机译:古汉语文本切分是古籍智能化的基础工作。本文将基于最小熵模型的无监督叙词表构建算法应用于大规模古文本语料库中,提取出古文本中高频共现的相邻字符组成的词典;并将该词典与现有的分词工具相结合,进行古文本分割实验。实验结果表明,该方法在不同时期对古文词的分词效果有不同的增强效果,表明词汇具有一定的有效性。此外,本文也是将单语分词方法应用于古汉语分词的少数作品之一。本文的工作丰富了相关领域的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号