首页> 外文会议>International conference on human centered computing >Ancient Chinese Lexicon Construction Based on Unsupervised Algorithm of Minimum Entropy and CBDB Optimization
【24h】

Ancient Chinese Lexicon Construction Based on Unsupervised Algorithm of Minimum Entropy and CBDB Optimization

机译:基于无监督算法的最小熵和CBDB优化古代莱克西昂建设

获取原文

摘要

Ancient Chinese text segmentation is the basic work of the intelli-gentization of ancient books. In this paper, an unsupervised lexicon construction algorithm based on the minimum entropy model is applied to a large-scale ancient text corpus, and a dictionary composed of high-frequency co-occurring neighbor characters is extracted. Two experiments were performed on this lexicon. Firstly, the experimental results of ancient text segmentation are compared before and after the lexicon is imported into the word segmentation tool. Secondly, the words such as person's name, place name, official name and person relationship in CDBD are added to the lexicon, and then the experimental results of ancient text segmentation before and after the optimized lexicon is imported into the word segmentation tool are compared. The above two experimental results show that the lexicon has different enhancement effects on the segmentation effect of ancient texts in different periods, and the optimization effect of CDBD data is not obvious. This article is one of the few works that applies monolingual word segmentation to ancient Chinese word segmentation. The work of this paper enriches the research in related fields.
机译:古代文本细分是古书籍智慧的基本工作。在本文中,基于最小熵模型的无监督的词典构造算法应用于大规模古代文本语料库,提取由高频共同发生邻居字符组成的字典。在本词典中进行了两项实验。首先,在将Lexicon导入到词分割工具之前和之后,比较古代文本分割的实验结果。其次,诸如CDBD中的人姓名,官方名称,官方名称和人士关系等词语被添加到词典中,然后比较了优化的词典前后的古代文本分割的实验结果。上述两个实验结果表明,词典对不同时期的古代文本的分割效果具有不同的增强效果,CDBD数据的优化效果不明显。本文是少数少数作品之一,将单声道词分割应用于古代文字分割。本文的工作丰富了相关领域的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号