【24h】

Word Extraction Based on Semantic Constraints in Chinese Word-Formation

机译:基于语义约束的中文构词法词提取

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a novel approach to Chinese word extraction based on semantic information of characters. A thesaurus of Chinese characters is conducted. A Chinese lexicon with 63,738 two-character words, together with the thesaurus of characters, are explored to learn semantic constraints between characters in Chinese word-formation, forming a semantic-tag-based HMM. The Baum-Welch re-estimation scheme is then chosen to train parameters of the HMM in the way of unsupervised learning. Various statistical measures for estimating the likelihood of a character string being a word are further tested. Large-scale experiments show that the results are promising: the F-score of this word extraction method can reach 68.5% whereas its counterpart, the character-based mutual information method, can only reach 47.5%.
机译:本文提出了一种基于字符语义信息的中文单词提取新方法。进行汉字词库。探索了一个包含63738个两个字符单词的汉语词典以及汉字词库,以学习汉字形成中汉字之间的语义约束,从而形成了基于语义标签的HMM。然后选择Baum-Welch重新估计方案,以无监督学习的方式训练HMM的参数。进一步测试了用于估计字符串是单词的可能性的各种统计方法。大规模实验表明,该结果是令人鼓舞的:该单词提取方法的F分数可以达到68.5%,而其对应的基于字符的互信息方法的F分数只能达到47.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号