首页> 外文会议>International Conference on Text, Speech and Dialogue >Disyllabic Chinese Word ExtractionBased on Character Thesaurus and Semantic Constraints in Word-Formation
【24h】

Disyllabic Chinese Word ExtractionBased on Character Thesaurus and Semantic Constraints in Word-Formation

机译:Distyllabic汉语词提取对字符杂耍和词组中的语义约束

获取原文

摘要

This paper presents a novel approach to Chinese disyllabic word extraction based on semantic information of characters. Two thesauri of Chinese characters, manually-crafted and machine-generated, are conducted. A Chinese wordlist with 63,738 two-character words, together with the character thesauri, are explored to learn semantic constraints between characters in Chinese word-formation, resulting in two types of semantic-tag-based HMM. Experiments show that: (1) both schemes outperform their character-based counterpart; (2) the machine-generated thesaurus outperforms the hand-crafted one to some extent in word extraction, and (3) the proper combination of semantic-tag-based and character-based methods could benefit word extraction.
机译:本文提出了一种基于字符语义信息的中文外人字词提取的新方法。进行了两位汉字,手动制作和机器产生的汉语。探索了一个中文字列表,与角色词库一起与角色词库一起学习中文字形中字符之间的语义约束,从而产生两种类型的基于语义标签的HMM。实验表明:(1)两种方案都优于基于性格的对应物; (2)机器生成的词库优于一定程度的手工制作的词组,而(3)基于语义标签和基于角色的方法的适当组合可以有益于提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号