首页> 外文会议>2013 International Conference of Soft Computing and Pattern Recognition >Identifying coordinated compound words for Vietnamese word segmentation
【24h】

Identifying coordinated compound words for Vietnamese word segmentation

机译:识别用于越南语分词的协调复合词

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a dictionary-based method for determining coordinated compound words in Vietnamese. The main idea to determine whether two contiguous simple words in a text forms a coordinated compound word is based on their properties, part-of-speeches and the similarity between their definitions in the dictionary of the Vietnamese Computational Lexicon (VCL). We also based on the sets of synonym and antonym to identify, recognize, and establish a list of coordinated compound words (coordinated di-syllable phrases). We have used a number of rules to identify 3 or 4 syllable phrases/idioms based on relations of coordinated di-syllable phrases. We carried out two major experiments: one for identifying and creating a list of coordinated compounds, the other for improving the accuracy of Vietnamese word segmentation. The second experiment showed that the word segmentation F-scores increases from 0.11% to 0.41% (the error rate decreases from 3.32% to 12.6%). This is a new approach and highly practical value.
机译:本文提出了一种基于字典的越南文复合词确定方法。确定文本中两个连续的简单单词是否构成协调复合单词的主要思想是基于它们的属性,词性以及越南语计算词典(VCL)词典中两个定义之间的相似性。我们还基于同义词和反义词的集合来识别,识别和建立协调的复合词(协调的双音节短语)列表。我们根据协调的双音节短语的关系使用了许多规则来识别3或4个音节短语/习惯用语。我们进行了两个主要实验:一个用于识别和创建协调化合物的列表,另一个用于提高越南语分词的准确性。第二个实验表明,分词F分数从0.11%增加到0.41%(错误率从3.32%减少到12.6%)。这是一种新方法,具有很高的实用价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号