首页> 外文期刊>WSEAS Transactions on Computers >Increase The Efficiency of English-Chinese Sentence Alignment: Target Range Restriction and Empirical Selection of Stop Words
【24h】

Increase The Efficiency of English-Chinese Sentence Alignment: Target Range Restriction and Empirical Selection of Stop Words

机译:提高英汉句子对齐的效率:目标范围限制和停用词的经验选择

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we use a lexical method to do sentence alignment for an English-Chinese corpus. Past research shows that alignment using a dictionary involves a lot of word matching and dictionary look ups. To address these two issues, we first restrict the range of candidate target sentences, based on the location of the source sentence relative to the beginning of the text. Moreover, careful empirical selection of stop words, based on word frequencies in the source text, helps to reduce the number of dictionary look ups. Experimental results show that the amount of word matching can be cut down by 75% and that of dictionary look ups by as much as 43% without sacrificing precision and recall. Another experiment was also done with twenty New York Times articles with 598 sentences and 18395 words. The resulted precision is 95.6% and the recall is 93.8%. Among all predicted alignment, 86% of the alignment is 1:1 (one source sentence to one target sentence), 8% is 1:2, and 6% is 2:1. Further analysis shows that most errors occur in alignments of types 1:2 and 2:1. Future work should focus on problems with these two alignment types.
机译:在本文中,我们使用词法对英汉语料库进行句子对齐。过去的研究表明,使用字典进行对齐会涉及很多单词匹配和字典查找。为了解决这两个问题,我们首先根据源句子相对于文本开头的位置来限制候选目标句子的范围。此外,根据源文本中单词的频率,对经验词进行仔细的经验选择有助于减少词典查找的次数。实验结果表明,在不牺牲精度和查全率的情况下,单词匹配量可以减少75%,字典查找量可以减少多达43%。还用20条《纽约时报》的598条句子和18395个单词进行了实验。结果精度为95.6%,召回率为93.8%。在所有预测的对齐方式中,对齐方式的86%为1:1(一个源语句对一个目标语句),8%为1:2和6%为2:1。进一步的分析表明,大多数错误发生在类型1:2和2:1的对齐方式中。未来的工作应该集中在这两种对齐方式的问题上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号