首页> 外文会议>Pacific Asia Conference on Language, Information and Computation; 20061101-03; Wuhan(CN) >Multi-feature Based Chinese-English Named Entity Extraction from Comparable Corpora
【24h】

Multi-feature Based Chinese-English Named Entity Extraction from Comparable Corpora

机译:基于可比语料库的基于多特征的汉英命名实体提取

获取原文
获取原文并翻译 | 示例

摘要

Bilingual Named Entity Extraction is important to some cross language information processes such as machine translation (MT), cross-lingual information retrieval (CLIR), etc. A lot of previous work extracted bilingual Named Entities from parallel corpus. Here we propose a multi-feature based method to extract bilingual Named Entities from comparable corpus. We first recognize the Chinese and English Named Entities respectively from the Chinese and English part of the comparable corpus. Then all the feature scores are calculated for every possible pair of Chinese and English Named Entities. At last we combine these feature scores together and decide which pairs are mutual translations. For translation score calculation, we didn't use the formula of IBM model 1 like previous approach. In stead, we used a modified edit distance to take the order of words into consideration. Experiment shows that the F-score of this method increased by 11 %. And with the multi-feature integration strategy encouraging results are obtained.
机译:双语命名实体提取对某些跨语言信息处理(例如机器翻译(MT),跨语言信息检索(CLIR)等)很重要。许多以前的工作都是从并行语料库中提取双语命名实体。在这里,我们提出了一种基于多特征的方法来从可比较的语料库中提取双语命名实体。我们首先从可比语料库的中文和英文部分分别识别中文和英文命名实体。然后,为每个可能的中文和英文命名实体对计算所有特征分数。最后,我们将这些特征分数组合在一起,并确定哪些对是互译。对于翻译分数计算,我们没有像以前的方法那样使用IBM模型1的公式。相反,我们使用修改后的编辑距离来考虑单词的顺序。实验表明,该方法的F得分提高了11%。并通过多功能集成策略获得了令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号