首页> 外文期刊>Machine translation >Methods for extracting and classifying pairs of cognates and false friends
【24h】

Methods for extracting and classifying pairs of cognates and false friends

机译:提取成对的认识和错误朋友的方法

获取原文
获取原文并翻译 | 示例
       

摘要

The identification of cognates has attracted the attention of researchers working in the area of Natural Language Processing, but the identification of false friends is still an under-researched area. This paper proposes novel methods for the automatic identification of both cognates and false friends from comparable bilingual corpora. The methods are not dependent on the existence of parallel texts, and make use of only monolingual corpora and a bilingual dictionary necessary for the mapping of co-occurrence data across languages. In addition, the methods do not require that the newly discovered cognates or false friends are present in the dictionary and hence are capable of operating on out-of-vocabulary expressions. These methods are evaluated on English, French, German and Spanish corpora in order to identify English-French, English-German, English-Spanish and French-Spanish pairs of cognates or false friends. The experiments were performed in two settings: (i) assuming 'ideal' extraction of cognates and false friends from plain-text corpora, i.e. when the evaluation data contains only cognates and false friends, and (ii) a real-world extraction scenario where cognates and false friends have to first be identified among words found in two comparable corpora in different languages. The evaluation results show that therndeveloped methods identify cognates and false friends with very satisfactory results for both recall and precision, with methods that incorporate background semantic knowledge, in addition to co-occurrence data obtained from the corpora, delivering the best results.
机译:认知对象的识别吸引了自然语言处理领域的研究人员的注意,但是识别假朋友的领域仍处于研究不足的领域。本文提出了一种新的方法,可以自动从可比较的双语语料库中识别出认知和错误的朋友。这些方法不依赖于并行文本的存在,而是仅使用单语语料库和双语词典来在多种语言之间映射共现数据。此外,这些方法不需要在字典中存在新发现的同位词或虚假朋友,因此能够对非语音表达进行操作。这些方法在英语,法语,德语和西班牙语语料库上进行评估,以识别英语-法语,英语-德语,英语-西班牙语和法语-西班牙语对的同伴或假朋友。实验是在两种情况下进行的:(i)假设从纯文本语料库中“理想”地提取了认知和虚假的朋友,即当评估数据仅包含认知和虚假的朋友时,以及(ii)现实世界中的提取场景首先必须在两个可比较的语料库中以不同语言找到的单词中识别出认知对象和虚假朋友。评估结果表明,这种先进的方法可以识别出同人和假朋友,其召回率和精确度都非常令人满意,除了从语料库中获得的同现数据外,还结合了背景语义知识的方法可以提供最佳结果。

著录项

  • 来源
    《Machine translation》 |2007年第1期|29-53|共25页
  • 作者单位

    Research Institute for Information and Language Processing, University of Wolverhampton,Stafford Street, Wolverhampton WV1 1SB, UK;

    Research Institute for Information and Language Processing, University of Wolverhampton,Stafford Street, Wolverhampton WV1 1SB, UK;

    Mathematics and Informatics Department, University of Plovdiv, 4003 Plovdiv, Bulgaria;

    Research Institute for Information and Language Processing, University of Wolverhampton,Stafford Street, Wolverhampton WV1 1SB, UK;

  • 收录信息 美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    cognates; faux amis; orthographic similarity; distributonal similarity; semantic similarity; translational equivalence;

    机译:认识人造阿米斯正字法相似度;分布相似性语义相似度;翻译对等;
  • 入库时间 2022-08-18 00:40:29

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号