首页> 外文期刊>Intelligent data analysis >Mining monolingual and bilingual corpora
【24h】

Mining monolingual and bilingual corpora

机译:挖掘单语和双语语料库

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we describe two new methods of mining monolingual and bilingual text corpora that heavily rely on the use of association rules and triggers. The association rules based method is firstly applied in query expansion. The conducted experiments on French newspapers and on a set of scientific documents show that the proposed approach outperforms the baseline model. The second method focuses on the machine translation and is motivated by the results of triggers on statistical language modeling. In order to build up a translation table, association rules and triggers are then generalized to mine bilingual corpora. In this respect, we propose respectively the concepts of inter-lingual association rules and inter-lingual triggers. Both methods have been integrated in a real statistical machine translation. Carried out experiments highlight the practical feasibility of the introduced approaches in the context of machine translation and show that inter-lingual triggers achieve better results than those obtained using the third IBM model.
机译:在本文中,我们描述了两种严重依赖关联规则和触发器的挖掘单语和双语文本语料库的新方法。首先将基于关联规则的方法应用于查询扩展。在法国报纸和一组科学文献上进行的实验表明,所提出的方法优于基线模型。第二种方法侧重于机器翻译,并且受到统计语言建模的触发结果的激励。为了建立翻译表,然后将关联规则和触发器通用化以挖掘双语语料库。在这方面,我们分别提出了语言间关联规则和语言间触发的概念。两种方法都已集成到实际的统计机器翻译中。进行的实验突出了在机器翻译的上下文中引入的方法的实际可行性,并表明与使用第三个IBM模型获得的语言触发相比,语际触发获得了更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号