首页> 外文会议>9th International conference on language resources and evaluation >Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them
【24h】

Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them

机译:比较专注的爬虫和从中获得的翻译资源的质量

获取原文

摘要

Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domain-specific natural language processing. One way to gather documents related to a specific topic of interest is to traverse a portion of the web graph in a targeted way, using focused crawling algorithms. In this paper, we compare several focused crawling algorithms using them to collect comparable corpora on a specific domain. Then, we compare the evaluation of the focused crawling algorithms to the performance of linguistic processes executed after training with the corresponding generated corpora. Also, we propose a novel approach for focused crawling, exploiting the expressive power of multiword expressions.
机译:可比较的Corpora已被用作并行Corpora作为涉及域特定的自然语言处理的计算任务的资源的替代方案。收集与感兴趣的特定话题相关的文档的一种方法是使用聚焦爬行算法以目标方式遍历一部分Web图。在本文中,我们比较了几种聚焦爬行算法,使用它们收集特定域上的可比较。然后,我们将聚焦爬行算法的评估进行比较与相应生成的语料库训练后执行的语言过程的性能。此外,我们提出了一种重点爬行的新方法,利用多个表达式的表现力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号