首页> 外文会议>ANLP 2011 >JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource
【24h】

JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource

机译:JRC名称:可自由的,高度多语言命名实体资源

获取原文

摘要

This paper describes a new, freely available, highly multilingual named entity resource for person and organisation names that has been compiled over seven years of large-scale multilingual news analysis combined with Wikipedia mining, resulting in 205,000 person and organisation names plus about the same number of spelling variants written in over 20 different scripts and in many more languages. This resource, produced as part of the Europe Media Monitor activity (EMM, http://emm.newsbrief.eu/overview.html), can be used for a number of purposes. These include improving name search in databases or on the internet, seeding machine learning systems to leam named entity recognition rules, improve machine translation results, and more. We describe here how this resource was created; we give statistics on its current size; we address the issue of morphological inflection; and we give details regarding its functionality. Updates to this resource will be made available daily.
机译:本文介绍了一个新的,可自由,高度多语言命名的实体资源,适用于已编译七年的大型多语言新闻分析,与维基百科挖掘相结合的人和组织名称,导致205,000人和组织名称加上相同的数字用20多个不同的脚本和更多语言编写的拼写变体。此资源作为欧洲媒体监视器活动(EMM,http://emm.newsbrief.eu/overview.html)的一部分,可以用于许多目的。这些包括改进数据库中的名称搜索,或在互联网上,将播种机学习系统为LeaM命名实体识别规则,提高机器翻译结果等。我们在这里描述了如何创建该资源;我们向目前规模提供统计数据;我们解决了形态变形问题;我们提供有关其功能的详细信息。此资源的更新将每天提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号