首页> 外文OA文献 >JRC-Names: Multilingual Entity Name variants and titles as Linked Data
【2h】

JRC-Names: Multilingual Entity Name variants and titles as Linked Data

机译:JRC-名称:多语言实体名称变体和作为链接数据的标题

摘要

Since 2004 the European Commission’s Joint Research Centre (JRC) has been analysing the online version ofprinted media in over twenty languages and has automatically recognised and compiled large amounts of namedentities (persons and organisations) and their many name variants. The collected variants not only include standardspellings in various countries, languages and scripts, but also frequently found spelling mistakes or lesser usedname forms, all occurring in real-life text (e.g. Benjamin/Binyamin/Bibi/Benyamín/Biniamin/Беньямин/ بنیامین Netanyahu/Netanjahu/Nétanyahou/Netahnyahu/Нетаньяху/ نتنیاهو ). This entity name variant data, known as JRCNames,has been available for public download since 2011. In this article, we report on our efforts to renderJRC-Names as Linked Data (LD), using the lexicon model for ontologies lemon. Besides adhering to SemanticWeb standards, this new release goes beyond the initial one in that it includes titles found nextto the names, as well as date ranges when the titles and the name variants were found. It also establisheslinks towards existing datasets, such as DBpedia and Talk-Of-Europe. As multilingual linguistic linkeddataset, JRC-Names can help bridge the gap between structured data and natural languages, thus supportinglarge-scale data integration, e.g. cross-lingual mapping, and web-based content processing, e.g. entity linking.JRC-Names is publicly available through the dataset catalogue of the European Union’s Open Data Portal.
机译:自2004年以来,欧洲委员会的联合研究中心(JRC)一直在分析20多种语言的印刷媒体的在线版本,并自动识别和编译了大量的命名实体(人员和组织)及其许多名称变体。收集的变体不仅包括各个国家/地区,语言和文字的标准拼写,而且还经常出现拼写错误或较少使用的名称形式,它们都出现在真实的文本中(例如,Benjamin / Binyamin / Bibi /Benyamín/ Biniamin /Беньямин/بنیامینNetanyahu / Netanjahu /Nétanyahou/ Netahnyahu /Нетаньяху/نتنیاهو)。自2011年以来,该实体名称变体数据(称为JRCNames)已可供公众下载。在本文中,我们报告了我们使用本体柠檬的词典模型将JRC名称呈现为链接数据(LD)的努力。除了遵循SemanticWeb标准外,此新版本还超出了最初的版本,因为它包括名称旁边的标题以及找到标题和名称变体的日期范围。它还建立了指向现有数据集的链接,例如DBpedia和Talk-Of-Europe。作为多语言语言链接数据集,JRC名称可以帮助弥合结构化数据与自然语言之间的鸿沟,从而支持大规模数据集成,例如跨语言映射和基于Web的内容处理,例如实体链接。JRC名称可通过欧盟开放数据门户网站的数据集目录公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号