首页> 外文期刊>Journal of Language Modelling >Populating a multilingual ontology of proper names from open sources
【24h】

Populating a multilingual ontology of proper names from open sources

机译:从开源中填充专有名称的多语言本体

获取原文
       

摘要

Even if proper names play a central role in natural language processing (NLP) applications they are still under-represented in lexicons, annotated corpora, and other resources dedicated to text processing.? One of the main challenges is both the prevalence and the dynamicity of proper names. At the same time, large and regularly-updated knowledge sources containing partially-structured data, such as Wikipedia or GeoNames, are publicly available and contain large numbers of proper names. We present a method for a semi-automatic enrichment of Prolexbase, an existing multilingual ontology of proper names dedicated to natural language processing, with data extracted from these open sources in three languages: Polish, English and French. Fine-grained data extraction and integration procedures allow the user to enrich previous contents of Prolexbase with new incoming data. All data are manually validated and available under an open licence.
机译:即使专有名称在自然语言处理(NLP)应用程序中起着核心作用,它们在词典,带注解的语料库和其他专用于文本处理的资源中的代表性仍然不足。主要挑战之一是专有名称的普遍性和动态性。同时,包含部分结构化数据的大型且定期更新的知识源(例如Wikipedia或GeoNames)是公开可用的,并且包含大量专有名称。我们提出了一种Prolexbase的半自动增值方法,Prolexbase是专门用于自然语言处理的现有专有名称的多语言本体,其数据来自以下三种语言的开放源代码:波兰语,英语和法语。细粒度的数据提取和集成过程使用户可以使用新的传入数据丰富Prolexbase的先前内容。所有数据均经过手动验证,并在公开许可下可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号