【24h】

Corpus-Driven Annotation Enrichment

机译:语料库驱动的注释丰富

获取原文

摘要

A reference library can be described as a corpus of an individual composition of documents containing related work of research, documents of favorite authors, or proceedings of a conference. Enriching documents with meaningful annotations is beneficial for the performance of applications like semantic search, content aggregation, automated relationship discovery, query answering and information retrieval. Available (semi-) automatic annotation tools ignore the individual composition of documents in corpora by annotating documents with generic named-entity related data. In this paper, we present and unsupervised corpus-driven annotation enrichment approach considering the composition of documents and use an EM-like algorithm to enrich weakly annotated documents with meaningful annotations of related documents from the same corpus.
机译:参考图书馆可以描述为包含相关研究工作,喜爱的作者的文件或会议记录的单个文件组成的语料库。使用有意义的注释来丰富文档对于诸如语义搜索,内容聚合,自动关系发现,查询回答和信息检索之类的应用程序的性能是有益的。可用的(半)自动注释工具通过使用通用的命名实体相关数据对文档进行注释,从而忽略了语料库中文档的单独组成。在本文中,我们提出了一种无监督的语料库驱动的注释丰富化方法,该方法考虑了文档的组成,并使用类似于EM的算法,利用来自同一语料库的相关文档的有意义的注释来丰富弱注释的文档。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号