首页> 外文会议>6th workshop on ontologies and lexical resources. >Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization
【24h】

Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization

机译:具有上下文相似性和超大规模文本分类的Web文档的跨文档共参考

获取原文
获取原文并翻译 | 示例

摘要

Cross Document Coreference (CDC) is the task of constructing the coreference chain for mentions of a person across a set of documents. This work offers a holistic view of using document-level categories, sub-document level context and extracted entities and relations for the CDC task. We train a categorization component with an efficient flat algorithm using thousands of ODP categories and over a million web documents. We propose to use ranked categories as coreference information, particularly suitable for web documents that are widely different in style and content. An ensemble composite coreference function, amenable to inactive features, combines these three levels of evidence for disambiguation. A thorough feature importance study is conducted to analyze how these three components contribute to the coreference results. The overall solution is evaluated using the WePS benchmark data and demonstrate superior performance.
机译:跨文档共同引用(CDC)是构建共同引用链的任务,以跨一组文档提及某人。这项工作为使用CDC任务的文档级别类别,子文档级别上下文以及提取的实体和关系提供了一个整体视图。我们使用数千种ODP类别和超过一百万个Web文档使用有效的平面算法来训练分类组件。我们建议使用排名类别作为共同参考信息,尤其适合样式和内容差异很大的Web文档。适用于非活动功能的集成复合共指函数将这三个层次的证据组合在一起,从而消除歧义。进行了全面的功能重要性研究,以分析这三个组成部分如何对共同参考结果做出贡献。整体解决方案使用WePS基准数据进行评估,并展示出卓越的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号