Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization

机译：具有上下文相似性和超大规模文本分类的Web文档的跨文档共参考

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross Document Coreference (CDC) is the task of constructing the coreference chain for mentions of a person across a set of documents. This work offers a holistic view of using document-level categories, sub-document level context and extracted entities and relations for the CDC task. We train a categorization component with an efficient flat algorithm using thousands of ODP categories and over a million web documents. We propose to use ranked categories as coreference information, particularly suitable for web documents that are widely different in style and content. An ensemble composite coreference function, amenable to inactive features, combines these three levels of evidence for disambiguation. A thorough feature importance study is conducted to analyze how these three components contribute to the coreference results. The overall solution is evaluated using the WePS benchmark data and demonstrate superior performance.

机译：跨文档共同引用（CDC）是构建共同引用链的任务，以跨一组文档提及某人。这项工作为使用CDC任务的文档级别类别，子文档级别上下文以及提取的实体和关系提供了一个整体视图。我们使用数千种ODP类别和超过一百万个Web文档使用有效的平面算法来训练分类组件。我们建议使用排名类别作为共同参考信息，尤其适合样式和内容差异很大的Web文档。适用于非活动功能的集成复合共指函数将这三个层次的证据组合在一起，从而消除歧义。进行了全面的功能重要性研究，以分析这三个组成部分如何对共同参考结果做出贡献。整体解决方案使用WePS基准数据进行评估，并展示出卓越的性能。

著录项

来源
《6th workshop on ontologies and lexical resources.》|2010年|p.483-491|共9页
会议地点 Beijing(CN);Beijing(CN);Beijing(CN);Beijing(CN)
作者
Jian Huang; Pucktada Treeratpituk; Sarah M. Taylor; C. Lee Giles;
展开▼
作者单位

Information Sciences and TechnologyPennsylvania State University;

Information Sciences and TechnologyPennsylvania State University;

Lockheed Martin ISGS;

Information Sciences and Technology Pennsylvania State University;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;程序设计、软件工程;程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
2. CESS-A System to Categorize Bangla Web Text Documents [J] . Dhar Ankita, Mukherjee Himadri, Dash Niladri Sekhar, ACM transactions on Asian language information processing . 2020,第5期

机译：CESS-A系统分类Bangla Web文本文档
3. Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization [J] . Fuzhen Zhuang, Ping Luo, Hui Xiong, Statistical Analysis and Data Mining . 2011,第1期

机译：利用Word群集和文档类之间的关联进行跨域文本分类
4. Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization [C] . Jian Huang, Pucktada Treeratpituk, Sarah M. Taylor, International conference on computational linguistics . 2010

机译：使用上下文相似性和非常大的规模文本分类增强Web文档的交叉文档COSERERED
5. Coreference, cross-document coreference, and information extraction methodologies. [D] . Bagga, Amit. 1998

机译：共指，跨文档共指和信息提取方法。
6. Relevance of health level 7 clinical document architecture and integrating the healthcare enterprise cross-enterprise document sharing profile for managing chronic wounds in a telemedicine context [O] . Philippe Finet, Bernard Gibaud, Olivier Dameron, 2016

机译：健康级别7临床文档架构的相关性以及集成医疗保健企业跨企业文档共享配置文件以在远程医疗环境中管理慢性伤口的相关性
7. Cross-document coreference: An approach to capturing coreference without context [O] . Kristin Wright-Bettner, Martha Palmer, Guergana Savova, 2019

机译：跨文档COSEREDES：没有上下文捕获Coreference的方法
8. Cross-Document Coreference on a Large Scale Corpus [R] . Gooi, C. H. , Allan, J. 2004

机译：大规模语料库的跨文档共指

Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅