【24h】

Parallel Community Detection for Cross-Document Coreference

机译:交叉文档COSEREDS的平行社区检测

获取原文

摘要

This paper presents a highly parallel solution for cross-document co reference resolution, which can deal with billions of documents that exist in the current web. At the core of our solution lies a novel algorithm for community detection in large scale graphs. We operate on graphs which we construct by representing documents' keywords as nodes and the colocation of those keywords in a document as edges. We then exploit the particular nature of such graphs where co referent words are topologically clustered and can be efficiently discovered by our community detection algorithm. The accuracy of our technique is considerably higher than that of the state of the art, while the convergence time is by far shorter. In particular, we increase the accuracy for a baseline dataset by more than 15% compared to the best reported result so far. Moreover, we outperform the best reported result for a dataset provided for the Word Sense Induction task in SemEval 2010.
机译:本文介绍了跨文档CO参考分辨率的高度平行解决方案,可以处理当前网站中存在的数十亿个文件。 在我们的解决方案的核心下,在大规模图表中进行社区检测的新算法。 我们通过代表文档的关键字作为节点以及文档中的这些关键字作为边缘的那些关键字的分配来操作。 然后,我们利用了这样的图表的特殊性,其中CO指示词是拓扑聚类,可以通过我们的社区检测算法有效地发现。 我们技术的准确性远高于现有技术的精度,而收敛时间则较短。 特别是,与到目前为止的最佳报告结果相比,我们将基线数据集的准确性提高了15%以上。 此外,我们优先于2010年Semeval 2010中提供的DataSet的最佳报告结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号