Web person disambiguation using hierarchical co-reference model

机译：使用分层共指模型的Web人消歧

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

As one of the entity disambiguation tasks, Web Person Disambiguation (WPD) identifies different persons with the same name by grouping search results for different persons into different clusters. Most of current research works use clustering methods to conduct WPD. These approaches require the tuning of thresholds that are biased towards training data and may not work well for different datasets. In this paper, we propose a novel approach by using pairwise co-reference modeling for WPD without the need to do threshold tuning. Because person names are named entities, disambiguation of person names can use semantic measures using the so called co-reference resolution criterion across different documents. The algorithm first forms a forest with person names as observable leaf nodes. It then stochastically tries to form an entity hierarchy by merging names into a sub-tree as a latent entity group if they have co-referential relationship across documents. As the joining/partition of nodes is based on co-reference-based comparative values, our method is independent of training data, and thus parameter tuning is not required. Experiments show that this semantic based method has achieved comparable performance with the top two state-of-the-art systems without using any training data. The stochastic approach also makes our algorithm to exhibit near linear processing time much more efficient than HAC based clustering method. Because our model allows a small number of upper-level entity nodes to summarize a large number of name mentions, the model has much higher semantic representation power and it is much more scalable over large collections of name mentions compared to HAC based algorithms.

机译：作为实体消除歧义的任务之一，Web Person Disambiguation（WPD）通过将针对不同人员的搜索结果分组到不同的群集中来识别具有相同名称的不同人员。当前大多数研究工作都使用聚类方法进行WPD。这些方法需要调整偏向训练数据的阈值，并且可能不适用于不同的数据集。在本文中，我们提出了一种通过使用成对共参考建模进行WPD的新颖方法，而无需进行阈值调整。因为人名是命名实体，所以人名的歧义可以使用语义度量，该语义度量使用跨不同文档的所谓共引用解析标准。该算法首先形成一个以人名作为可观察叶节点的森林。然后，如果名称在文档之间具有关联关系，则通过将名称合并到作为潜在实体组的子树中，以随机方式尝试形成实体层次结构。由于节点的加入/分区基于基于共同引用的比较值，因此我们的方法独立于训练数据，因此不需要参数调整。实验表明，这种基于语义的方法在不使用任何训练数据的情况下，可以与前两个最先进的系统实现相当的性能。随机方法还使我们的算法比基于HAC的聚类方法具有更高的线性处理时间。由于我们的模型允许少量的上层实体节点汇总大量的名称提及，因此与基于HAC的算法相比，该模型具有更高的语义表示能力，并且在大量名称提及方面具有更大的可扩展性。

著录项

作者
Xu J; Lu Q; Li ML; Li WJ;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Personal Data Retrieval and Disambiguation in Web Person Search [J] . Yuliang WEI, Guodong XIN, Wei WANG, IEICE transactions on information and systems . 2019,第2期

机译：Web Person Search中的个人数据检索和歧义消除
2. A Graph-based Approach to Person Name Disambiguation in Web [J] . HOJJAT EMAMI ACM Transactions on Management Information Systems . 2019,第2期

机译：Web中基于图的人名消歧方法
3. WEB PERSON NAME DISAMBIGUATION USING SOCIAL LINKS AND ENRICHED PROFILE INFORMATION [J] . Emami Hojjat, Shirazi Hossein, Barforoush Ahmad Abdollahzadeh Computing and informatics . 2018,第6期

机译：使用社交链接和丰富的个人资料信息消除Web人名
4. Web Person Disambiguation Using Hierarchical Co-reference Model [C] . Jian Xu, Qin Lu, Minglei Li, International conference on intelligent text processing and computational linguistics . 2015

机译：使用分层共指模型的Web Person消歧
5. Modeling and thematic analysis of neighborhood structures in the Web and hierarchical identification of Web communities. [D] . Nargis, Isheeta. 2007

机译：Web社区结构的建模和主题分析以及Web社区的层次结构标识。
6. Functional impairment trajectories among persons with HIV disease: a hierarchical linear models approach. [O] . S Crystal, U Sambamoorthi 1996

机译：艾滋病毒感染者的功能障碍轨迹：分级线性模型方法。
7. Selecting Hierarchical Clustering Cut Points for Web Person-Name Disambiguation [O] . Jun Gong, Douglas W. Oard 2012

机译：为Web人名消除歧义选择层次聚类切入点
8. Models of Distribution Computation: Behavior Characterization of Intelligent Problem Solving for an Agent Hierarchy in a Competitive (Web) Environment. [R] . Lorincz, A. 2002

机译：分布计算模型：竞争（Web）环境下代理层次智能问题求解的行为特征。

Web person disambiguation using hierarchical co-reference model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅