首页> 外文学位 >Person name disambiguation in the multicultural and online setting.
【24h】

Person name disambiguation in the multicultural and online setting.

机译:多元文化和在线环境中的人名歧义消除。

获取原文
获取原文并翻译 | 示例

摘要

With the recent rise in popularity of social network sites, more and more personal information is becoming available online. Since a person's information is generally available in various formats across multiple sites, there are ever increasing interests in consolidating such personal information from multiple information sources. The goal of person name disambiguation is to group these people references to the corresponding real-world people. These references can range from personal homepages to name mentioned in news articles.;This dissertation examines the person name disambiguation problem in three different settings: (1) the name-based person name disambiguation, (2) the metadata based person name disambiguation and (3) the person name disambiguation in online setting. In the simplest setting -- the name-based person name disambiguation, records are disambiguated based purely on personal names. Since personal names are culture-dependent, we propose a novel name matching similarity that take the ethnicity of the names into consideration. More specifically, we propose a name-ethnicity classifier based on multinomial logistic regression and a ethnicity-sensitive name matching similarity based on Smith --Waterman alignment algorithm, where different cost matrices are applied depending on the ethnicity of the names being compared. In the second setting, we examine the person name disambiguation problem where additional information other than personal names is also available. These additional information includes both association information, such as one's affiliation and social network, and contextual information, such as the content of the document where one's name is mentioned. We propose a random forest-based method for aggregating multiple types of metadata information in determining whether two person name records or more should be linked. In the last setting, we consider the person name disambiguation problem from the real system perspective, where the number of people references to be disambiguated are not static but ever increasing. Here we propose an online clustering method with constraints for person name disambiguation, where the integrity of each person cluster is continuously enforced. Our experiment shows that our method outperforms the previous static clustering approach without constraints.
机译:随着最近社交网站的普及,越来越多的个人信息可以在线获得。由于一个人的信息通常可以在多个站点上以各种格式获得,因此越来越有兴趣整合来自多个信息源的此类个人信息。消除人名歧义的目的是将这些人的引用归为相应的现实世界中的人。这些参考文献的范围从个人主页到新闻文章中提到的姓名。本文研究了三种不同情况下的人名歧义消除问题:(1)基于名称的人名歧义消除;(2)基于元数据的人名歧义消除;以及( 3)在网上设置人名歧义。在最简单的设置中-基于名称的人名消歧,仅基于个人名称对记录进行消歧。由于个人名称与文化有关,因此我们提出了一种新的名称匹配相似性,其中考虑了名称的种族。更具体地说,我们提出了基于多项式Lo​​gistic回归的名称种族分类器和基于Smith-Waterman对齐算法的种族敏感名称匹配相似度,其中根据所比较名称的种族来应用不同的成本矩阵。在第二种设置中,我们研究了人名消除歧义的问题,其中还提供了除人名之外的其他信息。这些附加信息包括关联信息(例如一个人的隶属关系和社交网络)和上下文信息(例如一个人的名字提到的文档的内容)。我们提出了一种基于森林的随机方法,用于汇总多种类型的元数据信息,以确定是否应链接两个或更多个人姓名记录。在最后一种设置中,我们从真实系统的角度考虑人名消除歧义的问题,其中要消除歧义的人员引用的数量不是静态的而是不断增加的。在这里,我们提出了一种在线聚类方法,该方法具有用于消除人名歧义的约束,其中每个人聚类的完整性都得到了不断增强。我们的实验表明,我们的方法在没有约束的情况下优于以前的静态聚类方法。

著录项

  • 作者

    Treeratpituk, Pucktada.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Information technology.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 130 p.
  • 总页数 130
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:43:09

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号