首页> 外文会议>Advances in Natural Language Processing >Automatically Extracting Personal Name Aliasesfrom the Web
【24h】

Automatically Extracting Personal Name Aliasesfrom the Web

机译:从Web自动提取个人名称别名

获取原文
获取原文并翻译 | 示例

摘要

Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.
机译:提取实体的别名对于各种任务很重要,例如,确定实体之间的关系,进行网络搜索和消除实体歧义。为了正确提取实体之间的关系,必须首先识别那些实体。我们提出了一种新颖的方法来使用自动提取的词汇模式来查找给定名称的别名。我们利用一组已知的名称及其别名作为训练数据,并提取词汇模式,这些模式从网络搜索引擎返回的文本摘要中传达与名称别名相关的信息。然后使用模式来查找给定名称的候选别名。我们使用锚文本设计单词共现模型,并使用它定义各种排名分数,以衡量名称与候选别名之间的关联。使用支持向量机将排名分数与基于页数的关联度量集成在一起,以利用可靠的别名检测方法。所提出的方法胜过许多基线和先前在人名数据集上进行别名提取的工作,实现了统计上显着的平均倒数排名0.6718。使用位置名称和日语个人名称的数据集进行的实验表明,有可能扩展提议的方法以提取不同类型的命名实体和其他语言的别名。此外,使用所提出的方法提取的别名在关系检测任务中将召回率提高了20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号