首页> 外文会议>Symposium on Computational Intelligence for Security and Defense Applications >He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis
【24h】

He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis

机译:他说,他说。特里亚说,帕特说。实体提取,关系提取和社会网络分析的参考决议值多少

获取原文

摘要

Anaphora resolution (AR) identifies the entities that pronouns refer to. Coreference resolution (CR) associates the various instances of an entity with each other. Given our data, our findings suggest that deduplicating and normalizing text data by using AR and CR impacts the literal mention, frequency, identity, and existence of about 75% of the entities in texts. Results are more moderate on the relation level: 13% of the links are modified and 8% are removed. Performing social network analysis on the relations extracted from texts leads to findings contrary to the results from corpus statistics: AR and CR cause different directions in the change of network analytical measures, AR alters these measures more strongly than CR does, and each technique identifies a different set of most crucial nodes. Bringing the results from corpus statistics and social network analysis together suggests that CR is more effective in normalizing entities, while AR is a more powerful technique for splitting up generic nodes into named entities with adjusted weights. Data changes due to AR and CR are qualitatively and quantitatively meaningful: the statistical properties of entities and relations change along with their identities. Consequently, the relational data represent the underlying social structure more truthfully. Our results can support analysts in eliminating some misinterpretations of graphs distilled from texts and in selected those nodes from social networks on which reference resolution should be performed.
机译:Anaphora分辨率(AR)识别代词所指的实体。 Coreference分辨率(CR)将实体的各种实例相互关联。鉴于我们的数据,我们的研究结果表明,使用AR和CR重新数据删除和规范化文本数据,并影响文字中的文字,频率,身份以及存在约75%的特征。结果在关系水平上更适中:修改了13%的链接,8%被移除。对从文本提取的关系进行社交网络分析导致结果与来自语料库统计的结果相反:AR和CR在网络分析措施的变化中引起不同的方向,AR更强烈地比CR更强烈,并且每个技术都识别不同的大多数关键节点集。将结果从语料库统计和社交网络分析中提出,CR在规范化实体中更有效,而AR是一种更强大的技术,用于将通用节点分成具有调整后的权重的命名实体。由于AR和CR导致的数据变化是定性和定量有意义的:实体和关系的统计特性与他们的身份发生变化。因此,关系数据更加结实代表潜在的社会结构。我们的结果可以支持分析师,消除从文本蒸馏的图表的一些误解,并在从社交网络中所选择的那些节点应该进行参考分辨率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号