Helping Editors Choose Better Seed Sets for Entity Set Expansion

机译：帮助编辑人员为实体集扩展选择更好的种子集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sets of named entities are used heavily at commercial search engines such as Google, Yahoo and Bing. Acquiring sets of entities typically consists of combining semi-supervised expansion algorithms with manual cleaning of the resulting expanded sets. In this paper, we study the effects of different seed sets in a state-of-the-art semi-supervised expansion system and show a tremendous variation in expansion performance depending on the choice of seeds. We further show that human editors, in general, provide very bad seed sets, which perform well-below the average random seed set. We identify three factors of seed set composition, namely proto-typicality, ambiguity and coverage, and we investigate their effects on expansion performance. Finally, we propose various automatic systems for improving editor-generated seed sets, which seek to remove ambiguous and other error-prone seed instances. An extensive experimental analysis shows that expansion quality, measured in R-precision, can be improved on average by a maximum of 46% by removing the right seeds from a seed set. Our automatic methods outperform the human editors seed sets and on average improve expansion performance by up to 34% over the original seed sets.

机译：命名实体集在诸如Google，Yahoo和Bing之类的商业搜索引擎中大量使用。获取实体集通常包括将半监督扩展算法与人工清理生成的扩展集相结合。在本文中，我们研究了最新的半监督扩展系统中不同种子集的影响，并显示了取决于种子选择的扩展性能的巨大变化。我们进一步表明，人类编辑通常会提供非常差的种子集，其表现要差于平均随机种子集。我们确定了种子集组成的三个因素，即原型性，歧义性和覆盖率，并研究了它们对扩展性能的影响。最后，我们提出了各种自动系统来改进编辑器生成的种子集，以寻求消除歧义和其他容易出错的种子实例。广泛的实验分析表明，通过从种子集中移除合适的种子，以R精度衡量的扩展质量平均最多可以提高46％。我们的自动方法优于人工编辑的种子集，平均而言，其扩展性能比原始种子集提高了34％。

著录项

来源
《ACM conference on information and knowledge management;CIKM 09》|2009年|P.225-234|共10页
会议地点
作者
Vishnu Vyas; Patrick Pantel; Eric Crestan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
seed set expansion; information extraction; seed set refinement;

机译：种子集扩大;信息提取;种子集细化;

相似文献

外文文献
中文文献
专利

1. Entity set expansion in knowledge graph: a heterogeneous information network perspective [J] . Chuan SHI, Jiayu DING, Xiaohuan CAO, Frontiers of computer science . 2021,第1期

机译：实体在知识图中设置扩展：异构信息网络视角
2. Pattern Rank+NN: A Ranking Framework Bringing User Behaviors into Entity Set Expansion from Web Search Queries [J] . Xiao Zhijun, Li Cuiping, Chen Hong ACM transactions on the web . 2020,第3期

机译：模式排名+ NN：将用户行为带入实体的排名框架从Web搜索查询中设置扩展
3. Neural variational entity set expansion for automatically populated knowledge graphs [J] . Rastogi Pushpendre, Poliak Adam, Lyzinski Vince, Information retrieval . 2019,第3a4期

机译：用于自动填充知识图的神经变分实体集扩展
4. Helping editors choose better seed sets for entity set expansion [C] . Vishnu Vyas, Patrick Pantel, Eric Crestan 18th ACM conference on information and knowledge management 2009 . 2009

机译：帮助编辑者选择更好的种子集进行实体集扩展
5. Improving hierarchical multiclass perceptrons for entity detection using Google Sets and pronoun disambiguation. [D] . Quinn, Michael. 2008

机译：使用Google集合和代词消除歧义，改进用于实体检测的分层多类感知器。
6. Landscape of Fluid Sets of Hairpin-Derived 21-/24-nt-Long Small RNAs at Seed Set Uncovers Special Epigenetic Features in Picea glauca [O] . Yang Liu, Yousry A. El-Kassaby 2017

机译：种子集发夹衍生的21- / 24-nt-长小RNA的流体集景观揭示了青云杉的特殊表观遗传特征
7. End-to-End Bootstrapping Neural Network for Entity Set Expansion [O] . Lingyong Yan, Xianpei Han, Ben He, 2020

机译：用于实体集扩展的端到端引导神经网络
8. Entity List Completion Using Set Expansion Techniques [R] . Dalvi, B., Callan, J., Cohen, W. 2011

机译：使用设置扩展技术完成实体列表

Helping Editors Choose Better Seed Sets for Entity Set Expansion

摘要

著录项

相似文献

相关主题

期刊订阅