Big Data versus the Crowd: Looking for Relationships in All the Right Places

机译：大数据与人群：在正确的地方寻找关系

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classically, training relation extractors relies on high-quality, manually annotated training data, which can be expensive to obtain. To mitigate this cost, NLU researchers have considered two newly available sources of less expensive (but potentially lower quality) labeled data from distant supervision and crowd sourcing. There is, however, no study comparing the relative impact of these two sources on the precision and recall of post-learning answers. To fill this gap, we empirically study how state-of-the-art techniques are affected by scaling these two sources. We use corpus sizes of up to 100 million documents and tens of thousands of crowd-source labeled examples. Our experiments show that increasing the corpus size for distant supervision has a statistically significant, positive impact on quality (F1 score). In contrast, human feedback has a positive and statistically significant, but lower, impact on precision and recall.

机译：传统上，训练关系提取器依赖于高质量，手动注释的训练数据，而获取这些数据可能会很昂贵。为了减轻此成本，NLU研究人员考虑了来自远程监管和众包的两个新的价格较低（但质量可能较低）标签数据的可用来源。但是，尚无研究比较这两种来源对学习后答案的准确性和记忆力的相对影响。为了填补这一空白，我们通过经验研究缩放这两个来源如何影响最新技术。我们使用多达1亿个文档的语料库大小和成千上万个带有众包标签的示例。我们的实验表明，增加远程监控的语料库大小对质量（F1评分）具有统计学上的显着积极影响。相比之下，人工反馈对准确性和召回率的影响为积极且具有统计学意义，但影响较小。

著录项

来源
《Annual meeting of the Association for Computational Linguistics;ACL 2012》|2012年|p.825-834|共10页
会议地点
作者
Ce Zhang; Feng Niu; Christopher Re; Jude Shavlik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Two's company, three's a crowd: Differences in dominance relationships in isolated versus socially embedded pairs of fish [J] . Chase ID, Tovey C, Murch P Behaviour . 2003,第Pta10期

机译：二人陪伴，三人成群：孤立的和社交嵌入的鱼对中的优势关系的差异
2. Ground-motion predictions from empirical attenuation relationships versus recorded data: The case of the 1997-1998 Umbria-Marche, central Italy, strong-motion data set [J] . Bindi D, Luzi L, Pacor F, Bulletin of the Seismological Society of America . 2006,第3期

机译：根据经验衰减关系与记录数据的地面运动预测：以意大利中部1997-1998年翁布里亚-马尔什为例，强烈运动数据集
3. Ground-motion predictions from empirical attenuation relationships versus recorded data: The case of the 1997-1998 Umbria-Marche, central Italy, strong-motion data set [J] . Bindi D, Luzi L, Pacor F, Bulletin of the Seismological Society of America . 2006,第3期

机译：根据经验衰减关系与记录数据的地面运动预测：以意大利中部1997-1998年翁布里亚-马尔什为例，强烈运动数据集
4. Big Data versus the Crowd: Looking for Relationships in All the Right Places [C] . Ce Zhang, Feng Niu, Christopher Re, Annual meeting of the Association for Computational Linguistics . 2012

机译：大数据与人群：在所有正确的地方寻找关系
5. The relationship between live crowds, competitive anxiety, and performance for high school basketball players [D] . Wright, Jason L. 2015

机译：高中篮球运动员的现场人群，竞争性焦虑与表现之间的关系
6. LSTrAP-Crowd: prediction of novel components of bacterial ribosomes with crowd-sourced analysis of RNA sequencing data [O] . Benedict Hew, Qiao Wen Tan, William Goh, 2020

机译：LSTRAP-人群：预测细菌核糖体的新组分具有RNA测序数据的人群源分析
7. EMOÇÕES E SATISFAÇÃO DE COMPRA EM SITUAÇÃO DE CROWDING: UMA ABORDAGEM CAPITAL VERSUS INTERIOR
DOI: 10.5585/remark.v11i3.2371 EMOTIONS AND PURCHASING SATISFACTION IN A SITUATION OF CROWDING: CAPITAL CITY VERSUS SMALL CITY [O] . Izabelle Quezado, Rômulo Bernardino Lopes da Costa, Verónica Peñaloza, 2012

机译：拥挤情况下的情绪和购买满意度：资本与内在方式的冲突（DOI：10.5585 / remark.v11i3.2371）拥挤情况下的情绪和购买满意度：首都城市与小城市

Big Data versus the Crowd: Looking for Relationships in All the Right Places

摘要

著录项

相似文献

相关主题

期刊订阅