首页> 外文会议>International conference on management of data >Designing a Scalable Crowdsourcing Platform
【24h】

Designing a Scalable Crowdsourcing Platform

机译:设计可扩展的众包平台

获取原文

摘要

Computers are extremely efficient at crawling, storing and processing huge volumes of structured data. They are great at exploiting link structures to generate valuable knowledge. Yet there are plenty of data processing tasks that are difficult today. Labeling sentiment, moderating images, and mining structured content from the web are still too hard for computers. Automated techniques can get us a long way in some of those, but human intelligence is required when an accurate decision is ultimately important. In many cases that decision is easy for people and can be made quickly -in a few seconds to few minutes. By creating millions of simple online tasks we create a distributed computing machine. By shipping the tasks to millions of contributors around the globe, we make this human computer available 24/7 to make important decisions about your data. In this talk, I will describe our approach to designing CrowdFlower - a scalable crowdsourcing platform - as it evolved over the last 4 years. We think about crowdsourcing in terms of Quality. Cost and Speed. They are the ultimate design objectives of a human computer. Unfortunately, we can't have all 3. A general price-constrained ta.sk requiring 99.9% accuracy and 10 minute turnaround is not possible today. I will discuss design decisions behind CrowdFlower that allow us to pursue any two of these objectives. I will briefly present examples of common crowdsourced tasks and tools built into the platform to make the design of complex tasks easy, tools such as CrowdFlower Markup Language(CML). Quality control is the single most important, challenge in Crowdsourcing. To enable an unidentified crowd of people to produce meaningful work, we must be certain that we can filter out bad contributors and produce high quality output. Initially we only used consensus. As the diversity and size of our crowd grew, so did the number of people attempting fraud. CrowdFlower developed "Cold standard" to block attempts of fraud. The use of gold allowed us to train contributors for the details of specific domains. By defining expected responses for a subset of the work and providing explanations of why a given response was expected, we are able distribute tasks to an ever-expanding anonymous workforce without sacrificing quality. As the volumes and demands for gold standard data grew, we developed automated techniques to generate gold in unlimited quantities to better train workers and minimize internal human resources that are required to run these jobs. As humans naturally make mistakes, we collect and aggregate multiple judgments to reach our target quality. When tasks are too subjective to specify a "Gold standard," CrowdFlower can fall back onto peer review. Finally, we track historical contributor's performance in a specific domain to reduce the amount of training and evaluation a contributor has to go through in a single task. When cost: is the driving factor in the decision to crowd-source, CrowdFlower provides access to millions of contributors. As we automatically determine who can contribute, we allow absolutely anyone to try the task to see if they can do it. Crowdsourcing differs from traditional service oriented businesses in that costs increase with volume. The list of tasks that a crowd can choose from is an open marketplace. Contributors go towards the best paying or most enjoyable tasks available. I will discuss techniques we use to keep the marketplace as stable as possible. The speed at which work can be completed is often the primary requirement of our clients. To maximize the scalability of our workforce, we use a channel based approach for our labor partnerships. As a result, we rarely have supply side constraints. This strategy also gives us a worldwide presence to achieve 24/7 processing. Some of our channel partners reward their users with virtual currency or other valuable items. I will briefly address the relationship between game dynamics and crowdsourced work. As CrowdF
机译:计算机在爬行,存储和处理大量的结构化数据方面非常有效。它们非常适合利用链接结构来产生宝贵的知识。然而,今天有很多数据处理任务。标签情绪,培养图像和来自网络的结构化内容仍然太难了。自动化技术可以在其中一些人来帮助我们,但是当准确的决定最重要时,需要人类智能。在许多情况下,对于人们而言,决定很容易,可以快速 - 几秒钟到几分钟。通过创建数百万简单的在线任务,我们创建了一个分布式计算机。通过将任务运送到全球数百万贡献者,我们将这款人机全天候提供全天候提供关于您的数据的重要决策。在这次谈话中,我将描述我们设计众筹的方法 - 一个可扩展的众包平台 - 在过去的4年里演变。我们在质量方面考虑众群。成本和速度。它们是人类计算机的终极设计目标。不幸的是,我们不能拥有所有3.一般价格约束的TA.SK需要99.9%的准确性和10分钟的转机今天不可能。我将讨论众多人背后的设计决策,让我们追求这些目标中的任何两个目标。我将简要介绍一下普通众包和工具中内置于平台中的工具,以便简单,工具等复杂任务设计,如众人标记语言(CML)。质量控制是众包中最重要的,挑战。为了使未识别的人群产生有意义的工作,我们必须确定我们可以过滤掉坏贡献者并产生高质量的产出。最初我们只使用共识。由于我们人群的多样性和规模增长,所以欺诈的人数也是如此。众人开发了“冷标准”来阻止欺诈的尝试。使用金色允许我们培训贡献者了解具体领域的细节。通过定义工作子集的预期响应并提供预期给定响应的原因,我们能够将任务分配到不断扩大的匿名劳动力而不会牺牲质量。随着黄金标准数据的卷和需求,我们开发了自动化技术,以为更好的批量生产,以更好的培训工人,并尽量减少运行这些工作所需的内部人力资源。随着人类自然犯错误,我们收集并汇总了多项判断,以达到目标质量。当任务太主观时指定“黄金标准”时,众人可以返回同行评审。最后,我们跟踪历史贡献者在特定领域的表现,以减少培训和评估的数量,贡献者必须在一项任务中进行。当成本:是众人群决定的驱动因素,众人提供数百万贡献者的访问。当我们自动确定谁可以贡献,我们绝对允许任何人尝试任务看他们是否可以做到这一点。众群不同于传统的服务导向业务,成本增加了卷。人群可以选择的任务列表是一个开放的市场。贡献者朝着最佳支付或最令人愉快的任务。我将讨论我们用来将市场保持稳定的技术。可以完成工作的速度通常是我们客户的主要要求。为了最大限度地提高我们劳动力的可扩展性,我们使用基于渠道的方法进行劳动伙伴关系。结果,我们很少有供应侧限制。该策略还使我们在全球范围内实现24/7处理。我们的一些渠道合作伙伴用虚拟货币或其他有价值的物品奖励他们的用户。我将简要介绍游戏动态和众群工作之间的关系。作为人群

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号