首页> 外文会议>International Conference on Semantic Web >Disrupting the Semantic Comfort Zone
【24h】

Disrupting the Semantic Comfort Zone

机译:破坏语义舒适区

获取原文

摘要

Ambiguity in interpreting signs is not a new idea, yet the vast majority of research in machine interpretation of signals such as speech, language, images, video, audio, etc., tend to ignore ambiguity. This is evidenced by the fact that metrics for quality of machine understanding rely on a ground truth, in which each instance (a sentence, a photo, a sound clip, etc) is assigned a discrete label, or set of labels, and the machine's prediction for that instance is compared to the label to determine if it is correct. This determination yields the familiar precision, recall, accuracy, and f-measure metrics, but clearly presupposes that this determination can be made. CrowdTruth is a form of collective intelligence based on a vector representation that accommodates diverse interpretation perspectives and encourages human annotators to disagree with each other, in order to expose latent elements such as ambiguity and worker quality. In other words, CrowdTruth assumes that when annotators disagree on how to label an example, it is because the example is ambiguous, the worker isn't doing the right thing, or the task itself is not clear. In previous work on CrowdTruth, the focus was on how the disagreement signals from low quality workers and from unclear tasks can be isolated. Recently, we observed that disagreement can also signal ambiguity. The basic hypothesis is that, if workers disagree on the correct label for an example, then it will be more difficult for a machine to classify that example. The elaborate data analysis to determine if the source of the disagreement is ambiguity supports our intuition that low clarity signals ambiguity, while high clarity sentences quite obviously express one or more of the target relations. In this talk I will share the experiences and lessons learned on the path to understanding diversity in human interpretation and the ways to capture it as ground truth to enable machines to deal with such diversity.
机译:解释符号的歧义不是一个新主意,但是语音,语言,图像,视频,音频等信号的机器解释的绝大多数研究都倾向于忽略歧义。这可以通过以下事实证明:机器理解质量的度量标准取决于基本事实,在该事实中,为每个实例(句子,照片,声音片段等)分配一个离散标签或一组标签,以及机器的将该实例的预测与标签进行比较,以确定其是否正确。此确定会产生熟悉的精度,查全率,准确性和f量度指标,但显然前提是可以进行此确定。 CrowdTruth是基于矢量表示的集体智慧的一种形式,该矢量表示适应不同的解释视角并鼓励人类注释者彼此不同,以揭示诸如歧义和工人素质之类的潜在因素。换句话说,CrowdTruth假定注释者在如何标记示例上存在分歧时,这是因为示例不明确,工作人员没有做正确的事或任务本身不清楚。在先前关于CrowdTruth的工作中,重点是如何隔离来自低质量工人和不清楚任务的分歧信号。最近,我们观察到分歧也可能表示歧义。基本假设是,如果工人不同意某个示例的正确标签,则机器将很难对该示例进行分类。进行详尽的数据分析以确定分歧的根源是否是歧义,这支持了我们的直觉,即低清晰度表示歧义,而高清晰度语句则很明显表达了一个或多个目标关系。在本次演讲中,我将分享在理解人类解释中的多样性的道路上获得的经验和教训,以及将其理解为使机器能够处理这种多样性的基本事实的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号