...
首页> 外文期刊>ACM transactions on multimedia computing communications and applications >Opinion Question Answering by Sentiment Clip Localization
【24h】

Opinion Question Answering by Sentiment Clip Localization

机译:通过情感片段本地化来回答意见

获取原文
获取原文并翻译 | 示例

摘要

This article considers multimedia question answering beyond factoid and how-to questions. We are interested in searching videos for answering opinion-oriented questions that are controversial and hotly debated. Examples of questions include "Should Edward Snowden be pardoned?" and "Obamacare-unconstitutional or not?". These questions often invoke emotional response, either positively or negatively, hence are likely to be better answered by videos than texts, due to the vivid display of emotional signals visible through facial expression and speaking tone. Nevertheless, a potential answer of duration 60s may be embedded in a video of 10min, resulting in degraded user experience compared to reading the answer in text only. Furthermore, a text-based opinion question may be short and vague, while the video answers could be verbal, less structured grammatically, and noisy because of errors in speech transcription. Direct matching of words or syntactic analysis of sentence structure, such as adopted by factoid and how-to question-answering, is unlikely to find video answers. The first problem, the answer localization, is addressed by audiovisual analysis of the emotional signals in videos for locating video segments likely expressing opinions. The second problem, questions and answers matching, is tackled by a deep architecture that nonlinearly matches text words in questions and speeches in videos. Experiments are conducted on eight controversial topics based on questions crawled from Yahoo! Answers and Internet videos from YouTube.
机译:本文考虑了多媒体问题的回答,而不是事实和操作方法问题。我们感兴趣的是搜索视频,以回答有争议和激烈辩论的以观点为导向的问题。问题的例子包括“爱德华·斯诺登应该被赦免吗?”和“奥巴马医改是否违宪?”。这些问题通常会积极或消极地引起情感反应,因此,由于通过面部表情和说话音可见的情感信号生动显示,视频比文字更能回答这些问题。但是,持续时间为60s的潜在答案可能会嵌入10分钟的视频中,与仅阅读文本答案相比,会降低用户体验。此外,基于文本的意见问题可能简短而模糊,而由于语音转录错误,视频答案可能是口头的,语法上结构化的且嘈杂的。单词的直接匹配或句子结构的句法分析(例如被事实证明和如何回答问题)不太可能找到视频答案。第一个问题是答案定位,它是通过对视频中的情感信号进行视听分析来解决的,以找到可能表达观点的视频片段。第二个问题是问题和答案的匹配,它是由一种深度架构解决的,该架构非线性地匹配问题和语音中的文字。根据从Yahoo!爬取的问题,针对八个有争议的主题进行了实验。来自YouTube的答案和互联网视频。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号