首页> 外文学位 >Multimedia annotation through search and mining.
【24h】

Multimedia annotation through search and mining.

机译:通过搜索和挖掘进行多媒体注释。

获取原文
获取原文并翻译 | 示例

摘要

Multimedia annotation represents an application of computer vision that presents the recognition of objects or ideas related to a multimedia document as a text label. Typically, annotation algorithms depend on complicated feature extraction and matching algorithms that attempt to learn individual annotation models. This work, however, reveals that it is possible to achieve effective annotation of large datasets without specific models by combining information from low-level visual features with annotation mining of the data. This technique is referred to as annotation by mining. The method is especially effective in the presence of aliased, redundant data, a characteristic feature of social media sites and content available on the web. By using this formulation, we are able to address the problem in a way that is highly scalable and fast regardless of dictionary size.;The work places particular emphasis on learning using graph theory. Such an approach can lead to algorithms that effectively combine disparate feature metrics through examination of the stability and smoothness of a graph constructed in any metric space. Specifically, a concept of "graph smoothness" is formulated that reflects the distribution of different attributes in the graph. This smoothness measurement allows us to extract visual annotations and geographic place annotations, as well as find weighting parameters for disparate similarity modalities. Analysis validates the approach on two different sets of videos, one a collection of TRECVID news videos and another a set crawled from the online repository hosted by YouTube, and two different image databases crawled from the set of Flickr geotagged photos. The approach is proven to be successful at mining accurate annotations out of noisy transcripts and noisy tagged social media data while scaling to dictionary sizes of more than 430,000 words.
机译:多媒体注释表示计算机视觉的一种应用,该应用将与多媒体文档相关的对象或思想的识别呈现为文本标签。通常,注释算法依赖于尝试学习单个注释模型的复杂特征提取和匹配算法。但是,这项工作表明,通过将来自低层视觉特征的信息与数据的注释挖掘相结合,可以在没有特定模型的情况下实现对大型数据集的有效注释。这项技术被称为通过挖掘注释。该方法在存在别名,冗余数据,社交媒体站点的特征和Web上可用内容的情况下特别有效。通过使用这种表述,我们能够以高度可扩展和快速的方式解决该问题,而不管字典的大小如何。该工作特别强调了使用图论进行学习。这种方法可以导致通过检查在任何度量空间中构造的图的稳定性和平滑度,有效地组合不同特征度量的算法。具体而言,制定了“图形平滑度”的概念,以反映图形中不同属性的分布。这种平滑度测量使我们能够提取视觉注释和地理位置注释,以及找到不同相似性模态的加权参数。分析对两种不同的视频进行了验证,一种是TRECVID新闻视频的集合,另一种是从YouTube托管的在线存储库中抓取的,而另一组是从Flickr地理标记照片中抓取的两个图像数据库。事实证明,该方法可以成功地从嘈杂的文字记录和带有嘈杂标签的社交媒体数据中提取准确的注释,同时扩展到超过430,000个单词的字典大小。

著录项

  • 作者

    Moxley, Emily K.;

  • 作者单位

    University of California, Santa Barbara.;

  • 授予单位 University of California, Santa Barbara.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 191 p.
  • 总页数 191
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号