首页> 外文学位 >Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions.
【24h】

Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions.

机译:动态交互的多对象系统的统计推断,重点是人类小组互动。

获取原文
获取原文并翻译 | 示例

摘要

In the first part of this dissertation we present a class of sequential block sampling algorithms for tracking unknown and variable number of objects. Proposed algorithms are applicable to multi-object tracking scenarios in which only available observations are detector outputs, and also to scenarios where both detector outputs and more complex observations which figure in the data-association free likelihood models. Proposed algorithms provide a way to construct block proposal distributions using detection based observations. Key parts of the proposed algorithms are methods for sampling block proposal distributions. We propose two novel methods for this purpose, one is based on a variational approximation scheme and the other represents an adaptive MCMC sampling scheme. Samples from block proposal distributions are further used in the sequential MCMC (or SMC) framework. We tested proposed schemes on two synthetic datasets. Results demonstrate benefits of processing longer observation sequences in multi-object tracking problems in a more efficient manner that the classical sequential sampling schemes.;In the second part, we present a multi-target tracking algorithm for algorithm for tracking multiple speakers by a microphone array. The sound source trajectories reconstructed by by the mixture particle filter do not necessarily correspond to speech only. Therefore, we apply an adapted optimal change point algorithm to segment obtained sound source trajectories into speech and non-speech segments. The algorithm is tested on a multi-participant meeting database as a separate module and as a part of a multi-modal system for automatic meeting monitoring. In both cases it provided significant improvements on the speaker detection and segmentation tasks.;In the third part, we present a modality fusion algorithm that exploits complementary properties of video tracking, microphone array localization and speaker identification and solves the problem of speaker segmentation in presence of the overlapped speech. The proposed algorithm is unique from multiple perspectives. First, we suggest a hidden Markov model architecture that performs fusion of three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel likelihood model for the microphone array observations for dealing with overlapped speech. We propose a modification of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function that takes into the account possible microphone occlusions. We employ the multi-object detect-before-tracking approach and use the local maxima of the modified SPR-GCC-PHAT functions as sound source detectors. Multiple detection locations are fused into the joint likelihood by the joint probabilistic data association.;We present a new multi-modal database for analysis of participant behaviors in dyadic interactions. This database contains multiple channels with close- and far-field audio, a high definition camera array and motion capture data. Presence of the motion capture allows precise analysis of the body language low-level descriptors and its comparison with similar descriptors derived from video data. Data is manually labeled by multiple human annotators using psychology-informed guides. We analyzed relation between approach-avoidance (A-A) behavior and various non-verbal body language and acoustic features, and influence of the audio and video channels on experts' labeling decisions. Also we analyzed dependency of the statistical interaction descriptors and A-A labels on participants' roles.;At the end, we propose an ordinal regression (OR) algorithm and its extension applicable to time series for estimation the approach-and-avoidance (AA) behavior quantifiers (lables) in human dyadic interactions. The proposed algorithm transforms the ordinal regression to multiple binary classification problems, solves them by independent score-outputting classifiers and fits the cumulative logit logistic regression model with proportional odds (CLLRMP) the classifier score vectors. (Abstract shortened by UMI.)
机译:在本文的第一部分,我们提出了一种用于跟踪未知和可变数量对象的顺序块采样算法。所提出的算法适用于其中仅可用观测值是探测器输出的多对象跟踪场景,也适用于探测器输出和更复杂的观测值都包含在无数据关联可能性模型中的场景。拟议算法提供了一种使用基于检测的观测值构造整体提议分布的方法。所提出的算法的关键部分是对块提议分布进行采样的方法。为此,我们提出了两种新颖的方法,一种是基于变分近似方案,另一种是自适应MCMC采样方案。顺序提案MCMC(或SMC)框架中进一步使用了来自总体提案分发的样本。我们在两个综合数据集上测试了建议的方案。结果证明了以比传统顺序采样方案更有效的方式处理多目标跟踪问题中较长观察序列的好处。第二部分,我们提出了一种多目标跟踪算法,用于通过麦克风阵列跟踪多个说话者的算法。由混合粒子滤波器重构的声源轨迹不一定仅与语音相对应。因此,我们应用了一种自适应的最佳变化点算法,将获得的声源轨迹细分为语音和非语音段。作为一个单独的模块,并且作为用于自动会议监控的多模式系统的一部分,该算法在多参与者会议数据库上进行了测试。在第三种情况下,我们提出了一种模态融合算法,该算法利用视频跟踪,麦克风阵列定位和说话人识别的互补特性,并解决了存在时说话人分割的问题。重叠的语音。所提出的算法从多个角度来看是独特的。首先,我们建议一种隐式马尔可夫模型架构,该架构执行三种模式的融合:用于参与者定位的多摄像头系统,用于说话人定位的麦克风阵列和说话人识别系统;其次,我们为麦克风阵列观测提供了一种新颖的似然模型,用于处理重叠语音。我们建议对转向功率响应广义互相关相位变换(SPR-GCC-PHAT)函数进行修改,以考虑到可能的麦克风遮挡。我们采用了跟踪之前的多目标检测方法,并使用改进的SPR-GCC-PHAT函数的局部最大值作为声源检测器。通过联合概率数据关联将多个检测位置融合到联合可能性中。;我们提出了一个新的多模式数据库,用于分析二元互动中的参与者行为。该数据库包含具有近场和远场音频,高清摄像机阵列和运动捕捉数据的多个通道。运动捕捉的存在允许对肢体语言低级描述符进行精确分析,并将其与从视频数据中导出的类似描述符进行比较。数据由多位人类注释者使用心理学告知的指南手动标记。我们分析了避免进近(A-A)行为与各种非语言肢体语言和声学特征之间的关系,以及音频和视频通道对专家标记决策的影响。还分析了统计交互描述符和AA标签对参与者角色的依赖性。最后,我们提出了一种有序回归(OR)算法及其可扩展性,该算法适用于时间序列,用于估计接近和避免(AA)行为人类二元互动中的量词(标签)。所提出的算法将有序回归转换为多个二元分类问题,通过独立的分数输出分类器对其进行求解,并使用具有分类优势分数向量的成比例对数(CLLRMP)拟合累积对数逻辑回归模型。 (摘要由UMI缩短。)

著录项

  • 作者

    Rozgic, Viktor.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Statistics.;Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 128 p.
  • 总页数 128
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号