On classification and segmentation of massive audio data streams

Charu C. Aggarwal

首页> 外文期刊>Knowledge and Information Systems >On classification and segmentation of massive audio data streams

【24h】

On classification and segmentation of massive audio data streams

机译：关于海量音频数据流的分类和分段

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, the proliferation of VOIP data has created a number of applications in which it is desirable to perform quick online classification and recognition of massive voice streams. Typically such applications are encountered in real time intelligence and surveillance. In many cases, the data streams can be in compressed format, and the rate of data processing can often run at the rate of Gigabits per second. All known techniques for speaker voice analysis require the use of an offline training phase in which the system is trained with known segments of speech. The state-of-the-art method for text-independent speaker recognition is known as Gaussian mixture modeling (GMM), and it requires an iterative expectation maximization procedure for training, which cannot be implemented in real time. In many real applications (such as surveillance) it is desirable to perform the recognition process in online time, so that the system can be quickly adapted to new segments of the data. In many cases, it may also be desirable to quickly create databases of training profiles for speakers of interest. In this paper, we discuss the details of such an online voice recognition system. For this purpose, we use our micro-clustering algorithms to design concise signatures of the target speakers. One of the surprising and insightful observations from our experiences with such a system is that while it was originally designed only for efficiency, we later discovered that it was also more accurate than the widely used GMM. This was because of the conciseness of the micro-cluster model, which made it less prone to over training. This is evidence of the fact that it is often possible to get the best of both worlds and do better than complex models both from an efficiency and accuracy perspective. We present experimental results illustrating the effectiveness and efficiency of the method.

机译：近年来，VOIP数据的激增创建了许多应用程序，其中希望对大量语音流进行快速的在线分类和识别。通常，在实时情报和监视中会遇到此类应用程序。在许多情况下，数据流可以采用压缩格式，并且数据处理的速率通常可以每秒千兆位的速率运行。用于扬声器语音分析的所有已知技术都需要使用离线训练阶段，在该阶段中，系统将使用已知的语音片段进行训练。与文本无关的说话人识别的最新方法被称为高斯混合建模（GMM），它需要迭代的期望最大化训练过程，该过程无法实时实现。在许多实际应用中（例如监视），希望以在线时间执行识别过程，以使系统可以快速适应新的数据段。在许多情况下，也可能需要为感兴趣的说话者快速创建培训资料数据库。在本文中，我们讨论了这种在线语音识别系统的细节。为此，我们使用微簇算法来设计目标说话者的简洁签名。从我们使用这种系统的经验中得出的令人惊讶和深刻的发现之一是，尽管它最初只是为提高效率而设计的，但后来我们发现它也比广泛使用的GMM更准确。这是因为微集群模型的简洁性，使其不太容易过度训练。这是事实的证明，从效率和准确性的角度来看，通常都可以兼顾两者，并且比复杂的模型做得更好。我们目前的实验结果说明了该方法的有效性和效率。

著录项

来源
《Knowledge and Information Systems》 |2009年第2期|p.137-156|共20页
作者
Charu C. Aggarwal;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Classification; Segmentation; Audio streams;

机译：分类;分割;音频流;

相似文献

外文文献
中文文献
专利

1. On classification and segmentation of massive audio data streams [J] . Charu C. Aggarwal Knowledge and information systems . 2009,第2期

机译：关于海量音频数据流的分类和分段
2. MASSIVE DATA MINING (MDM) ON DATA STREAMS USING CLASSIFICATION ALGORITHMS [J] . PROF.DR. P. K. SRIMANI, MRS. MALINI M PATIL International Journal of Engineering Science and Technology . 2012,第6期

机译：使用分类算法的数据流上的大规模数据挖掘（MDM）
3. Automatic Segmentation and Classification of Audio Broadcast Data [J] . P. Dhanalakshmi, S. Palanivel, V. Ramalingam Asian Journal of Information Technology . 2010,第2期

机译：音频广播数据的自动分段和分类
4. A Framework for Classification and Segmentation of Massive Audio Data Streams [C] . Charu C. Aggarwal ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 20070812-15; San Jose,CA(US) . 2007

机译：海量音频数据流的分类和分段框架
5. Automatic segmentation, indexing and retrieval of audiovisual data based on combined audio and visual content analysis. [D] . Zhang, Tong. 1999

机译：基于组合的视听内容分析，对视听数据进行自动分段，索引和检索。
6. Streaming chunk incremental learning for class-wise data stream classification with fast learning speed and low structural complexity [O] . Prem Junsawang, Suphakant Phimoltares, Chidchanok Lursinsap 2012

机译：流式块增量学习，用于以快速的学习速度和较低的结构复杂度对类数据流进行分类
7. A Framework for Classification and Segmentation of Massive Audio Data Streams [O] . Charu C. Aggarwal 2013

机译：海量音频数据流分类和分段的框架

On classification and segmentation of massive audio data streams

摘要

著录项

相似文献

相关主题

期刊订阅