首页> 外文会议>International Joint Conference on Computer, Information, and Systems Sciences, and Engineering >Clustering-Based Topic Identification of Transcribed Arabic Broadcast News
【24h】

Clustering-Based Topic Identification of Transcribed Arabic Broadcast News

机译:基于聚类的转录阿拉伯广播新闻主题识别

获取原文

摘要

In this research different clustering techniques are applied for grouping transcribed textual documents obtained out of audio streams. Since audio transcripts are normally highly erroneous, it is essential to reduce the negative impact of errors gained at the speech recognition stage. In attempt to overcome some of these errors, different stemming techniques are applied on the transcribed text. The goal of this research is to achieve automatic topic clustering of transcribed speech documents, and investigate the impact of applying stemming techniques in combination with a Chi-square similarity measure on the accuracy of the selected clustering algorithms. The evaluation-using F-Measure-showed that using root-based stemming in combination of spectral clustering technique achieved the highest accuracy.
机译:在这项研究中,采用了不同的聚类技术来对从音频流中获得的转录文本文档进行分组。由于音频笔录通常是高度错误的,因此必须减少在语音识别阶段获得的错误的负面影响。为了克服这些错误中的某些错误,已对转录的文本应用了不同的词干处理技术。这项研究的目的是实现转录语音文档的自动主题聚类,并研究将词干处理技术与卡方相似度度量相结合对所选聚类算法的准确性的影响。使用F-Measure进行的评估表明,将基于根的词干与光谱聚类技术结合使用可获得最高的准确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号