首页> 外文会议> >Topic extraction with multiple topic-words in broadcast-news speech
【24h】

Topic extraction with multiple topic-words in broadcast-news speech

机译:广播新闻语音中具有多个主题词的主题提取

获取原文

摘要

This paper reports on topic extraction in Japanese broadcast-news speech. We studied, using continuous speech recognition, the extraction of several topic-words from broadcast-news. A combination of multiple topic-words represents the content of the news. This is a more detailed and more flexible approach than using a single word or a single category. A topic extraction model shows the degree of relevance between each topic-word and each word in the article. For all words in an article, topic-words which have high total relevance score are extracted. We trained the topic extraction model with five years of newspapers, using the frequency of topic-words taken from headlines and words in articles. The degree of relevance between topic-words and words in articles is calculated on the basis of statistical measures, i.e., mutual information or the /spl chi//sup 2/-value. In topic extraction experiments for recognized broadcast-news speech, we extracted five topic-words from the 10-best hypotheses using a /spl chi//sup 2/-based model and found that 76.6% of them agreed with the topic-words chosen by subjects.
机译:本文报道了日语广播新闻演讲中的主题提取。我们研究了使用连续语音识别从广播新闻中提取几个主题词的方法。多个主题词的组合表示新闻的内容。与使用单个单词或单个类别相比,这是一种更详细,更灵活的方法。主题提取模型显示了每个主题词与文章中每个词之间的相关程度。对于文章中的所有单词,提取具有较高总相关分数的主题词。我们使用了从报纸头条和文章中提取的主题词的频率,对五年来的报纸进行了主题提取模型的培训。主题词与文章中的词之间的相关程度是根据统计指标(即互信息或/ sup chi // sup 2 /值)计算得出的。在针对公认的广播新闻语音的主题提取实验中,我们使用基于/ spl chi // sup 2 /的模型从10个最佳假设中提取了五个主题词,发现其中76.6%的人同意选择的主题词按主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号