首页> 外文会议>European Conference on Speech Communication and Technology - EUROSPEECH 2003(INTERSPEECH 2003) vol.4; 20030901-04; Geneva(CH) >Automatic Title Generation for Chinese Spoken Documents Using an Adaptive K Nearest-Neighbor Approach
【24h】

Automatic Title Generation for Chinese Spoken Documents Using an Adaptive K Nearest-Neighbor Approach

机译:使用自适应K最近邻方法自动生成中文语音文档标题

获取原文
获取原文并翻译 | 示例

摘要

The purpose of automatic title generation is to understand a document and to summarize it with only several but readable words or phrases. It is important for browsing and retrieving spoken documents, which may be automatically transcribed, but it will be much more helpful if given the titles indicating the content subjects of the documents. For title generation for Chinese language, additional problems such as word segmentation and key phrase extraction also have to be solved. In this paper, we developed a new approach of title generation for Chinese spoken documents. It includes key phrase extraction, topic classification, and a new title generation model based on an adaptive K nearest-neighbor concept. The tests were performed with a training corpus including 151,537 news stories in text form with human-generated titles and a testing corpus of 210 broadcast news stories. The evaluation included both objective F1 measures and 5-level subjective human evaluation. Very positive results were obtained.
机译:自动标题生成的目的是理解文档并仅用几个但可读的单词或短语对其进行汇总。这对于浏览和检索语音文档很重要,因为语音文档可能会自动转录,但是如果给定标题指示文档的内容主题,它将大有帮助。对于中文标题的生成,还必须解决其他问题,例如分词和关键词提取。在本文中,我们开发了一种新的中文语音文档标题生成方法。它包括关键短语提取,主题分类以及基于自适应K最近邻概念的新标题生成模型。测试使用一个训练语料库进行,该训练语料库包含151,537个具有人工生成标题的文本新闻报道,以及一个测试语料库,其中包含210个广播新闻报道。评估包括客观F1措施和5级主观人类评估。获得了非常积极的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号