首页> 外文会议>International Conference on Applications of Natural Language to Informations Systems >An Approach to Indexing and Clustering News Stories Using Continuous Language Models
【24h】

An Approach to Indexing and Clustering News Stories Using Continuous Language Models

机译:使用连续语言模型索引和聚类新闻故事的方法

获取原文

摘要

Within the vocabulary used in a set of news stories a minority of terms will be topic-specific in that they occur largely or solely within those stories belonging to a common event. When applying unsupervised learning techniques such as clustering it is useful to determine which words are event-specific and which topic they relate to. Continuous language models are used to model the generation of news stories over time and from these models two measures are derived: bendiness which indicates whether a word is event specific and shape distance which indicates whether two terms are likely to relate to the same topic. These are used to construct a new clustering technique which identifies and characterises the underlying events within the news stream.
机译:在一组新闻故事中使用的词汇中,少数术语将是特定于主题的,因为它们在很大程度上或仅在属于常见事件的这些故事中发生。当应用无监督的学习技术(例如群集)时,确定哪些单词是特定于事件的,它们与之相关的主题是有用的。连续语言模型用于模拟新闻故事的时间随着时间的推移和来自这些模型的两种测量来派生:指示单词是否是事件的特定和形状距离,该展位指示两个术语是否可能与同一主题相关的事件特定和形状距离。这些用于构造一种新的聚类技术,该技术识别并表征新闻流中的基础事件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号