【24h】

Selecting Labels for News Document Clusters

机译:选择新闻文档集群的标签

获取原文

摘要

This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.
机译:这项工作涉及新闻文件集群的有意义和集群标签的确定。我们分析了许多用于在文档群集中选择文档的标题和/或句子的许多替代方案(由实体 - 事件持续时间查询获得),并将一种从受支持的头条/句子中提取短语的方法可以作为群集标签的群集。我们的技术将一个句子映射到一组重要的茎中以近似其语义,以进行比较。最终,从选定的标题/句子中提取群集标签作为连续的单词序列,重新刺除在语义等效的形式化中丢失的单词排序信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号