【24h】

Exploration of a Text Collection and Identification of Topics by Clustering

机译:通过聚类探索文本收集和主题识别

获取原文
获取原文并翻译 | 示例

摘要

An application of cluster analysis to identify topics in a collection of posters abstracts from the Society for Neuroscience (SfN) Annual Meeting in 2006 is presented. The topics were identified by selecting from the abstracts belonging to each cluster the terms with the highest scores using different ranking schemes. The ranking scheme based on log-entropy showed better performance in this task than other more classical TFIDF schemes. An evaluation of the extracted topics was performed by comparison with previously defined thematic categories for which titles are available, and after assigning each cluster to one dominant category. The results show that repeated bisecting k-means performs better than standard k-means.
机译:提出了聚类分析在识别神经科学协会(SfN)2006年年会海报摘要的主题中的应用。通过使用不同的排名方案从属于每个类的摘要中选择得分最高的术语来确定主题。与其他更经典的TFIDF方案相比,基于对数熵的排序方案在此任务中显示出更好的性能。通过与先前定义的主题类别进行比较,对提取的主题进行评估,然后将每个聚类分配给一个主要类别。结果表明,重复平分k均值的效果优于标准k均值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号