首页> 外文会议>International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management >Improving Document Clustering Performance: The Use of an Automatically Generated Ontology to Augment Document Representations
【24h】

Improving Document Clustering Performance: The Use of an Automatically Generated Ontology to Augment Document Representations

机译:提高文档群集性能:使用自动生成的本体来增强文档表示

获取原文
获取外文期刊封面目录资料

摘要

Clustering documents is a common task in a range of information retrieval systems and applications. Many approaches for improving the clustering process have been proposed. One approach is the use of an ontology to better inform the classifier of word context, by expanding the items to be clustered. Wordnet is commonly cited as an appropriate source from which to draw the additional terms; however, it may not be sufficient to achieve strong performance. We have two aims in this paper: first, we show that the use of Wordnet may lead to suboptimal performance. This problem may be accentuated when a document set has been drawn from comments made in social forums; due to the unstructured nature of online conversations compared to standard document sets. Second, we propose a novel method which involves constructing a bespoke ontology that facilitates better clustering. We present a study of clustering applied to a sample of threads from a social forum and investigate the effectiveness of the application of these methods.
机译:群集文档是一系列信息检索系统和应用程序中的常见任务。已经提出了改善聚类过程的许多方法。一种方法是通过扩展要群集的项目来更好地使用本体来更好地通知语篇上下文的分类器。 Wordnet通常被引用为绘制其他术语的适当来源;但是,实现强大的性能可能不足以。我们有两个目的在本文中:首先,我们表明Wordnet的使用可能会导致次优的性能。当文件集已从社会论坛中提出的评论中绘制了文件集时,可能会突出这个问题;由于与标准文档集相比在线对话的非结构化性质。其次,我们提出了一种新颖的方法,涉及构建促进更好聚类的定制本体。我们展示了应用于社会论坛的线程样本的聚类研究,并调查这些方法的应用的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号