【24h】

Concept Discovery from Text

机译:从文本中发现概念

获取原文
获取原文并翻译 | 示例

摘要

Broad-coverage lexical resources such as WordNet are extremely useful. However, they often include many rare senses while missing domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers concepts from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning elements to their most similar cluster. Evaluating cluster quality has always been a difficult task. We present a new evaluation methodology that is based on the editing distance between output clusters and classes extracted from WordNet (the answer key). Our experiments show that CBC outperforms several well-known clustering algorithms in cluster quality.
机译:诸如WordNet之类的广泛词汇资源非常有用。但是,它们通常包含许多罕见的感觉,而缺少特定领域的感觉。我们提出了一种称为CBC(按委员会进行聚类)的聚类算法,该算法可自动从文本中发现概念。它最初发现了一组紧密的簇,称为委员会,它们很好地分散在相似性空间中。委员会成员的质心用作聚类的特征向量。我们首先将元素分配给它们最相似的群集。评估集群质量一直是一项艰巨的任务。我们提出了一种新的评估方法,该方法基于输出群集和从WordNet(答案键)提取的类之间的编辑距离。我们的实验表明,CBC在聚类质量方面优于几种著名的聚类算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号