首页> 外文会议>Pacific Symposium on Biocomputing 2001, Jan 3-7, 2001, Mauna Lani, Hawaii >TEXTQUEST: DOCUMENT CLUSTERING OF MEDLINE ABSTRACTS FOR CONCEPT DISCOVERY IN MOLECULAR BIOLOGY
【24h】

TEXTQUEST: DOCUMENT CLUSTERING OF MEDLINE ABSTRACTS FOR CONCEPT DISCOVERY IN MOLECULAR BIOLOGY

机译:TEXTQUEST:用于分子生物学概念发现的MEDLINE摘要的文档聚类

获取原文
获取原文并翻译 | 示例

摘要

We present an algorithm for large-scale document clustering of biological text, obtained from Medline abstracts. The algorithm is based on statistical treatment of terms, stemming, the idea of a 'go-list', unsupervised machine learning and graph layout optimization. The method is flexible and robust, controlled by a small number of parameter values. Experiments show that the resulting document clusters are meaningful as assessed by cluster-specific terms. Despite the statistical nature of the approach, with minimal semantic analysis, the terms provide a shallow description of the document corpus and support concept discovery.
机译:我们提出了一种从Medline摘要获得的用于生物文本的大规模文档聚类的算法。该算法基于术语的统计处理,词干,“ go-list”的想法,无监督的机器学习和图形布局优化。该方法灵活且健壮,由少量参数值控制。实验表明,按特定于群集的术语评估,所得文档群集是有意义的。尽管该方法具有统计性​​质,但通过最少的语义分析,这些术语提供了文档语料库的简短描述并支持概念发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号