首页> 外文会议>2011 IEEE International Conference on Bioinformatics and Biomedicine >Discriminative Application of String Similarity Methods to Chemical and Non-chemical Names for Biomedical Abbreviation Clustering
【24h】

Discriminative Application of String Similarity Methods to Chemical and Non-chemical Names for Biomedical Abbreviation Clustering

机译:字符串相似度方法在生物医学缩写词聚类的化学名称和非化学名称中的区别应用

获取原文

摘要

Term clustering by measuring the string similarities between terms is known to be an effective method to improve the quality of texts and dictionaries. However, based on our observations, chemical names are difficult to cluster using string similarity measures such as the edit distance. To demonstrate this difficulty clearly, we compared the string similarities determined using the edit distance, the Monge-Elkan score, Soft TFIDF, and the big ram Dice coefficient for chemical names with those for other terms. The experimental results show that the discriminative application of string similarity methods to chemical and non-chemical names may be a simple but effective way to improve the performance of term clustering.
机译:通过测量术语之间的字符串相似度来进行术语聚类是提高文本和词典质量的有效方法。但是,根据我们的观察,使用字符串相似性度量(例如编辑距离)很难对化学名称进行聚类。为了清楚地说明这一困难,我们将使用编辑距离,Monge-Elkan得分,Soft TFIDF和化学名称的大ram Dice系数确定的字符串相似性与其他术语进行了比较。实验结果表明,将字符串相似度方法区分用于化学和非化学名称可能是提高术语聚类性能的一种简单而有效的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号