首页> 美国卫生研究院文献>Bioinformatics >Discriminative and informative features for biomolecular text mining with ensemble feature selection
【2h】

Discriminative and informative features for biomolecular text mining with ensemble feature selection

机译:具有集成特征选择的生物分子文本挖掘的区分性和信息性特征

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results.>Results: We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools.>Availability: The FS algorithms and classifiers are available in Java-ML (). The datasets are publicly available from the BioNLP'09 Shared Task web site ().>Contact:
机译:>动机:在生物分子文本挖掘领域,机器学习系统的黑匣子行为目前限制了对预测真实性质的理解。但是,特征选择(FS)能够识别任何监督学习设置中最相关的特征,从而深入了解分类算法的特定属性。这使我们能够建立更准确的分类器,同时弥合黑匣子行为与必须解释结果的最终用户之间的鸿沟。>结果:我们证明了我们的FS方法成功地丢弃了很大一部分机器生成的功能,提高了最新文本挖掘算法的分类性能。此外,我们说明了如何将FS应用于预测从文本中提取生物分子事件的框架中的理解。我们提供了许多具有高度区分性的示例,这些示例可对生物学现实或通用语言结构进行建模。最后,我们讨论了来自FS分析的许多见解,这些见解将为大大改进当前的文本挖掘工具提供机会。>可用性: FS算法和分类器可在Java-ML()中使用。这些数据集可从BioNLP'09共享任务网站()公开获得。>联系方式:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号