首页> 美国卫生研究院文献>Biomedical Informatics Insights >Text Categorization of Heart Lung and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features
【2h】

Text Categorization of Heart Lung and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features

机译:利用n-gram和元数据特征对基因型和表型(dbGaP)数据库中的心脏肺和血液研究进行文本分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The database of Genotypes and Phenotypes (dbGaP) allows researchers to understand phenotypic contribution to genetic conditions, generate new hypotheses, confirm previous study results, and identify control populations. However, effective use of the database is hindered by suboptimal study retrieval. Our objective is to evaluate text classification techniques to improve study retrieval in the context of the dbGaP database. We utilized standard machine learning algorithms (naive Bayes, support vector machines, and the C4.5 decision tree) trained on dbGaP study text and incorporated n-gram features and study metadata to identify heart, lung, and blood studies. We used the χ2 feature selection algorithm to identify features that contributed most to classification performance and experimented with dbGaP associated PubMed papers as a proxy for topicality. Classifier performance was favorable in comparison to keyword-based search results. It was determined that text categorization is a useful complement to document retrieval techniques in the dbGaP.
机译:基因型和表型数据库(dbGaP)使研究人员能够了解表型对遗传条件的贡献,产生新的假设,确认先前的研究结果并确定对照人群。但是,次优研究检索阻碍了数据库的有效使用。我们的目标是评估文本分类技术,以改善dbGaP数据库中的研究检索。我们利用在dbGaP研究文本上训练的标准机器学习算法(朴素贝叶斯,支持向量机和C4.5决策树),并结合了n-gram特征和研究元数据来识别心脏,肺和血液研究。我们使用χ 2 特征选择算法来识别对分类性能贡献最大的特征,并使用dbGaP相关的PubMed论文作为主题性的代理进行了实验。与基于关键字的搜索结果相比,分类器的性能更好。已确定,文本分类是dbGaP中文档检索技术的有用补充。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号