首页> 美国卫生研究院文献>BMC Bioinformatics >ALE: automated label extraction from GEO metadata
【2h】

ALE: automated label extraction from GEO metadata

机译:ALE:从GEO元数据中自动提取标签

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundNCBI’s Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence.
机译:BackgroundNCBI的基因表达综合(GEO)是一个丰富的社区资源,其中包含来自人类,小鼠,大鼠和其他模型生物的数百万个基因表达实验。但是,有关每个实验(元数据)的信息均采用保存者提供的开放式,非标准化文本描述的格式。因此,在不给实验分配标签的情况下,按性别,样品供体的年龄和起源组织等因素对荟萃分析进行实验分类是不可行的。为此,最好使用自动方法,这主要是因为要处理的数据的大小和数量,而且还因为它可以确保标准化和一致性。尽管其中一些标签可以直接从文本元数据中提取,但许多可用数据并不包含明确的文本,以告知研究人员该研究对象的年龄和性别。为了弥合这一差距,可以训练机器学习方法,以使用与文本标签关联的基因表达模式来完善标签预测的置信度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号