首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >An entropy-reducing data representation approach for bioinformatic data
【2h】

An entropy-reducing data representation approach for bioinformatic data

机译:一种用于生物信息数据的减少熵的数据表示方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Non-semantic approaches to bioinformatic data analysis have potential relevance where semantic resources such as annotated finished reference genomes are lacking, such as in the analysis and utilisation of growing amounts of sequence data from non-model organisms, often associated with sequence-based agricultural, aqua-cultural and environmental sampling studies and commercial services. Even where rich semantic resources are available, semantic approaches to problems such as contrasting and comparing reference assemblies, and utilising multiple references in parallel to avoid reference bias, are costly and difficult to fully automate. We introduce and discuss a non-semantic data representation approach intended mainly for bioinformatic data called non-semantic labelling. Non-semantic labelling involves tensorially combining multiple kinds of model-based entropy-reducing data representation, with multiple representation models, so as to map both data and models into dual metric representation spaces, with goals of both reducing the statistical complexity of the data, and highlighting latent structure via machine learning and statistical analyses conducted within the dual representation spaces. As part of the framework, we introduce a novel algebraic abstraction of data representation mappings, and present four proof-of-concept examples of its application, to problems such as comparing and contrasting sequence assemblies, utilisation of multiple references for annotation and development of quality control diagnostics in a variety of high-throughput sequencing contexts. >Database URL:
机译:在缺乏语义资源(例如带注释的完成的参考基因组)之类的语义资源的情况下,生物信息数据分析的非语义方法具有潜在的相关性,例如在分析和利用越来越多的非模式生物(通常与基于序列的农业相关联)的序列数据时,水产文化和环境抽样研究以及商业服务。即使在有丰富的语义资源可用的情况下,针对问题的语义方法(例如对比和比较参考程序集以及并行使用多个参考以避免参考偏差)也是昂贵且难以完全自动化的。我们介绍和讨论一种主要用于生物信息数据的非语义数据表示方法,称为非语义标记。非语义标记涉及将多种基于模型的减少熵的数据表示形式与多种表示模型进行张力组合,以便将数据和模型都映射到双度量表示空间中,目的是降低数据的统计复杂度,并通过在双重表示空间内进行的机器学习和统计分析来突出潜在结构。作为框架的一部分,我们介绍了数据表示映射的新颖代数抽象,并给出了其应用的四个概念验证示例,以解决诸如比较和对比序列装配,利用多个引用进行注释和开发质量之类的问题在各种高通量测序环境中控制诊断。 >数据库网址

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号