首页> 外文期刊>GigaScience >HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
【24h】

HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes

机译:Hamap作为sparql规则 - 一个用于基因组和蛋白质的便携式注释管道

获取原文
       

摘要

Background: Genome and proteome annotation pipelines are generally custom built and not easily reusable by othergroups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address theseissues is to encourage the adoption of annotation standards and technological solutions that enable the sharing ofbiological knowledge and tools for genome and proteome annotation. Results: Here we demonstrate one approach togenerate portable genome and proteome annotation pipelines that users can run without recourse to custom software. Thisproof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for proteinsequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standardsResource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF QueryLanguage). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to proteinsequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation thatis identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to anygenome or proteome annotation pipeline. Conclusions: HAMAP SPARQL rules are freely available for download from theHAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generatedby the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available onGitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on theHAMAP website at https://hamap.expasy.org.
机译:背景:基因组和蛋白质组注释管道通常是定制的,而不是由其他组重复使用。这导致重复努力,增加成本和次优注释质量。解决顾收的一种方法是鼓励采用注释标准和技术解决方案,以便为基因组和蛋白质组注释分享生物学知识和工具。结果:在这里,我们展示了一种方法来获得便携式基因组和蛋白质组的注释管道,用户可以在不求追究定制软件的情况下运行。这种概念使用我们自己的规则的注释管道Hamap,它为蛋白质序列提供了功能的功能注释与Uniprotkb / Swiss-prot的相同深度和质量,以及万维网联盟(W3C)标准资源描述框架(RDF)和SPARQL( SPARQL协议和RDF QueryLanguage的递归首字母缩写)。我们将Complex Hamap规则转换为W3C标准SparQL 1.1语法,然后使用自由可用的SPARQL引擎以RDF格式应用于蛋白酶序列。这种方法支持使用标准,现成解决方案的内部管道产生的注释的产生,并且适用于任何适用于含义或蛋白质组的注释管道。结论:Hamap SparQL规则是免费提供的,可从CC-By-Nd 4.0许可证下从CC-BY-ND 4.0许可证下载FTP://ftp.expasy.org/databases/hamap/sparql/。规则的注释在CC-BY 4.0许可下。使用hamap作为sparql的教程和补充代码在https://github.com/ssib-swiss/hamap-sparql上可用,并且可以在https://hamap.expasy的Hamap网站上找到关于Hamap的一般文档。 org。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号