首页> 外文期刊>Progress in Artificial Intelligence >HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes
【24h】

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes

机译:Hamap作为SPARQL规则 - 用于基因组和蛋白质的便携式注释管道

获取原文
获取原文并翻译 | 示例
           

摘要

Background: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. Results: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. Conclusions: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP- SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.
机译:背景:基因组和蛋白质组注释管道通常是定制的,不容易被其他组重复使用。这导致重复努力,增加成本和次优注释质量。解决这些问题的一种方法是鼓励采用注释标准和技术解决方案,以便分享生物学知识和基因组和蛋白质组注释的工具。结果:在这里,我们展示了一种生成便携式基因组和蛋白质组注释管道的方法,用户可以在不依赖于自定义软件的情况下运行。这种概念证明使用了我们自己的基于规则的注释管道Hamap,它为蛋白质序列提供了与Uniprotkb / Swiss-prot的相同深度和质量的功能诠释,以及万维网联盟(W3C)标准资源描述框架(RDF)和sparql(sparql协议和rdf查询语言的递归首字母缩写)。我们将Complex Hamap规则转换为W3C标准SPARQL 1.1语法,然后使用自由可用的SPARQL引擎将它们以RDF格式的蛋白质序列应用于蛋白质序列。这种方法支持产生与我们自己的内部管道产生相同的注释,使用标准,现成的解决方案产生的,并且适用于任何基因组或蛋白质组注释管道。结论:Hamap SparQL规则可自由地从Hamap FTP站点下载,ftp://ftp.expasy.org/databases/hamap/sparql/,在cc-by-nd 4.0许可证下。规则生成的注释在CC-BY 4.0许可证下。在Https://github.com/sswiss/hamap- sparql上的github上提供了使用hamap作为sparql的教程和补充代码,并且可以在https:// hamap的HAMAP网站上找到关于HAMAP的一般文档。 Expasy.org。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号