...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
【24h】

Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences

机译:一套用于统计N-gram语言建模的工具,用于在整个基因组序列中进行模式挖掘

获取原文
获取原文并翻译 | 示例

摘要

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
机译:基因组序列包含许多具有生物医学意义的模式。各种重复序列是大多数基因组序列模式的主要组成部分。我们扩展了基于后缀数组的生物语言建模工具包,以计算整个基因组序列中窗口中的n-gram频率以及基于n-gram语言模型的困惑,以找到生物学上相关的模式。我们介绍了用于对整个人类基因组序列进行分析的工具套件及其应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号