首页> 外文期刊>Proceedings of the National Academy of Sciences of the United States of America >Enhanced protein domain discovery by using language modeling techniques from speech recognition.
【24h】

Enhanced protein domain discovery by using language modeling techniques from speech recognition.

机译:通过使用语音识别中的语言建模技术来增强蛋白质结构域发现。

获取原文
获取原文并翻译 | 示例
       

摘要

Most modern speech recognition uses probabilistic models to interpret a sequence of sounds. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to find domains in protein sequences of amino acids. To increase word accuracy in speech recognition, language models are used to capture the information that certain word combinations are more likely than others, thus improving detection based on context. However, to date, these context techniques have not been applied to protein domain discovery. Here we show that the application of statistical language modeling methods can significantly enhance domain recognition in protein sequences. As an example, we discover an unannotated Tf_Otx Pfam domain on the cone rod homeobox protein, which suggests a possible mechanism for how the V242M mutation on this protein causes cone-rod dystrophy.
机译:大多数现代语音识别使用概率模型来解释声音序列。隐马尔可夫模型尤其用于识别单词。已采用相同的技术来发现氨基酸的蛋白质序列中的结构域。为了提高语音识别中的单词准确性,使用语言模型来捕获某些单词组合比其他单词更可能出现的信息,从而改善了基于上下文的检测。但是,迄今为止,这些上下文技术尚未应用于蛋白质结构域发现。在这里,我们表明统计语言建模方法的应用可以显着增强蛋白质序列中的域识别。例如,我们在锥杆同源盒蛋白上发现了一个未注释的Tf_Otx Pfam结构域,这提示了该蛋白上的V242M突变如何引起锥杆营养不良的可能机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号