【24h】

Statistical Method of Context Evaluation for Biological Sequence Similarity

机译:生物序列相似度上下文评估的统计方法

获取原文
获取原文并翻译 | 示例

摘要

Within this paper we are proposing and testing a new strategy for detection and measurement of similarity between sequences of proteins. Our approach has its roots in computational linguistics and the related techniques for quantifying and comparing content in strings of characters. The pairwise comparison of proteins relies on the content regularities expected to uniquely characterize each sequence. These regularities are captured by n-gram based modelling techniques and exploited by cross-entropy related measures. In this new attempt to incorporate theoretical ideas from computational linguistics into the field of bioinformatics, we experimented using two implementations having always as ultimate goal the development of practical, computationally efficient algorithms for expressing protein similarity. The experimental analysis reported herein provides evidence for the usefulness of the proposed approach and motivates the further development of linguistics-related tools as a means of analysing biological sequences.
机译:在本文中,我们提出并测试了一种检测和测量蛋白质序列之间相似性的新策略。我们的方法源于计算语言学以及量化和比较字符串中内容的相关技术。蛋白质的成对比较依赖于预期独特地表征每个序列的含量规律。这些规律性是通过基于n-gram的建模技术捕获的,并被与交叉熵相关的度量所利用。在将计算语言学的理论思想整合到生物信息学领域的这一新尝试中,我们使用两种实现方式进行了实验,这些实现方式始终以开发蛋白质相似性的实用,高效计算算法为最终目标。本文报道的实验分析为提出的方法的有用性提供了证据,并激发了语言学相关工具作为分析生物学序列的手段的进一步发展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号