首页> 外文会议>Conference on Artificial Intelligence in Medicine(AIME 2005); 20050723-27; Aberdeen(GB) >Signature Recognition Methods for Identifying Influenza Sequences
【24h】

Signature Recognition Methods for Identifying Influenza Sequences

机译:识别流感序列的签名识别方法

获取原文
获取原文并翻译 | 示例

摘要

Basically, one of the most important issues for identifying biological sequences is accuracy; however, since the exponential growth and excessive diversity of biological data, the requirement to compute within considerably appropriate time usually compromises with accuracy. We propose novel approaches for accurately identifying DNA sequences in shorter time by discovering sequence patterns — signatures, which are enough distinctive information for the sequence identification. The approaches are to find the best combination of n-gram patterns and six statistical scoring algorithms, which are regularly used in the research of Information Retrieval, and then employ the signatures to create a similarity scoring model for identifying the DNA. We generate two approaches to discover the signatures. For the first one, we use only statistical information extracted directly from the sequences to discover the signatures. For the second one, we use prior knowledge of the DNA in the signature discovery process. From our experiments on influenza virus, we found that: 1) our technique can identify the influenza virus at the accuracy of up to 99.69% when 11-gram is used and the prior knowledge is applied; 2) the use of too short or too long signatures produces lower efficiency; and 3) most scoring algorithms are good for identification except the "Rocchio algorithm" where its results are approximately 9% lower than the others. Moreover, this technique can be applied for identifying other organisms.
机译:基本上,识别生物学序列的最重要问题之一是准确性。但是,由于生物数据的指数增长和过度多样性,因此在相当合适的时间内进行计算的要求通常会降低准确性。我们提出了新颖的方法,通过发现序列模式-签名来在更短的时间内准确识别DNA序列,这些特征对于序列识别而言是足够的独特信息。这些方法是找到通常在信息检索研究中使用的n-gram模式和六个统计评分算法的最佳组合,然后利用签名创建一个相似性评分模型来识别DNA。我们生成两种方法来发现签名。对于第一个,我们仅使用直接从序列中提取的统计信息来发现签名。对于第二个,我们在签名发现过程中使用了DNA的先验知识。从我们对流感病毒的实验中,我们发现:1)当使用11克并应用先验知识时,我们的技术可以以高达99.69%的精度识别流感病毒; 2)使用太短或太长的签名会降低效率;和3)除“ Rocchio算法”外,大多数评分算法都易于识别,“ Rocchio算法”的结果比其他算法低约9%。而且,该技术可以用于鉴定其他生物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号