首页> 外文学位 >Studies on information-theoretics based data-sequence pattern-discriminant algorithms: Applications in bioinformatic data mining.
【24h】

Studies on information-theoretics based data-sequence pattern-discriminant algorithms: Applications in bioinformatic data mining.

机译:基于信息理论的数据序列模式判别算法研究:在生物信息数据挖掘中的应用。

获取原文
获取原文并翻译 | 示例

摘要

This research refers to studies on information-theoretic (IT) aspects of data-sequence patterns and developing thereof discriminant algorithms that enable distinguishing the features of underlying sequence patterns having characteristic, inherent stochastical attributes. The application potentials of such algorithms include bioinformatic data mining efforts.; Consistent with the scope of the study as above, considered in this research are specific details on information-theoretics and entropy considerations vis-á-vis sequence patterns (having stochastical attributes) such as DNA sequences of molecular biology. Applying information-theoretic concepts (essentially in Shannon's sense), the following distinct sets of metrics are developed and applied in the algorithms developed for data-sequence pattern-discrimination applications: (i) Divergence or cross-entropy algorithms of Kullback-Leibler type and of general Czizár class; (ii) statistical distance measures; (iii) ratio-metrics; (iv) Fisher type linear-discriminant measure and (v) complexity metric based on information redundancy.; These measures are judiciously adopted in ascertaining codon-noncodon delineations in DNA sequences that consist of crisp and/or fuzzy nucleotide domains across their chains. The Fisher measure is also used in codon-noncodon delineation and in motif detection. Relevant algorithms are used to test DNA sequences of human and some bacterial organisms. The relative efficacy of the metrics and the algorithms is determined and discussed. The potentials of such algorithms in supplementing the prevailing methods are indicated. Scope for future studies is identified in terms of persisting open questions.
机译:这项研究涉及对数据序列模式的信息理论(IT)方面的研究,并开发可区分具有特定的,固有的随机属性的基础序列模式的特征的判别算法。这种算法的应用潜力包括生物信息数据挖掘工作。与上述研究范围相一致,本研究中考虑的是信息理论和熵考虑的具体细节,例如分子生物学的DNA序列,vis-á-vis序列模式(具有随机属性) 。应用信息理论概念(本质上是Shannon的意思),开发了以下不同的指标集并将其应用于为数据序列模式区分应用开发的算法中:(i)Kullback-Leibler类型的散度或交叉熵算法和一般的齐齐尔阶级; (ii)统计距离度量; (iii)比率指标; (iv)Fisher类型的线性判别度量和(v)基于信息冗余的复杂性度量;在确定DNA序列中由跨链的脆性和/或模糊核苷酸结构域组成的密码子-非密码子描述时,应明智地采用这些措施。 Fisher度量也用于密码子-非密码子的描绘和图案检测。相关算法用于测试人类和某些细菌有机体的DNA序列。确定和讨论度量标准和算法的相对功效。指出了这种算法在补充主流方法方面的潜力。未来研究的范围是根据持续存在的开放性问题确定的。

著录项

  • 作者

    Arredondo, Tomas Vidal.;

  • 作者单位

    Florida Atlantic University.;

  • 授予单位 Florida Atlantic University.;
  • 学科 Engineering Biomedical.; Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 376 p.
  • 总页数 376
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物医学工程;无线电电子学、电信技术;
  • 关键词

  • 入库时间 2022-08-17 11:45:40

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号