首页> 外文学位 >Statistical methods for feature extraction in shape analysis and bioinformatics.
【24h】

Statistical methods for feature extraction in shape analysis and bioinformatics.

机译:形状分析和生物信息学中特征提取的统计方法。

获取原文
获取原文并翻译 | 示例

摘要

Feature extraction aims to explain the underlying phenomena of interest of a given set of input data by simplifying the amount of resources required to accurately describe it. This terminology remains very broad as it refers to a lot of different objectives and encompasses multiple types of techniques, methods and processes.;The work contained in this thesis explores two types of feature extraction, from two different domains, namely 3D shape analysis and bioinformatics. The objective of both projects is to detect and understand the relevant information from a noise corrupted data set. However, the two processes significantly differ from each other, as one aims to compress and smooth signals while the other consists of clustering data.;In the first part of this thesis, a method for shape representation, compression and smoothing is proposed. First, it is shown that, similarly to spherical shapes, triangulated genus-one surfaces can be encoded using second generation wavelet decomposition. Next, a novel model is proposed for wavelet-based surface compression and smoothing. This part of the work aims to develop an efficient and robust process for eliminating irrelevant and noise-corrupted parts of the shape signal. Surfaces are encoded using wavelet filtering, and the objective of the proposed methodology is to separate noise-like wavelet coefficients from those contributing to the relevant part of the signal. The technique developed in this thesis consists of adaptively thresholding coefficients using a data-driven Bayesian framework. Once "thresholding" is performed, the coefficients that have been identified as irrelevant are removed and the inverse wavelet transform is applied to the "clean" set of wavelet coefficients. Experimental results show the efficiency of the proposed technique for surface smoothing and compression.;The second part of this thesis proposes a statistical model for studying RNA (RiboNucleic Acid) spatial conformations. The functional diversity of the RNA molecule depends on the ability of the RNA polymer to fold into a large number of precisely defined spatial forms. Therefore, one of the main challenges of bioinformatics is to establish a clearer understanding of the structure/function relationships in these molecules. If the functionality of a specific substructure (or unit block) from a given part of a RNA strand is known, then the functionality of similar substructures is assumed to be similar. Therefore, it is important to find an efficient way to classify the unit blocks of the RNA molecule. Each type of substructure can be geometrically characterized by a set of d parameters, which defines the spatial arrangement of its constituents. Thus, a set of substructures from the same family can be represented as a point cloud in a d-dimensional data space. A similarity measure can therefore be defined to perform clustering on this given data set and classify the corresponding substructures into a limited number of groups. In the proposed work, a statistical clustering model is applied to this RNA structure classification problem. First, single nucleotide structures are classified with respect to their spatial configurations. Application of the method to various data sets validates the process and further analysis is conducted to compare the results to other classifications. Second, the same clustering scheme is applied to base doublet geometries (base pairs and base stacking). These conformations offer more complex and challenging data sets. The proposed clustering results bring new features into the existing classification schemes.
机译:特征提取旨在通过简化准确描述数据所需的资源量,来解释给定输入数据集所关注的潜在现象。由于该术语涉及许多不同的目标,并且包含多种类型的技术,方法和过程,因此该术语仍然非常广泛。;本论文中的工作探讨了来自两个不同领域的两种类型的特征提取,即3D形状分析和生物信息学。 。这两个项目的目的都是从噪声破坏的数据集中检测并了解相关信息。但是,这两个过程之间存在显着差异,一个过程旨在压缩和平滑信号,而另一个过程则由聚类数据组成。;在本文的第一部分,提出了一种用于形状表示,压缩和平滑的方法。首先,示出了类似于球形,可以使用第二代小波分解来编码三角类属一表面。接下来,提出了一种基于小波的表面压缩和平滑的新模型。这部分工作旨在开发一种高效且鲁棒的过程,以消除形状信号中不相关且受噪声破坏的部分。使用小波滤波对表面进行编码,提出的方法的目的是将类似于噪声的小波系数与那些对信号相关部分有贡献的小波系数分开。本文开发的技术由使用数据驱动贝叶斯框架的自适应阈值系数组成。一旦执行“阈值处理”,就将已被识别为不相关的系数删除,并将逆小波变换应用于“干净”的小波系数集。实验结果证明了所提技术对表面平滑和压缩的有效性。本论文的第二部分提出了一种用于研究核糖核酸空间构象的统计模型。 RNA分子的功能多样性取决于RNA聚合物折叠成大量精确定义的空间形式的能力。因此,生物信息学的主要挑战之一是建立对这些分子中结构/功能关系的更清晰的理解。如果已知来自RNA链给定部分的特定亚结构(或单元块)的功能,则假定相似亚结构的功能相似。因此,重要的是找到一种有效的方法来对RNA分子的单位区块进行分类。每种类型的子结构都可以通过一组d参数进行几何特征化,这些参数定义了其组成部分的空间排列。因此,可以将同一族的一组子结构表示为d维数据空间中的点云。因此,可以定义一个相似性度量以对该给定数据集执行聚类,并将相应的子结构分类为有限数量的组。在提出的工作中,将统计聚类模型应用于该RNA结构分类问题。首先,关于单核苷酸结构的空间构型进行分类。将该方法应用于各种数据集可验证该过程,并进行进一步分析以将结果与其他分类进行比较。其次,将相同的聚类方案应用于基本doublet几何体(基本对和基本堆叠)。这些构象提供了更复杂和更具挑战性的数据集。提出的聚类结果为现有分类方案带来了新的特征。

著录项

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Engineering Biomedical.;Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 146 p.
  • 总页数 146
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号