首页> 外文学位 >Genome data modeling and data compression.
【24h】

Genome data modeling and data compression.

机译:基因组数据建模和数据压缩。

获取原文
获取原文并翻译 | 示例

摘要

Genome data modeling is an important area of research and different data models have been proposed for representing and storing data. Some of the challenges in biological data management are data storage, retrieval, data redundancy, and data integrity. In this thesis we propose two data models for representing and storing genome sequence data. In these models we propose that, instead of storing the whole gene sequence for each gene separately, we store common sub sequences only once, with a sequence ID or GenBank identification number. We also store the position number, so that the whole sequence can be retrieved correctly. This would significantly reduce storage space requirements and help maintain data integrity. In our second model a pre-coding routine is also included to further reduce storage requirements. A study of randomness in genome data is also included. Both data models were tested and the results were satisfactory. We were able to compress the sequence, when there was significant amount of commonality, and the retrieval algorithm was able to retrieve the sequence correctly.
机译:基因组数据建模是研究的重要领域,并且已经提出了用于表示和存储数据的不同数据模型。生物数据管理中的一些挑战是数据存储,检索,数据冗余和数据完整性。在本文中,我们提出了两个用于表示和存储基因组序列数据的数据模型。在这些模型中,我们提出,与其将每个基因的整个基因序列分别存储,不如将一个具有序列ID或GenBank标识号的公共子序列存储一次。我们还存储位置编号,以便可以正确检索整个序列。这将大大减少存储空间需求,并有助于保持数据完整性。在我们的第二个模型中,还包括预编码例程,以进一步减少存储需求。还包括对基因组数据随机性的研究。两种数据模型均经过测试,结果令人满意。当存在大量通用性时,我们能够压缩序列,并且检索算法能够正确检索序列。

著录项

  • 作者

    Radhakrishnan, Radhika.;

  • 作者单位

    University of Nevada, Reno.;

  • 授予单位 University of Nevada, Reno.;
  • 学科 Computer science.;Bioinformatics.
  • 学位 M.S.
  • 年度 2007
  • 页码 48 p.
  • 总页数 48
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号