为了满足对不同物种进行DNA序列分析的需求和适应DNA序列数据的快速增长,针对目前DNA序列分析软件大都各自实现一套序列存储和查询功能,工作重复且没有考虑并行性、扩展性和分布式系统或环境的缺陷,基于DNA序列分析的基本操作k-mer匹配,设计并实现了一个面向 TB 量级的 DNA 序列匹配软件库———k-mer 查找接口( KSI)。 KSI提供了一套分布式环境下的编程接口,并且针对生物计算领域的DNA序列匹配进行优化。实验显示,KSI为DNA序列匹配提供了一个高效的解决方案。%It was paid attention that current mainstream softwares for DNA sequence analysis perform much repetitive work because they mostly implement a set of functions for sequence storage and query for their own use, and their design ignores the requirements of parallelism, scalability and distributed environment, while the volume of DNA data is increasing rapidly.To meet the needs for analysis of different species’ DNA sequences, and adapt to DNA data’s rapid increase, a DNA sequence matching library for terabyte scale bio-data, called the k-mer searching in-terface ( KSI) , was designed and implemented based on k-mer matching, the basic operation for DNA sequence processing.KSI provides a set of application programming interfaces ( APIs) under distributed computing environ-ments, and optimizes the DNA sequence matching in the biological computing field.The experimental results show that KSI is an efficient and scalable solution for big bio-data processing.
展开▼