In many de novo sequencing applications, partial knowledge is relatively easy to obtain. For example, antibodies contain conserved and hypervariable segments, and a peptide from a digest may recognizably overlap both types of regions. Nerve toxins are another important class of de novo sequencing targets, and in these peptides the numbers and positions of cysteines are often predictable from homology. Finally, in some applications such as biotechnology or fossil bones, the protein of interest may be known, but the exact sequence may be unknown. Partial knowledge constrains the search space for de novo sequencing, thereby reducing error rate and enabling exact sequencing even in cases of incomplete fragmentation. Current de novo sequencing algorithms cannot take full advantage of partial knowledge, however, so we devised a new algorithm that builds partial knowledge into the core graph algorithm that generates candidate sequences.
展开▼