As newly sequenced proteins are deposited into the world's ever-growing archives, they are typically immediately tested by various algorithms for clues as to their biological structure and function. One question about a new protein involves its celluar location - that is, where the protein resides in a living organism (e.g., extracellular, membrane, nuclear). A human-created five-way algorithm for cellular location using statistical techniques with 76percent accuracy was recently reported. This paper describes a two-way algorithm that was evolved using genetic programming with 83percent accuracy for determining whether a protein is an extracellular protein, 84percent for nuclear proteins, 89percent for membrane proteins, and 83percent for anchored membrane proteins. Unlike the statistical calculation, the genetically evolved programs employ a large and varied arsenal of computational capabilities, including arithmetic functions, conditional operations, subroutines, iterations, named memory, indexed memory, setcreating operations, and look-ahead. The genetically evolved classification program can be viewed as an extension (which we call a programmatic motif) of the conventional notion of a protein motif.
展开▼