Since the beginning of the Human Genome Project, which aims at sequencing all the human '5 genetic information, a large amount of sequence data has been generated. Much attention is now given to the analysis of this data. A great part of these analysis is carried out with the use of intelligent computational techniques. However; many of the genetic databases are characterized by the presence of noisy data, which can deteriorate the performance of the computational techniques applied. This work studies the influence of noisy data in the training of three different learning methods: Decision Trees, Artificial Neural Networks and Support Vector Machines. The task investigated is the recognition of splice junctions in DNA sequences, which is part of the gene identification problem. Results indicate that the elimination of noisy patterns from the dataset can improve the learning algorithms' performance, with no significant reduction in their generalization ability.
展开▼