Microbial genomes at NCBI represent a large collection containing almost 30,000 genomes from more than 5,000 species. The quality and sampling density of the bacterial genome assemblies vary greatly: human pathogens are densely sampled while other bacteria are less represented. The variation in frequency of occurrences of different proteins in genome annotation is another factor contributing to the complexity of the analysis and presentation of the data. Redundancy in the results make them difficult to analyze and use, as the nearest-neighbor lists may often contain many nearly identical objects making it difficult or impossible to reflect more distant neighbor relationships. The complex data we work with requires the information to be organized, processed and shown at multiple levels of resolution, with appropriate levels of phylogenomic resolution and protein similarity and an adequate sampling strategy.
展开▼