BackgroundIn bioinformatics, we pre-process raw data into a format ready for answering medical and biological questions. A key step in processing is labeling the measured features with the identities of the molecules purportedly assayed: “>molecular identification” (>MI). Biological meaning comes from identifying these molecular measurements correctly with actual molecular species. But MI can be incorrect. >Identifier filtering (>IDF) selects features with more trusted MI, leaving a smaller, but more correct dataset. >Identifier mapping (>IDM) is needed when an analyst is combining two high-throughput (HT) measurement platforms on the same samples. IDM produces ID pairs, one ID from each platform, where the mapping declares that the two analytes are associated through a causal path, direct or indirect (example: pairing an ID for an mRNA species with an ID for a protein species that is its putative translation). Many competing solutions for IDF and IDM exist. Analysts need a rigorous method for evaluating and comparing all these choices.
展开▼