Automated Identification of Protein Classification and Detection of Annotation Errors in Protein Databases Using Statistical Approaches

机译：使用统计方法自动识别蛋白质分类并检测蛋白质数据库中的注释错误

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Because of the importance of proteins in life sciences, biologists have put great effort to elucidate their structures, functions and expression profiles to help us understand their roles in living cells in the past few decades. Currently, protein databases are widely used by biologists. Hence it is critical that the information that researcher work with should be as accurate as possible. However, the sizes of these databases are increasing rapidly, and existing protein databases are already known to contain annotation errors. In this paper, we investigate the reason why protein databases possess mis-annotated sequence data. Then, by using some statistical approaches, we derive a method to automatically filter and assess the reliability of the data from databases. This is important to provide accurate information to researchers and will help reduce further errors in annotation resulting from existed mis-annotated sequence data. Our initial experiments proved our theoretical findings, and show that our methods can effectively detect the mis-annotated sequence data.

机译：由于蛋白质在生命科学中的重要性，生物学家付出了巨大的努力来阐明它们的结构，功能和表达特征，以帮助我们了解它们在过去几十年中在活细胞中的作用。目前，蛋白质数据库已被生物学家广泛使用。因此，至关重要的是研究人员使用的信息应尽可能准确。但是，这些数据库的大小正在迅速增加，并且已知现有的蛋白质数据库包含注释错误。在本文中，我们研究了蛋白质数据库拥有错误注释的序列数据的原因。然后，通过使用一些统计方法，我们得出了一种自动过滤和评估数据库数据可靠性的方法。这对于向研究人员提供准确的信息很重要，并将有助于减少由于存在错误注释的序列数据而导致的注释中的进一步错误。我们的初步实验证明了我们的理论发现，并表明我们的方法可以有效地检测错误注释的序列数据。

著录项

来源
《PAKDD 2006 International Workshop on Knowledge Discovery in Life Science Literature(KDLL 2006); 20060409; Singapore(SG)》|2006年|P.123-138|共16页
会议地点 Singapore(SG)
作者
Kang Ning; Hon Nian Chua;
展开▼
作者单位

School of Computing, National University of Singapore, 3 Science Drive 2, 117543, Singapore;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类定量生物学;
关键词

相似文献

外文文献
中文文献
专利

1. SUS-BAR: a database of pig proteins with statistically validated structural and functional annotation [J] . Damiano Piovesan, Giuseppe Profiti, Luca Fontanesi, Database . 2013,第3期

机译：SUS-BAR：具有经统计验证的结构和功能注释的猪蛋白数据库
2. EUCLID:automatic classification of proteins in functional classes by their database annotations [J] . Javier Tamames... Bioinformatics . 1998,第6期

机译：EUCLID：通过其数据库注释对功能类中的蛋白质进行自动分类
3. ARC: Automated Resource Classifier for agglomerative functional classification of prokaryotic proteins using annotation texts [J] . Muthiah Gnanamani, Naveen Kumar, Srinivasan Ramachandran Journal of Biosciences . 2007,第5期

机译：ARC：使用注释文本对原核蛋白进行聚集功能分类的自动化资源分类器
4. Automated Identification of Protein Classification and Detection of Annotation Errors in Protein Databases Using Statistical Approaches [C] . Kang Ning, Hon Nian Chua PAKDD International Workshop on Knowledge Discovery in Life Science Literature . 2006

机译：使用统计方法自动鉴定蛋白质分类和蛋白质数据库注释误差的检测
5. Identification of interface residues involved in protein-protein and protein-DNA interactions from sequence using machine learning approaches. [D] . Yan, Changhui. 2005

机译：使用机器学习方法从序列中识别参与蛋白质-蛋白质和蛋白质-DNA相互作用的界面残基。
6. Molecular and statistical approaches to the detection and correction of errors in genotype databases. [O] . L M Brzustowicz, C Mérette, X Xie, 1993

机译：用于检测和纠正基因型数据库中错误的分子和统计方法。
7. Design and practical usage of web biological databases for the annotation and classification of proteins [O] . Hermoso Pulido Toni 2015

机译：Web生物数据库用于蛋白质注释和分类的设计和实际使用

Automated Identification of Protein Classification and Detection of Annotation Errors in Protein Databases Using Statistical Approaches

摘要

著录项

相似文献

相关主题

期刊订阅