DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences

首页> 外文期刊>Genes & Genetic Systems >DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences

【24h】

DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences

机译：DDBJ数据分析挑战：一种机器学习竞赛，以预测DNA序列的拟南芥染色质特征注释

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One solution to this problem is to utilise the power of crowdsourcing. In this report, we describe how we investigated the potential of crowdsourced modelling for a life science task by conducting a machine learning competition, the DNA Data Bank of Japan (DDBJ) Data Analysis Challenge. In the challenge, participants predicted chromatin feature annotations from DNA sequences with competing models. The challenge engaged 38 participants, with a cumulative total of 360 model submissions. The performance of the top model resulted in an area under the curve (AUC) score of 0.95. Over the course of the competition, the overall performance of the submitted models improved by an AUC score of 0.30 from the first submitted model. Furthermore, the 1st- and 2nd-ranking models utilised external data such as genomic location and gene annotation information with specific domain knowledge. The effect of incorporating this domain knowledge led to improvements of approximately 5%-9%, as measured by the AUC scores. This report suggests that machine learning competitions will lead to the development of highly accurate machine learning models for use by experimental scientists unfamiliar with the complexities of data science.

机译：近来，应用机器学习工具的前景自动化下一代序列机的大规模序列注释分析的过程提出了研究人员的兴趣。然而，许多实验生活科学家难以找到具有机器学习技术的研究合作者。这个问题的一个解决方案是利用众包的力量。在本报告中，我们描述了我们如何通过进行机器学习竞赛，日本DNA数据库（DDBJ）数据分析挑战挑战的机器学习竞赛来调查我们如何调查生命科学任务的众群建模的潜力。在挑战中，参与者预测来自具有竞争模型的DNA序列的染色质特征注释。挑战从事38名参与者，累计总共360个型号提交。顶部模型的性能导致曲线（AUC）得分为0.95的区域。在竞争过程中，提交模型的整体性能得到了第一个提交的模型的AUC评分0.30。此外，第1和第2位模型利用外部数据，例如基因组位置和基因注释信息，具体的域知识。通过AUC得分测量，将该域知识的效果导致大约5％-9％的改善。本报告表明，机器学习竞赛将导致高度准确的机器学习模型的开发，用于使用数据科学的复杂性不熟悉的实验科学家使用。

著录项

来源
《Genes & Genetic Systems》 |2020年第1期|共8页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类遗传学;
关键词
chromatin features prediction; deep learning; machine learning competition; sequence read archive;

机译：染色质特征预测;深度学习;机器学习竞争;序列阅读存档;

相似文献

外文文献
中文文献
专利

1. DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences [J] . Genes & Genetic Systems . 2020,第1期

机译：DDBJ数据分析挑战：一种机器学习竞赛，以预测DNA序列的拟南芥染色质特征注释
2. DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences [J] . Eli Kaminuma, Yukino Baba, Masahiro Mochizuki, Genes & Genetic Systems . 2020,第1期

机译：DDBJ数据分析挑战：一种机器学习竞争，以预测来自DNA序列的拟南芥染色质特征注释
3. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data [J] . Zhen Chen, Pei Zhao, Fuyi Li, Briefings in bioinformatics . 2020,第3期

机译：ILEARN：用于特征工程，机器学习分析和DNA，RNA和蛋白质序列数据建模的集成平台和META学习者
4. Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features [C] . Sunil Kumar, Philipp Bucher Asia-Pacific Bioinformatics Conference . 2016

机译：使用DNA序列固有和细胞型特异性染色质特征预测转录因子位点占用
5. Identification of protein complexes using machine learning (PyBrain and Scikit-Learn) based on DNA sequence data. [D] . Ruangchai, Wuthiwat. 2014

机译：根据DNA序列数据，使用机器学习（PyBrain和Scikit-Learn）识别蛋白质复合物。
6. Chromatin interaction neural network (ChINN): a machine learning-based method for predicting chromatin interactions from DNA sequences [O] . Fan Cao, Yu Zhang, Yichao Cai, 2021

机译：染色质互动神经网络（Chinn）：一种基于机器学习的方法用于预测DNA序列的染色质相互作用
7. DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences [O] . Eli Kaminuma, Yukino Baba, Masahiro Mochizuki, 2020

机译：DDBJ数据分析挑战：一种机器学习竞争，以预测来自DNA序列的染色体染色质特征注释
8. Applying machine learning techniques to DNA sequence analysis. Final report [R] . Shavlik, J. W. , Noordewier, M. O. 1995

机译：将机器学习技术应用于DNa序列分析。总结报告

DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences

摘要

著录项

相似文献

相关主题

期刊订阅