...
首页> 外文期刊>Genes & Genetic Systems >DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences
【24h】

DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences

机译:DDBJ数据分析挑战:一种机器学习竞赛,以预测DNA序列的拟南芥染色质特征注释

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One solution to this problem is to utilise the power of crowdsourcing. In this report, we describe how we investigated the potential of crowdsourced modelling for a life science task by conducting a machine learning competition, the DNA Data Bank of Japan (DDBJ) Data Analysis Challenge. In the challenge, participants predicted chromatin feature annotations from DNA sequences with competing models. The challenge engaged 38 participants, with a cumulative total of 360 model submissions. The performance of the top model resulted in an area under the curve (AUC) score of 0.95. Over the course of the competition, the overall performance of the submitted models improved by an AUC score of 0.30 from the first submitted model. Furthermore, the 1st- and 2nd-ranking models utilised external data such as genomic location and gene annotation information with specific domain knowledge. The effect of incorporating this domain knowledge led to improvements of approximately 5%-9%, as measured by the AUC scores. This report suggests that machine learning competitions will lead to the development of highly accurate machine learning models for use by experimental scientists unfamiliar with the complexities of data science.
机译:近来,应用机器学习工具的前景自动化下一代序列机的大规模序列注释分析的过程提出了研究人员的兴趣。然而,许多实验生活科学家难以找到具有机器学习技术的研究合作者。这个问题的一个解决方案是利用众包的力量。在本报告中,我们描述了我们如何通过进行机器学习竞赛,日本DNA数据库(DDBJ)数据分析挑战挑战的机器学习竞赛来调查我们如何调查生命科学任务的众群建模的潜力。在挑战中,参与者预测来自具有竞争模型的DNA序列的染色质特征注释。挑战从事38名参与者,累计总共360个型号提交。顶部模型的性能导致曲线(AUC)得分为0.95的区域。在竞争过程中,提交模型的整体性能得到了第一个提交的模型的AUC评分0.30。此外,第1和第2位模型利用外部数据,例如基因组位置和基因注释信息,具体的域知识。通过AUC得分测量,将该域知识的效果导致大约5%-9%的改善。本报告表明,机器学习竞赛将导致高度准确的机器学习模型的开发,用于使用数据科学的复杂性不熟悉的实验科学家使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号