Abstract Background Biomedical semantic indexing is important for information retrieval and many other research fields in bioinformatics. It annotates biomedical citations with Medical Subject Headings. In face of unbalanced category distribution in the training data, sampling methods are difficult to apply for semantic indexing task. Results In this paper, we present a novel deep serial multi-task learning model. The primary task treats the biomedical semantic indexing as a multi-label text classification issue that considers the relations of the labels. The auxiliary task is a regression task that predicts the MeSH number of the citation and provides hints for the network to make it converge faster. The experimental results on the BioASQ-Task5A open dataset show that our model outperforms the state-of-the-art solution “MTI”, proposed by the US National Library of Medicine. Further, it not only achieves the highest precision among all the solutions in BioASQ-Task5A but also has faster convergence speed compared with some naive deep learning methods. Conclusions Rather than parallel in an ordinary multi-task structure, the tasks in our model are serial and tightly coupled. It can achieve satisfied performance without any handcrafted feature.
展开▼
机译:摘要背景生物医学语义索引对于信息检索和许多生物信息学中的许多其他研究领域很重要。它用医疗主题标题注释了生物医学引文。面对训练数据中不平衡的类别分布,采样方法难以应用语义索引任务。结果本文介绍了一种新型串行多任务学习模型。主要任务将生物医学语义索引视为考虑标签关系的多标签文本分类问题。辅助任务是一种回归任务,可预测引文的网格号,并为网络提供提示使其更快地收敛。 Bioasq-Task5A Open DataSet上的实验结果表明,我们的模型优于美国国家医学图书馆提出的最先进的解决方案“MTI”。此外,与一些天真的深度学习方法相比,它不仅可以实现所有解决方案中的所有解决方案中的最高精度。结论而不是在普通的多任务结构中平行,我们模型中的任务是串行和紧密耦合的。它可以在没有任何手工制作功能的情况下实现满意的表现。
展开▼