Bio-AnswerFinder: a system to find answers to questions from biomedical texts

Ibrahim Burak Ozyurt; Anita Bandrowski; Jeffrey S Grethe

摘要

The ever accelerating pace of biomedical research results in corresponding acceleration in the volume of biomedical literature created. Since new research builds upon existing knowledge, the rate of increase in the available knowledge encoded in biomedical literature makes the easy access to that implicit knowledge more vital over time. Toward the goal of making implicit knowledge in the biomedical literature easily accessible to biomedical researchers, we introduce a question answering system called Bio-AnswerFinder. Bio-AnswerFinder uses a weighted-relaxed word mover's distance based similarity on word/phrase embeddings learned from PubMed abstracts to rank answers after question focus entity type filtering. Our approach retrieves relevant documents iteratively via enhanced keyword queries from a traditional search engine. To improve document retrieval performance, we introduced a supervised long short term memory neural network to select keywords from the question to facilitate iterative keyword search. Our unsupervised baseline system achieves a mean reciprocal rank score of 0.46 and Precision@1 of 0.32 on 936 questions from BioASQ. The answer sentences are further ranked by a fine-tuned bidirectional encoder representation from transformers (BERT) classifier trained using 100 answer candidate sentences per question for 492 BioASQ questions. To test ranking performance, we report a blind test on 100 questions that three independent annotators scored. These experts preferred BERT based reranking with 7% improvement on MRR and 13% improvement on Precision@1 scores on average.

机译：生物医学研究的速度促进了生物医学文学体积的相应加速度。由于新研究建立了现有知识，因此生物医学文献编码的可用知识的增加率使得随着时间的推移，可以轻松地获得隐式知识。迈向生物医学研究人员容易访问的生物医学文献中隐含知识的目标，我们介绍了一个名为Bio-anceptfinder的问题应答系统。生物答复器使用加权放松的Word Mover在Word /短语eMbeddings上从PubMed摘要中学到的eMbeddings在问题焦点实体类型过滤后排名答案。我们的方法通过传统搜索引擎的增强关键字查询迭代地检索相关文件。为了提高文档检索性能，我们介绍了一个监督的长期内存神经网络，从问题中选择关键字，以方便迭代关键字搜索。我们无监督的基线系统实现了0.46的平均互惠级别得分，精确于生物纳萨的936个问题0.32。答案句子进一步由来自变压器（BERT）分类器的微调双向编码器表示，使用100个应答候选句子为492个生物纳清问题。为了测试排名性能，我们向100个问题进行了盲目的测试，即三个独立的注释者得分。这些专家优选的是，基于BERT RERANKING，对MRR的7％改善，平均对1分1分13％的改善。

Bio-AnswerFinder: a system to find answers to questions from biomedical texts

摘要

著录项

相关主题

期刊订阅