首页> 外文期刊>JMIR Medical Informatics >A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism
【24h】

A Semi-Supervised Learning Approach to Enhance Health Care Community–Based Question Answering: A Case Study in Alcoholism

机译:一种半监督学习方法,以增强基于社区的医疗保健问题解答:酒精中毒的案例研究

获取原文
       

摘要

Background Community-based question answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for Web-based health communities. Objective In this study, we developed an algorithm to automatically answer health-related questions based on past questions and answers (QA). We also aimed to understand information embedded within Web-based health content that are good features in identifying valid answers. Methods Our proposed algorithm uses information retrieval techniques to identify candidate answers from resolved QA. To rank these candidates, we implemented a semi-supervised leaning algorithm that extracts the best answer to a question. We assessed this approach on a curated corpus from Yahoo! Answers and compared against a rule-based string similarity baseline. Results On our dataset, the semi-supervised learning algorithm has an accuracy of 86.2%. Unified medical language system–based (health related) features used in the model enhance the algorithm’s performance by proximately 8%. A reasonably high rate of accuracy is obtained given that the data are considerably noisy. Important features distinguishing a valid answer from an invalid answer include text length, number of stop words contained in a test question, a distance between the test question and other questions in the corpus, and a number of overlapping health-related terms between questions. Conclusions Overall, our automated QA system based on historical QA pairs is shown to be effective according to the dataset in this case study. It is developed for general use in the health care domain, which can also be applied to other CQA sites.
机译:背景技术基于社区的问题解答(CQA)网站在满足健康信息需求方面发挥着重要作用。但是,仍然有大量已发布的问题尚未得到解答。自动回答发布的问题可以为基于Web的健康社区提供有用的信息来源。目的在本研究中,我们开发了一种算法,可以根据过去的问题和答案(QA)自动回答与健康相关的问题。我们还旨在了解嵌入在基于Web的健康内容中的信息,这些信息对于识别有效答案是很好的功能。方法我们提出的算法使用信息检索技术从已解决的质量检查中识别候选答案。为了对这些候选人进行排名,我们实施了一种半监督学习算法,该算法可提取问题的最佳答案。我们在Yahoo!精选的语料库上评估了这种方法。答案并与基于规则的字符串相似性基线进行比较。结果在我们的数据集上,半监督学习算法的准确性为86.2%。模型中使用的基于统一医学语言系统(与健康相关)的功能将算法的性能提高了大约8%。鉴于数据噪声很大,因此可以获得相当高的准确率。区分有效答案和无效答案的重要特征包括文本长度,测试问题中包含的停用词的数量,测试问题与语料库中其他问题之间的距离以及问题之间多个与健康相关的术语重叠。结论总的来说,根据本案例研究中的数据集,基于历史质量检查对的自动化质量检查系统被证明是有效的。它被开发用于医疗保健领域,也可以应用于其他CQA网站。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号