首页> 美国卫生研究院文献>Journal of the American Medical Informatics Association : JAMIA >Expert guided natural language processing using one-class classification
【2h】

Expert guided natural language processing using one-class classification

机译:专家指导的自然语言处理采用一类分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Introduction Automatically identifying specific phenotypes in free-text clinical notes is critically important for the reuse of clinical data. In this study, the authors combine expert-guided feature (text) selection with one-class classification for text processing.>Objectives To compare the performance of one-class classification to traditional binary classification; to evaluate the utility of feature selection based on expert-selected salient text (snippets); and to determine the robustness of these models with respects to irrelevant surrounding text.>Methods The authors trained one-class support vector machines (1C-SVMs) and two-class SVMs (2C-SVMs) to identify notes discussing breast cancer. Manually annotated visit summary notes (88 positive and 88 negative for breast cancer) were used to compare the performance of models trained on whole notes labeled as positive or negative to models trained on expert-selected text sections (snippets) relevant to breast cancer status. Model performance was evaluated using a 70:30 split for 20 iterations and on a realistic dataset of 10 000 records with a breast cancer prevalence of 1.4%.>Results When tested on a balanced experimental dataset, 1C-SVMs trained on snippets had comparable results to 2C-SVMs trained on whole notes (F = 0.92 for both approaches). When evaluated on a realistic imbalanced dataset, 1C-SVMs had a considerably superior performance (F = 0.61 vs. F = 0.17 for the best performing model) attributable mainly to improved precision (p = .88 vs. p = .09 for the best performing model).>Conclusions 1C-SVMs trained on expert-selected relevant text sections perform better than 2C-SVMs classifiers trained on either snippets or whole notes when applied to realistically imbalanced data with low prevalence of the positive class.
机译:>简介:自动识别自由文本临床注释中的特定表型对于重复使用临床数据至关重要。在这项研究中,作者将专家指导的特征(文本)选择与一类分类相结合,以进行文本处理。>目标:比较一类分类与传统二元分类的性能;根据专家选择的突出文本(摘要)评估特征选择的实用性;并确定这些模型针对不相关的周围文本的鲁棒性。>方法作者训练了一类支持向量机(1C-SVM)和两类支持向量机(2C-SVM)来识别注释。讨论乳腺癌。手动注释的访问摘要笔记(乳腺癌的88例阳性和88例阴性)用于比较在标记为阳性或阴性的整个笔记上训练的模型与在与乳腺癌状态相关的专家选择的文本部分(摘要)上训练的模型的性能。使用70:30的比例进行20次迭代,并在包含10,000份乳腺癌患病率为1.4%的真实记录的10 000条记录中评估模型性能。>结果在平衡的实验数据集上进行测试时,1C-SVM在摘要上训练的结果与在整个音符上训练的2C-SVM的结果相当(两种方法的F = 0.92)。在现实的不平衡数据集上进行评估时,1C-SVM具有相当优越的性能(对于性能最佳的模型,F = 0.61 vs. F = 0.17)主要归因于精度的提高(对于最佳效果,p = .88 vs. p = .09) >结论:在专家级选择的相关文本部分上训练的1C-SVM分类器在摘要或整个音符上进行训练的2C-SVM分类器,在应用于实际不平衡数据时,其阳性类别的患病率较低,其表现要好于摘要或整个音符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号