Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

Manabu Torii; Kavishwar Wagholikar; Hongfang Liu

首页> 外文期刊>Journal of Biomedical Semantics >Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

【24h】

Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

机译：使用隐藏的马尔可夫模型检测生物医学文本中的概念提及：一次还是一次选择多个概念类型？

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Identifying phrases that refer to particular concept types is a critical step in extracting information from documents. Provided with annotated documents as training data, supervised machine learning can automate this process. When building a machine learning model for this task, the model may be built to detect all types simultaneously (all-types-at-once) or it may be built for one or a few selected types at a time (one-type- or a-few-types-at-a-time). It is of interest to investigate which strategy yields better detection performance. Results Hidden Markov models using the different strategies were evaluated on a clinical corpus annotated with three concept types (i2b2/VA corpus) and a biology literature corpus annotated with five concept types (JNLPBA corpus). Ten-fold cross-validation tests were conducted and the experimental results showed that models trained for multiple concept types consistently yielded better performance than those trained for a single concept type. F-scores observed for the former strategies were higher than those observed for the latter by 0.9 to 2.6% on the i2b2/VA corpus and 1.4 to 10.1% on the JNLPBA corpus, depending on the target concept types. Improved boundary detection and reduced type confusion were observed for the all-types-at-once strategy. Conclusions The current results suggest that detection of concept phrases could be improved by simultaneously tackling multiple concept types. This also suggests that we should annotate multiple concept types in developing a new corpus for machine learning models. Further investigation is expected to gain insights in the underlying mechanism to achieve good performance when multiple concept types are considered.

机译：背景技术识别引用特定概念类型的短语是从文档中提取信息的关键步骤。提供带注释的文档作为培训数据，有监督的机器学习可以使此过程自动化。在为此任务构建机器学习模型时，可以构建模型以同时检测所有类型（一次所有类型），也可以一次为一种或几种选定类型（一种类型或一种）构建模型。一次有几种类型）。研究哪种策略可以产生更好的检测性能是很有意义的。结果在注释了三种概念类型的临床语料库（i2b2 / VA语料库）和注释了五种概念类型的生物学文献语料库（JNLPBA语料库）上，评估了使用不同策略的隐马尔可夫模型。进行了十次交叉验证测试，实验结果表明，针对多个概念类型训练的模型始终比针对单个概念类型训练的模型产生更好的性能。在前两种策略中观察到的F分数在i2b2 / VA语料库中观察到的F分数要比在后者中观察到的F分数高，在JNLPBA语料库中则是1.4到10.1％，这取决于目标概念的类型。对于所有类型一次策略，改进了边界检测并减少了类型混乱。结论当前的结果表明，通过同时处理多种概念类型可以改善概念短语的检测。这也建议我们在为机器学习模型开发新的语料库时应注释多种概念类型。当考虑多种概念类型时，预期将进行进一步的研究以获取实现良好性能的基本机制。

著录项

来源
《Journal of Biomedical Semantics》 |2014年第s1期|共页
作者
Manabu Torii; Kavishwar Wagholikar; Hongfang Liu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类神经生理学;
关键词

相似文献

外文文献
中文文献
专利

1. Hidden Markov Models for Time Series : An Introduction Using R . Walter ? Zucchini , Iain L. ? McDonald , and Roland ? Langrock , Boca Raton , CRC Press Hidden Markov Models for Time Series Hidden Markov Models for Time Series : An Introduction Using R An Introduction Using R . Walter ? Zucchini Walter Walter ? Zucchini Zucchini , Iain L. ? McDonald Iain L. Iain L. ? McDonald McDonald , and Roland ? Langrock Roland Roland ? Langrock Langrock , Boca Raton Boca Raton , CRC Press CRC Press [J] . Patterson Toby Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2019,第2期

机译：隐藏的马尔可夫模型时间序列：使用r的简介。沃尔特？西葫芦，Iain L.？麦当劳和罗兰？ Langrock，Boca Raton，CRC压力隐马尔可夫模型用于时间序列隐藏式马尔可夫型号的时间序列：使用R引言使用R引言。沃尔特？西葫芦沃尔特沃尔特？西葫芦夏南瓜，Iain L.？麦当劳Iain L. Iain L.？麦当劳麦当劳，罗兰？ Langrock Roland Roland？ Langrock Langrock，Boca Raton Boca Raton，CRC按CRC压力机
2. A Language-Independent Acronym Extraction From Biomedical Texts With Hidden Markov Models [J] . Osiek B. A., Xexéo G., Carvalho L. A. V. de Biomedical Engineering, IEEE Transactions on . 2010,第11期

机译：具有隐马尔可夫模型的生物医学文本的语言独立首字母缩略词提取
3. Mining Hidden Connections Among Biomedical Concepts from Disjoint Biomedical Literature Sets Through Semantic-Based Association Rule [J] . Xiaohua Hu, Xiaodan Zhang, Illhoi Yoo, International journal of entelligent systems . 2010,第2期

机译：通过基于语义的关联规则从不连续的生物医学文献集中挖掘生物医学概念之间的隐藏联系
4. IP-traceback based attacker tracking: a probabilistic technique for detecting Internet attacks using the concept of hidden Markov models [C] . Raviteja Varanasi, Phoha V.V., Shrijit Joshi Information Assurance Workshop, 2004. Proceedings from the Fifth Annual IEEE SMC . 2004

机译：基于IP跟踪的攻击者跟踪：一种使用隐马尔可夫模型概念检测Internet攻击的概率技术
5. Hidden Markov Models for Analysis of Multimodal Biomedical Images. [D] . Shenoy, Renuka Vidyut. 2016

机译：用于多模式生物医学图像分析的隐马尔可夫模型。
6. Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? [O] . Manabu Torii, Kavishwar Wagholikar, Hongfang Liu 2014

机译：使用隐藏的马尔可夫模型检测生物医学文本中的概念提及：一次还是一次选择多个概念类型？
7. Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? [O] . Manabu Torii, Kavishwar Wagholikar, Hongfang Liu 2014

机译：使用隐藏的马尔可夫模型检测生物医学文本中的概念提及：一次还是一次选择多个概念类型？
8. Evidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov Model [R] . Del Rose, M., Wagner, C., Frederick, P. 2011

机译：证据前馈隐马尔可夫模型：一种新型的隐马尔可夫模型

Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

摘要

著录项

相似文献

相关主题

期刊订阅