Supervised classification of end-of-lines in clinical text with no manual annotation

机译：没有手动注释的临床文本中的线末端的分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In some plain text documents, end-of-line marks may or may not mark the boundary of a text unit (e.g., of a paragraph). This vexing problem is likely to impact subsequent natural language processing components, but is seldom addressed in the literature. We propose a method which uses no manual annotation to classify whether end-of-lines must actually be seen as simple spaces (soft line breaks) or as true text unit boundaries. This method, which includes self-training and co-training steps based on token and line length features, achieves 0.943 F-measure on a corpus of short e-books with controlled format, F=0.904 on a random sample of 24 clinical texts with soft line breaks, and F=0.898 on a larger set of mixed clinical texts which may or may not contain soft line breaks, a fairly high value for a method with no manual annotation.

机译：在一些纯文本文档中，行终点标记可能或可能不会标记文本单元的边界（例如，段落）。这个烦恼问题可能会影响随后的自然语言处理组件，但很少在文献中解决。我们提出了一种方法，该方法使用没有手动注释来分类线尾是否必须被视为简单的空格（软线中断）或真实的文本单位边界。这种方法包括基于令牌和线长特征的自培训和共同训练步骤，在具有受控格式的短电子书的语料库上实现0.943 F测量，F = 0.904在24个临床文本的随机样本上软线断裂，并且F = 0.898在一组较大的混合临床文本上，可能或可能不包含软线断裂，对于没有手动注释的方法相当高。

著录项

来源
《Workshop on building and evaluating resources for biomedical text mining》|2016年|xi 142 p.|共9页
会议地点
作者
Pierre Zweigenbaum; Cyril Grouin; Thomas Lavergne;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text [J] . Brett R. South, Danielle Mowery, Ying Suo, Journal of biomedical informatics. . 2014,第Null期

机译：评估机器预注释和交互式注释界面对手动取消识别临床文本的影响
2. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
3. Quantitative analysis of manual annotation of clinical text samples [J] . Minarro-Gimenez Jose A., Cornet Ronald, Jaulent M. C., International journal of medical informatics . 2019,第MARa期

机译：定量注释临床文本样本的定量分析
4. Supervised classification of end-of-lines in clinical text with no manual annotation [C] . Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne Fifth workshop on building and evaluating resources for biomedical text mining . 2016

机译：临床文本中行尾的监督分类，无需人工注释
5. Evaluating the effects of noninteractive and machine-assisted interactive manual clinical text annotation approaches on the quality of reference standards. [D] . South, Brett Ray. 2014

机译：评估非交互式和机器辅助交互式手册临床文本注释方法对参考标准质量的影响。
6. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text [O] . Brett R. South, Danielle Mowery, Ying Suo, -1

机译：评估机器预注释和交互式注释界面对手动取消识别临床文本的影响
7. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text [O] . South Brett R., Mowery Danielle, Suo Ying, 2014

机译：评估机器预注释和交互式注释界面对手动取消识别临床文本的影响

Supervised classification of end-of-lines in clinical text with no manual annotation

摘要

著录项

相似文献

相关主题

期刊订阅