Supervised classification of end-of-lines in clinical text with no manual annotation

机译：临床文本中行尾的监督分类，无需人工注释

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In some plain text documents, end-of-line marks may or may not mark the boundary of a text unit (e.g., of a paragraph). This vexing problem is likely to impact subsequent natural language processing components, but is seldom addressed in the literature. We propose a method which uses no manual annotation to classify whether end-of-lines must actually be seen as simple spaces (soft line breaks) or as true text unit boundaries. This method, which includes self-training and co-training steps based on token and line length features, achieves 0.943 F-measure on a corpus of short e-books with controlled format, F=0.904 on a random sample of 24 clinical texts with soft line breaks, and F=0.898 on a larger set of mixed clinical texts which may or may not contain soft line breaks, a fairly high value for a method with no manual annotation.

机译：在某些纯文本文档中，行尾标记可以标记也可以不标记文本单元（例如段落）的边界。这个令人烦恼的问题可能会影响随后的自然语言处理组件，但是在文献中很少涉及。我们提出一种不使用手动注释的方法来对行尾是否实际上必须视为简单的空格（软换行符）或真正的文本单元边界进行分类。该方法包括基于令牌和行长特征的自我训练和共训练步骤，可对24种临床文本的随机样本，受控格式的简短电子书的语料库实现0.943 F测度，F = 0.904软换行符，并且在较大的混合临床文本集上F = 0.898，这些文本可能包含也可能不包含软换行符，对于没有人工注释的方法来说，这是一个相当高的值。

著录项

来源
《Fifth workshop on building and evaluating resources for biomedical text mining》|2016年|80-88|共9页
会议地点 Osaka(JP)
作者
Pierre Zweigenbaum; Cyril Grouin; Thomas Lavergne;
展开▼
作者单位

LIMSI, CNRS, Universite Paris-Saclay 91405 Orsay, France;

LIMSI, CNRS, Universite Paris-Saclay 91405 Orsay, France;

LIMSI, CNRS, Univ. Paris-Sud, Universite Paris-Saclay 91405 Orsay, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text [J] . Brett R. South, Danielle Mowery, Ying Suo, Journal of biomedical informatics. . 2014,第Null期

机译：评估机器预注释和交互式注释界面对手动取消识别临床文本的影响
2. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
3. Quantitative analysis of manual annotation of clinical text samples [J] . Minarro-Gimenez Jose A., Cornet Ronald, Jaulent M. C., International journal of medical informatics . 2019,第MARa期

机译：定量注释临床文本样本的定量分析
4. Supervised classification of end-of-lines in clinical text with no manual annotation [C] . Pierre Zweigenbaum, Cyril Grouin, Thomas Lavergne Workshop on building and evaluating resources for biomedical text mining . 2016

机译：没有手动注释的临床文本中的线末端的分类
5. Evaluating the effects of noninteractive and machine-assisted interactive manual clinical text annotation approaches on the quality of reference standards. [D] . South, Brett Ray. 2014

机译：评估非交互式和机器辅助交互式手册临床文本注释方法对参考标准质量的影响。
6. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text [O] . Brett R. South, Danielle Mowery, Ying Suo, -1

机译：评估机器预注释和交互式注释界面对手动取消识别临床文本的影响
7. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text [O] . South Brett R., Mowery Danielle, Suo Ying, 2014

机译：评估机器预注释和交互式注释界面对手动取消识别临床文本的影响

Supervised classification of end-of-lines in clinical text with no manual annotation

摘要

著录项

相似文献

相关主题

期刊订阅