首页> 外文会议>International Conference on Document Analysis and Recognition >Extraction of Spelling Variations from Language Structure for Noisy Text Correction

【24h】

Extraction of Spelling Variations from Language Structure for Noisy Text Correction

机译：从语言结构中提取拼写变化以进行嘈杂的文本校正

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We describe a novel approach for the extraction of spelling variations from a list of instances. It relates emph{distinctive infixes} to emph{distinctive infixes} of referenced words. The distinctive infixes are extracted automatically from a (multi)set of instances and a referenced dictionary without any additional expert knowledge. Based on the spelling variations retrieved during a learning(training) phase we develop a correction algorithm which suggests and ranks candidates for a particular noisy word. The main advantage of our approach is that it provides good corrections for the unobserved noisy words while it is almost perfect on words observed during the learning. Our experimental results of the normalisation of a typical reference corpus of Early Modern English letters, significantly improve over previous results of VARD2. We also achieve better results than those reported incite{SMM07} and cite{MMGRSR07} on the OCR-correction of the TREC-5 Confusion Track corpus[5].

机译：我们描述了一种从实例列表中提取拼写变化的新颖方法。它将所指单词的英特{与众不同的内缀}与英格{与众不同的内缀}关联起来。特殊词缀是从（多个）实例集和引用的词典中自动提取的，而无需任何其他专家知识。基于在学习（训练）阶段中检索到的拼写变化，我们开发了一种校正算法，该算法可以为特定的有噪声单词建议和排列候选单词。我们的方法的主要优点是，它为未观察到的嘈杂词提供了良好的校正，而对于在学习过程中观察到的词则几乎是完美的。我们对早期现代英语字母典型参考语料库进行规范化的实验结果大大优于VARD2的先前结果。与TREC-5 Confusion Track语料库的OCR校正[5]相比，我们也获得了比报道[SMM07}和引用{MMGRSR07}更好的结果。

著录项

来源
《International Conference on Document Analysis and Recognition 》|2013年|324-328|共5页
会议地点
作者
Gerdjikov Stefan; Mihov Stoyan; Nenchev Vladislav;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
finite state automata; noisy texts correction; spelling variations;

机译：有限状态自动机;噪声文本校正;拼写变化;

相似文献

外文文献
中文文献
专利

1. Using a Natural Language Processing and Machine Learning Algorithm Program to Analyze Inter-Radiologist Report Style Variation and Compare Variation Between Radiologists When Using Highly Structured Versus More Free Text Reporting [J] . Lane F. Donnelly, Robert Grzeszczuk, Carolina V. Guimaraes, Current Problems in Diagnostic Radiology . 2019 ,第6期

机译：使用自然语言处理和机器学习算法程序来分析放射性学专家报告样式变化，并在使用高度结构化与更多自由文本报告时比较放射科学家之间的变化
2. REPRESENTATION, ANALYSIS, AND EXTRACTION OF KNOWLEDGE FROM UNSTRUCTURED NATURAL LANGUAGE TEXTS [J] . Hoherchak H., Darchuk N., Kryvyi S. Cybernetics and Systems Analysis . 2021 ,第3期

机译：来自非结构化自然语言文本的知识的表示，分析和提取
3. Automatic Extraction of Engineering Rules From Unstructured Text: A Natural Language Processing Approach [J] . Xinfeng Ye, Yuqian Lu Journal of Computing and Information Science in Engineering . 2020 ,第3期

机译：从非结构化文本自动提取工程规则：一种自然语言处理方法
4. Extraction of spelling variations from language structure for noisy text correction [C] . Stefan Gerdjikov, Stoyan Mihov, Vladislav Ncnchev International Conference on Document Analysis and Recognition . 2013

机译：噪声文本校正语言结构的拼写变化提取
5. Natural Language Processing on Noisy Text [D] . Dong, Rui. 2021

机译：嘈杂的文本的自然语言处理
6. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system [O] . Beata Fonferko-Shadrach, Arron S Lacey, Angus Roberts, 2019

机译：使用自然语言处理从非结构化临床信函中提取结构性癫痫数据：ExECT（癫痫临床文本摘录）系统的开发和验证
7. Towards the Natural Language Processing as Spelling Correction for Offline Handwritten Text Recognition Systems [O] . Arthur Flor de Sousa Neto, Byron Leite Dantas Bezerra, Alejandro Héctor Toselli 2020

机译：朝着离线手写文本识别系统的拼写纠正自然语言处理

Extraction of Spelling Variations from Language Structure for Noisy Text Correction

摘要

著录项

相似文献

相关主题

期刊订阅