Multi-Task Fine-Tuning on BERT Using Spelling Errors Correction for Chinese Text Classification Robustness

机译：用拼写错误纠正校正中文文本分类鲁棒性的多任务微调

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spelling errors are common in our daily life and the industrial application, caused by automatic speech recognition, optional character recognition and human writing. Because of lack of robustness, Text classification models trained on clean datasets tend to perform poorly on the datasets with spelling errors. We conduct experiments to find out the influence of spelling errors on the performance of Chinese text classification and solve the Chinese text classification task with spelling errors by multi-task fine-tuning on BERT. We use spelling errors correction task to assist the text classification task. The results on four Chinese text classification datasets show that our method can effectively improve the robustness of the classification model which decrease the influence of spelling errors and prove the effectiveness of multi-task fine-tuning on BERT.

机译：拼写错误在我们的日常生活和工业应用中常见，由自动语音识别，可选的字符识别和人类写作引起。由于缺乏稳健性，在干净数据集上培训的文本分类模型往往会在具有拼写错误的数据集上执行不良。我们进行实验，以找出拼写错误对中文文本分类性能的影响，解决中文文本分类任务，通过伯爵多任务微调拼写错误。我们使用拼写错误纠正任务来帮助文本分类任务。四个中文文本分类数据集的结果表明，我们的方法可以有效提高分类模型的稳健性，这减少了拼写错误的影响，并证明了伯特多任务微调的有效性。

著录项

来源
《International Conference on Big Data and Artificial Intelligence》|2021年|110-114|共5页
会议地点
作者
JunLi Xu; JiaHui Hao; XiMo Bian; XiaoMei Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Conferences; Text categorization; Bit error rate; Big Data; Robustness; Data models; Error correction;

机译：会议;文本分类;误码率;大数据;鲁棒性;数据模型;纠错;

相似文献

外文文献
中文文献
专利

1. BERT MODEL FINE-TUNING FOR TEXT CLASSIFICATION IN KNEE OA RADIOLOGY REPORTS [J] . Osteoarthritis and cartilage . 2020,第Suppla1期

机译：BERT模型微调膝关节OA放射学报告文本分类
2. OCRSpell: an interactive spelling correction system for OCR errors in text [J] . Kazem Taghva, Eric Stofsky International Journal on Document Analysis and Recognition . 2001,第3期

机译：OCRSpell：用于文本中OCR错误的交互式拼写更正系统
3. Spelling Error Correction with BERT based on Character-Phonetic [C] . Min Tan, Dagang Chen, Zesong Li, IEEE International Conference on Computer and Communications . 2020

机译：基于字符拼音的BERT拼写误差校正
4. A study of spelling errors in word processing: Detection and correction. [D] . Diaz-Figueroa, Maria I. 2007

机译：文字处理中的拼写错误研究：检测和更正。
5. Improved chemical text mining of patents using infinite dictionaries translation and automatic spelling correction [O] . Roger A Sayle, Plamen Petrov, Jon Winter, 2011

机译：使用无限词典翻译和自动拼写更正改进了专利的化学文本挖掘
6. OCRSpell: an interactive spelling correction system for OCR errors in text [O] . Kazem Taghva, Eric Stofsky 2001

机译：OCRSpell：用于文本中OCR错误的交互式拼写校正系统
7. SPEEDCOP: Automatic Spelling Error Detection and Correction for Large Data Bases [R] . Pollock, J. J. 1981

机译：spEEDCOp：大型数据库的自动拼写错误检测和纠正

Multi-Task Fine-Tuning on BERT Using Spelling Errors Correction for Chinese Text Classification Robustness

摘要

著录项

相似文献

相关主题

期刊订阅