首页> 外国专利> AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS

AUTOMATIC EXTRACTION OF A TRAINING CORPUS FOR A DATA CLASSIFIER BASED ON MACHINE LEARNING ALGORITHMS

机译：基于机器学习算法的数据分类器训练语料库的自动提取

页面导航

摘要
著录项
相似文献

摘要

An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.

机译：未分段电子文档的迭代分类器基于机器学习算法。使用复合字典对电子文档中的文本字符串进行分段，该复合字典结合了常规字典和基于电子文档的上下文和性质开发的自适应字典。该分类器是使用训练和测试样本集构建的，该样本集是通过检测一组针对文本字符串的预先建立的类的签名而自动从电子文档中提取的。在处理和分类新电子文档中的文本字符串时，通过实时实时自动扩展训练和测试样本的语料库，进一步迭代地改进了分类器。

著录项

公开/公告号US2018365322A1

专利类型
公开/公告日2018-12-20

原文格式PDF
申请/专利权人 ACCENTURE GLOBAL SOLUTIONS LIMITED;
展开▼

申请/专利号US201815977665
发明设计人 FANG HOU;YIKAI WU;XIAOPEI CHENG;SIFEI DING;
展开▼

申请日2018-05-11
分类号G06F17/30;G06F15/18;G06K9/62;
国家 US
入库时间 2022-08-21 12:09:46

相似文献

专利
外文文献
中文文献