首页> 外国专利> Systems and methods for generating training documents used by classification algorithms

Systems and methods for generating training documents used by classification algorithms

机译:用于生成分类算法使用的培训文档的系统和方法

摘要

The disclosed computer-implemented method for generating training documents used by classification algorithms may include (i) identifying a set of training documents used by a classification system to classify documents written in a first language, (ii) generating a list of tokens from within the training documents that indicate critical terms representative of classes defined by the classification system, (iii) translating the list of tokens from the first language to a second language, (iv) creating, based on the translated tokens, a set of simulated training documents that enables the classification system to classify documents written in the second language, and (v) classifying an additional document written in the second language based on the set of simulated training documents. Various other methods, systems, and computer-readable media are also disclosed.
机译:所公开的用于生成由分类算法使用的训练文档的计算机实现的方法可以包括:(i)识别由分类系统使用的一组训练文档,以对以第一语言编写的文档进行分类;(ii)从标记语言内生成令牌列表。指示关键术语的培训文档,这些术语代表分类系统定义的类别,(iii)将标记列表从第一语言翻译为第二语言,(iv)根据翻译后的标记创建一组模拟的训练文档,使分类系统能够对用第二种语言编写的文档进行分类,以及(v)根据一组模拟培训文档对用第二种语言编写的其他文档进行分类。还公开了各种其他方法,系统和计算机可读介质。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号