首页> 外国专利> Ingesting documents using multiple ingestion pipelines

Ingesting documents using multiple ingestion pipelines

机译:使用多个提取管道来提取文档

摘要

A primary ingestion pipeline configured for use in natural language processing includes annotators configured for annotating documents. The annotators and documents to be annotated are evaluated. Based on the evaluations, an ingestion risk score is generated for each document. Each ingestion risk score represents a likelihood that an associated document will not successfully be annotated by the annotators. Each ingestion risk score is compared to a set of risk criteria. Based on the comparisons, a determination is made that each document of a first set of documents satisfies the set of risk criteria. A further determination is made, based on the comparisons, that each document of a second set of documents does not satisfy the set of risk criteria. In response to these determinations, the first set of documents is entered into the primary ingestion pipeline and the second set of documents is provided special handling.
机译:配置为用于自然语言处理的主要摄取管道包括配置为对文档进行注释的注释器。评估注释者和要注释的文档。根据评估结果,为每个文档生成摄入风险评分。每个摄入风险评分代表注释者无法成功注释关联文档的可能性。将每个摄入风险评分与一组风险标准进行比较。基于比较,确定第一文档集合中的每个文档都满足该风险标准集合。基于比较,进一步确定第二组文档中的每个文档都不满足该组风险标准。响应于这些确定,将第一组文档输入到主要摄入管道中,并为第二组文档提供特殊处理。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号