首页> 外国专利> Identifying a stale data source to improve NLP accuracy

Identifying a stale data source to improve NLP accuracy

机译:识别陈旧的数据源以提高NLP准确性

摘要

In some NLP systems, queries are compared to different data sources stored in a corpus to provide an answer to the query. However, the best data sources for answering the query may not currently be contained within the corpus or the data sources in the corpus may contain stale data that provides an inaccurate answer. When receiving a query, the NLP system may evaluate the query to identify a data source that is likely to contain an answer to the query. If the data source is not currently contained within the corpus, the NLP system may ingest the data source. If the data source is already within the corpus, however, the NLP may determine a time-sensitivity value associated with at least some portion of the query. This value may then be used to determine whether the data source should be re-ingested—e.g., the information contained in the corpus is stale.
机译:在某些NLP系统中,将查询与语料库中存储的不同数据源进行比较,以提供查询答案。但是,用于回答查询的最佳数据源当前可能不包含在语料库中,或者语料库中的数据源可能包含提供错误答案的陈旧数据。当接收到查询时,NLP系统可以评估查询以标识可能包含查询答案的数据源。如果数据源当前不包含在语料库中,则NLP系统可以提取数据源。但是,如果数据源已经在语料库中,则NLP可以确定与查询的至少一部分相关联的时间敏感度值。然后可以使用该值来确定是否应重新摄取数据源,例如,语料库中包含的信息是陈旧的。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号