首页> 外国专利> Ground Truth Generation for Machine Learning Based Quality Assessment of Corpora

Ground Truth Generation for Machine Learning Based Quality Assessment of Corpora

机译:基于机器学习的语料库质量评估的地面真理生成

摘要

A mechanism is provided in a computing device configured with instructions executing on a processor of the computing device to implement a ground truth generation system for quality assessment scoring of articles in a corpus. The ground truth generation system receives recommendations of a set of recommended articles from subject matter experts. The ground truth generation system identifies a set of non-recommended articles. A topic clustering component within the ground truth generation system performs topic clustering on a combination of the set of recommended articles and the set of non-recommended articles to form a set of topic clusters containing recommended articles and non-recommended articles. The ground truth generation system identifies a first number of recommended articles and a second number of non-recommended articles in each of the set of topic clusters to form a quality assessment training set. The mechanism trains a quality assessment machine learning model using the quality assessment training set.
机译:在计算设备中提供了一种机制,该机制配置有在计算设备的处理器上执行的指令,以实现用于对语料库中的物品进行质量评估评分的地面事实生成系统。地面真相生成系统从主题专家那里接收一组推荐文章的推荐。地面真相生成系统识别出一组不推荐的文章。地面事实生成系统中的主题聚类组件对一组推荐文章和一组非推荐文章进行组合,以形成一组包含推荐文章和非推荐文章的主题聚类。地面真相生成系统在一组主题集群中的每一个中标识第一批推荐文章和第二批非推荐文章,以形成质量评估培训集。该机制使用质量评估训练集来训练质量评估机器学习模型。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号