首页> 外国专利> Document relevancy analysis within machine learning systems including determining closest cosine distances of training examples

Document relevancy analysis within machine learning systems including determining closest cosine distances of training examples

机译:机器学习系统中的文档相关性分析,包括确定训练示例的最接近余弦距离

摘要

Systems and methods that quantify document relevance for a document relative to a training corpus and select a best match or best matches are provided herein. Methods may include generating an example-based explanation for relevancy of a document to a training corpus by executing a support vector machine classifier, the support vector machine classifier performing a centroid classification of a relevant document in a term frequency-inverse document frequency features space relative to training examples in a training corpus, and generating an example-based explanation by selecting a best match for the relevant document from the training examples based upon the centroid classification. Determining the training example having the closest cosine distance to the relevant document includes ranking the training examples by stretching the internal best match scores for the training examples linearly to cover a complete unit interval.
机译:本文提供量化与训练语料有关的文档的文档相关性并选择一个或多个最佳匹配的系统和方法。方法可以包括通过执行支持向量机分类器来生成基于文档的与训练语料库相关性的基于示例的解释,该支持向量机分类器在相对于词频的文档中对相关文档进行质心分类。训练语料库中的训练示例,并通过基于质心分类从训练示例中选择相关文档的最佳匹配来生成基于示例的解释。确定具有与相关文档最接近的余弦距离的训练示例包括通过线性拉伸训练示例的内部最佳匹配分数以覆盖整个单位间隔来对训练示例进行排名。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号