首页> 外国专利> Language Agnostic Machine Learning Model for Title Standardization

Language Agnostic Machine Learning Model for Title Standardization

机译:用于标题标准化的语言不可知机器学习模型

摘要

In an example embodiment, a system is provided whereby a machine learning model is trained to predict a standardization for a given raw title. A neural network may be trained whose input is a raw title (such as a query string) and a list of candidate titles (either title identifications in a taxonomy, or English strings), which produces a probability that the raw title and each candidate belong to the same title. The model is able to standardize titles in any language included in the training data without first having to perform language identification or normalization of the title. Additionally, the model is able to benefit from the existence of “loan words” (words adopted from a foreign language with little or no modification) and relations between languages.
机译:在示例实施例中,提供了一种系统,借此训练机器学习模型以预测给定原始标题的标准化。可以训练一个神经网络,其输入是原始标题(例如查询字符串)和候选标题列表(分类法中的标题标识或英语字符串),从而产生原始标题和每个候选者所属的概率相同的标题。该模型能够以训练数据中包括的任何语言标准化标题,而无需首先执行语言识别或标题标准化。此外,该模型还可以受益于“外来语”(外来语采用的单词,几乎没有或没有任何修改)和语言之间的关系。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号