首页> 外国专利> Language Agnostic Machine Learning Model for Title Standardization

Language Agnostic Machine Learning Model for Title Standardization

机译：用于标题标准化的语言不可知机器学习模型

页面导航

摘要
著录项
相似文献

摘要

In an example embodiment, a system is provided whereby a machine learning model is trained to predict a standardization for a given raw title. A neural network may be trained whose input is a raw title (such as a query string) and a list of candidate titles (either title identifications in a taxonomy, or English strings), which produces a probability that the raw title and each candidate belong to the same title. The model is able to standardize titles in any language included in the training data without first having to perform language identification or normalization of the title. Additionally, the model is able to benefit from the existence of “loan words” (words adopted from a foreign language with little or no modification) and relations between languages.

机译：在示例实施例中，提供了一种系统，借此训练机器学习模型以预测给定原始标题的标准化。可以训练一个神经网络，其输入是原始标题（例如查询字符串）和候选标题列表（分类法中的标题标识或英语字符串），从而产生原始标题和每个候选者所属的概率相同的标题。该模型能够以训练数据中包括的任何语言标准化标题，而无需首先执行语言识别或标题标准化。此外，该模型还可以受益于“外来语”（外来语采用的单词，几乎没有或没有任何修改）和语言之间的关系。

著录项

公开/公告号US2020097812A1

专利类型
公开/公告日2020-03-26

原文格式PDF
申请/专利权人 MICROSOFT TECHNOLOGY LICENSING LLC;
展开▼

申请/专利号US201816142441
发明设计人 SEBASTIAN ALEXANDER CSAR;URI MERHAV;DAN SHACHAM;
展开▼

申请日2018-09-26
分类号G06N3/08;G06Q10/06;G06F17/30;
国家 US
入库时间 2022-08-21 11:21:35

相似文献

专利
外文文献
中文文献