首页> 外国专利> Method for systematic mass normalization of titles

Method for systematic mass normalization of titles

机译:标题的系统质量标准化方法

摘要

A method for normalizing raw titles to canonical titles is described. The method includes designating a set of canonical titles, generating a set of n-grams for each canonical title, assigning a set of attributes to each n-gram, assigning a set of labels to each of the attributes, and storing the labeled canonical title and labeled n-grams in a database. In some examples, a new title may be mapped to an existing canonical title in the database by generating a set of n-grams for the new title, looking up the n-grams in the database of canonical titles, retrieving the set of labels assigned to n-grams in the database that match n-grams from the new title, and assigning those labels to the corresponding attributes of the new title. The new title may then be mapped to a canonical title on the basis of similarly labeled attributes.
机译:描述了一种将原始标题标准化为规范标题的方法。该方法包括指定一组规范标题,为每个规范标题生成一组n-gram,为每个n-gram分配一组属性,为每个属性分配一组标签以及存储标记的规范标题并在数据库中标记为n-gram。在某些示例中,可以通过为新标题生成一组n-gram,在规范标题数据库中查找n-gram,检索分配的标签集,将新标题映射到数据库中的现有规范标题。匹配数据库中与新标题中的n-gram匹配的n-gram,并将这些标签分配给新标题的相应属性。然后,可以基于类似标记的属性将新标题映射到规范标题。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号