首页> 外国专利> DOMAIN SPECIFIC NATURAL LANGUAGE NORMALIZATION

DOMAIN SPECIFIC NATURAL LANGUAGE NORMALIZATION

机译:领域特定的自然语言标准化

摘要

A method for the domain specific normalization of a corpus of text including an industrial, organization, demographic or geographic domain. The method includes loading a corpus of text in a memory 310 of a computer and determining a domain for the corpus of text 320. The method also includes retrieving a lexicon of replacement words 330 for the determined domain. The method includes text simplifying the corpus of text using the retrieved catalogue of words 340. The domain may be determined through inference based upon words already present in the corpus of text. The domain may also be determined based upon meta-data provided. The list of replacement terms may be a set of source terms which can be mapped to one of a multiple different replacement terms which have a complexity value aligned with an average complexity score for the multiple different replacement terms.
机译:一种用于文本语料库的领域特定规范化的方法,包括工业,组织,人口统计或地理领域。该方法包括将文本语料库加载到计算机的存储器310中,并且确定文本语料库320的域。该方法还包括针对所确定的域检索替换词的词典330。该方法包括使用所检索的单词目录340来简化文本的语料库。可以基于已经在文本语料库中存在的单词通过推断来确定域。也可以基于提供的元数据来确定域。替换项列表可以是一组源项,可以将其映射到多个不同替换项中的一个,其复杂度值与多个不同替换项的平均复杂度得分一致。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号