首页> 外国专利> DOMAIN SPECIFIC NATURAL LANGUAGE NORMALIZATION

DOMAIN SPECIFIC NATURAL LANGUAGE NORMALIZATION

机译:领域特定的自然语言标准化

摘要

Embodiments of the present invention provide a method, system and computer program product for the domain specific normalization of a corpus of text. In an embodiment of the invention, a method for domain specific normalization of a corpus of text is provided, including an industrial, organization, demographic or geographic domain. The method includes loading a corpus of text in memory of a computer and determining a domain for the corpus of text. The method also includes retrieving a lexicon of replacement words for the determined domain. Finally, the method includes text simplifying the corpus of text using the retrieved lexicon. In one aspect of the embodiment, the domain is determined through inference based upon words already presence in the corpus of text. In another aspect of the embodiment, the domain is determined based upon meta-data provided with the corpus of text.
机译:本发明的实施例提供了一种用于文本语料库的域特定规范化的方法,系统和计算机程序产品。在本发明的实施例中,提供了一种用于文本语料库的域特定规范化的方法,包括工业,组织,人口统计或地理域。该方法包括将文本语料库加载到计算机的存储器中并确定文本语料库的域。该方法还包括为所确定的域检索替换词的词典。最后,该方法包括使用检索到的词典简化文本语料库的文本。在实施例的一方面中,通过基于文本语料库中已经存在的单词通过推断来确定域。在该实施例的另一方面,基于文本语料库提供的元数据来确定域。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号