首页> 外国专利> Method for building linguistic models from a corpus

Method for building linguistic models from a corpus

机译:从语料库构建语言模型的方法

摘要

A method iteratively integrates clustering techniques with phrase acquisition techniques to build complex linguistic models from a corpus. A set of features is initialized by the corpus. Thereafter, the method determines, according to a predetermined cost function, to process the features by one of phrase clustering processing or phrase grammar learning processing. If phrase clustering processing is performed, the method processes an interstitial set of features comprising both the old features and newly established clusters by phrase grammar learning processing. The features obtained as an output of phrase grammar learning is re-indexed as a set of features for a subsequent iteration. The method may be repeated over several iterations to build a hierarchical linguistic model.
机译:一种将聚类技术与短语获取技术迭代集成的方法,可以从语料库构建复杂的语言模型。一组功能由语料库初始化。此后,该方法根据预定的成本函数确定通过短语聚类处理或短语语法学习处理之一来处理特征。如果执行短语聚类处理,则该方法通过短语语法学习处理来处理包括旧特征和新建立的聚类的特征的间隙集。作为短语语法学习输出获得的特征被重新索引为一组特征,用于后续迭代。可以在几次迭代中重复该方法,以建立分层的语言模型。

著录项

  • 公开/公告号US6415248B1

    专利类型

  • 公开/公告日2002-07-02

    原文格式PDF

  • 申请/专利权人 AT&T CORP.;

    申请/专利号US19990443891

  • 发明设计人 SRINIVAS BANGALORE;GIUSEPPE RICCARDI;

    申请日1999-11-19

  • 分类号G06F172/00;G10L130/00;

  • 国家 US

  • 入库时间 2022-08-22 00:47:17

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号