首页> 外国专利> A Method for the N-gram Language Modeling Based on Keyword

A Method for the N-gram Language Modeling Based on Keyword

机译:基于关键词的N元语法建模方法

摘要

PURPOSE: A method for constructing a keyword-based N-gram language model is provided to define a part of speech necessary for meaning transmission as a keyword part of speech, extract a corpus composed as the keyword part of speech from a large-scale corpus, and construct the keyword-based N-gram language model. CONSTITUTION: A text corpus is preprocessed to include only Hangul characters(S201). A part tagging of speech is performed with respect to a morpheme composing a paragraph of the preprocessed text corpus, and the morpheme is analyzed(S202). The morpheme is merged by a pseudo-morpheme unit(S203). A sentence corpus composed as a keyword part of speech is extracted from the processed text corpus(S204). A keyword vocabulary dictionary is written using the extracted sentence corpus(S205). A keyword-based N-gram language model and a keyword pronunciation dictionary are constructed through the keyword vocabulary dictionary(S206,S207).
机译:目的:提供一种用于构建基于关键字的N元语法模型的方法,该方法用于将含义传输所需的语音部分定义为语音的关键词部分,从大规模语料库中提取构成语音的关键词部分的语料,并构建基于关键字的N-gram语言模型。构成:文本语料库经过预处理,仅包含韩文字符(S201)。对组成预处理文本语料库的段落的词素执行语音的部分标记,并且分析词素(S202)。语素由伪语素单元合并(S203)。从处理后的文本语料库中提取构成作为语音的关键词部分的句子语料库(S204)。使用所提取的句子语料库来编写关键词词汇词典(S205)。通过关键词词汇词典构建了基于关键词的N元语法模型和关键词发音词典(S206,S207)。

著录项

  • 公开/公告号KR100474359B1

    专利类型

  • 公开/公告日2005-03-10

    原文格式PDF

  • 申请/专利权人

    申请/专利号KR20020079354

  • 发明设计人 김현숙;정의정;전형배;이영직;

    申请日2002-12-12

  • 分类号G10L15/02;

  • 国家 KR

  • 入库时间 2022-08-21 22:04:05

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号