首页> 外文会议>Natural language processing Pacific Rim symposium >Novel statistical toolkit for large corpus processing and language model building
【24h】

Novel statistical toolkit for large corpus processing and language model building

机译:大型语料库处理和语言模型建筑的新型统计工具包

获取原文

摘要

In this paper, we have presented a series of algorithms and tools to process the large text corpus for building high performance statistical language model. Our purpose is that raw corpus as our input, the high accuracy and robust topic dependent language models can be got automatically. All the tools are based on three kernel technologies, which are developed by us. They are lexicons with tree structure, fuzzy training subset and topic change detection of text based on neural network.
机译:在本文中,我们介绍了一系列算法和工具来处理大型文本语料库,用于构建高性能统计语言模型。我们的目的是,原始语料库作为我们的输入,可以自动获得高精度和强大的主题依赖语言模型。所有工具均基于我们开发的三种内核技术。它们是具有树结构的词典,基于神经网络的文本的模糊训练子集和主题变更检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号