Novel statistical toolkit for large corpus processing and language model building

机译：大型语料库处理和语言模型建筑的新型统计工具包

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we have presented a series of algorithms and tools to process the large text corpus for building high performance statistical language model. Our purpose is that raw corpus as our input, the high accuracy and robust topic dependent language models can be got automatically. All the tools are based on three kernel technologies, which are developed by us. They are lexicons with tree structure, fuzzy training subset and topic change detection of text based on neural network.

机译：在本文中，我们介绍了一系列算法和工具来处理大型文本语料库，用于构建高性能统计语言模型。我们的目的是，原始语料库作为我们的输入，可以自动获得高精度和强大的主题依赖语言模型。所有工具均基于我们开发的三种内核技术。它们是具有树结构的词典，基于神经网络的文本的模糊训练子集和主题变更检测。

著录项

来源
《Natural language processing Pacific Rim symposium》|1999年||共6页
会议地点
作者
Langzhou Chen; Taiyi Huang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类机器翻译;
关键词

相似文献

外文文献
中文文献
专利

1. Building Statistical Language Models for Persian Continuous Speech Recognition Systems Using the Peykare Corpus [J] . Mohammad Bahrani, Hossein Sameti International journal of computer processing of languages . 2011,第1期

机译：使用Peykare语料库为波斯语连续语音识别系统建立统计语言模型
2. Robust Language Modeling for a Small Corpus of Target Tasks Using Class-Combined Word Statistics and Selective Use of a General Corpus [J] . Yosuke Wada, Norihiko Kobayashi, Tetsunori Kobayashi Systems and Computers in Japan . 2003,第12期

机译：使用类组合词统计和通用语料库的选择性使用，对目标任务的小型语料库进行稳健的语言建模
3. Statistical analysis of orthographic and phonemic language corpus for word-based and phoneme-based Polish language modelling [J] . Piotr K?osowski EURASIP journal on audio, speech, and music processing . 2017,第1期

机译：基于单词和音素的波兰语语言建模的正字法和音位语料库的统计分析
4. Novel statistical toolkit for large corpus processing and language model building [C] . Langzhou Chen, Taiyi Huang Natural language processing Pacific Rim symposium . 1999

机译：大型语料库处理和语言模型建筑的新型统计工具包
5. Corpus patterns and elicited language: Implications for language storage and processing. [D] . Nordquist, Dawn. 2006

机译：语料库模式和引发的语言：对语言存储和处理的影响。
6. CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines [O] . Ergin Soysal, Jingqi Wang, Min Jiang, -1

机译：CLAMP - 一种用于有效构建定制临床自然语言处理管道的工具包
7. CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines [O] . Ergin Soysal, Jingqi Wang, Min Jiang, 2017

机译：CLAMP - 一种用于有效构建定制临床自然语言处理管道的工具包

Novel statistical toolkit for large corpus processing and language model building

摘要

著录项

相似文献

相关主题

期刊订阅