首页> 外文会议>Asian Language Processing, 2009. IALP '09 >Semi-supervised Learning of Domain-Specific Language Models from General Domain Data

【24h】

Semi-supervised Learning of Domain-Specific Language Models from General Domain Data

机译：从通用领域数据半监督学习领域特定语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a semi-supervised learning method for building domain-specific language models (LM) from general-domain data. This method is aimed to use small amount of domain-specific data as seeds to tap domain-specific resources residing in larger amount of general-domain data with the help of topic modeling technologies. The proposed algorithm first performs topic decomposition (TD) on the combined dataset of domain-specific and general-domain data using probabilistic latent semantic analysis (PLSA). Then it derives domain-specific word n-gram counts with mixture modeling scheme of PLSA. Finally, it uses traditional n-gram modeling approach to construct domain-specific LMs from the domain-specific word n-gram counts. Experimental results show that this approach can outperform both stat-of-the-art methods and the simulated supervised learning method with our data sets. In particular, the semi-supervised learning method can achieve better performance even with very small amount of domain-specific data.

机译：我们提出了一种从通用域数据中构建域特定语言模型（LM）的半监督学习方法。此方法旨在使用少量领域特定数据作为种子，借助主题建模技术来挖掘驻留在大量通用域数据中的领域特定资源。所提出的算法首先使用概率潜在语义分析（PLSA）对特定领域数据和一般领域数据的组合数据集执行主题分解（TD）。然后利用PLSA的混合建模方案得出特定领域的词n-gram计数。最后，它使用传统的n-gram建模方法从特定于域的单词n-gram计数构造特定于域的LM。实验结果表明，采用我们的数据集，该方法可以胜过最新方法和模拟监督学习方法。特别是，即使使用非常少量的特定领域数据，半监督学习方法也可以实现更好的性能。

著录项

来源
《Asian Language Processing, 2009. IALP '09 》|2009年|273-279|共7页
会议地点 Singapore(SG);Singapore(SG)
作者
Bai Shuanhu; Zhang Min; Li Haizhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
language model; semi-supervised learning; topic model;

机译：语言模型半监督学习主题模型;

相似文献

外文文献
中文文献
专利

1. ModeL4CEP: Graphical domain-specific modeling languages for CEP domains and event patterns [J] . Boubeta-Puig Juan, Ortiz Guadalupe, Medina-Bulo Inmaculada Expert Systems with Application . 2015 ,第21期

机译：ModeL4CEP：针对CEP域和事件模式的图形化域特定建模语言
2. Classification of heterogeneous text data for robust domain-specific language modeling [J] . Ján Sta?, Jozef Juhár, Daniel Hládek EURASIP journal on audio, speech, and music processing . 2014 ,第1期

机译：异类文本数据的分类，以实现强大的领域特定语言建模
3. Development of data acquisition systems by using a domain-specific modeling language [J] . Tomaz Kos, Tomaz Kosar, Marjan Mernik Computers in Industry . 2012 ,第3期

机译：通过使用特定领域的建模语言开发数据采集系统
4. Semi-Supervised Learning of Domain-Specific Language Models from General Domain Data [C] . Shuanhu Bai, Min Zhang, Haizhou LI International Conference on Asian Language Processing . 2009

机译：来自常规域数据的半监督域特定语言模型的学习
5. Domain-Specific Languages for Ad Hoc Data Processing [D] . DiLorenzo, Jonathan. 2020

机译：特定于域的特定语言，用于ad hoc数据处理
6. Visual Sequence Learning in Infancy: Domain-General and Domain-Specific Associations with Language [O] . -1

机译：婴儿期的视觉序列学习：具有语言的域一般和特定于域的关联
7. Leveraging Web 2.0 data for scalable semi-supervised learning of domain-specific sentiment lexicons [O] . Lau Raymond, Lai Chun-Lam, Bruza Peter D., 2011

机译：利用Web 2.0数据对特定领域的情感词典进行可扩展的半监督学习
8. Using Domain-Specific Languages to Improve the Scale and Integration of Cognitive Models. [R] . Douglass, S. A., Mittal, S. 2011

机译：使用领域特定语言来改善认知模型的规模和整合。

Semi-supervised Learning of Domain-Specific Language Models from General Domain Data

摘要

著录项

相似文献

相关主题

期刊订阅