Domain adaption based on lda and word embedding in SMT

机译：SMT中基于lda和词嵌入的领域自适应

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Current methods about domain adaption in SMT mostly assume that a small in-domain sample is need at training time. However, the fact target domain may not be known at training time so that it may not satisfy the fact translation or is far away from user needs. We instead propose a more suitable method to avoid this situation. Our methods mainly contain two sections (1) Firstly, we use word embedding and LDA model to divide the training corpus into some similar semantic subdomains. (2) Secondly, for an actual source sentences we can select a more suitable translation system by semantic clues. We implement experiments on two language pairs. We can observe consistent improvements over three baselines.

机译：当前有关SMT中域适应的方法主要假设在训练时需要一个较小的域内样本。但是，事实目标域在训练时可能未知，因此它可能无法满足事实转换或远离用户需求。相反，我们提出了一种更合适的方法来避免这种情况。我们的方法主要包括两个部分（1）首先，我们使用词嵌入和LDA模型将训练语料库划分为一些相似的语义子域。（2）其次，对于实际的源句子，我们可以通过语义线索选择更合适的翻译系统。我们在两种语言对上进行实验。我们可以观察到在三个基准上的持续改进。

著录项

来源
《International conference on Asian language processing》|2017年|123-126|共4页
会议地点 Singapore(SG)
作者
Shaolin Zhu; Yating Yang; Xiao Li; Tonghai Jiang; Lei Wang; Xi Zhou; Chenggang Mi;
展开▼
作者单位

The Xinjiang Technical Institute of Physics Chemistry Chinese Academy of Sciences Urumqi China University of Chinese Academy of Sciences Beijing China Key laboratory of speech language information processing of Xinjiang Urumqi China;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
1/f noise;

机译：1 / f噪音;

相似文献

外文文献
中文文献
专利

1. Detecting new Chinese words from massive domain texts with word embedding [J] . Qian Yu, Du Yang, Deng Xiongwen, Journal of Information Science . 2019,第2期

机译：通过单词嵌入从大量领域文本中检测新的中文单词
2. Out-domain Chinese new word detection with statistics-based character embedding [J] . Liang Yuzhi, Yang Min, Zhu Jia, Natural language engineering . 2019,第PTa2期

机译：基于统计字符嵌入的域外中文新词检测
3. Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases [J] . Zhiwei Chen, Zhe He, Xiuwen Liu, BMC Medical Informatics and Decision Making . 2018,第2期

机译：利用生物医学和通用领域知识库评估神经词嵌入中的语义关系
4. Domain adaption based on lda and word embedding in SMT [C] . Shaolin Zhu, Yating Yang, Xiao Li, International Conference on Asian Language Processing . 2017

机译：基于LDA和SMT中的Word嵌入的域适应
5. SMT-Based and Disjunctive Relational Abstract Domains for Static Analysis [D] . Chen, Junjie 2015

机译：基于SMT的析取关系抽象域用于静态分析
6. Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes [O] . Yaoyun Zhang, Hee-Jin Li, Jingqi Wang, 2018

机译：将多个领域的词嵌入改编为精神病学笔记的症状识别
7. Domain Adapted Word Embeddings for Improved Sentiment Classification [O] . Prathusha Kameswara Sarma, Yingyu Liang, Bill Sethares 2018

机译：域改进的单词嵌入式改善情绪分类

Domain adaption based on lda and word embedding in SMT

摘要

著录项

相似文献

相关主题

期刊订阅