首页> 外文会议>2017 IEEE Automatic Speech Recognition and Understanding Workshop >Unwritten languages demand attention too! Word discovery with encoder-decoder models
【24h】

Unwritten languages demand attention too! Word discovery with encoder-decoder models

机译:书面语言也需要注意!使用编码器-解码器模型进行单词发现

获取原文
获取原文并翻译 | 示例

摘要

Word discovery is the task of extracting words from un-segmented text. In this paper we examine to what extent neural networks can be applied to this task in a realistic unwritten language scenario, where only small corpora and limited annotations are available. We investigate two scenarios: one with no supervision and another with limited supervision with access to the most frequent words. Obtained results show that it is possible to retrieve at least 27% of the gold standard vocabulary by training an encoder-decoder neural machine translation system with only 5,157 sentences. This result is close to those obtained with a task-specific Bayesian nonparametric model. Moreover, our approach has the advantage of generating translation alignments, which could be used to create a bilingual lexicon. As a future perspective, this approach is also well suited to work directly from speech.
机译:单词发现是从未分段的文本中提取单词的任务。在本文中,我们研究了在只有小型语料库和有限注解可用的现实的非书面语言场景中,神经网络可以在多大程度上应用于此任务。我们研究了两种情况:一种不受监督,另一种受监督少,可以访问最常见的单词。获得的结果表明,通过训练仅包含5157个句子的编码器-解码器神经机器翻译系统,可以检索至少27%的金标准词汇。该结果接近于使用特定于任务的贝叶斯非参数模型获得的结果。此外,我们的方法具有生成翻译比对的优势,可用于创建双语词典。从将来的角度来看,此方法也非常适合直接从语音工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号