Unwritten languages demand attention too! Word discovery with encoder-decoder models

机译：书面语言也需要注意！使用编码器-解码器模型进行单词发现

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Word discovery is the task of extracting words from un-segmented text. In this paper we examine to what extent neural networks can be applied to this task in a realistic unwritten language scenario, where only small corpora and limited annotations are available. We investigate two scenarios: one with no supervision and another with limited supervision with access to the most frequent words. Obtained results show that it is possible to retrieve at least 27% of the gold standard vocabulary by training an encoder-decoder neural machine translation system with only 5,157 sentences. This result is close to those obtained with a task-specific Bayesian nonparametric model. Moreover, our approach has the advantage of generating translation alignments, which could be used to create a bilingual lexicon. As a future perspective, this approach is also well suited to work directly from speech.

机译：单词发现是从未分段的文本中提取单词的任务。在本文中，我们研究了在只有小型语料库和有限注解可用的现实的非书面语言场景中，神经网络可以在多大程度上应用于此任务。我们研究了两种情况：一种不受监督，另一种受监督少，可以访问最常见的单词。获得的结果表明，通过训练仅包含5157个句子的编码器-解码器神经机器翻译系统，可以检索至少27％的金标准词汇。该结果接近于使用特定于任务的贝叶斯非参数模型获得的结果。此外，我们的方法具有生成翻译比对的优势，可用于创建双语词典。从将来的角度来看，此方法也非常适合直接从语音工作。

著录项

来源
《2017 IEEE Automatic Speech Recognition and Understanding Workshop》|2017年|458-465|共8页
会议地点 Okinawa(JP)
作者
Marcely Zanon Boito; Alexandre Bérard; Aline Villavicencio; Laurent Besacier;
展开▼
作者单位

Laboratoire d'Informatique de Grenoble, Univ. Grenoble Alpes (UGA), France;

Laboratoire d'Informatique de Grenoble, Univ. Grenoble Alpes (UGA), France;

Institute of Informatics, UFRGS, Brazil;

Laboratoire d'Informatique de Grenoble, Univ. Grenoble Alpes (UGA), France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech; Task analysis; Training; Documentation; Computational modeling; Smoothing methods; Vocabulary;

机译：演讲;任务分析;培训;文档;计算模型;平滑方法;词汇;;

相似文献

外文文献
中文文献
专利

1. Translation of Natural Language Query Into Keyword ery Using a RNN Encoder-Decoder [J] . Hyun-Je Song, A-Yeong Kim, Seong-Bae Park ACM SIGIR FORUM . 2017,第cd期

机译：使用RNN编码器/解码器将自然语言查询转换为关键字
2. Visual attention to words of native versus later acquired languages: a magnetoencephalographic study in humans. [J] . Pihko E, Makinen V, Nikouline VV, Neuroscience Letters: An International Multidisciplinary Journal Devoted to the Rapid Publication of Basic Research in the Brain Sciences . 2001,第1期

机译：视觉注意母语与后来获得的语言：人类的磁脑图研究。
3. What do second language listeners know about spoken words? Effects of experience and attention in spoken word processing [J] . Trofimovich P Journal of psycholinguistic research . 2008,第5期

机译：第二语言的听众对口语有什么了解？经验和注意力在口语处理中的作用
4. Unwritten languages demand attention too! Word discovery with encoder-decoder models [C] . Marcely Zanon Boito, Alexandre Bérard, Aline Villavicencio, IEEE Workshop on Automatic Speech Recognition and Understanding . 2017

机译：不成文的语言也需要注意！用编码器解码器模型进行单词发现
5. Connecting Documents, Words, and Languages Using Topic Models [D] . Yang, Weiwei. 2019

机译：使用主题模型连接文档，单词和语言
6. Attention-Based Personalized Encoder-Decoder Model for Local Citation Recommendation [O] . Libin Yang, Zeqing Zhang, Xiaoyan Cai, 2019

机译：基于注意力的本地引用推荐个性化编解码器模型
7. Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models [O] . Zanon Boito, Marcely, Bérard, Alexandre, Villavicencio, Aline, 2017

机译：不成文的语言也需要注意！带有编码器-解码器模型的单词发现

Unwritten languages demand attention too! Word discovery with encoder-decoder models

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅