首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin Using Recursive Neural Networks
【24h】

Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin Using Recursive Neural Networks

机译:分层字符嵌入:使用递归神经网络的逻辑原产语言学习语言和语义表示

获取原文
获取原文并翻译 | 示例
           

摘要

Logographs (Chinese characters) have recursive structures (i.e. hierarchies of sub-units in logographs) that contain phonological and semantic information, as developmental psychology literature suggests that native speakers leverage on the structures to learn how to read. Exploiting these structures could potentially lead to better embeddings that can benefit many downstream tasks. We propose building hierarchical logograph (character) embeddings from logograph recursive structures using treeLSTM, a recursive neural network. Using recursive neural network imposes a prior on the mapping from logographs to embeddings since the network must read in the sub-units in logographs according to the order specified by the recursive structures. Based on human behavior in language learning and reading, we hypothesize that modeling logographs’ structures using recursive neural network should be beneficial. To verify this claim, we consider two tasks (1) predicting logographs’ Cantonese pronunciation from logographic structures and (2) language modeling. Empirical results show that the proposed hierarchical embeddings outperform baseline approaches. Diagnostic analysis suggests that hierarchical embeddings constructed using treeLSTM is less sensitive to distractors, thus is more robust, especially on complex logographs.
机译:Logographes(汉字)具有递归结构(即Logographes中的子单元的层次结构),其包含语音和语义信息,因为发动心理学文献表明,母语扬声器利用结构杠杆学习如何阅读。利用这些结构可能导致更好的嵌入,可以使许多下游任务有益。我们建议使用递归神经网络的登录递归结构构建分层登录(字符)嵌入式。使用递归神经网络在映射上施加到映射到嵌入的映射,因为网络必须根据递归结构指定的顺序在Logography中的子单元中读取。基于语言学习和阅读中的人类行为,我们假设使用递归神经网络建模的逻辑结构的结构应该是有益的。为了验证这一索赔,我们考虑了两个任务(1)预测登录的粤语来自逻辑结构的发音和(2)语言建模。经验结果表明,建议的分层嵌入式优于基线方法。诊断分析表明,使用触发器构建的分层嵌入对分散的组织不太敏感,因此更强大,尤其是在复杂的上记录中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号