首页> 外文会议>Australasian Joint Conference on Artificial Intelligence >A Generative Deep Learning for Generating Korean Abbreviations
【24h】

A Generative Deep Learning for Generating Korean Abbreviations

机译:生成韩国缩写的生成深度学习

获取原文

摘要

An abbreviation is a short form of a sequence of words or phrases. Abbreviations have been widely used as an efficient way of communicating within a human community, and nowadays they are used more widely and more often because electronic communications such as world wide web or twitter get available. One critical issue about abbreviations is that they are continuously generated whenever a new material such as a new novel or a TV drama is made. Therefore, a method to understand generation and detection of abbreviations is required for further processing of the abbreviations. The simple and well-known method for abbreviation generation is to use the rules that are well designed by human experts, but such rule-based methods are not appropriate for Korean abbreviations. This is due to two major reasons. The first is that Korean abbreviations are much irregularly generated compared to English ones, and thus the rules become too complex for managing all irregularities. The other is that many Korean abbreviations contain characters or syllables that do not appear at the original sequence of words due to a pronunciation issue. As a result, a great number of rules to generate new characters or syllables should be made, which makes the rule-based methods impractical. As a solution to this problem, this paper proposes a generative deep learning architecture to generate Korean abbreviations. The proposed architecture consists of two Long Short Term Memory (LSTM) networks, in which one LSTM encodes a variable-length source sequence into a fixed-length vector and the other LSTM decodes the vector into a variable-length target shorter sequence. According to our experiments on the Korean abbreviations set from National Institute of Korean Language, the proposed method achieves 21.4% of accuracy, which is 420% improved accuracy over a simple rule-based method. This result proves that the proposed method is effective in generating Korean abbreviations.
机译:缩写是一系列单词或短语的简短形式。缩写已被广泛用作人类社区沟通的有效方式,现在它们被更广泛使用,更频繁地使用,因为世界宽网络或Twitter等电子通信可用。关于缩写的一个关键问题是,每当新的材料如新的小说或电视剧之类的新材料时,它们被连续产生。因此,需要一种了解缩写的生成和检测的方法来进一步处理缩写。简单且众所周知的缩写生成方法是使用人类专家设计的规则,但这种基于规则的方法不适合韩国缩写。这是由于两个主要原因。第一个是与英语相比,韩国缩写是不规则的产生,因此规则对于管理所有违规行为来说太复杂了。另一个是许多韩语缩写包含由于发音问题而不是原始单词序列出现的字符或音节。因此,应该进行大量的要生成新字符或音节的规则,这使得基于规则的方法是不切实际的。作为解决此问题的解决方案,本文提出了一种生成的深度学习架构,可以生成韩国缩写。所提出的架构由两个长的短期存储器(LSTM)网络组成,其中一个LSTM将可变长度源序列编码为固定长度向量,另一个LSTM对向量进行解码成可变长度目标较短序列。根据我们对韩国朝鲜族韩国人研究所的实验,所提出的方法达到21.4%的准确性,这是一种在简单的规则的方法上提高了420%的准确性。结果证明,该方法在生成韩国缩写方面是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号