首页> 外文会议>Annual meeting of the Association for Computational Linguistics >From Characters to Words to in Between: Do We Capture Morphology?
【24h】

From Characters to Words to in Between: Do We Capture Morphology?

机译:从角色到介于之间的字符:我们捕获形态吗?

获取原文

摘要

Words can be represented by composing the representations of subword units such as word segments, characters, and/or character n-grams. While such representations are effective and may capture the morphological regularities of words, they have not been systematically compared, and it is not understood how they interact with different morphological typologies. On a language modeling task, we present experiments that systematically vary (1) the basic unit of representation, (2) the composition of these representations, and (3) the morphological typology of the language modeled. Our results extend previous findings that character representations are effective across typologies, and we find that a previously unstudied combination of character trigram representations composed with bi-LSTMs outperforms most others. But we also find room for improvement: none of the character-level models match the predictive accuracy of a model with access to true morphological analyses, even when learned from an order of magnitude more data.
机译:可以通过构图诸如单词段,字符和/或字符n-grams的子字单元的表示来表示单词。虽然这些代表性是有效的,并且可能捕获单词的形态规律,但它们尚未得到系统化,并且不理解它们如何与不同的形态学类型互动。在语言建模任务中,我们提出了系统地改变(1)基本单位的实验,(2)这些表示的组成,以及(3)所建模语言的形态学类型。我们的结果扩展了先前的结果,即字符表示在Typologies上有效,我们发现以前不属于与Bi-LSTMS组成的字符三元字符表示的结合优于大多数其他人。但我们还找到改进的空间:也没有一个字符级模型匹配模型的预测精度,即使从更多数量级的数据学到的数据中学到的时候,也可以获得真正的形态分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号