首页> 外文会议>SIGMORPHON workshop on computational research in phonetics phonology, and morphology >Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?
【24h】

Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

机译:让生活变得##:单词片断对复杂形态的建模是否足够?

获取原文

摘要

This work investigates the most basic units that underlie contextualized word embeddings, such as BERT - the so-called word pieces. In Morphologically-Rich Languages (MRLs) which exhibit morphological fusion and non-concatenative morphology, the different units of meaning within a word may be fused, intertwined, and cannot be separated linearly. Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance. Here we empirically examine the capacity of word-pieces to capture morphology by investigating the task of multi-tagging in Hebrew, as a proxy to evaluating the underlying segmentation. Our results show that, while models trained to predict multi-tags for complete words outperform models tuned to predict the distinct tags of WPs, we can improve the WPs tag prediction by purposefully constraining the word-pieces to reflect their internal functions. We conjecture that this is due to the naive linear to-kenization of words into word-pieces, and suggest that linguistically-informed word-pieces schemes, that make morphological knowledge explicit, might boost performance for MRLs.
机译:这项工作研究了上下文化词嵌入基础的最基本单元,例如BERT-所谓的词片。在呈现形态融合和非连接形态的形态丰富语言(MRL)中,单词内的不同含义单元可能会融合,交织,并且无法线性分离。因此,在MRL中使用单词时,我们必须考虑:(1)线性分割为子单词单元可能无法捕获单词的全部形态复杂性; (2)无法获得子词单元形态学知识的表示形式可能会对性能产生负面影响。在这里,我们通过调查希伯来语中的多标签任务来评估基本分段的经验,从而实证研究单词捕获形态的能力。我们的结果表明,虽然训练用于预测完整单词的多标签的模型优于调整以预测WP的不同标签的模型,但我们可以通过有意地限制词片以反映其内部功能来改进WPs的标签预测。我们推测,这是由于单词天真地线性化为单词片段而导致的,并且建议使形态学信息明确的语言知悉的单词片段方案可能会提高MRL的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号