首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Effective Subword Segmentation for Text Comprehension
【24h】

Effective Subword Segmentation for Text Comprehension

机译:用于文本理解的有效子词分段

获取原文
获取原文并翻译 | 示例

摘要

Representation learning is the foundation of machine reading comprehension and inference. In state-of-the-art models, character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or complex words. However, character itself is not a natural minimal linguistic unit for representation or word embedding composing due to ignoring the linguistic coherence of consecutive characters inside word. This paper presents a general subword-augmented embedding framework for learning and composing computationally derived subword-level representations. We survey a series of unsupervised segmentation methods for subword acquisition and different subword-augmented strategies for text understanding, showing that subword-augmented embedding significantly improves our baselines in various types of text understanding tasks on both English and Chinese benchmarks.
机译:表征学习是机器阅读理解和推理的基础。在最新模型中,字符级表示已被广泛采用,以缓解有效表示稀有或复杂单词的问题。但是,由于忽略单词内部连续字符的语言连贯性,字符本身并不是用于表示或词嵌入的自然最小语言单元。本文提出了一种通用的子词增强嵌入框架,用于学习和组合计算得出的子词级表示形式。我们调查了用于子词获取的一系列无监督分割方法以及用于文本理解的不同子词增强策略,这些结果表明,在中英文基准上,子词增强嵌入大大改善了我们在各种类型的文本理解任务中的基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号