【24h】

Towards Lossless Encoding of Sentences

机译:走向句子的无损编码

获取原文

摘要

A lot of work has been done in the field of image compression via machine learning, but not much attention has been given to the compression of natural language. Compressing text into lossless representations while making features easily retrievable is not a trivial task, yet has huge benefits. Most methods designed to produce feature rich sentence embeddings focus solely on performing well on downstream tasks and are unable to properly reconstruct the original sequence from the learned embedding. In this work, we propose a near lossless method for encoding long sequences of texts as well as all of their sub-sequences into feature rich representations. We test our method on sentiment analysis and show good performance across all sub-sentence and sentence embeddings.
机译:通过机器学习在图像压缩领域已经做了很多工作,但是对自然语言的压缩却没有给予太多的关注。将文本压缩为无损表示形式,同时使特征易于检索是一项艰巨的任务,但是却具有巨大的优势。大多数设计用于生成功能丰富的句子嵌入的方法仅专注于在下游任务上执行良好,而无法从学习的嵌入中适当地重建原始序列。在这项工作中,我们提出了一种将文本的长序列及其所有子序列编码为功能丰富的表示形式的近乎无损的方法。我们在情感分析上测试了我们的方法,并在所有子句和句子嵌入中显示了良好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号