首页> 外文期刊>IEEE computer architecture letters >HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization
【24h】

HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization

机译:HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization

获取原文
获取原文并翻译 | 示例
           

摘要

The recent advancement of the natural language processing (NLP) models is the result of the ever-increasing model size and datasets. Most of these modern NLP models adopt the Transformer based model architecture, whose main bottleneck is exhibited in the self-attention mechanism. As the computation required for self-attention increases rapidly as the model size gets larger, self-attentions have been the main challenge for deploying NLP models. Consequently, there are several prior works which sought to address this bottleneck, but most of them suffer from significant design overheads and additional training requirements. In this work, we propose HAMMER, hardware-friendly approximate computing solution for self-attentions employing mean-redistribution and linearization, which effectively increases the performance of self-attention mechanism with low overheads. Compared to previous state-of-the-art self-attention accelerators, HAMMER improves performance by $1.2-1.6times$ and energy efficiency by $1.2-1.5times$ .

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号