一种基于数据分布感知的全整型量化训练框架
DOI:
CSTR:
作者:
作者单位:

天津大学 微电子学院

作者简介:

通讯作者:

中图分类号:

TP391???????

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


A data distribution-aware full integer quantization training framework
Author:
Affiliation:

School of Microelectronics,Tianjin University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为提升边缘设备上深度神经网络的训练速度、降低资源消耗,量化训练方法被广泛研究。相比浮点数或混合精度方案,全整型量化因其硬件适配性强和计算效率高的优势,在边缘训练场景下具有巨大潜力。但是,传统整型量化难以应对训练过程中数据分布的动态变化,导致显著的模型准确率损失。为了解决这个问题,提出一种基于数据分布感知的全整型量化训练框架,采用分段量化方法精细处理长尾数据分布,同时引入自适应搜索方法,根据数据分布动态调整量化参数。该框架在多个数据集上训练典型的深度神经网络模型ResNet的实验结果表明,与浮点数训练相比,模型准确率损失不超过2.44%;与现有的整型训练方法相比,准确率损失最多减少了90.61%。此外,将框架在FPGA上进行了部署,实验结果显示,与基于浮点数的训练框架相比,分别节省了27%的存储资源、53%的DSP计算资源和53%的执行时间。

    Abstract:

    To improve training speed of deep neural networks and reduce resource consumption on edge devices, quantization training methods have been extensively studied. Compared to floating-point or mixed-precision approaches, full integer quantization offers significant potential in edge training scenarios due to its strong hardware compatibility and high computational efficiency. However, conventional integer quantization struggles to adapt to the dynamic changes of data distributions during training, often leading to significant accuracy loss. To address this issue, a data distribution-aware full integer quantization training framework is proposed, which employs a piecewise quantization method to accurately handle long-tailed data distributions and incorporates an adaptive search method to dynamically adjust quantization parameters based on data distributions. Experimental results for training ResNet models on multiple datasets show that the accuracy loss is no more than 2.44% compared to the floating-point training. Compared to the existing integer training methods, the proposed framework reduces the accuracy loss by up to 90.61%. Furthermore, the framework is deployed on an FPGA, and experimental results demonstrate that, compared to the floating-point training framework, the proposed framework saves 27% of memory resources, 53% of DSP computation resources, and reduces execution time by 53%.

    参考文献
    相似文献
    引证文献
引用本文
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-04-24
  • 最后修改日期:2025-04-28
  • 录用日期:2025-05-23
  • 在线发布日期:
  • 出版日期:
文章二维码