PTQvsTAQ
c1. Post-Training Quantization(PTQ)
ZeroQ, EasyQuant, LAPQ, ACIQ,
DFQ
c2. Training-Aware Quantization(TAQ)
S&Q
3. Binary, Ternary, 3-4-5-bit, Flexible
c1. Binary
c2. Ternary
c3. 3-4-5-bit
c4. Flexible
2020
ECCV2020
- PWLQ: Post-Training Piecewise Linear Quantization for Deep Neural Networks. Samsung
CVPR2020
- ZeroQ: A Novel Zero-Shot Quantization Framework. Berkeley, Peking University
- AdaBits: Neural Network Quantization with Adaptive Bit-Widths. ByteDance direct adaption, progressive training and joint training 三种方法来量化模型
- BiDet: An Efficient Binarized Object Detector. CVPR2020
- APQ: Joint Search for Network Architecture, Pruning and Quantization Policy. CVPR2020 MIT architecture, channel pruning, HAQ三者结合的NAS方法,可操作性确实更好,motivation也合适。
- IR-Net: Forward and Backward Information Retention for Accurate Binary Neural Networks. CVPR2020
ICLR2020
- LSQ: Learned Step Size Quantization ICLR2020
- Mixed Precision DNNs: All you need is a good parametrization ICLR2020 sony
- SAT: Rethinking neural network quantization. Scale-Adjusted Training ICLR2020 reject paper
- LLSQ: Learned Symmetric Quantization of Neural Networks for Low-precision Integer Hardware. ICLR2020 ICT
- HAWQv2: Hessian Aware trace-Weighted Quantization of Neural Networks
- SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency. IBM Watson Lab
- BBG: Balanced Binary Neural Networks with Gated Residual.
- EasyQuant: Post-training Quantization via Scale Optimization. 优化每层的输出的与余弦距离 code中给出了加速效果实验
- LAPQ: Loss Aware Post-training Quantization code intel AIPG ACIQ的进化版,方法更加简洁,实现更直接,效果也更加好
2019
ICLR2019
- ACIQ(pre): analytical clipping for integer quantization of neural networks. ICLR2019 reject intel AIPG
- Per-Tensor Fixed-point quantization of the back-propagation algorithm. ICLR2019
- RQ: Relaxed Quantization for discretized NNs. ICLR2019
NIPS2019
- ACIQ Post training 4-bit quantization of convolution networks for rapid-deployment. NIPS 2019 AIPG, Intel
ICCV2019
- DSQ: Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks. ICCV2019 SenseTime, Beihang
- DFQ: Data-Free Quantization through Weight Equalization and Bias Correction. Qualcomm 高通
CVPR
- FQN: Fully Quantized Network for Object Detection. CVPR2019
- QIL: Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss CVPR2019
Other
- SAWB: Accurate and efficient 2-bit quantized neural networks. sysml2019
- SQuantizer: Simultaneous Learning for Both Sparse and Low-precision Neural Networks. 2019 AIPG, Intel
- Distributed Low Precision Training Without Mixed Precision. Oxford snowcloud.ai
Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers. ICT Cambricon int8 for weights and activations, int16 for most of the gradients. 通过量化加快训练过程
WAGEUBN: Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers. 应该算是WAGE的进阶版了
2018
ICLR2018
- VNQ: Variational network quantization. ICLR2018
- WAGE: Training and Inference with Integers in Deep Neural Networks. ICLR2018 oral tsinghua 不仅量化了weight,activation还量化了error, gradient.
- Alternating multi-bit quantization for recurrent neural networks. ICLR2018 alibaba
- Mixed Precision Training. FP16 training ICLR2018 baidu
- Model Compression via distillation and quantization. ICLR2018 google
- Quantized back-propagation: training binarized neural networks with quantized gradients. ICLR2018
CVPR2018
- Clip-Q: Deep network compression learning by In-Parallel Pruning Quantization. CVPR2018 SFU quantization 与pruning同时进行,达到更优的压缩结果。先使用贝叶斯优化搜索 layer-wise的(p,q),然后在进行fine-turning,达到最大目的的压缩权重,没有涉及到激活值的量化和压缩
- ELQ: Explicit loss-error-aware quantization for low-bit deep neural networks. CVPR2018 intel tsinghua
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. CVPR2018 Google
- TSQ: two-step quantization for low-bit neural networks. CVPR2018
- SYQ: learning symmetric quantization for efficient deep neural networks. CVPR2018 xilinx
- Towards Effective Low-bitwidth Convolutional Neural Networks. CVPR2018
ECCV2018
- LQ-NETs: learned quantization for highly accurate and compact deep neural networks. ECCV2018 Microsoft
- Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved Representational capability and advanced training algorithm. ECCV2018 HKU
- V-Quant: Value-aware quantization for training and inference of neural networks. ECCV2018 facebook
NIPS2018
- Heterogeneous Bitwidth Binarization in Convolutional Neural Networks. NIPS2018 microsoft
- HAQ: Hardware-Aware automated quantization. NIPS workshop 2018 mit
- Scalable methods for 8-bits training of neural networks. NIPS2018 intel
AAAI2018
- From Hashing to CNNs: training Binary weights vis hashing. AAAI2018 nlpr
Other
- Synergy: Algorithm-hardware co-design for convnet accelerators on embedded FPGAs. 2018 UC Berkeley
- Efficient Non-uniform quantizer for quantized neural network targeting Re-configurable hardware. 2018
- HALP: High-Accuracy Low-Precision Training. 2018 stanford
- PACT: parameterized clipping activation for quantized neural networks. 2018 IBM
- QUENN: Quantization engine for low-power neural networks. CF18ACM
- UNIQ: Uniform noise injection for non-uniform quantization of neural networks. 2018
- Training competitive binary neural networks from scratch. 2018
- A white-paper: Quantizing deep convolutional networks for efficient inference. 2018 google
2015-2016
- Deep learning with limited numerical precision. 2015 IBM
- DoReFa-Net: Training low bit-width convolutional neural networks with low bit-width gradients. 2016
- BNN: Binarized Neural Networks. NIPS2016
- TWNs: Ternary weight networks. NIPS2016 ucas
- XNOR-Net: ImageNet Classification using binary convolutional neural networks. ECCV2016 washington
- Hardware-oriented approximation of convolutional neural networks. ICLR2016
- Quantized convolutional neural networks for mobile devices. CVPR2016 nlpr
2017
- Flexpoint: an adaptive numerical format for efficient training of deep neural networks. 2017 intel
- INQ: Incremental network quantization, towards lossless CNNs with low-precision weights. ICLR2017 intel labs china
- TTQ: Trained ternary quantization. ICLR2017 stanford
- WRPN: wide reduced-precision networks. 2017 Accelerator Architecture Lab, Intel
- HWGQ: Deep Learning with Low Precision by Half-wave Gaussian Quantization. CVPR2017
- A Survey of Model Compression and Acceleration for Deep Neural Networks. 2017
- LP-SGD Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent ISCA2017
- How to Train a Compact Binary Neural Network with High Accuracy? NLPR MicroSoft
Other
- Fixed point quantization of deep convolutional networks. 2016
- Training a binary weight object detector by knowledge transfer for autonomous driving. 2018
- Low-bit Quantization of Neural Networks for Efficient Inference. 2019 huawei