paper-of-quantization

PTQvsTAQ

c1. Post-Training Quantization(PTQ)

ZeroQ, EasyQuant, LAPQ, ACIQ,
DFQ

c2. Training-Aware Quantization(TAQ)

S&Q

3. Binary, Ternary, 3-4-5-bit, Flexible

c1. Binary

c2. Ternary

c3. 3-4-5-bit

c4. Flexible

2020

ECCV2020

PWLQ: Post-Training Piecewise Linear Quantization for Deep Neural Networks. Samsung

CVPR2020

ZeroQ: A Novel Zero-Shot Quantization Framework. Berkeley, Peking University
AdaBits: Neural Network Quantization with Adaptive Bit-Widths. ByteDance direct adaption, progressive training and joint training 三种方法来量化模型
BiDet: An Efficient Binarized Object Detector. CVPR2020
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy. CVPR2020 MIT architecture, channel pruning, HAQ三者结合的NAS方法，可操作性确实更好，motivation也合适。
IR-Net: Forward and Backward Information Retention for Accurate Binary Neural Networks. CVPR2020

ICLR2020

LSQ: Learned Step Size Quantization ICLR2020
Mixed Precision DNNs: All you need is a good parametrization ICLR2020 sony
SAT: Rethinking neural network quantization. Scale-Adjusted Training ICLR2020 reject paper
LLSQ: Learned Symmetric Quantization of Neural Networks for Low-precision Integer Hardware. ICLR2020 ICT

HAWQv2: Hessian Aware trace-Weighted Quantization of Neural Networks
SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency. IBM Watson Lab
BBG: Balanced Binary Neural Networks with Gated Residual.
EasyQuant: Post-training Quantization via Scale Optimization. 优化每层的输出的与余弦距离 code中给出了加速效果实验
LAPQ: Loss Aware Post-training Quantization code intel AIPG ACIQ的进化版，方法更加简洁，实现更直接，效果也更加好

2019

ICLR2019

ACIQ(pre): analytical clipping for integer quantization of neural networks. ICLR2019 reject intel AIPG
Per-Tensor Fixed-point quantization of the back-propagation algorithm. ICLR2019
RQ: Relaxed Quantization for discretized NNs. ICLR2019

NIPS2019

ACIQ Post training 4-bit quantization of convolution networks for rapid-deployment. NIPS 2019 AIPG, Intel

ICCV2019

DSQ: Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks. ICCV2019 SenseTime, Beihang
DFQ: Data-Free Quantization through Weight Equalization and Bias Correction. Qualcomm 高通

CVPR

FQN: Fully Quantized Network for Object Detection. CVPR2019
QIL: Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss CVPR2019

Other

SAWB: Accurate and efficient 2-bit quantized neural networks. sysml2019
SQuantizer: Simultaneous Learning for Both Sparse and Low-precision Neural Networks. 2019 AIPG, Intel
Distributed Low Precision Training Without Mixed Precision. Oxford snowcloud.ai
Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers. ICT Cambricon int8 for weights and activations, int16 for most of the gradients. 通过量化加快训练过程
WAGEUBN: Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers. 应该算是WAGE的进阶版了

2018

ICLR2018

VNQ: Variational network quantization. ICLR2018
WAGE: Training and Inference with Integers in Deep Neural Networks. ICLR2018 oral tsinghua 不仅量化了weight,activation还量化了error, gradient.
Alternating multi-bit quantization for recurrent neural networks. ICLR2018 alibaba
Mixed Precision Training. FP16 training ICLR2018 baidu
Model Compression via distillation and quantization. ICLR2018 google
Quantized back-propagation: training binarized neural networks with quantized gradients. ICLR2018

CVPR2018

Clip-Q: Deep network compression learning by In-Parallel Pruning Quantization. CVPR2018 SFU quantization 与pruning同时进行，达到更优的压缩结果。先使用贝叶斯优化搜索 layer-wise的（p,q),然后在进行fine-turning，达到最大目的的压缩权重，没有涉及到激活值的量化和压缩
ELQ: Explicit loss-error-aware quantization for low-bit deep neural networks. CVPR2018 intel tsinghua
Quantization and training of neural networks for efficient integer-arithmetic-only inference. CVPR2018 Google
TSQ: two-step quantization for low-bit neural networks. CVPR2018
SYQ: learning symmetric quantization for efficient deep neural networks. CVPR2018 xilinx
Towards Effective Low-bitwidth Convolutional Neural Networks. CVPR2018

ECCV2018

LQ-NETs: learned quantization for highly accurate and compact deep neural networks. ECCV2018 Microsoft
Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved Representational capability and advanced training algorithm. ECCV2018 HKU
V-Quant: Value-aware quantization for training and inference of neural networks. ECCV2018 facebook

NIPS2018

Heterogeneous Bitwidth Binarization in Convolutional Neural Networks. NIPS2018 microsoft
HAQ: Hardware-Aware automated quantization. NIPS workshop 2018 mit
Scalable methods for 8-bits training of neural networks. NIPS2018 intel

AAAI2018

From Hashing to CNNs: training Binary weights vis hashing. AAAI2018 nlpr

Other

Synergy: Algorithm-hardware co-design for convnet accelerators on embedded FPGAs. 2018 UC Berkeley
Efficient Non-uniform quantizer for quantized neural network targeting Re-configurable hardware. 2018
HALP: High-Accuracy Low-Precision Training. 2018 stanford
PACT: parameterized clipping activation for quantized neural networks. 2018 IBM
QUENN: Quantization engine for low-power neural networks. CF18ACM
UNIQ: Uniform noise injection for non-uniform quantization of neural networks. 2018
Training competitive binary neural networks from scratch. 2018
A white-paper: Quantizing deep convolutional networks for efficient inference. 2018 google

2015-2016

Deep learning with limited numerical precision. 2015 IBM
DoReFa-Net: Training low bit-width convolutional neural networks with low bit-width gradients. 2016
BNN: Binarized Neural Networks. NIPS2016
TWNs: Ternary weight networks. NIPS2016 ucas
XNOR-Net: ImageNet Classification using binary convolutional neural networks. ECCV2016 washington
Hardware-oriented approximation of convolutional neural networks. ICLR2016
Quantized convolutional neural networks for mobile devices. CVPR2016 nlpr

2017

Flexpoint: an adaptive numerical format for efficient training of deep neural networks. 2017 intel
INQ: Incremental network quantization, towards lossless CNNs with low-precision weights. ICLR2017 intel labs china
TTQ: Trained ternary quantization. ICLR2017 stanford
WRPN: wide reduced-precision networks. 2017 Accelerator Architecture Lab, Intel
HWGQ: Deep Learning with Low Precision by Half-wave Gaussian Quantization. CVPR2017
A Survey of Model Compression and Acceleration for Deep Neural Networks. 2017
LP-SGD Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent ISCA2017
How to Train a Compact Binary Neural Network with High Accuracy? NLPR MicroSoft

Other

Fixed point quantization of deep convolutional networks. 2016
Training a binary weight object detector by knowledge transfer for autonomous driving. 2018
Low-bit Quantization of Neural Networks for Efﬁcient Inference. 2019 huawei

top