Quantized Neural Network for Edge Devices – Why do we still need smaller models in the age of the large model?

Raum E006

Typ Vortrag

Studiengang / Lehrstuhl / Firma Chair of Processor Design

Neural network models have been growing larger and larger rapidly over the decades. Meanwhile, larger models also cause more difficulties when deployed on edge devices. Recent research proved that neural networks can also work at lower precision, even 1-bit. Therefore, we will briefly introduce the following three topics in this presentation: 1. Why must we explore ultra-low precision quantization technology for neural networks on edge devices? 2. What is the Quantized Neural Network? How does it work in different precision and models? 3. How can we train and implement the Quantized Neural Network as a hardware accelerator on FPGA/ASIC?