View on GitHub

TensorQuant

TensorQuant

Features

Overview

TensorQuant is a toolbox for TensorFlow, which allows to investigate the effects of various quantization methods on deep neural networks. Its original purpose is to emulate custom numerical formats in Deep Neural Networks. No changes to the local TensorFlow version are required and it is highly compatible between TensorFlow versions.

Given a model of a DNN written in TensorFlow, the user needs to specify a dictionary which maps layer IDs to quantization methods. After some minor modifications to the original model description, TensorQuant can quantize the model using any user-defined methods or one provided with the toolbox.

Workflow

TensorQuant quantizes operators and variables by looping in additional nodes in the computational graph. Those insertions can be made to a very deep level, which effectively allows to emulate custom numerical formats, for example fixed-point formats.

Looping In Quantization Nodes

Quantization can be applied not only during inference, but also during training. This way, it is possible to train models with arbitrary numerical formats, for examples with binary, ternary or mixed precision weights.

Papers

Authors

Dominik Loroch (Fraunhofer ITWM)

Fraunhofer

Please reference to this paper.