Memory optimization in convolution layer. (#6801) · Issue · PaddlePaddle / Paddle

Memory optimization in convolution layer.

Created by: qingqing01

For the large input images, the convolution layer by the implementation of im2col and gemm will allocate a large size of workspace for the process of im2col. For example, in one OCR based detection model, the input image size is 3 * 1500 * 1500 and the first convolution layer is:

    conv1 = paddle.layer.img_conv(
            input=data,   # 3 * 1500 * 1500
            filter_size=7,
            num_channels=3,
            num_filters=32,
            stride=4,
            padding=3,
            act=paddle.activation.Relu(),
            bias_attr=None)

The output shape of this convolution layer is 32 * 375 * 375. The size of extra workspace for im2col of this layer is 78 MB:

// Ci * Kh * Kw * output_height * output_width.
3 * 7 * 7 * 375 * 375 * sizeof(float) / 1024 / 1024 = 78MB

If more convolution layer like this in the network will lead to more memory usage. Especially for the mobile deployment, this is a very large memory . So we need to optimize it. We adopt the groped im2col + gemm to calculate the convolution layer with large extra workspace.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Memory optimization in convolution layer.

PaddlePaddle / Paddle
大约 1 年前同步成功