Memory optimization in convolution layer.
Created by: qingqing01
For the large input images, the convolution layer by the implementation of im2col and gemm will allocate a large size of workspace for the process of im2col. For example, in one OCR based detection model, the input image size is 3 * 1500 * 1500 and the first convolution layer is:
conv1 = paddle.layer.img_conv(
input=data, # 3 * 1500 * 1500
filter_size=7,
num_channels=3,
num_filters=32,
stride=4,
padding=3,
act=paddle.activation.Relu(),
bias_attr=None)
The output shape of this convolution layer is 32 * 375 * 375. The size of extra workspace for im2col of this layer is 78 MB:
// Ci * Kh * Kw * output_height * output_width.
3 * 7 * 7 * 375 * 375 * sizeof(float) / 1024 / 1024 = 78MB
If more convolution layer like this in the network will lead to more memory usage. Especially for the mobile deployment, this is a very large memory . So we need to optimize it. We adopt the groped im2col + gemm
to calculate the convolution layer with large extra workspace.