Created by: wzzju
Faster gemm_int8, max speedup can be 2(int8 / float). Add gemm_with_bias and gemm_with_relu_bias. With the help of gemm_with_relu_bias, we implement the fusion_conv_add_relu_int8_op, of which inputs and outputs are both int8_t. What's more, the pooling supports int8_t as inputs and outputs.