Created by: wzzju
Faster gemm_int8, max speedup can be 2(int8 / float). Add gemm_with_bias and gemm_with_relu_bias. With the help of gemm_with_relu_bias, we implement the fusion_conv_add_relu_int8_op, of which inputs and outputs are both int8_t.
Created by: wzzju
Faster gemm_int8, max speedup can be 2(int8 / float). Add gemm_with_bias and gemm_with_relu_bias. With the help of gemm_with_relu_bias, we implement the fusion_conv_add_relu_int8_op, of which inputs and outputs are both int8_t.