Need to support non-contiguous access of C when using Eigen to compute gemm
Created by: Xreki
Paddle has an option USE_EIGEN_FOR_BLAS
, that is to use Eigen to compute those BLAS functions, such as gemm
, instead of OpenBLAS. On some platforms, such as armeabi-v7a
of Android, Eigen is faster than OpenBLAS.
However, the current implementation of EigenBlasGemm
does not support non-contiguous input and output:
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/function/EigenGemm.cpp#L40-L62
if (transA) {
sizeA[0] = K;
sizeA[1] = M;
CHECK_EQ(M, lda);
} else {
sizeA[0] = M;
sizeA[1] = K;
CHECK_EQ(K, lda);
}
Eigen::array<int, 2> sizeB;
if (transB) {
sizeB[0] = N;
sizeB[1] = K;
CHECK_EQ(K, ldb);
} else {
sizeB[0] = K;
sizeB[1] = N;
CHECK_EQ(N, ldb);
}
Eigen::array<int, 2> sizeC;
sizeC[0] = M;
sizeC[1] = N;
CHECK_EQ(N, ldc);
However, there are non-contiguous needs in the computation of GRU.
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/function/GruFunctor.h#L34
BlasGemm<Device, T>::compute(false,
false,
batchSize,
2 * frameSize,
frameSize,
1,
value.prevOutValue,
frameSize,
value.gateWeight,
frameSize * 2,
1,
value.gateValue,
frameSize * 3);
In the calling, N
is frameSize * 2
and ldc
is frameSize * 3
, such that networks with GRU fail because of the following error:
I1128 12:51:08.703012 29921 Util.cpp:166] commandline: --use_gpu=False
F1128 12:51:08.752616 29921 EigenGemm.cpp:62] Check failed: N == ldc (400 vs. 600)
*** Check failure stack trace: ***
@ 0x81c44d google::LogMessage::Fail()
@ 0x81fefc google::LogMessage::SendToLog()
@ 0x81bf73 google::LogMessage::Flush()
@ 0x82140e google::LogMessageFatal::~LogMessageFatal()
@ 0x5e3437 paddle::EigenBlasGemm<>::compute()
@ 0x4976fb paddle::GruCompute::forward<>()
@ 0x5836ae paddle::GatedRecurrentLayer::forwardBatch()
@ 0x584732 paddle::GatedRecurrentLayer::forward()
@ 0x4c8b3d paddle::NeuralNetwork::forward()
@ 0x60ae06 paddle_gradient_machine_forward
@ 0x42450e infer()
@ 0x413bf5 main
@ 0x318ae1ecdd (unknown)
@ 0x42314d (unknown)
Aborted