- 10 1月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* Optimize the kernel implementation of layernorm with openmp (#20895) * Add ernie c++ inference test (#21015) * Add ernie unit test test=develop * Add ernie unit test test=develop * Add ernie unit test test=develop * remove ngraph * optimize gpu test test=develop * optimize codes test=develop * fix cmake fails on inference_download_and_uncompress (#21185) * solve cmake fails on inference_download_and_uncompress test=develop * solve cmake fails on inference_download_and_uncompress test=develop * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop * Polish the codes of fc when needs padding (#21378) test=develop * Add ernie large c++ inference test (#21365) * add ernie-large test test=develop * add ernie large c++ inference test test=develop * Modify padding strategy: remove weight copy in fc padding (#21650) test=develop * optimize fc jit (#21878) test=develop Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
-
- 11 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop
-
- 19 2月, 2019 1 次提交
-
-
由 xuezhong 提交于
test=develop
-
- 30 1月, 2019 1 次提交
-
-
由 xuezhong 提交于
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 12 12月, 2017 1 次提交
-
-
由 QI JUN 提交于
There are mainly following fixes: - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place` - remove `eigen_device` interface in base class `DeviceContext` - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext` - remove unused `platform::EigenDeviceConverter` - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL` - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
-
- 26 10月, 2017 1 次提交
-
-
由 chengduoZH 提交于
-
- 23 10月, 2017 1 次提交
-
-
由 chengduoZH 提交于
-
- 28 9月, 2017 1 次提交
-
-
由 Liu Yiqun 提交于
-
- 27 9月, 2017 1 次提交
-
-
由 qiaolongfei 提交于
-
- 26 9月, 2017 1 次提交
-
-
由 caoying03 提交于
-
- 22 9月, 2017 1 次提交
-
-
由 caoying03 提交于
-
- 13 9月, 2017 1 次提交
-
-
由 caoying03 提交于
-
- 12 9月, 2017 1 次提交
-
-
由 caoying03 提交于
-