- 09 7月, 2021 1 次提交
-
-
由 arlesniak 提交于
* Use CBLAS for SelectedRows elementwise add operation. It's faster. * template compilation fix * reverted template compilation fix * slimmed template compilation fix Co-authored-by: NAdam Osewski <adam.osewski@intel.com>
-
- 21 6月, 2021 1 次提交
-
-
由 lidanqing 提交于
* Add oneDNN AXPY handler. * Add fallback for small tensors. * Fix ifdefs * Remove unnecessary namespace prefixes and add missing headers. * Guard handler_axpy with proper ifdefs. * Compilation of this function is possible only when Paddle is not build with CUDA nor HIP. * Move AXPY handler code to separate files. * Use oneDNN AXPY handler in SGD op. * Use axpy handler only when Paddle is built with oneDNN. * Add test for SUM BF16 with big rows. * Fix SFINAE rules for elementwise_add_to. * Add test case for SGD with big rows. * update * update Co-authored-by: NAdam Osewski <adam.osewski@intel.com>
-
- 26 5月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* modify matmul Op to complex template types * remove complex64/128 head file
-
- 06 5月, 2021 1 次提交
-
-
由 Adam Osewski 提交于
-
- 04 2月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include second time, test=develop
-
- 25 12月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line
-
- 10 9月, 2020 1 次提交
-
-
由 Steffy-zxf 提交于
update error info for selected_rows_functor
-
- 11 5月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop
-
- 30 10月, 2019 1 次提交
-
-
由 zhang wenhui 提交于
-
- 05 9月, 2019 1 次提交
-
-
由 123malin 提交于
* test=develop, communicator merge add => merge average
-
- 12 4月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 09 1月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 28 12月, 2018 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 14 12月, 2018 2 次提交
- 26 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
test=develop
-
- 14 11月, 2018 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 08 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
Fix code to support cpplint syntax check test=develop
-
- 27 10月, 2018 2 次提交
-
-
由 Qiao Longfei 提交于
-
由 Qiao Longfei 提交于
-
- 17 10月, 2018 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 15 10月, 2018 5 次提交
-
-
由 Qiao Longfei 提交于
test=develop
-
由 Qiao Longfei 提交于
-
由 Qiao Longfei 提交于
-
由 Qiao Longfei 提交于
-
由 Qiao Longfei 提交于
-
- 12 10月, 2018 1 次提交
-
-
由 minqiyang 提交于
test=develop
-
- 11 10月, 2018 2 次提交
-
-
由 minqiyang 提交于
1. Accelerate SelectedRows MergeAdd functor 2. Add SelectedRowsSumTo functor to support MergeAdd multiple SelectedRows into one test=develop
-
由 Qiao Longfei 提交于
-
- 08 10月, 2018 1 次提交
-
-
由 qiaolongfei 提交于
-
- 18 9月, 2018 1 次提交
-
-
由 sneaxiy 提交于
-
- 28 4月, 2018 1 次提交
-
-
由 Abhinav Arora 提交于
* Fix CPPLint errors * Fix CPPLint errors in sequence2batch * Fix compilation * Fix LSTM op and GRU op * Fix LSTMP op * Fix more cpplint errors in operators/math * Address Code review feedback
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 08 2月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 29 12月, 2017 2 次提交
-
-
由 typhoonzero 提交于
-
由 typhoonzero 提交于
-
- 27 12月, 2017 1 次提交
-
-
由 typhoonzero 提交于
-
- 12 12月, 2017 1 次提交
-
-
由 QI JUN 提交于
There are mainly following fixes: - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place` - remove `eigen_device` interface in base class `DeviceContext` - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext` - remove unused `platform::EigenDeviceConverter` - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL` - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
-