- 21 1月, 2022 1 次提交
-
-
由 Weilong Wu 提交于
-
- 21 9月, 2021 1 次提交
-
-
由 Adam Osewski 提交于
* Create stateful OneDNNAXPYHandler object. This makes it possible to call it multiple times without recreating the oneDNN primitives every time. * Prepare SGDOpKernel to reuse its implementation from OneDNN kernel. * OneDNN SGD kernel. * Update call to use new OneDNNAXPYHandler object api. * Setup seed in proper place. * Enable OneDNN kernel only for single case. * For dense param and sparse grad. * Small refactor. * Enable oneDNN by op attr or by cmd line flag. * Use int64_t type for number of elements. * Support dense param and grad from OneDNN kernel. * Enable SGD OneDNN kernel when use MP BF16 optimizer. * Force non-copyable/movable OneDNNAXPYHandler. * Reuse OneDNNAXPYHandler for spare tensors in SUM op. * Fix SFINAE rules. * Remove recording event inside AXPY. * Get rid of internal primitive caching. * Stop use PP cache mechanims to store mem and primitive obj. * Handler obj store and reuse needed desc & prim * Do not derive from MKLDNNHandlerT
-
- 21 6月, 2021 1 次提交
-
-
由 lidanqing 提交于
* Add oneDNN AXPY handler. * Add fallback for small tensors. * Fix ifdefs * Remove unnecessary namespace prefixes and add missing headers. * Guard handler_axpy with proper ifdefs. * Compilation of this function is possible only when Paddle is not build with CUDA nor HIP. * Move AXPY handler code to separate files. * Use oneDNN AXPY handler in SGD op. * Use axpy handler only when Paddle is built with oneDNN. * Add test for SUM BF16 with big rows. * Fix SFINAE rules for elementwise_add_to. * Add test case for SGD with big rows. * update * update Co-authored-by: NAdam Osewski <adam.osewski@intel.com>
-
- 14 4月, 2021 1 次提交
-
-
由 Adam Osewski 提交于
* Initial draft for SGD BG16 kernel. * Unit tests for SGD with BF16 data type. * Add VLOG message to SGD BF16 op CPU kernel. * Enhance error messages and error types. * Refactor SGD op kernels to leverage some common code. * Make easier to add new kerne invoke code. * Fix SGD op kernel for sparse grad. * Unify quotes style. * Fix error for ROCM compilation. * Use specialized PADDLE_ENFORCE_xx functions.
-
- 27 9月, 2020 1 次提交
-
-
由 Chengmo 提交于
* fix sgd/momentum/dpsgd/rmsprop error message
-
- 24 10月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 08 3月, 2019 1 次提交
-
-
由 tensor-tang 提交于
test=develop
-
- 07 3月, 2019 1 次提交
-
-
由 tensor-tang 提交于
test=develop
-
- 04 3月, 2019 1 次提交
-
-
由 tensor-tang 提交于
test=develop
-
- 23 2月, 2019 1 次提交
-
-
由 tensor-tang 提交于
test=develop
-
- 27 12月, 2018 2 次提交
- 26 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
test=develop
-
- 16 11月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* wip simplify operator framework * wip * wip * done test=develop * clean test=develop * fix test=develop * fix deps test=develop * fix cpu build test=develop * fix tensorrt build test=develop * fix tests test=develop * fix test=develop * fix cpu build test=develop
-
- 13 11月, 2018 1 次提交
-
-
由 Qiao Longfei 提交于
test=develop
-
- 08 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
Fix code to support cpplint syntax check test=develop
-
- 17 8月, 2018 1 次提交
-
-
由 Qiao Longfei 提交于
Optimize selected rows for dist lookup table with rwlock
-
- 05 6月, 2018 1 次提交
-
-
由 Siddharth Goyal 提交于
-
- 29 5月, 2018 1 次提交
-
-
由 qiaolongfei 提交于
-
- 17 4月, 2018 1 次提交
-
-
由 Yancey1989 提交于
-
- 13 4月, 2018 1 次提交
-
-
由 Abhinav Arora 提交于
-
- 03 4月, 2018 2 次提交
-
-
由 qiaolongfei 提交于
-
由 qiaolongfei 提交于
-
- 09 3月, 2018 1 次提交
-
-
由 Yancey 提交于
Fix sparse update memory error for distributed training
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 23 12月, 2017 1 次提交
-
-
由 chengduoZH 提交于
-
- 12 12月, 2017 1 次提交
-
-
由 QI JUN 提交于
There are mainly following fixes: - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place` - remove `eigen_device` interface in base class `DeviceContext` - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext` - remove unused `platform::EigenDeviceConverter` - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL` - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
-
- 18 10月, 2017 3 次提交
- 05 10月, 2017 4 次提交
-
-
由 qiaolongfei 提交于
-
由 qiaolongfei 提交于
-
由 qiaolongfei 提交于
-
由 Abhinav Arora 提交于
-
- 04 10月, 2017 1 次提交
-
-
由 Abhinav Arora 提交于
-
- 03 10月, 2017 1 次提交
-
-
由 Abhinav Arora 提交于
* Changing learning rate from attribute to input(float) * Removing obsolete code
-
- 28 9月, 2017 1 次提交
-
-
由 Yu Yang 提交于
-
- 06 9月, 2017 1 次提交
-
-
由 Yu Yang 提交于
Fix #3902
-