- 18 11月, 2022 13 次提交
-
-
由 james 提交于
* correct sync behavior for XPU distributed training XPU support event mechanism similar to cuda event, so it is advisable to use an event to sync compute/comm streams for performance. However this mechanism is never fully tested, and inconsistent loss/ending_epochs are reported. Therefore, this PR replaces event sync with stream waiting as a temporary solution. * remove compile warning
-
由 Dandelight 提交于
-
由 james 提交于
* fix device id issue for xpu eager xpu device id is not correctly set in eager mode, thus vars are on dev0 unless XPUDeviceGurad is called, leading to this error message for all node rank != 0: "NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported." * fix typo * fix pybind error
-
由 Tian Zheng 提交于
* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation * Fix macro * Add implementation for conv_kernel and conv_grad_kernel * Modification after rebase onto latest develop * Modify plan cache to comply with the API of phi::autotune * Refactor to reduce duplicate code * Review fix: - move functions in conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu - add const specifier for input tensor - add logging when plans fail to execute - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h * - move plan building outside of cache * Fix ROCM build
-
由 GGBond8488 提交于
-
由 Yuang Liu 提交于
-
由 Wang Xin 提交于
* remove "gpu_primitives.h" in fluid namespace * fix PR-CI-GpuPS fail * fix PR-CI-GpuPS fail
-
由 parap1uie-s 提交于
* Fix hAPI bug of not compatible with LayerHook https://github.com/PaddlePaddle/Paddle/issues/47000 * Fix hAPI bug of not compatible with LayerHook * Allow to specify train_bs and eval_bs separately in hapi.fit() * Update model.py * Update Model.py * Update test_model.py * update model.py
-
由 zhangyikun02 提交于
-
由 feng_shuai 提交于
-
由 feng_shuai 提交于
-
由 Sylwester Fraczek 提交于
-
由 huangjiyi 提交于
-
- 17 11月, 2022 27 次提交
-
-
由 zyfncg 提交于
* clip extra and intermediate output of op * fix bug * fix bug * polich code * polich log
-
由 傅剑寒 提交于
-
由 傅剑寒 提交于
* remove unstack in nn.py under fluid * remove unstack under fluid
-
由 HongyuJia 提交于
* clean fluid elementwise_pow, remove API * clean elem_pow doc * clean elementwise_mod * clean elementwise min, floordiv, mod
-
由 Qi Li 提交于
* [NPU] add _npu_identity op and api, test=develop * fix doc * address comments
-
由 傅剑寒 提交于
* remove swish in nn.py under fluid * fix tswish test case
-
由 傅剑寒 提交于
-
由 Wen Sun 提交于
-
由 wenbin 提交于
* int scale * round * revert commit
-
由 xiongkun 提交于
-
由 hong 提交于
-
由 huangjiyi 提交于
-
由 YuanRisheng 提交于
* standard api * fix xpu bugs
-
由 Mountagha 提交于
-
由 taixiurong 提交于
-
由 傅剑寒 提交于
-
由 Wang Xin 提交于
-
由 xiaoxiaohehe001 提交于
* add_cast_bool * cast
-
由 Yiqun Liu 提交于
* Implement a common dims simplifier. * Fix the include position error. * Reduce the cpu overhead of broadcast computing.
-
由 ShenLiang 提交于
-
由 wuhuachaocoding 提交于
-
由 Kevin吴嘉文 提交于
-
由 huangjiyi 提交于
-
由 huangjiyi 提交于
* rm "paddle/fluid/operators/math.h" in phi * rm "paddle/fluid/operators/math.h" in fluit
-
由 Yuang Liu 提交于
Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16 training with tensor fusion. (#48041) * add bfloat16 for adamw * set lr not to bfloat16 for pure bf16 training * update the logic * update the adamw optimizer * support bfloat for adam
-
由 zhouweiwei2014 提交于
-
由 sneaxiy 提交于
* add vectorized bfloat16 atomicAdd * fix compile error * fix compile error again * fix V100 compile error * fix V100 compile again
-