- 21 11月, 2022 20 次提交
-
-
由 wanghuancoder 提交于
* refine reduce_all
-
由 JYChen 提交于
* remove apis in fluid.ops * fix test_activation_nn_grad * fix circle import error * fix ops * fix cos * fix divide not inplace * remove lazy-import part
-
由 zyfncg 提交于
* Fix wrong eigen header include * fix compile bug
-
由 傅剑寒 提交于
-
由 Vvsmile 提交于
remove crop which is not used in Paddle 2.0
-
由 PuQing 提交于
* move threadpool fix cmake * fix make
-
由 傅剑寒 提交于
-
由 taixiurong 提交于
-
由 houj04 提交于
-
由 傅剑寒 提交于
* remove relu6 test case under fluid * fix relu6 test case in mkldnn_elt_act_fuse_pass
-
由 Vvsmile 提交于
replace paddle.fluid.layers.selu with paddle.nn.functional.selu
-
由 Vvsmile 提交于
* Remove API: gather replace the paddle.fluid.layers.gather with paddle.gather * modify the call of gather from old style to new style
-
由 engineer1109 提交于
-
由 wenbin 提交于
-
由 huangjiyi 提交于
* move cross_entropy from fluid to phi * replace mutable_data with Alloc * use .template
-
由 Wen Sun 提交于
* refactor: replace Collective & PointToPoint with NCCLEnv * refactor: rename to RunFnInNCCLEnv * refactor: pass std::function by value
-
由 LiYuRio 提交于
-
由 LiYuRio 提交于
-
由 PuQing 提交于
-
由 sneaxiy 提交于
-
- 20 11月, 2022 1 次提交
-
-
由 ccrrong 提交于
* remove range
-
- 19 11月, 2022 2 次提交
-
-
由 Wen Sun 提交于
-
由 Aganlengzi 提交于
* [CustomPlace] fix amp * [CustomPlace] fix amp * fix ut because of too long time matmul fp16
-
- 18 11月, 2022 17 次提交
-
-
由 wanghuancoder 提交于
-
由 MarDino 提交于
* fused qkvBiasAdd and transpose with split qkv * fix typo * fix format * fix name * add annotation * fix comment
-
由 yuehuayingxueluo 提交于
* clear fluid apis in fleet and passes * fix model.py * fix model.py * fix cpp_pass.py
-
由 Sławomir Siwek 提交于
* cleanup unused code * unify is_int8 is_bfloat16 * Simplify matmul_v2 FWD kernel * remove RunKernel methods * remove import namespace * remove headers * clean fluid/phi cross imports * remove fluid axpy_handler * delete fluid methods * activations * OneDNNMemDesc * MKLDNNFormatForSize * MatchShapeToLayout * MKLDNNMemoryFormat * MKLDNNFormat * ReorderMKLDNNHandler * to_void_cast * review suggestions * interpolate * remove fluid depedency * init * ExecuteMatMulV2 * rm fluid kernel * matmul_grad * remove mutable_data
-
由 Vvsmile 提交于
remove pad_constant_like which is not used in paddle 2.0
-
由 Zuza Gawrysiak 提交于
* Migrate conv_transpose to phi * Move handler to kernel * kernel m * Fix formatting * handler * remove fluid * revert tcp_store * tcp_store * remove unused * Fix declaration * add dnn input * Fix typo Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
-
由 201716010711 提交于
-
由 zyfncg 提交于
* fix bug of zero_allocator in host * fix test compile bug * add unittest * update test
-
由 傅剑寒 提交于
-
由 MarDino 提交于
* Add quick gelu and fused bias add kernel * fix annotation * remove useless code * add fast gelu option and set it in multi transformer op * add flag to restrict if use fast gelu approximate * fix flags conflict * fix use tanh function instead * add cudart version limit * use phi fast tanh func * fix comment
-
由 huangjiyi 提交于
* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi * update copyright years * rm "fluid/platform/device/gpu/gpu_device_function.h" in phi * fix rocm-complie bugs
-
由 Wen Sun 提交于
-
由 zhaoyingli 提交于
* [AutoParallel] selective recompute * add cmakelist
-
由 james 提交于
* correct sync behavior for XPU distributed training XPU support event mechanism similar to cuda event, so it is advisable to use an event to sync compute/comm streams for performance. However this mechanism is never fully tested, and inconsistent loss/ending_epochs are reported. Therefore, this PR replaces event sync with stream waiting as a temporary solution. * remove compile warning
-
由 Dandelight 提交于
-
由 james 提交于
* fix device id issue for xpu eager xpu device id is not correctly set in eager mode, thus vars are on dev0 unless XPUDeviceGurad is called, leading to this error message for all node rank != 0: "NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported." * fix typo * fix pybind error
-
由 Tian Zheng 提交于
* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation * Fix macro * Add implementation for conv_kernel and conv_grad_kernel * Modification after rebase onto latest develop * Modify plan cache to comply with the API of phi::autotune * Refactor to reduce duplicate code * Review fix: - move functions in conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu - add const specifier for input tensor - add logging when plans fail to execute - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h * - move plan building outside of cache * Fix ROCM build
-