- 18 3月, 2022 3 次提交
- 17 3月, 2022 4 次提交
-
-
由 seemingwang 提交于
* extract sub-graph * graph-engine merging * fix * fix * fix heter-ps config
-
由 baoachun 提交于
-
由 Weilong Wu 提交于
* [Eager] Support eager grad interface, draft version * Support eager grad interface with allow_unused and multi startup_op * Fix code format * Fix allow_unused case, return PyNone if tensor not initialize * Support output's stop_gradient related to create_graph * Support grad exception case in eager mode, fix coverage CI * Update ToPyObject, return PyNone if not initialize * AccumulationNode add FLAGS_retain_grad_for_all_tensor * Fix ci issue * Fix CI issue * fix, use core.eager.Tensor * Add func SetBufferSlotRankZeros for GradTensorHolder * Support retain_graph by using ClearTensorWrappers * Support retain_graph by using ClearTensorWrappers * Update retain_graph and no_grad_vars related test case * Update code gen logic for ClearTensorWrappers * Fix by override statement * fix override func args * Support retain_graph, update unit tests * Updated ClearTensorWrappers logic * fix grad python interface * Use deep copy and update unit tests * Polish code * Polish code * Fix CI issue, Deep copy only use when user set grad_tensors * Fix CI, use Backward instead RunBackward * Fix CI, Declare kernel explicitly in test file * Polish, remove vector of TensorWrapper * Refactor the logic of grad/backward, polish codes * Update code after merge upstream develop * Polish after merge upstream develop * Update to adapt new GradNodeBase superclass * Fix error introduced during conflict resolution * Update purify potential_startup_nodes logic * Fix errors * Polish code * Remove useless args for ToPyObject * Remove useless TensorWrappersSet * Fix code-format, re-install pre-commit * Fix pre-process logic for potential_startup_ops * Update unit tests, use eager mode
-
由 Jiabin Yang 提交于
* fix copy_ problem by doing it with phi copy * improve test coverage * refactor copy with sr kernel
-
- 16 3月, 2022 2 次提交
-
-
由 ronnywang 提交于
-
由 Yulong Ao 提交于
* [Auto Parallel] Support the auto completion of while_op * [Auto Parallel] Improve the completion algorithms * [Auto Parallel] Fix bugs for ernie inference * [Auto Parallel] Remove attrs which cannot be pickled * [Auto Parallel] make the dims_mappings of LodTensorArray vars empty * [Auto Parallel] Fix bugs for the ernie inference in the pipeline parallel * [Auto Parallel] Remove unncessary comments * [Auto Parallel] Fix a bug of the CMakeLists * [Auto Parallel] Use the newest APIs to write the unit test * [Auto Parallel] Remove unnecessary statements
-
- 15 3月, 2022 4 次提交
-
-
由 xiongkun 提交于
* run python api in eager model and filter the out in argument list * fix code
-
由 furnace 提交于
* [NPU] add AMP O1 support * [NPU] fix NOTE and warnings
-
由 Haohongxiang 提交于
* refactor reducer * modify cmakelists * solve conflicts * rename group and update process_group * fix bugs of ProcessGroupNCCL * modify for CIs * refactoring reducer
-
由 zyfncg 提交于
* change the exception of getitem from pybind type to PADDLE_ENFORCE * fix bug * remove pybind::index_error exception
-
- 14 3月, 2022 4 次提交
-
-
由 Jiabin Yang 提交于
* eager, test=develop * fix bug, test=develop * eager, test=develop * merge legacy to fluid * eager, test=develop * eager, test=develop * Refactor TensorAdd func by template and remove gradient_accumulation in eager * Remove needless target name * eager, test=develop * eager, test=develop * Use overload instead of template * Remove legacy code * Remove legacy code * selectedrows, test=develop * Remove DataType test * eager, test=develop * eager, test=develop * support gan, test=develop * Using Tensor directly instead of using EagerTensor * support gradient_accumulation * make test_imperative_lod_tensor_to_selected_rows longer * make test_imperative_lod_tensor_to_selected_rows longer * refine code * ptb, test=develop * Rename all EagerTensor to Tensor * Rename some EagerTensor to Tensor * rename EagerTensor to EagerVariable * eager, test=develop * eager, test=develop * eager, test=develop * eager, test=develop * add more test * eager, test=develop * Support copiable selected rows and merge develop * save load, eager, test=develop * save load, eager, test=develop * refine, test=develop * remove useless _set_value method * refine, test=develop * refine, test=develop * revert static_runner, test=develop * EagerTensor to Tensor, test=develop * refine, test=develop * refine, test=develop * clear grad, test=develop * merge, develop * merge, develop * merge, test=develop * merge, test=develop * Support quant and part of slice * support legacy static save * extend slim tests time * remove imperative on inference * remove imperative on inference * merge develop * fix typo * fix typo * split slice related code into 2 part for imperative and eager * split slice from inference * split slice from inference * fix test_tensor_register_hook * support custom op in eager mode * fix inference deps error * split eager utils from custom operator * fix type match * fix typo Co-authored-by: NWang Huan <wanghuan29@baidu.com> Co-authored-by: NWeilong Wu <veyron_wu@163.com> Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>
-
由 0x45f 提交于
-
由 Zhong Hui 提交于
[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors between python processes. (#37302) * Add support for paddle.multiprocessing * move multiprocessing to incubate.
-
由 0x45f 提交于
* refine partial_program * fix code for test_mnist.py train * support quantify UT * make __fake_vars and _double_grads to lazy * fix comments
-
- 12 3月, 2022 1 次提交
-
-
由 Aganlengzi 提交于
* [custom kernel] fix static object de-initialize bug * fix text * fix text * refine log info
-
- 11 3月, 2022 2 次提交
- 10 3月, 2022 2 次提交
-
-
由 heliqi 提交于
* add onnxruntime predictor * Add code comments * support link paddle2onnx onnxruntime * support onnxruntime with python * support onnxruntime with python * support onnxruntime with windows * paddle2onnx compile with windows * supoort windows compile * supoort windows compile with onnxruntime * supoort windows compile with paddle2onnx * supoort mac compile * compile with mac * compile with mac * add code comments * fix remind word * code optimization * add test case * add test case * add inference demo_ci test case * fix compile paddle2onnx with no python * add inference demo_ci test case * add inference demo_ci test case * add inference infer_ut test case * support c go api and test cases * add converage test case * add converage test case * add capi test case * add capi test case
-
由 Lijunhui 提交于
-
- 09 3月, 2022 2 次提交
-
-
由 0x45f 提交于
* adapt run_program OP for eager * fix program_id * refine code * fix test
-
由 huzhiqiang 提交于
-
- 08 3月, 2022 2 次提交
-
-
由 lilong12 提交于
* add pg_hccl
-
由 chenjian 提交于
* add python profiler package * update according to review * fix bug * fix bug * fix bug * add unit test * Revert "add unit test" This reverts commit 4e69ff71b0645e069afe5dd8fea0d07717852c48. * reduce for pr * add unit test * modify for pr * fix unittest * update for ci coverage * modify according to review * fix bug * improve coverage
-
- 07 3月, 2022 3 次提交
-
-
由 xiongkun 提交于
* add python api test in TestOp * test_python_api if self.python_api is set * fix code by CR
-
由 Ming-Xu Huang 提交于
* Added cuBlasLtHandle_t to device context. * Added fused_gemm_epilogue op. 1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue. 2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2. 2. Act currently only be supported ReLU. (Will add GeLU in the future). * Added UT to fused_gemm_epilogue op. * Added LinearAct Pattern 1. Added LinearAct into graph_pattern_detector.* to define (2.)'s pattern. 2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)). 3. act currently only support ReLU (Will support GeLU in the future). * Added FuseGemmEpiloguePass 1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU} fusion (GeLU will be supported in the future). 2. Only support matmul_v2 from nn.Linear. * Added pybind to BuildStrageter.fuse_gemm_epilogue_. * Added UT for fuse_gemm_epilogue_pass. * GeLU support and EpilogueSingleton 1. Added GeLU support to fused_gemm_epilogue op. 2. Added EpilogueSingleton to cache auxiliary pointer. 3. Added related UTs. * Rename cublaslt_epilogue_opto gemm_epilogue_op.*. * Added both train and infer pattern to LinearAct. 1. Added support of fwd graph with grap_ops linking to LinearAct. 2. Added related changes to fuse_gemm_epilogue_pass for above modification. * Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass. * Added identity activation support to gemm_epilogue_op. * Added Linear Fusion (matmul_v2 + ele_add) 1. Added matmul_v2 + ele_add pattern to LinearActPattern. 2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass. * Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.* * Add fused_gemm_epilogue_grad op. 1. Added fused_gemm_epilogue_grad to support backward epilogue fusion. * Add UTs to fused_gemm_epilogue_grad_op. * Change attribute name in fused_gemm_epilogue_grad_op for clearing. * Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op. * Added ElementwiseAdd+Matmul+Act graph pattern detection. * Fuse backward of Linear( Act(x)) 1. Added backward fusion pass to Linear( Act(x)). 2. Added backward fusion pass to Linear(x). * Added UTs to backward fusion of Linear(Act(x)). * Complete document of arguments to fused_gemm_epilogue_op. * Made arguments of some functions pass by reference. * Modify code with review comments. 1. Made arguments of some function pass by reference. 2. Removed redundant code. 3. Followed Google code style to change code. * Made 'const' code style be consistent * Fixed random seed of python UTs. * Set Compiling constrains to cuBlasLt 1. Require CUDA 11.6+ 2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6. * Code Reivew from Paddle 1. Changed arguments name is_first_gemm to without_x_gradient for clearing. 2. Applied PADDLE_THROW in fused_gemm_epilogue_op. * Remove EpilogueSingleton 1. Applied ReserveSpace to replace Epilogue for passing auxiliary pointers between FWD and BWD. * Fix a logical error and enhance UTs. 1. Added act op count checking in UTs. 2. Fix issue to fuse backward or ReLU(Linear(X)). 3. TODO: solve GELU fusion issues. * Fix Linear and GeLU fusion issues. 1. Modified graph_detech_pattern to fit with both linear wiht gelu or relu. 2. Modified data range in Uts to allow negative values. * Removed fused_gemm_epilogue_op.h. * Rename namespace pten to phi. * Rename name of arguments in fused_gemm_epilogue_op 1. bias -> Bias. 2. out -> Out. 3. reserve_space -> ReserveSpace. * Change EpiloguePassActivationCache as local variable. 1. Removed singleton in EpiloguePassActivationCache. 2. Made EpiloguePassActivationCache as an argument to each pass functions.
-
由 lilong12 提交于
-
- 04 3月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* [Eager][Yaml]Supported Scalar and ScalarArray for AutoCodeGen * Generate forward-only operators * [Yaml]Support parsing fwd & bwd returns with name * Fixed issues * Fixed minor issues
-
- 03 3月, 2022 5 次提交
-
-
由 Zhanlue Yang 提交于
-
由 ronnywang 提交于
-
由 lilong12 提交于
-
由 Jiabin Yang 提交于
* eager, test=develop * fix bug, test=develop * eager, test=develop * merge legacy to fluid * eager, test=develop * eager, test=develop * Refactor TensorAdd func by template and remove gradient_accumulation in eager * Remove needless target name * eager, test=develop * eager, test=develop * Use overload instead of template * Remove legacy code * Remove legacy code * selectedrows, test=develop * Remove DataType test * eager, test=develop * eager, test=develop * support gan, test=develop * Using Tensor directly instead of using EagerTensor * support gradient_accumulation * make test_imperative_lod_tensor_to_selected_rows longer * make test_imperative_lod_tensor_to_selected_rows longer * refine code * ptb, test=develop * Rename all EagerTensor to Tensor * Rename some EagerTensor to Tensor * rename EagerTensor to EagerVariable * eager, test=develop * eager, test=develop * eager, test=develop * eager, test=develop * add more test * eager, test=develop * Support copiable selected rows and merge develop * save load, eager, test=develop * save load, eager, test=develop * refine, test=develop * remove useless _set_value method * refine, test=develop * refine, test=develop * revert static_runner, test=develop * EagerTensor to Tensor, test=develop * refine, test=develop * refine, test=develop * clear grad, test=develop * merge, develop * merge, develop * merge, test=develop * merge, test=develop * Support quant and part of slice * support legacy static save * extend slim tests time * remove imperative on inference * remove imperative on inference * merge develop * fix typo * fix typo * split slice related code into 2 part for imperative and eager * split slice from inference * split slice from inference * fix test_tensor_register_hook Co-authored-by: NWang Huan <wanghuan29@baidu.com> Co-authored-by: NWeilong Wu <veyron_wu@163.com> Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>
-
由 lilong12 提交于
* add pg_gloo
-
- 02 3月, 2022 4 次提交
-
-
由 huzhiqiang 提交于
-
由 Yuang Liu 提交于
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference (#39992)
-
由 Baibaifan 提交于
-
由 wanghuancoder 提交于
* open eager when WITH_PYTHON, test=develop * refine, test=develop * refine, test=develop * add DWITH_PYTHON for gen_fluid_lib, test=develop
-
- 01 3月, 2022 1 次提交
-
-
由 Allen Guo 提交于
-