- 01 3月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* solve build gpu task core (#30626) * build gpu task core * format * dump to cpu (#30750) * dump to cpu * format * format * format * support multi node in heterps (#31102) * push multi node * multi node * MultiThread * remove log * solve bug in 30829 * optimizer
-
- 23 2月, 2021 2 次提交
-
-
由 Chen Weihang 提交于
[CustomOp] New custom operator extension mechanism in 2.0.1 Cherry-pick New custom operator basic implementation related PRs
-
由 tangwei12 提交于
* test=develop, save/load, shrink Co-authored-by: NseiriosPlus <tangwei12@baidu.com> Co-authored-by: N123malin <malin10@baidu.com>
-
- 19 2月, 2021 1 次提交
-
-
由 Wilber 提交于
-
- 04 2月, 2021 1 次提交
-
-
由 石晓伟 提交于
-
- 02 2月, 2021 1 次提交
-
-
由 Shang Zhizhou 提交于
* add dla * add python api Co-authored-by: Nshangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com> Co-authored-by: Nshangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>
-
- 20 1月, 2021 2 次提交
-
-
由 AshburnLee 提交于
* Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) * Fixed an error * Fixed an error
-
由 AshburnLee 提交于
This PR is cherry-picked from PR: #29192 Function: Added TF32 switch for cuDNN. Turned on as default, turned off when users set the switch as False
-
- 19 1月, 2021 2 次提交
- 18 1月, 2021 2 次提交
-
-
由 guofei 提交于
* Modify the calculation logic of LambOptimizer (#29313) * Modify the calculation logic of LambOptimizer * Modify the calculation logic of LambOptimizer * Modify the calculation logic of LambOptimizer
-
由 pangyoki 提交于
Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) (#30496) * add view strategy on squeeze,unsqueeze,reshape,flatten * add squeeze unittest * add unittests * use View strategy as name rather than Reuse Allacation * fix view api doc * fix format * use core.ops when input of reshape2 is Tensor * fix test_cross_entropy_loss error because of reshape2 * fix test_cross_entropy_loss error because of reshape2 * add inplace strategy * add elementwise_add sub * let backward op not use inplace * grad op do not use inplace * fix memory increase error and add leaf error message * delete selected_rows * change op_function * little change * solve HandleViewBetweenInputAndOutput * add unittest and leaf error message * merge view error * optimize op_function_generator format and support sum inplace op * fix format of basic_engine * fix format for framework * little change of variable wrapper * add reshape, squeeze, unsqueeze, scatter api * add relu elu tanh softmax inplace api * fix test_squeeze_op unittest * fix test_relu_op unittest * fix comment problems * delete sample code of inplace api * add reference of grad_pending_nodes in basic_engine * fix unittest name * add inplace apis into wlist * fix error message * add PADDLE_ENFORCE for set grad op twice * fix head file error
-
- 15 1月, 2021 1 次提交
-
-
由 pangyoki 提交于
* Cherry-pick 30072, add dispenable input for core.ops.reshape2/expand/slice (#30072) * add dispenable input 'shape' for core.ops.reshape2 * add dispenable inputs for core.ops.reshape2/expand/slice * add ut * save reshape update in pr 30180 * save reshape update v2 in pr 30180 Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
-
- 13 1月, 2021 3 次提交
- 12 1月, 2021 3 次提交
-
-
由 Leo Chen 提交于
* change to tensor copy sync * change to tensor copy sync * make copy_to safe when use TensorCopy * refine code * add ut * add cudapinned garbagecollector * add testcase: cpu place -> cuda pinned place
-
由 Chengmo 提交于
* Fix server.h include device_context (#30243) * fix cmake Co-authored-by: NseiriosPlus <tangwei12@baidu.com> * 【Paddle.Fleet】Support local save sparse param (#30175) * add save tensor support Co-authored-by: NseiriosPlus <tangwei12@baidu.com> * add sparse embedding & load vars for 2.0 & gloo bug fix (#30306) * add sparse embedding & load vars for 2.0 Change-Id: I36b59ed5f015189dc9d9d2e34a9357722d369f1b * fix hdfs gloo Change-Id: Ia84d579053720ad804183e54c9a04b4f031c79c6 * fix gloo hdfs Change-Id: I5ab982fd483cddc10adcdef0b8aa83aca976cb9e * move loadvar/sparse embedding from incubute to static Change-Id: I57081d3545ad2efab78c72420d2162c0eacaf3a0 Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
-
由 Chengmo 提交于
-
- 11 1月, 2021 1 次提交
-
-
由 pangyoki 提交于
[Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258) * add view strategy on squeeze,unsqueeze,reshape,flatten * add squeeze unittest * add unittests * use View strategy as name rather than Reuse Allacation * fix view api doc * fix format * use core.ops when input of reshape2 is Tensor * fix test_cross_entropy_loss error because of reshape2 * delete selected_rows * change op_function * little change * solve HandleViewBetweenInputAndOutput
-
- 08 1月, 2021 2 次提交
-
-
由 liym27 提交于
[cherry-pick 2.0] Fix bug: In dynamic mode, if start or end is negetive, __getitem__ return wrong result(#30003) (#30146) 1. when slice_item is a slice: 1) the start of __getitem__ should be std::max(start, 0) if slice 2) the start of __getitem__ should be std::min(end, dim) 2. when slice_item is an integer, it should be in [-dim_len, dim_len) 3. Fix error message to use accurate data
-
由 123malin 提交于
* Add Lookahead and ModelAverage Optimizer (#30004) * test=develop, add model_average and lookahead * Improve Index select cuda kernel (#30139) * test=develop, add index_select_cuda kernel
-
- 06 1月, 2021 1 次提交
-
-
由 liym27 提交于
[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842) (#30105) Before this PR, SharePlaceHolderWith share Tensor between different C++ Variable, which meas sharing the data, shape, and inplace_version_counter_ of Tensor. But in some cases, Sharing data and inplace_version_counter_ but not sharing shape is needed. For example, inplace op reshape, can't share shape. This PR, discard SharePlaceHolderWith, and expose ShareInplaceVersionCounterWith for C++ Tensor. This reverts commit b10ecd9d. * Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase
-
- 05 1月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* add topo aware * resource.h * topo aware * format
-
- 04 1月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* support deepcopy for Layer/Tensor/Paramerbase * fix some code
-
- 29 12月, 2020 2 次提交
-
-
由 liuyuhui 提交于
* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926) * add bkcl.so in whl for kunlun (#29947) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29961) Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
-
由 Thunderbrook 提交于
* cherry pick heter ps * CMakeList
-
- 25 12月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add ps table (#29463) * add ps table Change-Id: I468a04bd071d21ff52654926fcf4d5f3da19e178 * add service (#29560) * add service, remove ut on mac * fix heter_profiler & add heter stop method * fix code style * merge pscore Change-Id: Ie7f60d1cdde6755a0c29db26863c6283e9843d57 * fix cmake Change-Id: I6773509a7b4ca79139ecc40b7bf3eb318ceff8bb * fix conflit Change-Id: I35575be0c96a8520f9d756ea7f1ff0b904a165ba * fix conflit Change-Id: Ic926ea0b0d67803226d51241397ba3b510226bfa
-
- 22 12月, 2020 1 次提交
-
-
由 ShenLiang 提交于
* fix fleet for multi-stream * fix memcpy for ncclid * use sync to solve move operation
-
- 21 12月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 17 12月, 2020 1 次提交
-
-
由 ShenLiang 提交于
* Fix the dowanload bug in the case of multiple machines (#29551) * fix the dowanload bug * add sort for ips * Fix bug of matmul_v2 for broadcast case (#29599) * fix bug of matmul_v2 for broadcast * Rebuild group automatically in dynamic graph distributed (#29255) * add tensor_indices in AssignGroupBySize * add rebuild group in reducer * fix error message of gather nd (#29521)
-
- 05 12月, 2020 1 次提交
-
-
由 myq406450149 提交于
* enhance array_to_lod_tensor_op lod_tensor_to_array_op errors information. test=develop * fix format. test=develop * format fix. test=develop * add lod_rank_table. test=develop * fix format. test=develop * fix doc info. test=develop * fix np error * add unbind dygraph api. test=develop * fix unbind doc.test=develop
-
- 04 12月, 2020 2 次提交
-
-
由 liym27 提交于
[cherry-pick 2.0rc1][inplace] Add ShareHolderWith for class Variable and SharePlaceholderWith in VarBase.detach() to share the same Tensor/SelectedRows (#29267) (#29359)
-
由 Chen Weihang 提交于
* basic impl of type promote * add comment & another testcase * fix complex bugs & support python op promote type * fix failed unittests & polish code * add unittest for coverage * change to only promote complex type * polish code details * polish several comments
-
- 03 12月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* Add pure fp16 training with master weights. (#27712) * add the weight decay func for the momentum op * Add the multi_precision function in Momentum Optimizer. * Make sure that the initial value of master weights are same with the fp16 weights. * add static loss scaling. * add the rescale_grad function in the pure fp16 training. * use the original momentum updating method. * Polish some codes, such as variable names. * add docstring for apis. * update the var creation details of _create_master_weight. * not modify codes about imperative momentum updating. * Fix the error of test_dist_sparse_tensor_load_momentum UT. * add unit test for multi precision fp16 training. * add more unit tests for CI. * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
-
- 01 12月, 2020 2 次提交
-
-
由 chentianyu03 提交于
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types * add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest
-
由 Zhou Wei 提交于
* The leaf tensor concept is exposed and the gradient accumulation of leaf tensor * The leaf tensor concept is exposed and the gradient accumulation of leaf tensor * fix coverage * fix api doc * fix CI unittest * fix CI unittest * fix unitest * empty tensor does’t need inner_var_ * fix some error message
-
- 30 11月, 2020 1 次提交
-
-
由 liym27 提交于
* Add a class TensorInplaceVersion to count the inplace version and put it in framework::Tensor instead of Allocation or Variable. * Add a new attribute `_inplace_version` for VarBase. * Raise exception if an inplace operation can result in incorrect gradient computation. * Add a new interface _bump_inplace_version() for VarBase to bump the version whenever the Tensor is modified through an inplace operation. * For api assign, call _bump_inplace_version() when it's an inplace operation inn dynamic mode. * Use original var_wrapper if the inplace_version is not changed. * Replace SnapshotVarWrapperList with SnapshotVarWrapper to optimize performane.
-
- 27 11月, 2020 1 次提交
-
-
由 ShenLiang 提交于
* add reducer * refine envent for memorycopy * add concat&split for allreduce * apply concat & split for fuse tensor * fix nccl dep * fix the untest, compile problem and ddp initialize problem * fix untest for mac & add some comments & solve the repeated param in sublayers * fix untest for windows & fix document
-
- 26 11月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* split train_mode and has_grad * fix format * fix ci problems * fix sample code
-