- 25 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 20 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
* sequential reader stage 1, test=develop * fix ut, test=develop * fix iterable=False reset bug, add some logs and polish code, test=develop * inference feed partial data, test=develop * Turn on keep_order=True for test, test=develop * enhance ut to test more cases, test=develop * test commit for reverting * Revert "test commit for reverting", test=develop This reverts commit 80aef42e. * add ut of merged and unmerged results, test=develop * add more uts for coverages and add en doc of api, test=develop * follow comments, test=develop * change note style, test=develop
-
- 09 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
* refine grad maker, test=develop * refactor tracer stage 1, test=develop * merge develop to solve conflict third times, test=develop
-
- 02 3月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results. * add the unit test for fetch_unmerged. * update ut for multi-card and multi-cpu. * add the error message and the user suggestion in FetchOpHandle. test=develop
-
- 23 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 22 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add sync communicator and implement
-
- 13 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
test=develop
-
- 12 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add thread barrier for the compiled program
-
- 11 2月, 2020 1 次提交
-
-
由 Wilber 提交于
支持不依赖nccl进行编译。[1/2] 多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 07 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 17 1月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add half_async in the communicator * fix DistributedStrategy
-
- 13 1月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* polish error message of parallel executor, test=develop * change PADDLE_ENFORCE, test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
- 19 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 18 12月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
The fixed bugs: 1. The condition sub-graph is not pruned 2. When backward graph is extremely simple, the whole backward ops are pruned.
-
- 15 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 12 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 11 12月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 06 12月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* polish infer shape registry, test=develop * modify some operators registry, test=develop
-
- 28 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix ref_cnt pass, test=develop * add cpp unittests to reference_count_pass, test=develop * follow comments, test=develop
-
- 25 11月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 18 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop
-
- 13 11月, 2019 2 次提交
-
-
由 Chen Weihang 提交于
Add examples for error message writing specification - PreconditionNotMet, Unimplemented, Unavailable (#21137) * add examples for error spec, test=develop * change ENFORCE to ENFORCE_**, test=develop
-
由 Chen Weihang 提交于
* add examples for error msg spec, test=develop * change ENFORCE to ENFORCE_**, test=develop * fix error, test=develop
-
- 12 11月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 05 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* support no need buffer vars in dygraph, test=develop * fix inference compilation error, test=develop * update no_need_buffer_vars_inference, test=develop * add unittests for no_need_buffer_vars_context, test=develop * refine no_need_buffer_vars by return ref, test=develop * polish some codes, test=develop
-
- 01 11月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 123malin 提交于
* update pserver decay blocks * update distributed notify handler
-
- 31 10月, 2019 1 次提交
-
-
由 hong 提交于
* refactor dygraph,test=develop * fix failed unittest,test=develop * polish code,test=develop * check windows ci error,test=develop try to fix windows ci error by np.allclose,test=develop * polish vlog and profiler, test=develop * try to fix preceding ops order,test=develop * test transformer in windows ci, test=develop * use python c-api to speed up tracer.trace,test=develop * test=develop, fix docker with paddle nccl problem * test=develop, add ut for debug string and gradient_accumulator * test=develop, add tests for layer/gradient_accumulator/prepared_op * test=develop, fix complie error for test_prepared_op * test=develop, add more ut for dygraph * test=develop, create API.spec for dygraph api change * optimize grad maker; test=develop * optimize grad maker * test * grad make optim; test=develop * fix unittest bugs; test=develop * add dygraph grad op maker and split_op * grad op maker refactor; test=develop * add dygraph grad maker; test=develop * fix op deformable_conv_v1_op bug; test=develop * fix deformable_conv prroi pool bugs; * fix new op grad op maker bug; test=develop * fix split by ref bug; test=develop * fix dygraph auto prune bug; test=develop * fix test_trace bug; test=develop * fix fused emb seq pool bug; test=develop * remove useless code in op_desc file; test=develop * remove useless code, StrVarBaseNode; test=develop * fix review issues; test=develop * fix rank_loss grad maker; test=develop * remove flag in VarBase; test=develop * fix distributed_notify_op compile bug ; test=develop * fix reshape op double grad; test=develop * fix expand as op; test=develop * add impertive type_defs.h for demo_train; test=develop * fix inference lib cmake; test=develop * fix inference lib; test=develop * fix infernce_lib; test=develop * fix inference cmake; test=develop * fix inference lib; test=develop * fix inference lib; test=develop * remove condition dygraph grad maker, modify local name; test=develop * fix split grad maker bug; test=develop * fix pyramid_op bug; test=develop * change travis time out limit; test=develop * restore travis; test=develop * change timeout limit; test=develop
-
- 28 10月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 18 10月, 2019 2 次提交
- 14 10月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
-
- 30 9月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 27 9月, 2019 1 次提交
-
-
由 tangwei12 提交于
* add a base class for the Communicator * add AsyncCommunicator Impl for async distributed training
-
- 26 9月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 24 9月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 23 9月, 2019 1 次提交
-
-
由 chengduo 提交于
* Add RecordHistoryLocalExecScopes test=develop
-