- 03 3月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* add fluid.device_guard to specify the device type for Op
-
- 12 11月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 29 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop
-
- 02 7月, 2019 1 次提交
-
-
由 Yi Liu 提交于
1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 08 5月, 2019 1 次提交
-
-
由 chengduo 提交于
* move pass to ir * polish code test=develop * fix dependency test=develop
-
- 21 4月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop
-
- 18 4月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 08 1月, 2019 1 次提交
-
-
由 peizhilin 提交于
-
- 26 12月, 2018 1 次提交
-
- 25 12月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 08 11月, 2018 1 次提交
-
-
由 chengduo 提交于
* fix input<tensor> test=develop * fix split_ids test=develop * ElementwiseMul should not support SelectedRows * fix scale op test=develop * change GetTensorFromVar() method to GetTensorOrSelectedRowsFromVar() * fix operator * refine MultiOutput * fix MultiOutput test=develop * disable test_dist_save_load test=develop * fix elementwise_op test=develop * add get_sparse_as_op test=develop * add info for check test=develop * rename get_sparse_as_op with extract_rows_as_op. test=develop * elementwise doesn't support selected_rows * fix regularizer * remove extract_rows_as test=develop * fix ci test=develop * add test for sum_op * fix regularizer test=develop * test=develop * fix pserver weight decay multi inputs test=develop
-
- 30 9月, 2018 1 次提交
-
- 21 9月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* wip * clean up * should fix running with memopt * add ut * mark lr schedule op role * hide lr_schedule_guard * use op_role_var instead of ufind * unify dist test name * wip for py3 support * fix var deref * fix python3 mem_opt order * remove comments
-
- 16 9月, 2018 1 次提交
-
-
由 Yibing Liu 提交于
-
- 14 9月, 2018 1 次提交
-
-
由 Yibing Liu 提交于
-
- 04 9月, 2018 1 次提交
-
- 29 8月, 2018 1 次提交
-
-
由 Xin Pan 提交于
-
- 23 8月, 2018 3 次提交
-
-
由 guochaorong 提交于
This reverts commit b2df1700.
-
由 Wu Yi 提交于
* dist transpiler add control dependency var between send and recv * fix async deps * follow comments and refine * fix deps connect for rpc ops
-
由 Yu Yang 提交于
* Add Python Callstacks when Op::Run error * Skip op with sub-block * refactor: refine callstack info's format * Reshape only support matrix * Polish Python code * Fix UT * Fix Py3
-
- 29 5月, 2018 1 次提交
-
-
由 Yancey1989 提交于
-
- 15 5月, 2018 1 次提交
-
-
由 yuyang18 提交于
-
- 07 4月, 2018 1 次提交
-
-
由 Yi Wang 提交于
* cpplint test and add tesnor_py_test.cc * Update * Update
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 05 1月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "add c++ side kernel selection" * "add multiple kernel op test" * "kernel selection only support cudnn" * "better formatter" * "small fix with UseCPU" * "depends on change interface Get(Place, Library)" * "fix CI" * "fix python cudnn test" * "leave the register cudnn op to another PR" * "fix CI" * "use all kernel by default" * "fix CI"
-
- 25 12月, 2017 1 次提交
-
-
由 Qiao Longfei 提交于
* init kernel hint * fix typo * rm unused code * add include in op_kernel.h * restore op_kernel since it will be moved to op_kernel_type * change force_cpu to use_cpu * fix compilation
-
- 19 12月, 2017 1 次提交
-
-
由 qiaolongfei 提交于
-
- 25 5月, 2017 1 次提交
-
-
由 Yu Yang 提交于
-
- 09 12月, 2016 1 次提交
-
-
由 Yi Wang 提交于
-
- 22 11月, 2016 1 次提交
-
-
由 Luo Tao 提交于
-
- 29 8月, 2016 1 次提交
-
-
由 zhangjinchao01 提交于
ISSUE=4586495 git-svn-id: https://svn.baidu.com/idl/trunk/paddle@1408 1ad973e4-5ce8-4261-8a94-b56d1f490c56
-