- 28 1月, 2022 1 次提交
-
-
由 Fan Zhang 提交于
* 12.3 first add metrics module * add Mask/MultiTask * add WuAUC * [PSLIB] Update WuAUC Compute * [PSLIB] Change WuAUC Compute Mehod * [PSLIB] Clean WuAUC Compute * [PSLIB] Clean Metric Module Unused Code * mv metric instance * [PSLIB] Add Metrics Module, Support User-defined Add Metric (#38789) * [PSLIB] Add Metrics Module, Support User-defined Add Metric * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI Coverage * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI * [PSLIB] Modify According to CI Coverage * [PSLIB] Modify According to CI Coverage * [PSLIB] Modify According to CI Coverage * modify role_maker * update CMakeLists.txt
-
- 22 9月, 2020 1 次提交
-
-
由 guofei 提交于
test=release/1.8
-
- 18 8月, 2020 1 次提交
-
-
由 Thunderbrook 提交于
* add mock barrier all (#24786) * add mock barrier all test=develop * fix test=develop * fix test=develop * fix test=develop * fix gloo error test=develop Co-authored-by: Nxujiaqi01 <173596896@qq.com>
-
- 12 8月, 2020 1 次提交
-
-
由 Thunderbrook 提交于
* fix dataset py3 (#25012) * fix dataset py3 error * test=develop * fix logger (#24682) * fix logger of FetchHandler,which may print log twice * test=develop * add timeout and http store in communication (#23436) * add timeout and http store in communication, add revert and confirm in fleet * test=develop * modify datanorm op test=develop (#23030) Co-authored-by: Nxujiaqi01 <173596896@qq.com> Co-authored-by: Nyaoxuefeng <yaoxuefeng@baidu.com>
-
- 18 3月, 2020 1 次提交
-
- 17 3月, 2020 1 次提交
-
-
由 tangwei12 提交于
Fleet Parameter Server API Integrated
-
- 23 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 02 2月, 2020 1 次提交
-
-
由 xujiaqi01 提交于
* add GeneralRoleMaker which is for general usage * test=develop
-
- 20 11月, 2019 1 次提交
-
-
由 Dong Daxiang 提交于
test=develop
-
- 31 10月, 2019 1 次提交
-
-
由 Chengmo 提交于
* fix PaddleCloud Role maker & add warning in distribute transpiler & change rpc_retry_times
-
- 23 9月, 2019 1 次提交
-
-
由 tangwei12 提交于
* optimize cloud rolemaker, test=develop
-
- 06 9月, 2019 1 次提交
-
-
由 123malin 提交于
* fleet api add input check, test=develop
-
- 16 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
node_num is not needed for users, so remove them and fix the bugs about it!
-
- 12 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
-
- 25 7月, 2019 1 次提交
-
-
由 guru4elephant 提交于
refine launch_ps and role_maker
-
- 22 7月, 2019 1 次提交
-
-
由 tangwei12 提交于
do some odd jobs, test=develop
-
- 10 7月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* upgrade collective fleet api
-
- 08 7月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* add random port
-
- 02 7月, 2019 1 次提交
-
-
由 guru4elephant 提交于
make fleet support mpi job submit directly.
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 23 6月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* fix paddle cloud role maker bug
-
- 17 6月, 2019 1 次提交
-
-
由 guru4elephant 提交于
add paddle cloud role maker for customized usage, note this is only for industrial users that have cloud environment pre-configuration (#18121) add paddle cloud role maker for specific cloud usage. This pr will simplifies user's configuration in distributed training.
-
- 12 6月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix save/load in Fleet * add UT framework of Fleet
-
- 11 6月, 2019 1 次提交
-
-
由 lilong12 提交于
* add 'UserDefinedRoleMakerNCCL' for collective mode. * code style * add the name UserDefinedRoleMakerNCCL to __all__ * rename to UserDefinedRoleMakerCollective * rename to UserDefinedCollectiveRoleMaker
-
- 23 5月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
Async exe support communicator
-
- 15 5月, 2019 1 次提交
-
-
由 jiaqi 提交于
* support config file, cvm, load, save, shrink test=develop * fix error of worker_num & add table.compress_in_save test=develop * fix code style test=develop * fix save model bug test=develop
-
- 09 5月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix some logic in distributed transpiler, test=develop * reformat fleet API, test=develop
-
- 25 4月, 2019 1 次提交
-
-
由 tangwei12 提交于
* implement distributed transpiler with fleet
-
- 11 4月, 2019 1 次提交
-
-
由 dongdaxiang 提交于
-
- 09 4月, 2019 1 次提交
-
-
由 xjqbest 提交于
test=develop
-
- 30 3月, 2019 1 次提交
-
-
由 xjqbest 提交于
test=develop
-
- 29 3月, 2019 9 次提交
-
-
由 xjqbest 提交于
test=develop
-
由 dongdaxiang 提交于
-
由 xujiaqi01 提交于
-
由 xujiaqi01 提交于
-
由 dongdaxiang 提交于
-
由 dongdaxiang 提交于
-
由 dongdaxiang 提交于
-
由 dongdaxiang 提交于
-
由 dongdaxiang 提交于
-