- 28 9月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* ps gpu dump * remove log
-
- 14 9月, 2021 1 次提交
-
-
由 Yuang Liu 提交于
-
- 16 8月, 2021 1 次提交
-
-
由 Fan Zhang 提交于
-
- 20 7月, 2021 1 次提交
-
-
由 WangXi 提交于
-
- 09 7月, 2021 1 次提交
-
-
由 Yuang Liu 提交于
-
- 29 6月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* remove heterbox * remove heterbox
-
- 25 5月, 2021 1 次提交
-
-
由 danleifeng 提交于
* fix hogwild_worker dev_ctx place bug; test=develop
-
- 10 5月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* pslib with cmake * heter util * vlog * heter server test * add dtor * cmake
-
- 08 5月, 2021 1 次提交
-
-
由 danleifeng 提交于
* add trainprofiler for heterps in oneps; test=develop * add set_use_ps_gpu; test=develop
-
- 06 5月, 2021 1 次提交
-
-
由 gongweibao 提交于
-
- 23 4月, 2021 1 次提交
-
-
由 Baibaifan 提交于
solve hccl communicate conflict (#32447)
-
- 15 4月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* pscore support heterps * fleet cmake * fleet wrapper * macro * solve conflict * solve conflict * add unitest * paddle enforce * unitest * unitest * unitest
-
- 01 4月, 2021 1 次提交
-
-
由 tangwei12 提交于
* upgrade vlog * train from dataset fetch optimize
-
- 22 3月, 2021 1 次提交
-
-
由 lilong12 提交于
* add 1f1b scheduler for pp, test=develop
-
- 11 3月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* heter bug * format * format
-
- 02 3月, 2021 1 次提交
-
-
由 danleifeng 提交于
* topo and memory performance for heterps; test=develop * add trainwithprofiler in heter trainier; test=develop
-
- 25 2月, 2021 1 次提交
-
-
由 Qi Li 提交于
-
- 23 12月, 2020 1 次提交
-
-
由 Thunderbrook 提交于
* add heter box * add trainer, worker, wrapper... * format * for ci * format * remove boost get * boost & copyright * rename * rename * format * format * format Co-authored-by: Nyaoxuefeng6 <yaoxuefeng@baidu.com>
-
- 23 11月, 2020 2 次提交
-
-
由 lilong12 提交于
* update, test=develop
-
由 Thunderbrook 提交于
* ps gpu transpile * ps gpu * remove op * gps trainer * local ps * add macro * HeterBox * def cuda * tab * code style * style Co-authored-by: Thunderbrook <a754913769#163.com>
-
- 14 10月, 2020 1 次提交
-
-
由 zhang wenhui 提交于
* add multitask * add multitask, test=develop * fix code style, test=develop * add partail push dense, test=develop * fix has_kay in py3, test=develop * fix, test=develop * fix, test=develop * fix, test=develop
-
- 25 9月, 2020 1 次提交
-
-
由 Thunderbrook 提交于
* add xpu in heter mode test=develop * BOOST_CONST_GET; PADDLE_THROW test=develop * code style test=develop * code style test=develop * code style test=develop * refine test=develop * refine test=develop * refine test=develop * refine code test=develop
-
- 24 9月, 2020 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop
-
- 17 9月, 2020 1 次提交
-
-
由 lilong12 提交于
-
- 06 8月, 2020 1 次提交
-
-
由 Thunderbrook 提交于
* add heter ps mode * code style test=develop * add with_pslib test=develop * unitest test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * test monitor test=develop * prepare trainer test=develop * code style test=develop
-
- 30 7月, 2020 1 次提交
-
-
由 lilong12 提交于
* fix test_pipeline, test=develop
-
- 07 7月, 2020 1 次提交
-
-
由 lilong12 提交于
* add device_worker for pipeline, test=develop
-
- 03 6月, 2020 1 次提交
-
-
由 123malin 提交于
* test=develop, add try_catch for debug
-
- 19 5月, 2020 1 次提交
-
-
由 hutuxian 提交于
* Refactor code for dump_field & dump_param: abstracting the common function in base class. * Support dump randomly & random with lineid * Support specify the random interval, which avoids printing too much logs.
-
- 01 4月, 2020 1 次提交
-
-
由 xujiaqi01 提交于
* add fleet pslib pull and push sparse op and push dense op * test=develop
-
- 17 2月, 2020 1 次提交
-
-
由 123malin 提交于
-
- 11 2月, 2020 2 次提交
-
-
由 yaoxuefeng 提交于
* update * update test=develop * update compile set test=develop * update compile set test=develop * update test=develop * update test=develop * update test=develop * update compile setting test=develop * update compile setting test=develop * update run demo test=develop * update test=develop * update test=develop * fix test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update format test=develop * update format test=develop * update style test=develop * update style test=develop * change style test=develop * change style test=develop * change style test=develop * add dataset unittest test=develop * update test=develop * update for record test=develop * udpate style for record test=develop * update for record test=develop * update for record test=develop * update for record test=develop * fix format test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop
-
由 Wilber 提交于
支持不依赖nccl进行编译。[1/2] 多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 17 1月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add half_async in the communicator * fix DistributedStrategy
-
- 18 12月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* fix compiled error of butil when with_pslib=on and with_testing=on * test=develop
-
- 20 11月, 2019 1 次提交
-
-
由 Thunderbrook 提交于
* general table * add sparse table test=develop * no cvm test=develop * add no_cvm test=develop * add note test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * code style test=develop * add key of optimizer test=develop
-
- 15 11月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* copy some feasigns and corresponding embeddings from one sparse table to another * copy all feasigns and corresponding embeddings from one sparse table to another * copy all dense params from one table to another * copy some local vars to other local vars
-
- 31 10月, 2019 1 次提交
-
-
由 Thunderbrook 提交于
* support dump param to afs test=develop * code style test=develop * code style test=develop * dump param test=develop * dump param test=develop * dump param test=develop * dump param test=develop
-
- 25 10月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto. * add find_distributed_lookup_table_grads instead of hard code GRAD * support embedding stop gradient. push sparse has error before fix this.* * fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this. * fix pull sparse, skip slots which do not have embedding. * fix collect feasign label info, skip slots which do not have embedding. * support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables. * test=develop
-
- 18 10月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* add check nan / inf in downpour worker during training * test=develop
-