1. 08 11月, 2022 1 次提交
  2. 11 10月, 2022 1 次提交
  3. 23 9月, 2022 1 次提交
  4. 13 8月, 2022 1 次提交
    • Z
      fl-ps: support split sparse params in local & remote (#44864) · 3f5c405f
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      
      * fix bug
      
      * .
      
      * .
      
      * fl-ps with coordinator ready
      
      * merge dev
      
      * update message parse only
      
      * update fl client scheduler
      
      * fix bug
      
      * update multithreads sync
      
      * fix ci errors
      
      * update role_maker.py
      
      * update role_maker.py
      
      * fix ci error: windows py import error
      
      * fix ci error: windows py import error
      
      * fix windows ci pylib import error
      
      * add dump fields & params
      
      * try to fix windows import fleet error
      
      * fix ps FLAGS error
      
      * fix logging risk
      
      * fix logging possible risk
      
      * write trainer_desc file
      
      * support split sparse params in local & remote
      
      * fix import paddle.fluid.core.PSGPU
      
      * fix import paddle.fluid.core.PSGPU
      
      * add remote_sparse & local_sparse config
      
      * fix unittest
      
      * fix test_dist_fleet_geo table error
      
      * fix PADDLE_ENFORCE error
      
      * fix other's pr conflict
      3f5c405f
  5. 26 7月, 2022 1 次提交
    • Z
      add horizontal federation learning ps feature (#44327) · 4bc22b69
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * fl-ps v1.0
      
      * .
      
      * support N + N mode
      
      * .
      
      * .
      
      * .
      
      * .
      
      * delete print
      
      * .
      
      * .
      
      * .
      
      * .
      
      * fix bug
      
      * .
      
      * .
      
      * fl-ps with coordinator ready
      
      * merge dev
      
      * update message parse only
      
      * update fl client scheduler
      
      * fix bug
      
      * update multithreads sync
      
      * fix ci errors
      
      * update role_maker.py
      
      * update role_maker.py
      
      * fix ci error: windows py import error
      
      * fix ci error: windows py import error
      
      * fix windows ci pylib import error
      
      * add dump fields & params
      
      * try to fix windows import fleet error
      
      * fix ps FLAGS error
      4bc22b69
  6. 02 7月, 2022 1 次提交
    • L
      unify cpu context, part2 (#44012) · 755438a7
      Leo Chen 提交于
      * fix init()
      
      * delete test_device_context
      
      * replace CPUDeviceContext with CPUContext
      
      * fix test_scalar
      
      * remove dot_op.cc
      
      * fix compile
      755438a7
  7. 26 6月, 2022 1 次提交
  8. 05 6月, 2022 1 次提交
  9. 02 4月, 2022 1 次提交
    • Z
      统一ps refine (#41234) · b3270adf
      zhaocaibei123 提交于
      * update name
      
      * update name
      
      * fix test
      
      * fix fleet bind
      
      * update name
      
      * update name
      
      * fix test
      
      * fix gpups wrapper
      
      * remove Push/Pull/Load/Save with context in client and wrapper base class
      
      * fix
      
      * fix
      Co-authored-by: Nesythan <esythan@126.com>
      b3270adf
  10. 23 3月, 2022 1 次提交
    • Z
      two-phase training for ps (#40762) · b1a4668c
      zhaocaibei123 提交于
      * fix benchmark and communicator config
      
      * fix bugs of the_one_ps
      
      * multi program and fix bug in optimizer
      
      * multi program in the_one_ps
      
      * public commcontext
      
      * ps optimizer multi programs
      
      * cvm & datanorm backend
      
      * fix dim
      
      * fix unittest
      
      * fix
      
      * the one ps merge
      
      * remove comm
      
      * add DownpourLiteWorker
      
      * all
      
      * fix
      
      * fix
      
      * device worker downpour lite
      
      * fix
      
      * fix bug in global shuffle
      
      * save inference model
      
      * fix & add log
      
      * fix
      
      * remove log
      
      * fix
      
      * fix save summary
      
      * fix
      
      * fix pscore
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * remove logs
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * add some comments
      
      * fix
      Co-authored-by: Nesythan <esythan@126.com>
      b1a4668c
  11. 20 2月, 2022 1 次提交
  12. 18 2月, 2022 1 次提交
  13. 11 2月, 2022 1 次提交
  14. 06 2月, 2022 1 次提交
  15. 30 1月, 2022 1 次提交
  16. 25 1月, 2022 2 次提交
  17. 20 1月, 2022 1 次提交
  18. 24 12月, 2021 1 次提交
  19. 15 11月, 2021 1 次提交
  20. 28 10月, 2021 1 次提交
    • W
      save/load in ps runtime(the_one_ps) (#36097) · e7842ba6
      wangguanqun 提交于
      * add trainer desc config to distributed strategy
      
      * code style modified
      
      * data_feed set lod
      
      * fix bug
      
      * code style
      
      * fix bug
      
      * save load
      
      * save load
      
      * save unittest
      
      * add unittest of the_one_ps
      
      * unittest
      
      * add todo in communicator sendsparse
      e7842ba6
  21. 29 7月, 2021 2 次提交
    • S
      recover capacity check (#34478) · b9d6c987
      seemingwang 提交于
      * graph engine demo
      
      * upload unsaved changes
      
      * fix dependency error
      
      * fix shard_num problem
      
      * py client
      
      * remove lock and graph-type
      
      * add load direct graph
      
      * add load direct graph
      
      * add load direct graph
      
      * batch random_sample
      
      * batch_sample_k
      
      * fix num_nodes size
      
      * batch brpc
      
      * batch brpc
      
      * add test
      
      * add test
      
      * add load_nodes; change add_node function
      
      * change sample return type to pair
      
      * resolve conflict
      
      * resolved conflict
      
      * resolved conflict
      
      * separate server and client
      
      * merge pair type
      
      * fix
      
      * resolved conflict
      
      * fixed segment fault; high-level VLOG for load edges and load nodes
      
      * random_sample return 0
      
      * rm useless loop
      
      * test:load edge
      
      * fix ret -1
      
      * test: rm sample
      
      * rm sample
      
      * random_sample return future
      
      * random_sample return int
      
      * test fake node
      
      * fixed here
      
      * memory leak
      
      * remove test code
      
      * fix return problem
      
      * add common_graph_table
      
      * random sample node &test & change data-structure from linkedList to vector
      
      * add common_graph_table
      
      * sample with srand
      
      * add node_types
      
      * optimize nodes sample
      
      * recover test
      
      * random sample
      
      * destruct weighted sampler
      
      * GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * pybind sample nodes api
      
      * pull nodes with step
      
      * fixed pull_graph_list bug; add test for pull_graph_list by step
      
      * add graph table;name
      
      * add graph table;name
      
      * add pybind
      
      * add pybind
      
      * add FeatureNode
      
      * add FeatureNode
      
      * add FeatureNode Serialize
      
      * add FeatureNode Serialize
      
      * get_feat_node
      
      * avoid local rpc
      
      * fix get_node_feat
      
      * fix get_node_feat
      
      * remove log
      
      * get_node_feat return  py:bytes
      
      * merge develop with graph_engine
      
      * fix threadpool.h head
      
      * fix
      
      * fix typo
      
      * resolve conflict
      
      * fix conflict
      
      * recover lost content
      
      * fix pybind of FeatureNode
      
      * recover cmake
      
      * recover tools
      
      * resolve conflict
      
      * resolve linking problem
      
      * code style
      
      * change test_server port
      
      * fix code problems
      
      * remove shard_num config
      
      * remove redundent threads
      
      * optimize start server
      
      * remove logs
      
      * fix code problems by reviewers' suggestions
      
      * move graph files into a folder
      
      * code style change
      
      * remove graph operations from base table
      
      * optimize get_feat function of graph engine
      
      * fix long long count problem
      
      * remove redandunt graph files
      
      * remove unused shell
      
      * recover dropout_op_pass.h
      
      * fix potential stack overflow when request number is too large & node add & node clear & node remove
      
      * when sample k is larger than neigbor num, return directly
      
      * using random seed generator of paddle to speed up
      
      * fix bug of random sample k
      
      * fix code style
      
      * fix code style
      
      * fix blocking_queue problem
      
      * fix style
      
      * fix
      
      * recover capacity check
      Co-authored-by: NHuang Zhengjie <270018958@qq.com>
      Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
      Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
      Co-authored-by: Nluobin06 <luobin06@baidu.com>
      Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
      Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
      b9d6c987
    • S
      fix block_queue problem (#34461) · e9583166
      seemingwang 提交于
      e9583166
  22. 15 4月, 2021 1 次提交
    • T
      heterps support pscore (#32093) · 9f8c8f96
      Thunderbrook 提交于
      * pscore support heterps
      
      * fleet cmake
      
      * fleet wrapper
      
      * macro
      
      * solve conflict
      
      * solve conflict
      
      * add unitest
      
      * paddle enforce
      
      * unitest
      
      * unitest
      
      * unitest
      9f8c8f96
  23. 26 2月, 2021 1 次提交
  24. 04 2月, 2021 1 次提交
  25. 08 1月, 2021 1 次提交
  26. 14 12月, 2020 1 次提交
    • T
      add service (#29560) · 0034273b
      tangwei12 提交于
      * add service, remove ut on mac
      
      * fix heter_profiler & add heter stop method
      
      * fix code style
      0034273b
  27. 24 11月, 2020 1 次提交
  28. 30 9月, 2020 1 次提交
    • M
      fix distributed error info (#27206) · 20fb01fb
      MRXLT 提交于
      * fix distributed error info
      
      * bug fix; notest
      
      * error info refine
      
      * update error info
      
      * update error info
      
      * update error info
      
      * bug fix
      
      * bug fix
      
      * bug fix
      
      * bug fix
      20fb01fb
  29. 29 9月, 2020 1 次提交
  30. 24 9月, 2020 1 次提交
    • W
      use iwyu clean include (#27267) · df43905f
      wanghuancoder 提交于
      * use iwyu clean include, test=develop, test=win
      
      * compilation error, test=develop
      
      * fix compilation error2, test=develop
      
      * fix compilation error3, test=develop
      
      * fix compilation error4, test=develop
      
      * fix compilation error5, test=develop
      
      * fix compilation error6, test=develop
      
      * fix compilation error7, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error10, test=develop
      
      * fix compilation error11, test=develop
      df43905f
  31. 23 9月, 2020 1 次提交
    • T
      large scale kv speedup (#26510) · bc5f0246
      tangwei12 提交于
      * rename communicator meet->BatchesCounter
      
      * fix parame recv for sparse
      
      * geo sparse init from pserver
      
      * optimize init from pserver
      
      * add large scale optimizer fuse(SGD/ADAM)
      
      * rectification init_worker and exe.run startup program
      bc5f0246
  32. 30 7月, 2020 1 次提交
  33. 09 3月, 2020 1 次提交
  34. 22 2月, 2020 1 次提交
  35. 17 1月, 2020 1 次提交
  36. 06 1月, 2020 1 次提交
  37. 19 12月, 2019 1 次提交
  38. 28 11月, 2019 1 次提交