1. 15 11月, 2019 2 次提交
  2. 12 11月, 2019 1 次提交
  3. 04 11月, 2019 1 次提交
  4. 31 10月, 2019 3 次提交
  5. 25 10月, 2019 1 次提交
    • X
      fix several sparse table issuses (#20686) · 48669aa8
      xujiaqi01 提交于
      * no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
      * add find_distributed_lookup_table_grads instead of hard code GRAD
      * support embedding stop gradient. push sparse has error before fix this.* 
      * fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
      * fix pull sparse, skip slots which do not have embedding.
      * fix collect feasign label info, skip slots which do not have embedding.
      * support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
      * test=develop
      48669aa8
  6. 18 10月, 2019 1 次提交
  7. 15 10月, 2019 3 次提交
  8. 14 10月, 2019 1 次提交
  9. 12 10月, 2019 1 次提交
  10. 11 10月, 2019 1 次提交
  11. 07 10月, 2019 1 次提交
  12. 30 9月, 2019 1 次提交
  13. 24 9月, 2019 1 次提交
    • X
      support change shuffle and train thread num (#19841) · cedc0477
      xujiaqi01 提交于
      * support change shuffle thread num
      * support change train thread num
      * fix receive shuffle data of each channel
      * data norm stop gradient
      * add check thread_tensor type and root_tensor type when merge metric
      * remove sleep in shuffle, add config
      * add config of pslib client to client communication
      * fix xbox str
      * add data norm op testcase
      * add flush in trainer finalize
      cedc0477
  14. 23 9月, 2019 2 次提交
    • M
      Forward recompute3 (#19913) · 9901f696
      mapingshuo 提交于
      * add recompute based checkpoints methods for large batch training
      test=develop
      
      * add append_backward_with_forward_recomputation
      test=develop
      
      * refine optimizer
      test=develop
      
      * update backward and optimizer
      test=develop
      
      * make Variable usable
      test=develop
      
      * add recompute code
      
      * refine optimizer
      test=develop
      
      * refine addup _append_backward_ops_with_checkpoints_
      1) for recompute part, just cache the grad_op_desc without appending to block
      2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
      test=develop
      
      * make method private
      
      * add recompute strategy into DistributedStrategy
      test=develop
      
      * checkpoint version3
      test=develop
      
      * remove some print information
      test=develop
      
      * remove unused sumop
      test=develop
      
      * try to fix recompute with graph building modules
      
      * add input names to vars should be held
      
      * add memory debug tool
      
      * backup backward
      
      * Fix bugs
      
      * add backward desc for op not in any segments
      
      * add exception info for sub_block
      
      test=develop
      
      * modify code style
      
      test=develop
      
      * modify code style
      
      test=develop
      
      * remove print functions
      
      test=develop
      
      * add API spec
      
      test=develop
      test=document_preview
      
      * make Recompute a child class of Optimizer
      
      test=develop
      test=document_preview
      
      * add API spec
      
      test=develop
      test=document_preview
      
      * modify API spec
      
      test=develop
      test=document_preview
      
      * add document for Recompute
      
      test=develop
      test=document_preview
      
      * change API doc of Rcompute
      
      test=develop
      test=document_preview
      
      * code cleaning
      
      test=develop
      test=document_preview
      
      * modify API spec
      
      * fix bugs when segments hold no element
      
      * add testcase for Recompute Optimizer
      
      test=develop
      test=document_preview
      
      * add test for apply_gradient, and code cleaning
      
      test=develop
      test=document_preview
      
      * add test case for load function
      
      * enable CI
      
      test=develop
      test=document
      
      * add test case
      
      test=develop
      test=document_preview
      
      * add sample code for 4 function of recompute optimizer
      
      test=develop
      test=document_preview
      9901f696
    • T
      paddle cloud role maker fix (#19646) · 278dd003
      tangwei12 提交于
      * optimize cloud rolemaker, test=develop
      278dd003
  15. 19 9月, 2019 1 次提交
  16. 17 9月, 2019 1 次提交
  17. 10 9月, 2019 1 次提交
  18. 06 9月, 2019 1 次提交
  19. 05 9月, 2019 1 次提交
  20. 30 8月, 2019 1 次提交
    • Y
      add thread scope stat accurate metrics test=develop (#19480) · 10ca3f96
      yaoxuefeng 提交于
      * add thread scope stat accurate metrics test=develop
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix conflict
      
      * fix style
      
      * fix style test=develop
      
      * fix error test=develop
      
      * fix error test=develop
      10ca3f96
  21. 29 8月, 2019 2 次提交
    • T
      support debug each output of each ins (#19004) · 1fe468d3
      Thunderbrook 提交于
      * dump slot
      
      * test
      
      * proto
      
      * dump slot
      
      * test
      
      * proto
      
      * code style
      
      * code style
      
      * code style
      
      * style
      
      * add delete after unseen days
      
      * add unseen days
      
      * code style
      
      * conflict solve
      test=develop
      
      * add clear model
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * support debug tensor of each ins
      test=develop
      
      * support debug tensor of each ins
      test=develop
      
      * learning rate
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * unitest
      
      * style
      
      * style
      
      * multi phase
      
      * add channel
      
      * code style
      
      * style
      
      * style
      
      * unitest
      
      * style
      
      * define
      
      * define
      test=develop
      
      * style
      test=develop
      
      * rm define
      test=develop
      
      * linux
      
      * linux
      test=develop
      
      * style
      test=develop
      
      * output format
      test=develop
      
      * windows ci
      test=develop
      1fe468d3
    • Z
      support fc sort by number, test=develop (#19466) · bd35a7f0
      zhang wenhui 提交于
      fleet_desc sort fc name by dictionary sort, but we want to sort by number.
      bd35a7f0
  22. 28 8月, 2019 2 次提交
  23. 27 8月, 2019 1 次提交
  24. 23 8月, 2019 1 次提交
  25. 16 8月, 2019 1 次提交
  26. 14 8月, 2019 3 次提交
  27. 12 8月, 2019 1 次提交
  28. 11 8月, 2019 1 次提交
    • Y
      add save cache model api in fleet& add slots shuffle in dataset module & add... · 9150cf50
      yaoxuefeng 提交于
      add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871)
      
      * add ctr related metric layer test=develop
      
      * add save cache and slots shuffle test=develop
      
      * add save cache and slots shuffle test=develop
      
      * fix error
      
      * fix error
      
      * fix style for ci
      
      * fix for comments
      
      * change SlotsShuffle input to std::strinf for generality
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix stylr
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * change non-const reference to pointer
      
      * fix style
      
      * fix style
      
      * fix style test=develop
      
      * fix style  test=develop
      
      * add return ins num in ctr metric op
      
      * change dtype to float in metric_op.py
      
      * fix error test=develop
      
      * fix style test=develop
      
      * fix API spec
      
      * fix API spec
      
      * fix API spec test=develop
      
      * add UT test=develop
      9150cf50
  29. 08 8月, 2019 1 次提交
  30. 02 8月, 2019 1 次提交
    • J
      support filelist size < trainer num && fix pull dense (#18956) · 02c370c3
      jiaqi 提交于
      * support filelist size < trainer num
      * pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver
      *  enable QueueDataset train same filelist for serveral times
      02c370c3