1. 15 10月, 2019 1 次提交
  2. 14 10月, 2019 1 次提交
  3. 12 10月, 2019 1 次提交
  4. 11 10月, 2019 1 次提交
  5. 07 10月, 2019 1 次提交
  6. 30 9月, 2019 1 次提交
  7. 24 9月, 2019 1 次提交
    • X
      support change shuffle and train thread num (#19841) · cedc0477
      xujiaqi01 提交于
      * support change shuffle thread num
      * support change train thread num
      * fix receive shuffle data of each channel
      * data norm stop gradient
      * add check thread_tensor type and root_tensor type when merge metric
      * remove sleep in shuffle, add config
      * add config of pslib client to client communication
      * fix xbox str
      * add data norm op testcase
      * add flush in trainer finalize
      cedc0477
  8. 23 9月, 2019 2 次提交
    • M
      Forward recompute3 (#19913) · 9901f696
      mapingshuo 提交于
      * add recompute based checkpoints methods for large batch training
      test=develop
      
      * add append_backward_with_forward_recomputation
      test=develop
      
      * refine optimizer
      test=develop
      
      * update backward and optimizer
      test=develop
      
      * make Variable usable
      test=develop
      
      * add recompute code
      
      * refine optimizer
      test=develop
      
      * refine addup _append_backward_ops_with_checkpoints_
      1) for recompute part, just cache the grad_op_desc without appending to block
      2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
      test=develop
      
      * make method private
      
      * add recompute strategy into DistributedStrategy
      test=develop
      
      * checkpoint version3
      test=develop
      
      * remove some print information
      test=develop
      
      * remove unused sumop
      test=develop
      
      * try to fix recompute with graph building modules
      
      * add input names to vars should be held
      
      * add memory debug tool
      
      * backup backward
      
      * Fix bugs
      
      * add backward desc for op not in any segments
      
      * add exception info for sub_block
      
      test=develop
      
      * modify code style
      
      test=develop
      
      * modify code style
      
      test=develop
      
      * remove print functions
      
      test=develop
      
      * add API spec
      
      test=develop
      test=document_preview
      
      * make Recompute a child class of Optimizer
      
      test=develop
      test=document_preview
      
      * add API spec
      
      test=develop
      test=document_preview
      
      * modify API spec
      
      test=develop
      test=document_preview
      
      * add document for Recompute
      
      test=develop
      test=document_preview
      
      * change API doc of Rcompute
      
      test=develop
      test=document_preview
      
      * code cleaning
      
      test=develop
      test=document_preview
      
      * modify API spec
      
      * fix bugs when segments hold no element
      
      * add testcase for Recompute Optimizer
      
      test=develop
      test=document_preview
      
      * add test for apply_gradient, and code cleaning
      
      test=develop
      test=document_preview
      
      * add test case for load function
      
      * enable CI
      
      test=develop
      test=document
      
      * add test case
      
      test=develop
      test=document_preview
      
      * add sample code for 4 function of recompute optimizer
      
      test=develop
      test=document_preview
      9901f696
    • T
      paddle cloud role maker fix (#19646) · 278dd003
      tangwei12 提交于
      * optimize cloud rolemaker, test=develop
      278dd003
  9. 19 9月, 2019 1 次提交
  10. 17 9月, 2019 1 次提交
  11. 10 9月, 2019 1 次提交
  12. 06 9月, 2019 1 次提交
  13. 05 9月, 2019 1 次提交
  14. 30 8月, 2019 1 次提交
    • Y
      add thread scope stat accurate metrics test=develop (#19480) · 10ca3f96
      yaoxuefeng 提交于
      * add thread scope stat accurate metrics test=develop
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix conflict
      
      * fix style
      
      * fix style test=develop
      
      * fix error test=develop
      
      * fix error test=develop
      10ca3f96
  15. 29 8月, 2019 2 次提交
    • T
      support debug each output of each ins (#19004) · 1fe468d3
      Thunderbrook 提交于
      * dump slot
      
      * test
      
      * proto
      
      * dump slot
      
      * test
      
      * proto
      
      * code style
      
      * code style
      
      * code style
      
      * style
      
      * add delete after unseen days
      
      * add unseen days
      
      * code style
      
      * conflict solve
      test=develop
      
      * add clear model
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * support debug tensor of each ins
      test=develop
      
      * support debug tensor of each ins
      test=develop
      
      * learning rate
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * unitest
      
      * style
      
      * style
      
      * multi phase
      
      * add channel
      
      * code style
      
      * style
      
      * style
      
      * unitest
      
      * style
      
      * define
      
      * define
      test=develop
      
      * style
      test=develop
      
      * rm define
      test=develop
      
      * linux
      
      * linux
      test=develop
      
      * style
      test=develop
      
      * output format
      test=develop
      
      * windows ci
      test=develop
      1fe468d3
    • Z
      support fc sort by number, test=develop (#19466) · bd35a7f0
      zhang wenhui 提交于
      fleet_desc sort fc name by dictionary sort, but we want to sort by number.
      bd35a7f0
  16. 28 8月, 2019 2 次提交
  17. 27 8月, 2019 1 次提交
  18. 23 8月, 2019 1 次提交
  19. 16 8月, 2019 1 次提交
  20. 14 8月, 2019 3 次提交
  21. 12 8月, 2019 1 次提交
  22. 11 8月, 2019 1 次提交
    • Y
      add save cache model api in fleet& add slots shuffle in dataset module & add... · 9150cf50
      yaoxuefeng 提交于
      add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871)
      
      * add ctr related metric layer test=develop
      
      * add save cache and slots shuffle test=develop
      
      * add save cache and slots shuffle test=develop
      
      * fix error
      
      * fix error
      
      * fix style for ci
      
      * fix for comments
      
      * change SlotsShuffle input to std::strinf for generality
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix stylr
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * change non-const reference to pointer
      
      * fix style
      
      * fix style
      
      * fix style test=develop
      
      * fix style  test=develop
      
      * add return ins num in ctr metric op
      
      * change dtype to float in metric_op.py
      
      * fix error test=develop
      
      * fix style test=develop
      
      * fix API spec
      
      * fix API spec
      
      * fix API spec test=develop
      
      * add UT test=develop
      9150cf50
  23. 08 8月, 2019 1 次提交
  24. 02 8月, 2019 1 次提交
    • J
      support filelist size < trainer num && fix pull dense (#18956) · 02c370c3
      jiaqi 提交于
      * support filelist size < trainer num
      * pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver
      *  enable QueueDataset train same filelist for serveral times
      02c370c3
  25. 01 8月, 2019 1 次提交
  26. 31 7月, 2019 1 次提交
    • J
      set fleet_send_batch_num a default value according to trainer num · 233746d8
      jiaqi 提交于
      (1) set fleet_send_batch_num a default value according to trainer num, the previous 80000 is fixed,if trainer num is much less or larger than 100,global shuffle may have timeout error.
      
      (2) fix load one table bug, add barrier
      233746d8
  27. 29 7月, 2019 1 次提交
    • T
      add clear_model interface in fleetwrapper (#18815) · 52c1431e
      Thunderbrook 提交于
      * dump slot
      
      * test
      
      * proto
      
      * dump slot
      
      * test
      
      * proto
      
      * code style
      
      * code style
      
      * code style
      
      * style
      
      * add delete after unseen days
      
      * add unseen days
      
      * code style
      
      * conflict solve
      test=develop
      
      * add clear model
      
      * code style
      test=develop
      
      * code style
      test=develop
      52c1431e
  28. 25 7月, 2019 2 次提交
  29. 24 7月, 2019 1 次提交
    • T
      add slot to sparse table (#18686) · d8396281
      Thunderbrook 提交于
      The change includes 2 things:
      
      1. save delta model and shrink table are control by the same parameter before, now add delete_after_unseen_days to control shrink table.
      2. value in sparse table has no slot before, now add slot in sparse table, and add DownpureCtrAccessor to support the new meta.
      test=develop
      d8396281
  30. 23 7月, 2019 1 次提交
    • J
      support patch data, add load_one_table, fix bug (#18509) · d18aabb4
      jiaqi 提交于
      (1)support patch data (merge slots of instances of same line id, modify dense layer which
      changes its size)
      (2)add fleet load_one_table interface, support load from paddle model and load from pslib model
      (3)fix push sparse bug which cause push sparse cost more time(about 10% in my testcase)
      (4)when some slots are not in one of your network (join/update, etc.),data feed、collect label info、push/pull sparse will skip these slots, instead of throw error.
      (5)add more debug info in TrainFilesWithProfiler
      d18aabb4
  31. 22 7月, 2019 1 次提交
  32. 10 7月, 2019 1 次提交
  33. 08 7月, 2019 1 次提交
  34. 02 7月, 2019 1 次提交