1. 04 11月, 2019 1 次提交
  2. 31 10月, 2019 3 次提交
  3. 25 10月, 2019 1 次提交
    • X
      fix several sparse table issuses (#20686) · 48669aa8
      xujiaqi01 提交于
      * no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
      * add find_distributed_lookup_table_grads instead of hard code GRAD
      * support embedding stop gradient. push sparse has error before fix this.* 
      * fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
      * fix pull sparse, skip slots which do not have embedding.
      * fix collect feasign label info, skip slots which do not have embedding.
      * support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
      * test=develop
      48669aa8
  4. 18 10月, 2019 1 次提交
  5. 15 10月, 2019 3 次提交
  6. 14 10月, 2019 1 次提交
  7. 12 10月, 2019 1 次提交
  8. 11 10月, 2019 1 次提交
  9. 07 10月, 2019 1 次提交
  10. 30 9月, 2019 1 次提交
  11. 24 9月, 2019 1 次提交
    • X
      support change shuffle and train thread num (#19841) · cedc0477
      xujiaqi01 提交于
      * support change shuffle thread num
      * support change train thread num
      * fix receive shuffle data of each channel
      * data norm stop gradient
      * add check thread_tensor type and root_tensor type when merge metric
      * remove sleep in shuffle, add config
      * add config of pslib client to client communication
      * fix xbox str
      * add data norm op testcase
      * add flush in trainer finalize
      cedc0477
  12. 23 9月, 2019 2 次提交
    • M
      Forward recompute3 (#19913) · 9901f696
      mapingshuo 提交于
      * add recompute based checkpoints methods for large batch training
      test=develop
      
      * add append_backward_with_forward_recomputation
      test=develop
      
      * refine optimizer
      test=develop
      
      * update backward and optimizer
      test=develop
      
      * make Variable usable
      test=develop
      
      * add recompute code
      
      * refine optimizer
      test=develop
      
      * refine addup _append_backward_ops_with_checkpoints_
      1) for recompute part, just cache the grad_op_desc without appending to block
      2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
      test=develop
      
      * make method private
      
      * add recompute strategy into DistributedStrategy
      test=develop
      
      * checkpoint version3
      test=develop
      
      * remove some print information
      test=develop
      
      * remove unused sumop
      test=develop
      
      * try to fix recompute with graph building modules
      
      * add input names to vars should be held
      
      * add memory debug tool
      
      * backup backward
      
      * Fix bugs
      
      * add backward desc for op not in any segments
      
      * add exception info for sub_block
      
      test=develop
      
      * modify code style
      
      test=develop
      
      * modify code style
      
      test=develop
      
      * remove print functions
      
      test=develop
      
      * add API spec
      
      test=develop
      test=document_preview
      
      * make Recompute a child class of Optimizer
      
      test=develop
      test=document_preview
      
      * add API spec
      
      test=develop
      test=document_preview
      
      * modify API spec
      
      test=develop
      test=document_preview
      
      * add document for Recompute
      
      test=develop
      test=document_preview
      
      * change API doc of Rcompute
      
      test=develop
      test=document_preview
      
      * code cleaning
      
      test=develop
      test=document_preview
      
      * modify API spec
      
      * fix bugs when segments hold no element
      
      * add testcase for Recompute Optimizer
      
      test=develop
      test=document_preview
      
      * add test for apply_gradient, and code cleaning
      
      test=develop
      test=document_preview
      
      * add test case for load function
      
      * enable CI
      
      test=develop
      test=document
      
      * add test case
      
      test=develop
      test=document_preview
      
      * add sample code for 4 function of recompute optimizer
      
      test=develop
      test=document_preview
      9901f696
    • T
      paddle cloud role maker fix (#19646) · 278dd003
      tangwei12 提交于
      * optimize cloud rolemaker, test=develop
      278dd003
  13. 19 9月, 2019 1 次提交
  14. 17 9月, 2019 1 次提交
  15. 10 9月, 2019 1 次提交
  16. 06 9月, 2019 1 次提交
  17. 05 9月, 2019 1 次提交
  18. 30 8月, 2019 1 次提交
    • Y
      add thread scope stat accurate metrics test=develop (#19480) · 10ca3f96
      yaoxuefeng 提交于
      * add thread scope stat accurate metrics test=develop
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix style test=develop
      
      * fix conflict
      
      * fix style
      
      * fix style test=develop
      
      * fix error test=develop
      
      * fix error test=develop
      10ca3f96
  19. 29 8月, 2019 2 次提交
    • T
      support debug each output of each ins (#19004) · 1fe468d3
      Thunderbrook 提交于
      * dump slot
      
      * test
      
      * proto
      
      * dump slot
      
      * test
      
      * proto
      
      * code style
      
      * code style
      
      * code style
      
      * style
      
      * add delete after unseen days
      
      * add unseen days
      
      * code style
      
      * conflict solve
      test=develop
      
      * add clear model
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * support debug tensor of each ins
      test=develop
      
      * support debug tensor of each ins
      test=develop
      
      * learning rate
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      
      * code style
      test=develop
      
      * code style
      test=develop
      
      * unitest
      
      * style
      
      * style
      
      * multi phase
      
      * add channel
      
      * code style
      
      * style
      
      * style
      
      * unitest
      
      * style
      
      * define
      
      * define
      test=develop
      
      * style
      test=develop
      
      * rm define
      test=develop
      
      * linux
      
      * linux
      test=develop
      
      * style
      test=develop
      
      * output format
      test=develop
      
      * windows ci
      test=develop
      1fe468d3
    • Z
      support fc sort by number, test=develop (#19466) · bd35a7f0
      zhang wenhui 提交于
      fleet_desc sort fc name by dictionary sort, but we want to sort by number.
      bd35a7f0
  20. 28 8月, 2019 2 次提交
  21. 27 8月, 2019 1 次提交
  22. 23 8月, 2019 1 次提交
  23. 16 8月, 2019 1 次提交
  24. 14 8月, 2019 3 次提交
  25. 12 8月, 2019 1 次提交
  26. 11 8月, 2019 1 次提交
    • Y
      add save cache model api in fleet& add slots shuffle in dataset module & add... · 9150cf50
      yaoxuefeng 提交于
      add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871)
      
      * add ctr related metric layer test=develop
      
      * add save cache and slots shuffle test=develop
      
      * add save cache and slots shuffle test=develop
      
      * fix error
      
      * fix error
      
      * fix style for ci
      
      * fix for comments
      
      * change SlotsShuffle input to std::strinf for generality
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix stylr
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * fix style
      
      * change non-const reference to pointer
      
      * fix style
      
      * fix style
      
      * fix style test=develop
      
      * fix style  test=develop
      
      * add return ins num in ctr metric op
      
      * change dtype to float in metric_op.py
      
      * fix error test=develop
      
      * fix style test=develop
      
      * fix API spec
      
      * fix API spec
      
      * fix API spec test=develop
      
      * add UT test=develop
      9150cf50
  27. 08 8月, 2019 2 次提交
    • J
      add fleet util, add some interface in hdfs util (#18752) · a99bc64c
      jiaqi 提交于
      * add fleet util (fleet/utils/fleet_util.py): functions for users' convenience
      * add some interface in hdfs util : hdfs is_file、hdfs cat
      a99bc64c
    • M
      [WIP] Add Imdb train demo (#18895) · 4ad7c9d5
      mapingshuo 提交于
      * add train demo for imdb text classification task
      
      * make inference library release data_feed dataset dataset_factory data_feed_factory
      
      * add String Data Generator
      
      * new feature of train demo: save model params
      
      * New feature of train demo: set training config using gflags
      
      * change code style for CI
      
      * add readme and dataset for imdb demo trainer
      4ad7c9d5
  28. 02 8月, 2019 1 次提交
    • J
      support filelist size < trainer num && fix pull dense (#18956) · 02c370c3
      jiaqi 提交于
      * support filelist size < trainer num
      * pull dense when stop, to make sure local dense params are same as pserver, so save paddle model will save dense model same as pserver
      *  enable QueueDataset train same filelist for serveral times
      02c370c3
  29. 01 8月, 2019 1 次提交
  30. 31 7月, 2019 1 次提交
    • J
      set fleet_send_batch_num a default value according to trainer num · 233746d8
      jiaqi 提交于
      (1) set fleet_send_batch_num a default value according to trainer num, the previous 80000 is fixed,if trainer num is much less or larger than 100,global shuffle may have timeout error.
      
      (2) fix load one table bug, add barrier
      233746d8