1. 27 6月, 2019 4 次提交
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
    • S
      add int8 mkldnn prior_box (#17242) · 9252e8fa
      Sylwester Fraczek 提交于
      add prior_box quantization code
      
      add scale algo rules for prior box
      
      test=develop
      9252e8fa
    • L
      some fixes for int8 mobilenet_ssd tester (#18112) · 5fd68ac1
      lidanqing 提交于
      * some fixes for int8 mobilenet_ssd tester
      test=develop
      
      * change wrong data file name
      test=develop
      
      * change test images bin file from 200 images to 100 images
      
      * change directory existence to file existence during downloading
      test=develop
      
      * reuse download_data
      test=develop
      
      * run full dataset when iterations=0
      test=develop
      5fd68ac1
    • J
      [MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146) · c2efdfd5
      Jacek Czaja 提交于
      * - Reusing of reuder used in elementwise_add_mkldnn
      
      - Added MKL-DNN sum prim reusing
      
      test=develop
      
      - Compilation fixes
      
      test=develop
      
      - Yet another compilation fix
      
      test=develop
      
      - Yet another compilation fix
      
      test=develo
      
      - Yet another linking fix
      
      test=develop
      
      - Final compilation fix
      
      test=develop
      
      - lint fixes
      
      test=develop
      
      - Lint fixes
      
      test=develop
      
      * - Fixes after review
      
      test=develop
      c2efdfd5
  2. 26 6月, 2019 9 次提交
  3. 25 6月, 2019 9 次提交
  4. 24 6月, 2019 6 次提交
  5. 23 6月, 2019 4 次提交
  6. 22 6月, 2019 2 次提交
  7. 21 6月, 2019 6 次提交
    • P
      fix a bug in examples of metrics.Acc · cd9d57f5
      pkpk 提交于
      cd9d57f5
    • T
      refine core cmake warning and print more info (#18248) · 68da8b2a
      tensor-tang 提交于
      * refine core cmake warning and print more info
      
      test=develop
      
      * fix comments
      
      test=develop
      68da8b2a
    • Z
      Add StaticRNN.output code example (#18251) · 32c95f17
      zhaoyuchen2018 提交于
      refine StaticRNN api doc
      test=develop
      test=document_preview
      32c95f17
    • X
      fix yolo_box example,test=develop (#18247) · 2f0d6826
      xiaoting 提交于
      2f0d6826
    • S
      fix some bug when merge sparse embedding parameters, test=develop (#18223) · 6b3d9625
      songhao 提交于
      1. fix the bug that out_put_var in SaveSelectedRows would be empty string
      2. use merge_sparse_lookup_table to replace sum op for load_persistables_for_inference
      3. fix the bug in _clone_var_in_block_ when the var is SELECTED_ROWS.
      6b3d9625
    • J
      dataset (#17973) · 3f8031e2
      jiaqi 提交于
      (1) use channel instead of vector/BlockingQueue in Dataset,to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training.
      (2) add Record because MultiSlotType costs too much memory (80B),fix memory out of limit problem.
      (3) add Channel, Archive in paddle/fluid/framework
      (4) change dataset from shared_ptr to unique_ptr in pybind
      (5) move create/destroy readers from trainer to dataset
      (6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network.
      (7) fix thread num bug of Dataset when filelist size < thread num
      (8) support set_queue_num in InMemoryDataset
      3f8031e2