1. 07 5月, 2019 7 次提交
    • Z
      Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225) · 4f859408
      Zeng Jinle 提交于
      * add use_cuda to inplace pass,test=develop
      
      * add test softmax_with_xe_inplace test,test=develop
      
      * fix potential inplace bug
      test=develop
      
      * add more skip vars in mem opt pass,test=develop
      
      * follow comment,test=develop
      
      * follow comments,move duplicate out arg check to program->graph,test=develop
      4f859408
    • B
      update sofmax with axis arg test=develop (#17190) · e782b54b
      baojun 提交于
      e782b54b
    • K
      Softmax_cross_entropy op add axis (#16806) · a71d8fdb
      Kaipeng Deng 提交于
      * add attr axis infershape. test=develop
      
      * add CUDA kernel. test=develop
      
      * fix unittest. test=develop
      
      * fix unittest for soft_label. test=develop
      
      * fix fp16 unittest. test=develop
      
      * remove comment code. test=develop
      
      * refine test for axis. test=develop
      
      * add python api. test=develop
      
      * fix doc. test=develop
      
      * fix fp16 unittest. test=develop
      
      * fix ngraph test. test=develop
      
      * fix ENFORCE for test_imperative_transformer. test=develop
      
      * fit for ngraph test. test=develop
      
      * fix after rebase develop. test=develop
      
      * fix doc. test=develop
      
      * fix API.spec. test=develop
      
      * fix test_layers. test=develop
      
      * fix format. test=develop
      a71d8fdb
    • Z
      Quant output scale (#17215) · a914d9b1
      Zhen Wang 提交于
      * Add MovingAverageAbsMaxScale operator which is only used for calculating the quantization scale.
      
      * test=develop
      
      * change the output into inplace. test=develop
      
      * Revert "test=develop"
      
      This reverts commit 696cf62699ba1e1c98f61f7345ac7060010eb29a.
      
      * Revert "change the output into inplace. test=develop"
      
      This reverts commit a19acd20f07eee82622701a3015e6e9c073a5e0b.
      
      * test=develop.
      
      * update the MovingAverageAbsMaxScaleOp test. test=develop
      a914d9b1
    • Z
      optimize sum op (#16820) · 32b62c25
      zhaoyuchen2018 提交于
      * optimize sum op
      
      fuse multi eigen kernel calls into one cuda kernel.
      refine code
      
      test=develop.
      Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
      
      * Refine code.
      
      test=develop
      Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
      
      * Refine code according to comments.
      
      test=develop
      
      * refine code
      
      delete sum_op_gpu.h
      test=develop
      
      * Fix test error.
      
      test=develop
      Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
      
      * refine code in format.
      
      test=develop.
      
      * refine code
      
      test=develop
      Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
      
      * refine code
      
      test=develop
      Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
      32b62c25
    • Cherry-pick benchmark related changes from release/1.4 (#17156) · a72dbe9a
      石晓伟 提交于
      * cherry-pick commit from 88770542
      
      * cherry-pick commit from 3f0b97df
      
      * cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn
      
      (cherry picked from commit 8643dbc2)
      
      * Cherry-Pick from 16662 : Anakin subgraph cpu support
      
      (cherry picked from commit 7ad182e1)
      
      * Cherry-pick from 1662, 16797.. : add anakin int8 support
      
      (cherry picked from commit e14ab180)
      
      * Cherry-pick from 16813 : change singleton to graph RegistBlock
      test=release/1.4
      
      (cherry picked from commit 4b9fa423)
      
      * Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2
      
      Support ShuffleNet and MobileNet-v2, test=release/1.4
      
      (cherry picked from commit a6fb066f)
      
      * Cherry-pick : anakin subgraph add opt config layout argument #16846
      test=release/1.4
      
      (cherry picked from commit 8121b3ec)
      
      * 1. add shuffle_channel_detect
      
      (cherry picked from commit 6efdea89)
      
      * update shuffle_channel op convert, test=release/1.4
      
      (cherry picked from commit e4726a06)
      
      * Modify symbol export rules
      
      test=develop
      a72dbe9a
    • J
      Refine api doc (#17230) · ef66baed
      jerrywgz 提交于
      * refine api comment, test=develop
      ef66baed
  2. 06 5月, 2019 2 次提交
  3. 05 5月, 2019 1 次提交
  4. 30 4月, 2019 4 次提交
  5. 29 4月, 2019 1 次提交
  6. 28 4月, 2019 2 次提交
    • Z
      Refine dropout gpu memory (#17095) · 28d69d71
      Zeng Jinle 提交于
      * refine_dropout_mem,test=develop
      
      * # This is a combination of 14 commits.
      # The first commit's message is:
      remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (#17066)
      
      # This is the 2nd commit message:
      
      Fleet unify distributed training (#16791)
      
      * implement distributed transpiler with fleet
      # This is the 3rd commit message:
      
      ParallelDyGraph with GPU collective mode (#16827)
      
      implement dygraph.parallel.DataParallel to hook reduce op.
      
      # This is the 4th commit message:
      
      Init mixed precision training interface (#16856)
      
      * Init mixed precision training interface
      
      * Add fp16 test script
      
      test=develop
      
      * All initializers support float16
      
      test=develop
      
      * Code cleanup & add more code annotations
      
      test=develop
      
      * Update API spec
      
      test=develop
      
      * Add usage example in doc
      
      test=develop
      
      # This is the 5th commit message:
      
      fix reference_count_pass,test=develop (#17060)
      
      test=develop
      # This is the 6th commit message:
      
      Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (#17090)
      
      * Cache the information of linear interpolation in forward and use it in backward.
      test=develop
      
      * Fix cuda kernel.
      test=develop
      
      # This is the 7th commit message:
      
      remove unnecessary prepare_data (#17080)
      
      test=develop
      # This is the 8th commit message:
      
      fix interpolate cu. test=develop (#17101)
      
      # This is the 9th commit message:
      
      test=develop, double backward leaky_relu (#17067)
      
      backward of backward: leaky_relu
      # This is the 10th commit message:
      
      fix fuse optimizer ops (#17102)
      
      test=develop
      # This is the 11th commit message:
      
      truncated_gaussian_random supported in distributed training, test=develop (#17091)
      
      # This is the 12th commit message:
      
       Detailed coordinate description for yolov3 loss (#17007)
      
      * Detailed coordinate description for yolov3 loss
      
      test=develop
      
      * modified api.spec
      
      test=develop
      
      * modified loss name
      
      * fix api.spec
      
      test=develop
      
      * polish description
      
      test=develop
      
      * modified api.spec
      
      test=develop
      
      # This is the 13th commit message:
      
      fix test_weight_decay (#17109)
      
      test=develop
      # This is the 14th commit message:
      
      Path flag (#17105)
      
      * fix python/paddle/fluid/__init__.py detecting problems
      28d69d71
    • H
      Use CudnnWorkspaceHandle in exhaustive search (#17082) · b9494058
      Huihuang Zheng 提交于
      1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn.
      2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search.
      
      test=develop
      b9494058
  7. 26 4月, 2019 3 次提交
  8. 25 4月, 2019 2 次提交
  9. 23 4月, 2019 2 次提交
    • Z
      Make conv cudnn workspace size configurable (#17036) · 0c335dcd
      Zeng Jinle 提交于
      * make_conv_cudnn_ws_size_configurable, test=develop
      
      * change std::max to std::min
      test=develop
      0c335dcd
    • Q
      Support backward of backward for Relu and add a new gradient checker by... · c1c2633a
      qingqing01 提交于
      Support backward of backward for Relu and add a new gradient checker by comparing theoretical and numerical Jacobian. (#16862)
      
      * Support backward of backward and a new gradient checker
      * Rename decorators.py to decorator_helper.py, since Python on Windows CI has decorators package.
      
      1. Add ReluDoubleGradMaker when register relu_grad.
      2. Add a new gradient checker by comparing theoretical and numerical Jacobian.  Check double gradients by double_grad_check.
      c1c2633a
  10. 22 4月, 2019 4 次提交
  11. 21 4月, 2019 1 次提交
    • Z
      Refine model gpu memory (#16993) · 1202d3fc
      Zeng Jinle 提交于
      * speedup gc and inplace softmax_with_cross_entropy_grad
      test=develop
      
      * refine models gpu mem
      Merge skip vars and warning messages of mem opt
      remove relu mem opt
      test=develop
      
      * follow comments
      test=develop
      1202d3fc
  12. 20 4月, 2019 1 次提交
  13. 19 4月, 2019 1 次提交
  14. 18 4月, 2019 1 次提交
  15. 17 4月, 2019 3 次提交
  16. 16 4月, 2019 5 次提交