1. 11 1月, 2021 3 次提交
    • Z
      [Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54
      Zhen Wang 提交于
      * Support pure fp16 training for AMP API. (#29544)
      
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      
      * Remove tensor copy in the update_loss_scaling op. (#29426)
      
      * remove tensor copy in the update_loss_scaling op
      
      * not use thrust.
      
      * fix some cuda memory access error.
      d8dfef54
    • G
      Quantization supports 2.0 APIs (#30036) (#30257) · 393a91f1
      guofei 提交于
      * Quantization supports 2.0 APIs
      
      * Fix the error of save_quantized_model
      393a91f1
    • W
      [cherry-pick 2.0] optimize gradient merge (#30185) · e283dc6f
      WangXi 提交于
      * Optimization grad merge performance (#29784)
      
      * [fleet] combine amp and gradient merge, test=develop (#30086)
      
      * fix assign_op_xpu concat_op_xpu warining (#30120)
      Co-authored-by: Nliuyuhui <liuyuhui@baidu.com>
      e283dc6f
  2. 08 1月, 2021 1 次提交
    • H
      [Cherry-pick] amp related PR cherry pick into Release/2.0 (#30212) · 9f7c66b4
      huangxu96 提交于
      * Optimizer trans momentum (#29597)
      
      * merge amp related function in Momentum from paddle.fluid.contrib.optimizer into paddle.optimizer.
      
      * Add unittest for 2.0  Momentum API.
      
      * fix some bugs in weight_decay.
      
      * add alias for fluid.contrib.mixed_precision (#29562)
      
      * add alias for fluid.contrib.mixed_precision
      
      * add static.amp into setup.pu.in (#29621)
      
      * add static.amp into setup.pu.in
      
      * add unittest for api
      
      * fix a bug in multi_precision_fp16 unittest. (#29756)
      9f7c66b4
  3. 07 1月, 2021 1 次提交
    • F
      [Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63
      furnace 提交于
      * Layer norm fp16 (#29169)
      
      * add fp16 for layer_norm op
      
      * revert layernorm api
      
      * fix forward
      
      * fix forward
      
      * fix backward for layernorm with fp16
      
      * fix unit test for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
      
      * fix with_mkldnn compile error for layernorm with fp16
      
      * fix with_mkldnn compile error for layernorm with fp16
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * fix layer_norm accuracy (#29434)
      
      * Layernorm opt (#29522)
      
      * layernorm fw opt
      
      * layernorm bw opt
      
      * fix typo, test=develop
      
      * remove const dim3 for windows CI compatibility
      
      * merge develop
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      
      * Fix compile problem when cuda_arch < 6000 (#29576)
      
      * fix compile problem when cuda_arch < 6000
      
      * refine code
      
      * refine code
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
      44b81e63
  4. 05 1月, 2021 1 次提交
  5. 29 12月, 2020 1 次提交
    • X
      [cherry-pick] clean redundant API alias in 2.0 - part 1 #29928 (#29960) · c9c835b5
      XiaoguangHu 提交于
      * [cherry-pick] cherry-pick of PR#29928
      
      * delete paddle.metric.chunk_eval and paddle.metric.mean_iou
      
      * delete paddle.nn.clip and paddle.nn.clip_by_norm
      
      * delete paddle.nn.functional.activation.hard_sigmoid and paddle.nn.functional.activation.hard_swish
      
      * [cherry-pick] cherry-pick of PR#29928
      
      * fix extension import error
      c9c835b5
  6. 09 12月, 2020 1 次提交
  7. 03 12月, 2020 1 次提交
    • Z
      [Cherry-pick] Add pure fp16 training with master weights. (#29301) · d8ea8a06
      Zhen Wang 提交于
      * Add pure fp16 training with master weights. (#27712)
      
      * add the weight decay func for the momentum op
      
      * Add the multi_precision function in Momentum Optimizer.
      
      * Make sure that the initial value of master weights are same with the fp16 weights.
      
      * add static loss scaling.
      
      * add the rescale_grad function in the pure fp16 training.
      
      * use the original momentum updating method.
      
      * Polish some codes, such as variable names.
      
      * add docstring for apis.
      
      * update the var creation details of _create_master_weight.
      
      * not modify codes about imperative momentum updating.
      
      * Fix the error of test_dist_sparse_tensor_load_momentum UT.
      
      * add unit test for multi precision fp16 training.
      
      * add more unit tests for CI.
      
      * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
      d8ea8a06
  8. 01 12月, 2020 1 次提交
  9. 30 11月, 2020 2 次提交
  10. 27 11月, 2020 1 次提交
  11. 26 11月, 2020 1 次提交
  12. 25 11月, 2020 1 次提交
    • H
      Quant nn2.0 (#28764) · 40f54537
      huangxu96 提交于
      * Impelement 2.0 API version Conv2d and Linear layer quantization in imperative mode.
      
      * use cudnn softmax in static Lenet
      
      * Modified ChannelwiseQAT Unittest for 2.0 API.
      
      * For CI python coverage.
      40f54537
  13. 24 11月, 2020 1 次提交
    • L
      Upgrade string literals to raw string (#28989) · 3815d7aa
      Leo Chen 提交于
      * upgrade comment string to raw string
      
      * fix string in
      
      * fix string with ' '
      
      * revert update on comments
      
      * upgrade only necessary
      
      * fix sample code checker
      
      * fix comments with '''
      3815d7aa
  14. 23 11月, 2020 1 次提交
  15. 18 11月, 2020 3 次提交
  16. 16 11月, 2020 1 次提交
  17. 08 11月, 2020 1 次提交
    • Y
      exec ut no more than 15s 1 (#28439) · ba075632
      YUNSHEN XIE 提交于
      * disable ut test_parallel_executor_fetch_isolated_var,test=document_fix
      
      * test for limiting ut exec time as 15S
      
      * fix an error caused by cannot find ut
      
      * fix some error
      
      * can not find test_transformer
      
      * fix error caused by ut not run in windows
      
      * fix error caused by Compiler Options
      
      * fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt
      
      * setting timeout value to 120s for old ut
      
      * add the timeout value setting
      
      * fix error caused by ut only run in coverage_ci
      
      * add analyzer_transformer_profile_tester
      
      * fix some error
      
      * fix some error
      
      * fix error with inference option
      
      * fix error with inference option setting as ON_INFER
      
      * add some ut to set timeout
      
      * modified some option
      
      * fix error
      
      * fix some timeout error
      
      * fix error
      
      * fix error
      
      * fix timeout for test_analyzer_bfloat16_resnet50
      
      * fix error
      
      * setting timeout properity for some ut
      
      * first pr for new ut timeout as 15S
      ba075632
  18. 04 11月, 2020 1 次提交
  19. 21 10月, 2020 2 次提交
    • C
      fix test_weight_decay_extend error (#28178) · 5d73bfdb
      Chen Weihang 提交于
      5d73bfdb
    • C
      2.0rc api rename (#28088) · 7c1aa0d6
      cnn 提交于
      * rename manual_seed to seed
      
      * rename xxx1d-->xxx1D, xxx2d-->xxx2D, xxx3d-->xxx3D
      
      * rename manual_seed --> seed
      
      * do not rename .cc, .cu and .h file
      
      * rename manual_seed --> seed
      
      * rename manual_seed --> seed
      
      * rename manual_seed --> seed
      
      * rename manual_seed --> seed
      
      * disable_static on doc example code
      
      * donot change manual_seed on generator
      
      * add enable_static on sample code
      
      * convert python/paddle/fluid/layers/nn.py to bak
      
      * fix typo
      
      * fix code style
      
      * fix seed to manual_seed when call functions of Generator()
      
      * fix bug
      7c1aa0d6
  20. 14 10月, 2020 1 次提交
  21. 12 10月, 2020 2 次提交
  22. 11 10月, 2020 1 次提交
  23. 09 10月, 2020 1 次提交
  24. 01 10月, 2020 1 次提交
  25. 24 9月, 2020 1 次提交
  26. 23 9月, 2020 2 次提交
  27. 22 9月, 2020 1 次提交
    • P
      Use dygraph mode by default (#27443) · 827ac36f
      pangyoki 提交于
      * default open dygraph mode
      
      * fix CI-Mac
      
      * fix Mac-CI other unittest file
      
      * fix CI-Py3
      
      * fix test_communicator_geo and test_buffer_shared_memory_reuse_pass
      
      * add enable_static to fix CI-Py3
      
      * add enable_static to fix CI-coverage
      
      * delete try except
      827ac36f
  28. 21 9月, 2020 1 次提交
    • H
      Quant op dev (#25932) · 02606d45
      huangxu96 提交于
      * Finished ChannelWiseQuantDequantAbsMaxOp and Passed unittests.
      
      * Finished channel-wise quantize strategy in imperative quantization.
      
      * Added Cuda code of ChannelWiseQuantDequantMaxAbsOP
      Add Cuda code of ChannelWiseQuantDequantMaxAbsOp
      
      * Add quant_axis for channel_wise quant.
      
      * fixed a bug in unnitests, which will not trigger axis = 1 case and cannot meet the coverage rate requirement.
      
      * Added some assert infomation and fixed some coding style mistakes.
      02606d45
  29. 18 9月, 2020 1 次提交
  30. 15 9月, 2020 1 次提交
  31. 14 9月, 2020 1 次提交
    • Z
      Update amp_check_finite_and_scale_op and add an updating_loss_scaling op for... · d708b210
      Zhen Wang 提交于
      Update amp_check_finite_and_scale_op and add an updating_loss_scaling op for static graph amp training. (#26240)
      
      * update amp_check_finite_and_scale_op for static_amp.
      
      * use amp_check_finite_and_scale in static graph amp.
      
      * update grads to zero when grads own infinite values(as for amp_checkout_finite_and_scale op).
      
      * add update_loss_scaling op in cpp.
      
      * add update_loss_scaling_op unit test.
      
      * update the doc of the check_finite_and_unscale op
      
      * Update the process of gradients updating skipping if the gradients have infinite values.
      
      * update the way to zero grads.
      
      * update test_update_loss_scaling_op.py
      
      * add log info when find infinite grads.
      
      * add the unit test for UpdateLossScaling Layer.
      d708b210
  32. 10 9月, 2020 1 次提交