1. 18 1月, 2021 2 次提交
    • Z
      [cherry-pick] improve perfomance of cast and tril op (#30498) · de003cee
      Zhang Ting 提交于
      * add fp16 support for tril_triu op (#30186)
      
      * add VecCastCUDAKernel (#30296)
      Co-authored-by: Nfurnace <34057289+windstamp@users.noreply.github.com>
      de003cee
    • P
      Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in... · 27c2f1ea
      pangyoki 提交于
      Cherry-pick PR 30103. Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) (#30496)
      
      * add view strategy on squeeze,unsqueeze,reshape,flatten
      
      * add squeeze unittest
      
      * add unittests
      
      * use View strategy as name rather than Reuse Allacation
      
      * fix view api doc
      
      * fix format
      
      * use core.ops when input of reshape2 is Tensor
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * add inplace strategy
      
      * add elementwise_add sub
      
      * let backward op not use inplace
      
      * grad op do not use inplace
      
      * fix memory increase error and add leaf error message
      
      * delete selected_rows
      
      * change op_function
      
      * little change
      
      * solve HandleViewBetweenInputAndOutput
      
      * add unittest and leaf error message
      
      * merge view error
      
      * optimize op_function_generator format and support sum inplace op
      
      * fix format of basic_engine
      
      * fix format for framework
      
      * little change of variable wrapper
      
      * add reshape, squeeze, unsqueeze, scatter api
      
      * add relu elu tanh softmax inplace api
      
      * fix test_squeeze_op unittest
      
      * fix test_relu_op unittest
      
      * fix comment problems
      
      * delete sample code of inplace api
      
      * add reference of grad_pending_nodes in basic_engine
      
      * fix unittest name
      
      * add inplace apis into wlist
      
      * fix error message
      
      * add PADDLE_ENFORCE for set grad op twice
      
      * fix head file error
      27c2f1ea
  2. 15 1月, 2021 6 次提交
  3. 14 1月, 2021 8 次提交
  4. 13 1月, 2021 6 次提交
  5. 12 1月, 2021 8 次提交
  6. 11 1月, 2021 10 次提交
    • L
      [Cherry-Pick] Support vector<double> as type of op attribute and op set_value... · d839761e
      liym27 提交于
      [Cherry-Pick] Support vector<double> as type of op attribute and op set_value suppport vector<double> as value (#30126) (#30305)
      
      Cherry-Pick #30126
      1. Support vector<float64> as type of op attribute.
      2. op set_value suppports float64 numpy.array
      d839761e
    • L
      [cherry-pick] Async drop scope in executor (#29714) #30285 · 93ce7f69
      Leo Chen 提交于
      [cherry-pick] Async drop scope in executor (#29714)
      93ce7f69
    • L
      [Cherry-Pick 2.0] Check the rank of input in kernel of set_value op (#30147) (#30301) · a2bbd06a
      liym27 提交于
      cherry-pick #30147,For op set_value, check input's rank < 7
      a2bbd06a
    • W
      [cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t,... · 04cc659c
      WeiXin 提交于
      [cherry pick] Add detailed error message for curandStatus_t, cublasStatus_t, cusolverStatus_t (#30161) (#30280)
      
      为curandStatus_t、cublasStatus_t、cusolverStatus_t添加详细的报错信息。
      原始PR:#30161
      04cc659c
    • P
      [Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze,... · 7c943a65
      pangyoki 提交于
      [Cherry-pick PR 29913], add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) (#30258)
      
      * add view strategy on squeeze,unsqueeze,reshape,flatten
      
      * add squeeze unittest
      
      * add unittests
      
      * use View strategy as name rather than Reuse Allacation
      
      * fix view api doc
      
      * fix format
      
      * use core.ops when input of reshape2 is Tensor
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * delete selected_rows
      
      * change op_function
      
      * little change
      
      * solve HandleViewBetweenInputAndOutput
      7c943a65
    • Z
      [cherry-pick]add cast cuda kernel (#29352) #30263 · afbc6367
      Zhang Ting 提交于
       add cast cuda kernel
      
      cherry-pick #29352
      afbc6367
    • W
      [cherry-pick]add support for place string representation #30264 · fb66355e
      wangchaochaohu 提交于
      cherry-pick #28769, add support for place string representation 
      fb66355e
    • W
      [cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8
      wangchaochaohu 提交于
      * elementwise_add_grad Op optimization  (#29575)
      
      * optimize for long width for elementwise (#29602)
      
      * refine (#29622)
      
      * delete the code for fp16 optimization because it is not faster than common template code (#29715)
      
      * fix the shape choose of vectorize for cuda
      
      * optimization for fp16 elementwise add (#29744)
      
      * Fix the compiler error for half type (#29799)
      
      * refine the compiler error for half2 operation (#29816)
      
      * fix the compiler error when gcc4 cuda9.0 (#29997)
      e59524f8
    • Z
      [Cherry-Pick] Support pure fp16 training for AMP API. (#29544) (#30241) · d8dfef54
      Zhen Wang 提交于
      * Support pure fp16 training for AMP API. (#29544)
      
      * add cast ops before and after unsupported fp16 ops.
      
      * Keep partial net in FP32 pattern.
      
      * Support check_finite_and_unscale and update_loss_scaling for FP16 calculation mode.
      
      * Add fp16 support for adam op.
      
      * add multi precision attr for adam.
      
      * Fix the bug of test_multi_precision_fp16_train UT.
      
      * Code format for CI.
      
      * Fix the redefine error about MPTypeTrait on windows.
      
      * fix bugs of the _create_accumulators func in Momentum.
      
      * fix bug when inserting post cast op.
      
      * Add the update_loss_scaling op in allow_set of UnusedVarCheck.
      
      * Update for ci coverage.
      
      * Add some doc for OptimizerWithMixedPrecision.
      
      * Fix the code style.
      
      * Imporve the doc of `amp_init`.
      
      * Change for fp16 testing if users have the infer program defined in separate way.
      
      * Remove tensor copy in the update_loss_scaling op. (#29426)
      
      * remove tensor copy in the update_loss_scaling op
      
      * not use thrust.
      
      * fix some cuda memory access error.
      d8dfef54
    • Z
      [Cherry pick] improve dropout (#30260) · b4931ab1
      Zhang Ting 提交于
      * improve dropout (#29465)
      
      * improve drop out
      
      * add VectorizedRandomGeneratorWithGenerator
      
      * fix bug
      
      * modify according to comments
      
      * improve dropout grad (#29605)
      
      * improve grad perf
      
      * fix the bug of dropout_grad (#29813)
      b4931ab1