1. 23 11月, 2021 2 次提交
    • 0
      [Dy2stat]Allow users to switch eval/train mode when using @to_static to... · eed736dc
      0x45f 提交于
      [Dy2stat]Allow users to switch eval/train mode when using @to_static to decorate a function (#37383) (#37432)
      
      本PR之前使用@to_static装饰一个单独的function时,对于生成的Program无法切换train/eval模式,只能运行在train模式下。这也就导致动转静后用户多次调用function显存会一直增长。
      本PR之后,使用@to_static装饰一个单独的function时,可以通过function.train()或者function.eval()的方式来切换train/eval模式。
      eed736dc
    • W
      fix shape api (#37412) · 2778fcd9
      Wilber 提交于
      2778fcd9
  2. 22 11月, 2021 2 次提交
  3. 19 11月, 2021 1 次提交
  4. 16 11月, 2021 2 次提交
  5. 15 11月, 2021 1 次提交
  6. 10 11月, 2021 1 次提交
  7. 08 11月, 2021 2 次提交
  8. 01 11月, 2021 1 次提交
  9. 30 10月, 2021 1 次提交
  10. 29 10月, 2021 1 次提交
  11. 28 10月, 2021 10 次提交
  12. 27 10月, 2021 4 次提交
  13. 26 10月, 2021 12 次提交
    • W
    • B
      fix wrong trt dim when input dim is 2 (#36614) (#36732) · da6e5143
      baoachun 提交于
      * fix wrong trt dim when input dim is 2
      
      * update leaky_relu and instance_norm converter unit test
      
      * add instance_norm input dim check
      da6e5143
    • W
      [Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul,... · 30ce925c
      Wangzheee 提交于
      [Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul, mul) convert pass, fix (matmul, mul) op_teller (#36652) (#36737)
      
      30ce925c
    • S
      [Cherry-pick] Add FasterTokenizer Operator (#36716) · edff5b79
      Steffy-zxf 提交于
      * Add FasterTokenizer Operator (#34491)
      
      Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.
      
      * support the text string as an input Tensor
      * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
      * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
      * It first applies basic tokenization, followed by wordpiece tokenization.
      
      * optimize fast tokenizer
      
      * remove const_cast
      Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com>
      Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
      edff5b79
    • Z
      [cherry pick] add op: fused_feedforward(backward) (#36730) · 76c1bae1
      zhangkaihuo 提交于
      * add op: fused_feedforward(backward) (#35611)
      
      这个PR是fused_feedforward反向的代码
      
      相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias
      
      fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。
      
      * Move fused_attention and fused_feedforward functional api path to incubate (#36704)
      
      将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
      76c1bae1
    • H
      [cherry-pick]Support FP16 in HybridParallel and Fix bugs in HybridOptimizer (#36707) · 5b357e02
      Haohongxiang 提交于
      * fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer (#36237)
      
      * fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer
      
      * update
      
      * update
      
      * fix bugs in mp_layers、pp_layers and HybridParallelClipGrad (#36144)
      
      * fix calling bug of HybridParallelClipGrad
      
      * fix bugs of HybridParallelClipGrad
      
      * add unittest of pp with HybridParallelClipGrad
      
      * fix bugs in mp_layers.py
      
      * update
      
      * fix bugs in pp_layers.py
      
      * update
      
      * [HybridParallel]Rebuild code for pipeline (#36396)
      
      * add no_sync for parameters sync
      
      * add pipeline for moe
      
      * [HybridParallel]Support fp16 in dygraph hybrid parallel (#36420)
      
      * [HybridParallel]Support fp16 in dygraph hybrid parallel
      
      * update
      
      * update
      
      * update for recompute
      
      * add unittest of pp+fp16
      
      * add unittest of recompute+fp16
      
      * update
      
      * modify ut
      
      * modify ut of cond (#36475)
      
      * fix bugs of ClipGradByGlobalNorm in HybridParallel (#36555)
      
      * fix bugs of ClipGradByGlobalNorm
      
      * add unittests
      
      * add unittests
      
      * [HybridParallel]fix bug of check_inf in fleet_base.py (#36651)
      
      * fix bug of check_inf
      
      * fix allreduce
      
      * support ClipGradByGlobalNorm in sharding (#36012)
      
      * support ClipGradByGlobalNorm in sharding
      
      * support ClipGradByGlobalNorm in sharding
      
      * test=allcase
      
      * Update test_linalg_cond.py
      
      * Update hybrid_parallel_util.py
      
      * Update hybrid_parallel_util.py
      Co-authored-by: NShenLiang <1422485404@qq.com>
      Co-authored-by: Nzhaoyingli <86812880+zhaoyinglia@users.noreply.github.com>
      5b357e02
    • Z
      [cherry-pick]add op: fused_feedforward(forward) (#36729) · 77034fc3
      zhangkaihuo 提交于
      This is a fusion operator to compute feed forward layer in transformer model architecture.
      77034fc3
    • F
      Pool3d 2.0 (#36545) (#36721) · dfda193f
      feng_shuai 提交于
      dfda193f
    • S
      Add bincount op (#36317) (#36709) · 610a810c
      smallv0221 提交于
      * Add bincount op
      
      * upload cpu version
      
      * fix unitest
      
      * fix unittest
      
      * fix unittest
      
      * fix en doc
      
      * add more test
      
      * fix en doc
      
      * add more test case
      
      * fix test
      
      * fix input vailidation
      
      * fix input check
      
      * fix unittest
      
      * fix test
      
      * fix en doc
      
      cherry-pick
      610a810c
    • Y
      [Cherry-pick] Add the forward QR operator (#36627) · 616ce203
      Yulong Ao 提交于
      616ce203
    • X
      Support various length support for SelectedRows in GLOO::AllGather (#36637) (#36722) · fced11bd
      xiongkun 提交于
      Support various length support for SelectedRows in GLOO::AllGather (#36637)
      
          In cpu parallel using gloo, add various length support for SelectedRows
      fced11bd
    • L
      [Amp] refine code of amp level (#36362) (#36726) · 1ee4fc32
      Leo Chen 提交于
      * refine amp level
      
      * fix typo
      
      * update tracer._amp_level
      1ee4fc32