1. 14 4月, 2023 11 次提交
  2. 13 4月, 2023 12 次提交
  3. 12 4月, 2023 4 次提交
    • Z
      Optimize performance of unique kernel (#52736) · 8cbeefea
      Zhang Zheng 提交于
      * Optimize performance of unique kernel
      
      * fix ci
      8cbeefea
    • W
      [AMP OP&Test] add fp16/bf16 unittest for pool2d op (#52288) · f9b155f9
      Wei Shengyu 提交于
      * add bf16 support and bf16/fp16 unittest for pool2d
      
      * add include files
      
      * dbg
      
      * reformat
      
      * reformat
      
      * modify code according to review comment
      
      * remove duplicate code
      
      * remove dup code
      
      * remove useless include
      
      * dbg
      f9b155f9
    • W
      Patch del (#52754) · 189e0d44
      wangzhen38 提交于
      * [DO NOT MERGE] adadelta lr support
      
      * [DO NOT MERGE] gpu support
      
      * [test] follow torch
      
      * fix acc update order
      
      * for ci
      
      * [bug fix] update master para
      
      * [bug fix] update test
      
      * [bug fix] for ci test
      
      * for ci
      
      * fix xpu
      
      * [adadelta fix] del fluid head file
      
      * for ci
      
      * del notes
      189e0d44
    • G
      [AMP OP&Test] support bf16 for batch norm (#52407) · 523f8a26
      Guoxia Wang 提交于
      * [AMP OP&Test] support bf16 for batchnorm
      
      * codestyle
      
      * Update batch_norm_grad_kernel.cu
      
      * Update batch_norm_kernel.cu
      
      * fix codestyle
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * Update batch_norm_kernel.cc
      523f8a26
  4. 11 4月, 2023 7 次提交
  5. 10 4月, 2023 6 次提交
    • D
      【Hackathon No57】 add fp16 & bf16 for flip, fp16 for gaussian (#52380) · 2b0fffc2
      Difer 提交于
      * add_fp_bf_for_flip_gaussian_random
      
      * forget convert uint
      
      * fix some error
      
      * fix some error
      2b0fffc2
    • C
      3ee2b237
    • H
      [enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc (#52573) · 3c0b1795
      HongyuJia 提交于
      * [enforce.h Decouple gflags.h] Move gflags.h from enforce.h to enforce.cc
      
      * Add gflags.h for other files
      
      * Add gflags.h for other files
      
      * Add gflags.h for blas_impl.hip.h
      
      * Add gflags.h for miopen_helper.h
      3c0b1795
    • V
      [AMP OP&Test] Add fp16 and bf16 test to activation (#52521) · 6bd5fd75
      Vvsmile 提交于
      * adjust defalut tolerance of output and grad
      
      * fix a bug in the grad of OpTest
      
      * fix the type of setting defalut value in optest, both forward and
      backward
      
      * add defalut
      
      * fix test_sum_op
      
      * adjust tolerance
      
      * fix the tolerance of eager
      
      * add bf16 and fp16 to the activation tests
      
      * remove some fixs
      
      * fix activation
      
      * fix fp16
      
      * fix gelu
      
      * fix the activation tests
      
      * add bfloat16 specialization to singrad and cosgrad
      
      * fix bugs
      
      * fix bugs
      
      * add unittest
      
      * add skip
      
      * add fp/bf to rrelu/rrelu_grad
      
      * git add rrelu
      
      * fix bugs
      6bd5fd75
    • Q
      【AMP OP&Test】instance_norm fp16 and bf16 support. (#52241) · 7c98abd9
      qizhaoaoe 提交于
      * add fp16 and bf16 support for instance_norm
      
      * fix /= operator which not support bf16
      
      * fix instance_norm_grad kernel and unittests.
      
      * fix fp32 unittests.
      
      * fix instance_norm_kernel and unittests.
      
      * fix instance_norm_grad_kernel and unittest threshold.
      
      * add fp16/bf16 for instance_norm_grad_grad op.
      
      * add bf16 dtype check.
      
      * fix conflicts.
      
      * fix cpu support for fp32 op and fix type in instance_norm_grad_kernel.
      
      * fix type in instance_norm_kernel.
      
      * fix bf16 outputs in unittests and refine codes.
      
      * fix dx computation.
      
      * delete unuseful params and head including.
      
      * add fp16/bf16 for static graph.
      
      * fix device condiction for instance_norm op.
      
      * fix instance_norm_grad_grad and bf16 op tests.
      
      * fix op_test to support grad of bf16 can be compared with fp32.
      
      * remove updates.
      
      * add self-defined grad.
      7c98abd9
    • Z
      【PaddlePaddle Hackathon 4 No.36】为 Paddle 优化 tile op 在 GPU 上的计算性能 (#52482) · 61fe2198
      Zero Rains 提交于
      * fix divide zero bug for softmax_with_cross_entropy
      
      * change the single test way
      
      * can run but slow. the most important is that I do not know why it slow
      
      * remove some useless commet
      
      * change the copyright to correct
      
      * remove some useless change
      
      * if repeat_times == 1, we will not use BroadcastKernel
      61fe2198