1. 08 6月, 2022 1 次提交
    • N
      Replace ReduceAmax/Amax.part.cu with KP (#43202) (#43263) · e161979e
      niuliling123 提交于
      Reduce amax/amin frobenius_norm_kerne原始实现为Eigen实现,文件编译时间较长,因此本PR将其替换为KP实现
      删除DefaultElementwiseOperator中重复功能支持,减少elementwise_double_grad OP编译时间
      e161979e
  2. 07 6月, 2022 1 次提交
  3. 06 6月, 2022 1 次提交
    • N
      cherry-pick 42645 (#43205) · 835a1888
      niuliling123 提交于
      删除Broadcast function中rank例化以及Elementwise调用,降低编译时间。
      从develop分支中的#42645 PR修改而来,由于develop分支与release分支相差较大,无法实现cherry-pick,因此针对release2.3重新提交PR.
      Broadcast中关于rank的例化会导致底层模板展开较多,造成reduce_sum_grad_kernel.cu.o文件体积过大,修改后可以降低.o体积及编译时间
      835a1888
  4. 28 4月, 2022 2 次提交
  5. 26 4月, 2022 1 次提交
    • C
      [Cherry-pick] Optimize dygraph performance part2 (#42224) · ab24b9c0
      Chen Weihang 提交于
      * Add paddle::variant and replace paddle::any (#42139)
      
      * add variant and replace any
      
      * split attribute
      
      * Optimize dygraph GetExpectedKernelType perf (#42154)
      
      * opt dygraph scheduling
      
      * revert part impl
      
      * fix variant compile error (#42203)
      
      * replace any by variant in infermeta (#42181)
      ab24b9c0
  6. 19 4月, 2022 1 次提交
  7. 13 4月, 2022 2 次提交
  8. 07 4月, 2022 1 次提交
  9. 03 4月, 2022 1 次提交
    • F
      add maximum limit for grid of index_select (#41127) · af8d2482
      FlyingQianMM 提交于
      * limit grid dim for index select
      
      * mv LimitGridDim into gpu_launch_config.h
      
      * fix conflicts
      
      * fix conflicts
      
      * fix code style
      
      * set block to 256
      
      * fix grid setting
      
      * set dtype of block_dim to unsigned int
      af8d2482
  10. 02 4月, 2022 2 次提交
  11. 01 4月, 2022 1 次提交
    • C
      [Phi]Interploatd kernels into phi (#40855) · d65a7a46
      chentianyu03 提交于
      * add interploate cpu kernel
      
      * fix nullptr bug
      
      * add interpolate gpu kernel
      
      * fix unit test error
      
      * remove raw kernels
      
      * add cuda kernel impl
      
      * add infermeta
      
      * recover accidentally deleted kernels in interpolate op
      
      * fix grad x_grad name error
      
      * remove interpolate_v2_op.h
      
      * rm unused codes
      
      * fix xpu build error
      
      * fix build error
      
      * fix namespace error
      
      * add register header for nup
      
      * fix infermeta error
      
      * modify by review
      
      * add the missing args in test_trt_convert_nearest_interp_v2
      d65a7a46
  12. 31 3月, 2022 2 次提交
  13. 30 3月, 2022 3 次提交
  14. 29 3月, 2022 3 次提交
  15. 28 3月, 2022 1 次提交
    • H
      Move some activation to phi (#40727) · e77a947e
      hong 提交于
      * update
      
      * add forward case
      
      * update
      
      * update; test=develop
      
      * add some grad kernel; test=develop
      
      * move gpu kernel; test=develop
      
      * update
      
      * update;
      
      * update test;
      
      * fix selected rows bug;
      
      * add mix vector include ;
      
      * add mixed vector depen; test=develop
      
      * add logit grad signature;
      
      * polish code
      
      * fix bug;
      
      * add namespace for abs
      
      * revert code
      
      * not move softsign
      
      * revmove duplate register;
      
      * fix softsign bug
      
      * polish code
      
      * format
      
      * format
      
      * fix bug
      
      * remove cmake dep
      
      * add square sqrt selected rows support
      
      * update
      
      * remove clip norm
      
      * add standalone executor sqrt dep
      
      * standalone exec denp sqrt
      
      * remove sqrt op in cmkaelist
      
      * open some case
      e77a947e
  16. 27 3月, 2022 1 次提交
    • H
      Move slice to phi (#40736) · b8236b7b
      hong 提交于
      * move slice to pten
      
      * merge develop; test=develop
      
      * fix slice bug;
      
      * update
      
      * update
      
      * fix error
      
      * update
      
      * fix bug
      
      * polish code
      
      * polish code
      
      * polish code
      
      * try to fix windows bug
      
      * add gpu compile flag;
      
      * try to fix
      
      * remov template;
      
      * polish code;
      
      * fix npu bug;
      
      * fix npu bug
      
      * fix npu bug; test=develop
      
      * fix slice bug;
      
      * remove no need dep
      b8236b7b
  17. 26 3月, 2022 1 次提交
  18. 25 3月, 2022 5 次提交
  19. 24 3月, 2022 2 次提交
  20. 23 3月, 2022 5 次提交
  21. 22 3月, 2022 2 次提交
    • H
      Change bn muable data to phi (#40748) · d9a41fc4
      hong 提交于
      * move mutable_data to context alloc
      
      * move mutable_data to context alloc
      
      * remvoe duplicate code
      d9a41fc4
    • H
      Move embedding to phi (#39901) · 0331cfda
      hong 提交于
      * move embeding to phi;
      
      * update sig; test=develop
      
      * move reset impl to phi; test=develop
      
      * remove old register; test=develop
      
      * fix cpu bf16 bug; test=develop
      
      * fix lookup speed error
      
      * polish code
      
      * fix paddle throw type
      0331cfda
  22. 21 3月, 2022 1 次提交