1. 28 4月, 2022 2 次提交
    • Z
      [cherry-pick] Optimize performance of dygraph (#42196) (#42329) · 2ea56c90
      zyfncg 提交于
      * Optimize performance of dygraph (v4)  (#42196)
      
      * optimize performance of dygraph
      
      * optimize performance of dygraph and elementwise_add
      
      * optimize the trace op
      
      * fix bug
      
      * fix bug
      
      * fix unittest bug
      
      * fix code format
      
      * fix cherry-pick problem
      2ea56c90
    • Z
      [cherry-pick] Optimize performance of dygraph (#42231, #42253) (#42309) · 69a92b7b
      zyfncg 提交于
      * Optimize the performanece of sum api (#42231)
      
      * optimize the performanece of sum api
      
      * optimize IsDenseTensorInput
      
      * remove debug log
      
      * Add move construct for KernelSignature (#42253)
      
      * add move construct for KernelSignature
      
      * add noexcept
      
      * fix cherry-pick problem
      69a92b7b
  2. 27 4月, 2022 3 次提交
    • C
      [Cherry-pick] Optimize dygraph performance part4 (#42306) · 9bc423b1
      Chen Weihang 提交于
      * Remove std::type_index in AttributeArdDef (#42122)
      
      * polish some impl
      
      * add lost attr type
      
      * polish details
      
      * fix error type
      
      * polish in name lists
      
      * add double attr
      
      * adapt infrt attr parse
      
      * add attr type test (#42263)
      
      * opt attr eaque perf (#42272)
      9bc423b1
    • A
      [Performance]Remove redundant op_type in RecordEvent (#42246) (#42288) · 16fd8f9c
      Aurelius84 提交于
      * [Performance]Remove redundant op_type  in RecordEvent
      
      * [Performance]Remove redundant op_type  in RecordEvent
      
      * [Performance]Remove redundant op_type  in RecordEvent
      16fd8f9c
    • C
      [Cherry-pick2.3] Optimize dygraph performance part3 (#42256) · 9495708a
      Chen Weihang 提交于
      * Change small vector size (#42202)
      
      * change samll vector size
      
      * Update type_defs.h
      
      * Optimize dygraph InferShape perf (#42155)
      
      * init commit
      
      * remove two hash impl
      
      * fix bug
      
      * polish details
      
      * fix compile failed
      
      * fix compile failed
      
      * fix compile failed
      
      * add default kernel sig cache
      
      * fix get kernel arg defs error
      
      * remove kernel arg defs cache
      
      * fix origin op execute
      9495708a
  3. 26 4月, 2022 1 次提交
    • C
      [Cherry-pick] Optimize dygraph performance part2 (#42224) · ab24b9c0
      Chen Weihang 提交于
      * Add paddle::variant and replace paddle::any (#42139)
      
      * add variant and replace any
      
      * split attribute
      
      * Optimize dygraph GetExpectedKernelType perf (#42154)
      
      * opt dygraph scheduling
      
      * revert part impl
      
      * fix variant compile error (#42203)
      
      * replace any by variant in infermeta (#42181)
      ab24b9c0
  4. 25 4月, 2022 1 次提交
  5. 21 4月, 2022 1 次提交
  6. 20 4月, 2022 1 次提交
  7. 19 4月, 2022 1 次提交
  8. 15 4月, 2022 1 次提交
  9. 12 4月, 2022 1 次提交
  10. 11 4月, 2022 1 次提交
  11. 05 4月, 2022 1 次提交
    • Z
      Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e
      Zhang Ting 提交于
      * switch autotune
      
      * implement AutoTuneCache
      
      * implement AutoTuneCache class
      
      * add pybind api
      
      * add dygraph test
      
      * support static mode and eager mode and improve unittests
      
      * rename the SwitchAutoTune Class and improve tests
      
      * improve AutoTuneStatus and reduce the cost of tests
      b0f8000e
  12. 02 4月, 2022 1 次提交
  13. 01 4月, 2022 3 次提交
    • C
      [Phi] Move softmax with cross entropy kernel into phi (#40832) · e6ec98fe
      Chen Weihang 提交于
      * add cross_entropy_with_softmax phi kernel
      
      * remove softmax_with_cross_entropy kernel
      
      * add softmax_with_cross_entropy grad kernel
      
      * remove original op kernel
      
      * refine cross entropy impl
      
      * fix pointer error
      
      * revert kernel cu change
      
      * fix xpu failed
      
      * fix cinn failed
      
      * fix npu failed
      
      * add forward sig
      
      * add check_nan_inf for pt kernel
      
      * remove repeat cmake item
      
      * fix unittest error
      e6ec98fe
    • C
      [Phi]Interploatd kernels into phi (#40855) · d65a7a46
      chentianyu03 提交于
      * add interploate cpu kernel
      
      * fix nullptr bug
      
      * add interpolate gpu kernel
      
      * fix unit test error
      
      * remove raw kernels
      
      * add cuda kernel impl
      
      * add infermeta
      
      * recover accidentally deleted kernels in interpolate op
      
      * fix grad x_grad name error
      
      * remove interpolate_v2_op.h
      
      * rm unused codes
      
      * fix xpu build error
      
      * fix build error
      
      * fix namespace error
      
      * add register header for nup
      
      * fix infermeta error
      
      * modify by review
      
      * add the missing args in test_trt_convert_nearest_interp_v2
      d65a7a46
    • L
      [KP] fix bug in activation xpu kp kernel (#41219) · 705776ca
      Liu-xiandong 提交于
      * fix bug in activation xpu kp kernel
      
      * delete useless comment
      705776ca
  14. 31 3月, 2022 2 次提交
  15. 28 3月, 2022 1 次提交
  16. 24 3月, 2022 1 次提交
  17. 23 3月, 2022 2 次提交
  18. 21 3月, 2022 1 次提交
  19. 18 3月, 2022 1 次提交
    • Z
      [Phi]Move hierarchical_sigmoid kernel to phi (#40553) · 64a7cbd3
      Zhang Zheng 提交于
      * first commit
      
      * fix compile error
      
      * support std::vector<std::srting>
      
      * fix
      
      * fix op support on GPU by chenweihang
      
      * pass test
      
      * infershape
      
      * add set_dtype
      
      * fix order
      
      * fix
      
      * unify the impl of dt and sr
      
      * fix
      64a7cbd3
  20. 17 3月, 2022 2 次提交
    • C
      [Phi] Move assign kernel into phi (#40022) · 1904572a
      Chen Weihang 提交于
      * move assign kernel init commit
      
      * change vec<tensor> to vec<tensor*>
      
      * support tensor array
      
      * support api declare
      
      * fix test_list failed
      
      * fix npu and xpu failed
      
      * fix infrt failed
      
      * remove assign array size in operator
      
      * move assign sr header into sr dir
      
      * add infermeta for assign
      
      * test op success
      
      * fix test_list failed
      
      * fix kunlun failed
      
      * add set host allocator in tests
      
      * support tensor array in arg ctx
      
      * open set layout in share_meta
      
      * fix meta tensor layout error
      
      * fix test failed
      1904572a
    • Q
      [ROCm] fix bfloat16 support, test=develop (#40401) · da558f0e
      Qi Li 提交于
      da558f0e
  21. 16 3月, 2022 3 次提交
  22. 15 3月, 2022 4 次提交
    • X
      run python api in eager model and filter the out in argument list (#40523) · 4d886f75
      xiongkun 提交于
      * run python api in eager model and filter the out in argument list
      
      * fix code
      4d886f75
    • F
      [NPU] add AMP O1 support (#40362) · 69dd43d1
      furnace 提交于
      * [NPU] add AMP O1 support
      
      * [NPU] fix NOTE and warnings
      69dd43d1
    • Z
      Added more profile signposts to dygraph (#40201) · 36db75b4
      Zhanlue Yang 提交于
      * Added more signposts to dygraph profiling
      
      * Fixed minor issues
      
      * Refactored signpost names
      
      * Fixed typo
      
      * Removed debug codes
      
      * Fixed typo
      
      * Adjusted signpost names
      
      * Fixed issues from branch merge
      36db75b4
    • H
      Move one hot to phi (#39876) · 7701db37
      hong 提交于
      * move one hot to phi; test=develop
      
      * fix bugs; test=develop
      
      * fix bugs; test=develop
      
      * add infer meta; test=develop
      
      * fix bugs; test=develop
      
      * resolve confilct
      
      * resolve confilct
      
      * fix bug;
      
      * fix error; test=develop
      
      * update; test=develop
      
      * polish code; test=develop
      
      * add one api in eager mode; test=develop
      
      * add one hot test; test=develop
      
      * remove use less code; test=develop
      
      * fix bug; test=develop
      
      * polish code; test=develop
      
      * polish code; test=develop
      7701db37
  23. 14 3月, 2022 1 次提交
  24. 12 3月, 2022 1 次提交
  25. 11 3月, 2022 2 次提交
    • C
      [Phi] Remove needless deps in unittests (#40256) · 89ed57e2
      Chen Weihang 提交于
      * remove needless deps in unittests
      
      * add gpu marco
      
      * fix other unittests
      
      * fix kernel name error
      
      * fix test_prepare_op
      
      * fix failed dygraph unittests
      
      * fix gpu failed tests
      
      * fix cinn test failed
      
      * fix cinn test failed
      
      * fix dropout tests
      89ed57e2
    • C
      [Phi] Reduce grad (#40263) · f452ad5c
      chentianyu03 提交于
      * add reduce_sum grad kernel
      
      * add reduce_grad
      
      * modify reduce grad
      
      * update reduce grad functions
      
      * fix build error
      
      * add argument mapping
      
      * move cast input after grad
      
      * add dims.size=1 cpu reduce_sum grad compute method
      
      * update reduce grad GPU
      
      * remove raw reduce_sum_grad kernel
      
      * modify header files
      
      * add namespace funcs for reduce_grad_funcstions
      f452ad5c
  26. 10 3月, 2022 1 次提交