提交 · 193ab32cf0743c7b20e37c05bfb779655908ebba · PaddlePaddle / Paddle

14 6月, 2022 10 次提交
- C
  
  [MLU] add mlu kernel for depthwise conv2d op (#43359) · 077f3788
  由 cambriconhsq 提交于 6月 14, 2022
  
  077f3788
- Z
  [MLU]: add elementwise_max mlu kernel (#43365) · ceb6b3f1
  由 zhaoying9105 提交于 6月 14, 2022
```
* [MLU]: add elementwise_max mlu kernel

* [MLU]: add int32 support for elementwise maxk MLU kernel
```
  ceb6b3f1
- Z
  
  [MLU]: add log log10 log2 MLU kernel (#43360) · 4642e8c4
  由 zhaoying9105 提交于 6月 14, 2022
  
  4642e8c4
- S
  
  fix update loss scaling (#43487) · 0e6462d6
  由 sneaxiy 提交于 6月 14, 2022
  
  0e6462d6
- Y
  
  [cuda graph] partial program with cuda graph under static mode (#43440) · d83d59dd
  由 Yuang Liu 提交于 6月 14, 2022
  
  d83d59dd
- Z
  
  fix compiling werror (#43337) · c6421019
  由 Zhang Jun 提交于 6月 14, 2022
  
  c6421019
- X
  [ Make FLAGS_einsum_opt as default ] Einsum memory optimization (#43397) · 83abec60
  由 xiongkun 提交于 6月 14, 2022
```
* change logic for optimize

* modifty

* optimize the backward speed of EinsumOp

* add cache optimizer for einsum op

* EinsumOp: fix new dygraph mode error

* fix bug

* change Cache->InnerCache

* fix code

* fix

* add nan inf utils for einsum op

* add as_extra

* memory optimizer for einsum

* update code
```
  83abec60
- S
  
  【code format check upgrade】 step3：enable clang-format sort these infrt files's headers (#43333) · 403b127b
  由 Sing_chan 提交于 6月 14, 2022
  
  403b127b
- W
  fix cmake-lint problems. (#43406) · 59f89236
  由 Wilber 提交于 6月 14, 2022
```
* cmake-lint

* update
```
  59f89236
- Z
  
  fix bug of infer shape for slice (#43443) · e0a01461
  由 zyfncg 提交于 6月 14, 2022
  
  e0a01461
13 6月, 2022 4 次提交
- Q
  
  [MLU]add lookup_table_v2 op and fix amp feature of bert with mlu device (#43366) · 67bd5d9c
  由 qipengh 提交于 6月 13, 2022
  
  67bd5d9c
- C
  
  add mlu interp_v2(nearest&bilinear). (#43383) · affe25b7
  由 Chenxiao Niu 提交于 6月 13, 2022
  
  affe25b7
- P
  
  Disable oneDNN adaptive pooling exhaustive check (#43236) · 4af7ebf4
  由 piotrekobi 提交于 6月 13, 2022
  
  4af7ebf4
- R
  
  Fix cmakelint errors for some files (#43428) · edf69ae0
  由 Ruibiao Chen 提交于 6月 13, 2022
  
  edf69ae0
12 6月, 2022 1 次提交
- L
  Fix the bug of slice op and optimize the code style of generate_proposals_v2... · 2d96801a
  由 Leo Guo 提交于 6月 12, 2022
```
Fix the bug of slice op and optimize the code style of generate_proposals_v2 op for kunlun. *test=kunlun (#43380)
```
  2d96801a
10 6月, 2022 9 次提交
- L
  
  optimize bwd layer_norm kernel with fast method (#42491) · b4a93884
  由 limingshu 提交于 6月 10, 2022
  
  b4a93884
- [MLU] add mlu kernel for clip (#43229) · 798e2e7e
  由光明和真理提交于 6月 10, 2022
  
  798e2e7e
- F
  
  [MLU]add mlu kernel for scatter op (#43292) · 9ad05afd
  由 fuyou765 提交于 6月 10, 2022
  
  9ad05afd
- S
  
  fix nullptr (#43370) · acfd7129
  由 sneaxiy 提交于 6月 10, 2022
  
  acfd7129
- L
  make all phi kernels to 2(host/device) static libraries directly (#43247) · 5781999d
  由 Leo Chen 提交于 6月 10, 2022
```
* make all phi kernels to 2(host/device) static libraries directly

* fix calling kernel_declare

* fix compile

* fix cpu compile

* fix rocm compile

* fix xpu compile

* fix xpu kp compile

* fix inference compile
```
  5781999d
- T
  
  [Hackathon No.28] implement logcumsumexp (#42267) · 19a7524f
  由 tiancaishaonvjituizi 提交于 6月 10, 2022
  
  19a7524f
- C
  
  [MLU]add mlu kernel for sqrt op (#43326) · 6d3a68cb
  由 cambriconhsq 提交于 6月 10, 2022
  
  6d3a68cb
- E
  Re-implemented check_finite_and_unscale_op with newly added xdnn api (#42960) · 6197fbf6
  由 enzodechine 提交于 6月 10, 2022
```
* Re-implemented check_finite_and_unscale_op  with newly added xdnn api
*test=kunlun

* Re-implemented check_finite_and_unscale_op  with newly added xdnn api

*test=kunlun
```
  6197fbf6
- F
  
  [MLU] add randperm kernel and reduce_prod kernel (#43357) · b07f469b
  由 fwenguang 提交于 6月 10, 2022
  
  b07f469b
09 6月, 2022 6 次提交
- F
  
  [MLU] add mlu meshgrid kernel (#43271) · 0d719718
  由 fwenguang 提交于 6月 09, 2022
  
  0d719718
- F
  
  [MLU]add mlu kernel for range op (#43296) · 1a80b484
  由 fuyou765 提交于 6月 09, 2022
  
  1a80b484
- C
  
  add mlu gather_nd kernel (#43344) · 0454b777
  由 cifar10 提交于 6月 09, 2022
  
  0454b777
- S
  Add nproc_per_node for DistributedFusedLamb (#43295) · 6678def9
  由 sneaxiy 提交于 6月 09, 2022
```
* add nproc_per_node for DistributedFusedLamb

* fix nproc_per_node communicator bug

* fix ring_id = 1 init bug

* fix ci

* fix test_parallel_executor_mnist.py
```
  6678def9
- C
  
  [MLU]add mlu kernel for conv2dtransposed op (#43233) · c96f7a29
  由 cambriconhsq 提交于 6月 09, 2022
  
  c96f7a29
- C
  Implement dropout_nd operator to optimize dropout with axis not None. (#42463) · caa57498
  由 crystal 提交于 6月 09, 2022
```
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
```
  caa57498
08 6月, 2022 4 次提交
- A
  
  [NPU] fix reduce_max (#43230) · 07ede118
  由 Aganlengzi 提交于 6月 08, 2022
  
  07ede118
- Y
  [Phi]Move group op kernel into PHI and add yaml / unittest (#43104) · 99c6497b
  由 YuanRisheng 提交于 6月 08, 2022
```
* move_group_norm

* move group norm backward

* fix code format

* modify code according comment
```
  99c6497b
- F
  
  [MLU] add logical ops (#43286) · 8bd3514c
  由 fwenguang 提交于 6月 08, 2022
  
  8bd3514c
- Y
  Fix wrong reduce_dims in fused_gate_attention and optimize the memory usage. (#43216) · 10f8637c
  由 Yiqun Liu 提交于 6月 08, 2022
```
* Polish codes and memory usage for fused_gate_attention.

* Fix wrong reduce_dims in fused_gate_attention when computing gradient of nonbatched_bias.
```
  10f8637c
07 6月, 2022 6 次提交
- S
  Matmul post-ops for fuses (#43198) · 5434d663
  由 Sławomir Siwek 提交于 6月 07, 2022
```
* add method for post ops

* format code

* change post-ops pattern

* code style
```
  5434d663
- S
  
  Optimized the performance of activation op in XPU2 (#43187) · d5afc1ba
  由 shixingbo 提交于 6月 07, 2022
  
  d5afc1ba
- S
  Add use_master_acc_grad for DistributedFusedLamb (#43266) · 601d7a35
  由 sneaxiy 提交于 6月 07, 2022
```
* add use_master_acc_grad

* add ut
```
  601d7a35
- Q
  [MLU]support cast double type (#43058) · 42dd0f1b
  由 qipengh 提交于 6月 07, 2022
```
* [MLU]support cast double type

* [MLU]fix cast test
```
  42dd0f1b
- L
  Transpose optimization with assitant of Chengdu Supercomputing Center and... · 71a63f0a
  由 limingshu 提交于 6月 07, 2022
```
Transpose optimization with assitant of  Chengdu Supercomputing Center and auto_tune operation (#42704)
```
  71a63f0a
- N
  
  [XPU KP]Add xpu register, any, amax, amin op test (#43204) · aec49361
  由 niuliling123 提交于 6月 07, 2022
  
  aec49361

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功