提交 · 15577630c940bf94279f881ecc58d27d956fd620 · Crayon鑫 / Paddle

15 6月, 2022 2 次提交

Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to... · 15577630

由 Yiqun Liu 提交于 6月 15, 2022

Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to support large tensor. (#43506)

* Change some data type from int to int64_t in GetGpuLaunchConfig1D to support large tensor.

* Use int64_t in ElementwiseKernel as index type to support large tensor.

15577630

R
Refactor dynload/port.h (#43431) · 332fdd1e
由 Ruibiao Chen 提交于 6月 15, 2022
```
* Refactor port.h

* Remove some unnecessary code

* Fix CI errors
```
332fdd1e

14 6月, 2022 16 次提交
- J
  [Eager] Fix edvr starganv2 (#43471) · c62a7e25
  由 Jiabin Yang 提交于 6月 14, 2022
```
* fix starganv2

* fix starganv2 stop_gradient end error

* fix edvr_starganv2

* fix mul kernel to fix optional ddx

* fix typo
```
  c62a7e25
- R
  Support sequential run GPU OPs for standalone executor (#43243) · 8cec1271
  由 Ruibiao Chen 提交于 6月 14, 2022
```
* Support sequential run for standalone executor

* Add UTs

* Fix test_standalone_multiply_write

* Remove unnecessary UTs
```
  8cec1271
- C
  
  [MLU] add mlu kernel for depthwise conv2d op (#43359) · 077f3788
  由 cambriconhsq 提交于 6月 14, 2022
  
  077f3788
- Z
  [MLU]: add elementwise_max mlu kernel (#43365) · ceb6b3f1
  由 zhaoying9105 提交于 6月 14, 2022
```
* [MLU]: add elementwise_max mlu kernel

* [MLU]: add int32 support for elementwise maxk MLU kernel
```
  ceb6b3f1
- Z
  
  [MLU]: add log log10 log2 MLU kernel (#43360) · 4642e8c4
  由 zhaoying9105 提交于 6月 14, 2022
  
  4642e8c4
- T
  
  fix whl check (#43415) · b1f77b4d
  由 tianshuo78520a 提交于 6月 14, 2022
  
  b1f77b4d
- S
  
  fix update loss scaling (#43487) · 0e6462d6
  由 sneaxiy 提交于 6月 14, 2022
  
  0e6462d6
- Y
  
  [cuda graph] partial program with cuda graph under static mode (#43440) · d83d59dd
  由 Yuang Liu 提交于 6月 14, 2022
  
  d83d59dd
- S
  [windows CI]open inference_ut in windows-inference pipeline (#43446) · 058c52b6
  由 Sing_chan 提交于 6月 14, 2022
```
* open inference_ut;test=windows_ci_inference

* inference_ut need onnx;test=windows_ci_inference

* disable trt_split_converter_test; use higher parallel level

* too high parallel will cause ut timeout
```
  058c52b6
- Z
  
  fix compiling werror (#43337) · c6421019
  由 Zhang Jun 提交于 6月 14, 2022
  
  c6421019
- X
  [ Make FLAGS_einsum_opt as default ] Einsum memory optimization (#43397) · 83abec60
  由 xiongkun 提交于 6月 14, 2022
```
* change logic for optimize

* modifty

* optimize the backward speed of EinsumOp

* add cache optimizer for einsum op

* EinsumOp: fix new dygraph mode error

* fix bug

* change Cache->InnerCache

* fix code

* fix

* add nan inf utils for einsum op

* add as_extra

* memory optimizer for einsum

* update code
```
  83abec60
- S
  
  【code format check upgrade】 step3：enable clang-format sort these infrt files's headers (#43333) · 403b127b
  由 Sing_chan 提交于 6月 14, 2022
  
  403b127b
- S
  
  【code format check upgrade】 step3：enable clang-format sort these cinn files's headers(#43329) · d14e3698
  由 Sing_chan 提交于 6月 14, 2022
  
  d14e3698
- W
  fix cmake-lint problems. (#43406) · 59f89236
  由 Wilber 提交于 6月 14, 2022
```
* cmake-lint

* update
```
  59f89236
- Z
  
  fix bug of infer shape for slice (#43443) · e0a01461
  由 zyfncg 提交于 6月 14, 2022
  
  e0a01461
- J
  [Eager] Fix custom op error (#43463) · 42754088
  由 Jiabin Yang 提交于 6月 14, 2022
```
* fix custom op error

* fix code error
```
  42754088
13 6月, 2022 13 次提交
- Q
  
  [MLU]add lookup_table_v2 op and fix amp feature of bert with mlu device (#43366) · 67bd5d9c
  由 qipengh 提交于 6月 13, 2022
  
  67bd5d9c
- C
  
  add mlu interp_v2(nearest&bilinear). (#43383) · affe25b7
  由 Chenxiao Niu 提交于 6月 13, 2022
  
  affe25b7
- C
  add serialization for new field in event node (#43405) · 360b8383
  由 chenjian 提交于 6月 13, 2022
```
* add serialization for new field in event node

* fix a bug
```
  360b8383
- Z
  
  add only split (#43424) · 30b10630
  由 zhoutianzi666 提交于 6月 13, 2022
  
  30b10630
- 津
  
  [inference]add topk/topk_v2 trt convertor (#43368) · 65e86580
  由津提交于 6月 13, 2022
  
  65e86580
- P
  
  Disable oneDNN adaptive pooling exhaustive check (#43236) · 4af7ebf4
  由 piotrekobi 提交于 6月 13, 2022
  
  4af7ebf4
- T
  Enable bert model on CPU (#43244) · 5fcd8061
  由 Tomasz Socha 提交于 6月 13, 2022
```
* Enable bert model on CPU

* Style
```
  5fcd8061
- S
  
  Support DCU in ProcessGroup (#43356) · c92b3805
  由 ShenLiang 提交于 6月 13, 2022
  
  c92b3805
- Z
  fix bug of strided_slice (#43388) · abc5d0c4
  由 zyfncg 提交于 6月 13, 2022
```
* fix stride_slice bug

* fix bug
```
  abc5d0c4
- J
  [Eager] Support set_grad_ivar for eager (#43378) · a0d0bb63
  由 Jiabin Yang 提交于 6月 13, 2022
```
* support set_grad_ivar for eager

* support set_grad_ivar for eager

* support set_grad_ivar for eager
```
  a0d0bb63
- R
  
  Fix cmakelint errors for some files (#43428) · edf69ae0
  由 Ruibiao Chen 提交于 6月 13, 2022
  
  edf69ae0
- Z
  
  gradient add support SparseCooTensor (#43352) · 97af8516
  由 zhangkaihuo 提交于 6月 13, 2022
  
  97af8516
- Z
  sparse convertion kernel support secondary dispatch (#43345) · 5752643b
  由 zhangkaihuo 提交于 6月 13, 2022
```
* use GpuMemcpy and GpuMemset

* sparse convert kernel support double dispatch by indices dtype

* cudaMemcpyKind->gpuMemcpyKind
```
  5752643b
12 6月, 2022 1 次提交
- L
  Fix the bug of slice op and optimize the code style of generate_proposals_v2... · 2d96801a
  由 Leo Guo 提交于 6月 12, 2022
```
Fix the bug of slice op and optimize the code style of generate_proposals_v2 op for kunlun. *test=kunlun (#43380)
```
  2d96801a
11 6月, 2022 1 次提交
- C
  
  fix add_n incompatible error (#43395) · 3800f192
  由 Chen Weihang 提交于 6月 10, 2022
  
  3800f192
10 6月, 2022 7 次提交
- C
  [Phi] Fix depthwise conv yaml error (#43379) · f551d9fe
  由 Chen Weihang 提交于 6月 10, 2022
```
* fix depthwise conv yaml error

* fix depthwise conv double grad error
```
  f551d9fe
- L
  
  optimize bwd layer_norm kernel with fast method (#42491) · b4a93884
  由 limingshu 提交于 6月 10, 2022
  
  b4a93884
- [MLU] add mlu kernel for clip (#43229) · 798e2e7e
  由光明和真理提交于 6月 10, 2022
  
  798e2e7e
- F
  
  [MLU]add mlu kernel for scatter op (#43292) · 9ad05afd
  由 fuyou765 提交于 6月 10, 2022
  
  9ad05afd
- W
  
  revert PR43039 (#43384) · ac75617a
  由 Wilber 提交于 6月 10, 2022
  
  ac75617a
- Y
  [BugFix]Fix dims mismatch when run rec_svtrnet model in eager mode (#43373) · cdeb3167
  由 YuanRisheng 提交于 6月 10, 2022
```
* change tensor name

* fix unittest bugs
```
  cdeb3167
- S
  
  fix nullptr (#43370) · acfd7129
  由 sneaxiy 提交于 6月 10, 2022
  
  acfd7129

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致