提交 · 937e21a3b6aa2a794d4f05b2a65b44317bdbf1e6 · Crayon鑫 / Paddle

26 7月, 2021 7 次提交

由 furnace 提交于 7月 26, 2021

* [NPU] add tril_triu

* [NPU] delete debug codes

* [NPU] add more test cases, and api test

* [NPU] optimize codes style

d5872192

D
【HETERPS】edit cuda remote_streams (#34276) · 539d7185
由 danleifeng 提交于 7月 26, 2021
```
* psgpu:edit cuda remote_streams; test=develop
```
539d7185
R

fix bug for index_sample_op_npu (#34383) · ca174025
由 ronnywang 提交于 7月 26, 2021

ca174025

Support getitem by None index in dynamic mode (#34338) · a0bbc992

由 zyfncg 提交于 7月 26, 2021

* Support getitem by ellipsis index in dynamic mode

* change some code style

* Support getitem by none index in dynamic mode

* modify a comments style and remove useless code

a0bbc992

Q

[NPU] fix logcial op on NPU, test=develop (#34371) · d3d174f7
由 Qi Li 提交于 7月 26, 2021

d3d174f7

[NPU] add cumsum (#34188) · 6b20cb4e

由 furnace 提交于 7月 26, 2021

* [NPU] add cumsum

* [NPU] delete debug codes

* [NPU] add attr flatten and unittests, and api tests

* [NPU] delete comment codes

* [NPU] add attr flatten and axis exclusive check

* [NPU] delete skipIf

6b20cb4e

[NPU] add hard_sigmoid (#34094) · b5d8f43e

由 furnace 提交于 7月 26, 2021

* [NPU] add hard_sigmoid

* [NPU] delete check_dygraph=False and max_relative_error

* [NPU] delete debug codes

* [NPU] add more test cases

* [NPU] add api test TestHardsigmoidAPI

* [NPU] temp delete hard_sigmoid for resovle conficts

* [NPU] resolve conflicts

b5d8f43e

23 7月, 2021 6 次提交
- A
  Revert "[Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy... · 577fdde5
  由 Aurelius84 提交于 7月 23, 2021
```
Revert "[Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy for pass (#34181)" (#34348)

This reverts commit 609f8225.
```
  577fdde5
- W
  Logical Ops support more data types (#34141) · 27417f1f
  由 will-jl944 提交于 7月 23, 2021
```
* logical ops support int8, int16, int32, int64, float, double

* update docs of logical ops

* fix npu and xpu logical ops

* fix npu and xpu logical ops

* fix bug in xpu logical op code

* update test_logical_op_npu and test_logical_op_xpu

* correct error type
```
  27417f1f
- R
  [NPU] add index_sample_op_npu and tests (#34239) · 63f6ce7b
  由 ronnywang 提交于 7月 23, 2021
```
* add index_sample_op_npu and tests

* update
```
  63f6ce7b
- L
  [NPU] Refine variable name (#34330) · b436e5fa
  由 Leo Chen 提交于 7月 23, 2021
```
* fix variable name

* fix variable name
```
  b436e5fa
- L
  
  update gather tree error msg (#34322) · edb9aff5
  由 liu zhengxi 提交于 7月 23, 2021
  
  edb9aff5
- R
  
  add npu_sampling_id and tests (#34302) · 04288091
  由 ronnywang 提交于 7月 23, 2021
  
  04288091
22 7月, 2021 8 次提交
- L
  copy found_inf to cpu in advance to improve performance (#34274) · 781f4028
  由 Leo Chen 提交于 7月 22, 2021
```
* copy found_inf to cpu in advance to improve performance

* add npu test

* add npu test

* refine code

* refine memcpy op

* fix adam
```
  781f4028
- W
  
  fix concat bug (#34319) · c342651e
  由 wuhuachaocoding 提交于 7月 22, 2021
  
  c342651e
- A
  [Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy for pass (#34181) · 609f8225
  由 Aurelius84 提交于 7月 22, 2021
```
* modify into program_id

* fix cache_info declare problem

* fix python int to C long problem

* modify point to reference

* add ENVS
```
  609f8225
- J
  Added sigmoid BF16 FWD/BWD kernels and gelu BF16 BWD kernel (#34216) · 5d3c89cf
  由 jakpiase 提交于 7月 22, 2021
```
* added sigmoid BF16 FWD/BWD and gelu BF16 BWD

* added newline at EOF

* switched from lambdas to local functions

* changed function names
```
  5d3c89cf
- L
  
  enable amp unsupported_fp16_list for npu (#34314) · b0a2f005
  由 Leo Chen 提交于 7月 22, 2021
  
  b0a2f005
- C
  Add int16 kernel for lookup_talbe and dequantize_abs_max op (#34275) · 85e531a9
  由 cc 提交于 7月 22, 2021
```
* add int16 kernel for lookup_talbe and dequantize_abs_max op
```
  85e531a9
- 王
  
  [pass_enhance]fix the positon error in map_matmuil_to_mul_pass, test=develop (#34303) · 5179853a
  由王明冬提交于 7月 22, 2021
  
  5179853a
- Z
  Support getitem by ellipsis index in dynamic mode (#34267) · 82339ed1
  由 zyfncg 提交于 7月 22, 2021
```
* Support getitem by ellipsis index in dynamic mode

* change some code style
```
  82339ed1
21 7月, 2021 5 次提交
- Z
  [Paddle-TRT] upgrade test_tensorrt to trt8 (#34294) · 0438b604
  由 zlsh80826 提交于 7月 21, 2021
```
* upgrade test_tensorrt to trt8

* format
```
  0438b604
- 王
  
  [pass_enhance]fix the error in conv_bias_mkldnn_fuse_pass, test=develop (#34292) · 9b63e7f2
  由王明冬提交于 7月 21, 2021
  
  9b63e7f2
- J
  
  solve slice inplace illegal memory address bug (#34265) · d8e238d1
  由 jiangcheng 提交于 7月 21, 2021
  
  d8e238d1
- C
  
  fix cuda Stream record_event bug (#34285) · d953f8a9
  由 chentianyu03 提交于 7月 21, 2021
  
  d953f8a9
- W
  trt reduce_mean supported. (#34204) · aff14962
  由 wenbin 提交于 7月 21, 2021
```
* reduce_mean supported. test=allcase

* ut. test=allcase

* test=develop

* ut.test=allcase

* correct name. test=allcase

* correct UT. test=allcase

* correct UT.test=develop

* remove op

* UT

* add convert

* fix timeout issue

* more uts

* more ut

* correct ut
```
  aff14962
20 7月, 2021 8 次提交

李
Fix cast op that can not cast the arrays that the size of arrays is beyond int32 (#34209) · 038883fd
由李季提交于 7月 20, 2021
```
* fix cast
```
038883fd
P

optimize fusion pass logs to avoid duplication (#34261) · 52e2c83e
由 Pei Yang 提交于 7月 20, 2021

52e2c83e

[Paddle-TRT] Add noexcept on methods inherited from TensorRT (#34157) · b5aab4f0

由 zlsh80826 提交于 7月 20, 2021

* add trt noexcept definition

* add trt noexcept on trt plugin

* add trt noexcept on trt int8 calibrator

* remove noexcept on base serialize

* add trt noexcept on split plugin

* add trt noexcept on elementwise plugin

* add trt noexcept on prelu plugin

* add trt noexcept on pool plugin

* add trt noexcept on swish plugin

* add trt noexcept on gelu plugin

* add trt noexcept on layer norm plugin

* add trt noexcept on instance norm plugin

* add trt noexcept on emb eltwise layernorm plugin

* add trt noexcept on qkv2context plugin

* add trt noexcept on skip layernorm plugin

* add trt noexcept on slice plugin

* add trt noexcept on hard swish plugin

* add trt noexcept on stack plugin

* add trt noexcept on special slice plugin

* add trt noexcept on anchor generator plugin

* add trt noexcept on yolobox plugin

* add trt noexcept on roi align plugin

* add trt noexcept on gather nd plugin

b5aab4f0

Add Dependency to Fix Random Compilation Failure (#34256) · c0133e01

由 Huihuang Zheng 提交于 7月 20, 2021

Add boost as dependency to fix random compilation failure. This is due to program_processing.cc used boost but didn't write boost into DEPS in the CMakeLists.txt

c0133e01

C

fix cuda_stream missing mkldnn depending error (#34260) · c963a21d
由 chentianyu03 提交于 7月 20, 2021

c963a21d
C

optimization of index_select op backward (#32955) · 6883403f
由 crystal 提交于 7月 20, 2021

6883403f

change strided_slice when step<0. (#34205) · 7f2b5be3

由 WeiXin 提交于 7月 20, 2021

* change strided_slice when step<0.

* add unittest for paddle.strided_slice

* polish unittest

7f2b5be3

W

[hybrid parallel] Optimize pipeline memory (#34230) · a74208c1
由 WangXi 提交于 7月 20, 2021

a74208c1

19 7月, 2021 6 次提交

Q

[NPU] add is_empty_op_npu, test=develop (#34234) · d4fb5c68
由 Qi Li 提交于 7月 19, 2021

d4fb5c68
J

Fix format in requantize mkldnn op (#34137) · 1dfd857b
由 joanna.wozna.intel 提交于 7月 19, 2021

1dfd857b

Add Cuda event and stream API (#32460) · 9c7f6af5

由 chentianyu03 提交于 7月 19, 2021

* add cuda event and stream api

* add cuda event and stream api

* add get_current_stream api

* add get_current_stream api

* init streams

* modify get_current_stream

* modify get_cuttent_stream

* add synchronize func

* add current_stream doc and test file

* move get_current_stream into CUDA macro

* move CudaEvent into CUDA macro

* move _get_current_stream and _device_synchronize into cuda macro

* modify the macro of cuda stream and event

* add test case for synchronize

* add paddle.devices.cuda module

* event and stream support hip

* add doc for stream and event class

* move cuda stream and event into single pybind

* add cuda_streams_py.cc to cmakelist

* add _device_synchronize and _get_current_stream to core module

* add test case for cudastream and cudaevent

* move __all__ in streams.py

* fix test fail

* add cuda to devices __all__

* fix current_stream doc writing error

* move devices to device direction, and merge device.py into __init__.py

* add required:gpu to sample codes

* remove cuda direction from device/__init__.py

9c7f6af5

R

[NPU hybrid] Partial send /recv/ allgather for npu (#34189) · 0cd21fac
由 Roc 提交于 7月 19, 2021

0cd21fac
李

set the fuse_all_reduce_ops defalut false (#34212) · 2d5d5f37
由李季提交于 7月 19, 2021

2d5d5f37
W

[Inference] Add config.Summary api (#34122) · 831c1c6c
由 Wilber 提交于 7月 19, 2021

831c1c6c

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致