提交 · 851637fd58d4a64961f67424af020579f74baff6 · 机器未来 / Paddle

29 12月, 2021 4 次提交

Make profiler better (#38280) · 851637fd

由 liutiexing 提交于 12月 29, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update OS info

* split host_event_recorder

* split host_event_recorder

* update

* update

* update

* update

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

851637fd

Y

add top k v2 operator, test=kunlun (#38434) · d22f92ad
由 ykkk2333 提交于 12月 29, 2021

d22f92ad

add argsort/scatter for kunlun (#38345) · 4643baa7

由 TTerror 提交于 12月 29, 2021

* add argsort/scatter for kunlun

* update test_scatter

* update xpu.cmake

* update xpu.cmake

* fix scatter

4643baa7

S

add nccl func of NCCL 2.11 (#38519) · 4853ab0a
由 sneaxiy 提交于 12月 29, 2021

4853ab0a

28 12月, 2021 1 次提交

add reduce_prod_xpu. fix reduce_mean_xpu bug. (#38481) · 78836bb7

由 houj04 提交于 12月 28, 2021

* add reduce_prod_xpu. fix reduce_mean_xpu bug.

* iadd reduce_prod_xpu. fix reduce_mean_xpu bug. test=kunlun

78836bb7

27 12月, 2021 2 次提交
- L
  add device-agnostic stream class (#38391) · 6b5e33b4
  由 Leo Chen 提交于 12月 27, 2021
```
* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile
```
  6b5e33b4
- S
  
  refine float16 implementation (#38439) · 78375990
  由 sneaxiy 提交于 12月 27, 2021
  
  78375990
24 12月, 2021 1 次提交
- Z
  
  Add new API cholesky_solve (#38167) · 39f7c41f
  由 zhiboniu 提交于 12月 24, 2021
  
  39f7c41f
23 12月, 2021 3 次提交
- J
  Make GetBlob assuming elements are cached (#38336) · 7da5368d
  由 Jacek Czaja 提交于 12月 23, 2021
```
* First set of fixes

* - Make more likely to GetBlob find a blobs

* - Lint
```
  7da5368d
- W
  Support external stream. (#38373) · 15ad7ee4
  由 Wilber 提交于 12月 23, 2021
```
* support external stream.

* update

* update

* update
```
  15ad7ee4
- H
  
  add-leaky-relu-to-xpu2-op-list (#38366) · b7bafee8
  由 houj04 提交于 12月 23, 2021
  
  b7bafee8
20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
17 12月, 2021 2 次提交

Get base pointer from Allocation (#37978) · 431a2d6a

由 From00 提交于 12月 17, 2021

* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy

431a2d6a

H

update xpu1 op list, for train ResNet50 using PaddleClas. (#38201) · 3a0e0b6f
由 houj04 提交于 12月 17, 2021

3a0e0b6f

16 12月, 2021 2 次提交

D
[psgpu]add checknan print and fix trainer device (#38131) · 092839d6
由 danleifeng 提交于 12月 16, 2021
```
* trainer_device fix and checknan tool for psgpu;test=develop

* disable show_one_table;test=develop
```
092839d6

Adapt host event recorder to profiler (#37766) · 5b6be4d7

由 liutiexing 提交于 12月 16, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add os_info

* update

* update

* update

* update

* update

* update for bugfix

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

5b6be4d7

13 12月, 2021 1 次提交
- J
  
  add popart_canonicalization p4 (#37967) · 69252fd8
  由 jianghaicheng 提交于 12月 13, 2021
  
  69252fd8
10 12月, 2021 3 次提交
- S
  
  make cuda graph thread local allocator (#37814) · 62b1f38c
  由 sneaxiy 提交于 12月 10, 2021
  
  62b1f38c
- J
  
  add popart_canonicalization p3 (#37966) · 3e7768d3
  由 jianghaicheng 提交于 12月 10, 2021
  
  3e7768d3
- J
  
  add popart_canonicalization p2 (#37965) · 8b30c1ec
  由 jianghaicheng 提交于 12月 10, 2021
  
  8b30c1ec
09 12月, 2021 2 次提交
- S
  Refine CUDA atomicAdd for FP16 by CUDA primitive methods (#37895) · 033ebe7e
  由 sneaxiy 提交于 12月 09, 2021
```
* fix cuda atomicAdd for FP16

* try to fix ci
```
  033ebe7e
- J
  
  add ipu device p2 (#37840) · cb636a48
  由 jianghaicheng 提交于 12月 09, 2021
  
  cb636a48
08 12月, 2021 2 次提交

Fix host event recorder (#37944) · 20471de7

由 liutiexing 提交于 12月 08, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Fix RecordEvent
Co-authored-by: Nliutiexing <liutiexing@google.com>

20471de7

S
Fix CUDA Graph H2D bug by restore host memory (#37774) · a1ad3a63
由 sneaxiy 提交于 12月 08, 2021
```
* fix CUDA Graph H2D bug again

* fix no return bug
```
a1ad3a63

07 12月, 2021 2 次提交
- T
  add some op to xpu2 op list && format xpu op list (#37832) · efd7a229
  由 TTerror 提交于 12月 07, 2021
```
* format xpu op list

* format xpu op list

* update xpu1 op list
```
  efd7a229
- J
  
  add ipu device p1 (#37841) · c9a3c669
  由 jianghaicheng 提交于 12月 07, 2021
  
  c9a3c669
03 12月, 2021 2 次提交
- J
  
  add ipu_backend (#36322) · a3b3ec68
  由 jianghaicheng 提交于 12月 03, 2021
  
  a3b3ec68
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
01 12月, 2021 3 次提交

HostEventRecorder (#37629) · feda7c1d

由 liutiexing 提交于 12月 01, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update HostEventTracer

* update HostEventTracer

* fix c++17

* update

* update

* update

* update

* fix bug
Co-authored-by: Nliutiexing <liutiexing@google.com>

feda7c1d

T
add prior_box for kunlun (#37697) · e0fc8937
由 TTerror 提交于 12月 01, 2021
```
* add prior_box for kunlun

* update

* update CMakeLists
```
e0fc8937
F
add angle_op (#37689) · 28b43111
由 Feiyu Chan 提交于 12月 01, 2021
```
* add angle_op
```
28b43111

29 11月, 2021 3 次提交
- T
  
  DLTP-40731 [Bug] xpu1+x86环境，develop paddle包，nlp case glue_xpu1_dy_bert_bs32 (#37666) · 46c71f2c
  由 taixiurong 提交于 11月 29, 2021
  
  46c71f2c
- T
  add expand_v2/expand_as_v2 for kunlun (#37592) · dae4e7f2
  由 TTerror 提交于 11月 29, 2021
```
* add expand_v2/expand_as_v2 for kunlun

* update expand_as_v2

* update expand_as_v2

* support float16/bool

* update xpu.cmake
```
  dae4e7f2
- P
  
  Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
  由 piotrekobiIntel 提交于 11月 29, 2021
  
  1ba81500
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

24 11月, 2021 2 次提交

P
Changed second batch of deprecated mkldnn header and function names to new oneDNN names (#37351) · 7db7a0ec
由 piotrekobiIntel 提交于 11月 24, 2021
```
* Add second batch of deprecated mkldnn namespace and macro changes

* Unlock CI

* Fix temporary namespace alias placing
```
7db7a0ec

[Paddle-Inference] Matmul_int8_convert: tensor*tensor (#37285) · 16590799

由 Wangzheee 提交于 11月 24, 2021

* matmul_convert_int8

* matmul_convert_int8

* matmulconvert_int8

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

16590799

23 11月, 2021 2 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
- Y
  
  [fleet_executor] Update with collective (#37462) · df14dbf0
  由 Yuang Liu 提交于 11月 23, 2021
  
  df14dbf0
19 11月, 2021 1 次提交

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致