提交 · 76514a1f7ba4e233e733d1c7cf24896c06b1877b · Crayon鑫 / Paddle

20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
17 12月, 2021 2 次提交

Get base pointer from Allocation (#37978) · 431a2d6a

由 From00 提交于 12月 17, 2021

* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy

431a2d6a

H

update xpu1 op list, for train ResNet50 using PaddleClas. (#38201) · 3a0e0b6f
由 houj04 提交于 12月 17, 2021

3a0e0b6f

16 12月, 2021 2 次提交

D
[psgpu]add checknan print and fix trainer device (#38131) · 092839d6
由 danleifeng 提交于 12月 16, 2021
```
* trainer_device fix and checknan tool for psgpu;test=develop

* disable show_one_table;test=develop
```
092839d6

Adapt host event recorder to profiler (#37766) · 5b6be4d7

由 liutiexing 提交于 12月 16, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add os_info

* update

* update

* update

* update

* update

* update for bugfix

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

5b6be4d7

13 12月, 2021 1 次提交
- J
  
  add popart_canonicalization p4 (#37967) · 69252fd8
  由 jianghaicheng 提交于 12月 13, 2021
  
  69252fd8
10 12月, 2021 3 次提交
- S
  
  make cuda graph thread local allocator (#37814) · 62b1f38c
  由 sneaxiy 提交于 12月 10, 2021
  
  62b1f38c
- J
  
  add popart_canonicalization p3 (#37966) · 3e7768d3
  由 jianghaicheng 提交于 12月 10, 2021
  
  3e7768d3
- J
  
  add popart_canonicalization p2 (#37965) · 8b30c1ec
  由 jianghaicheng 提交于 12月 10, 2021
  
  8b30c1ec
09 12月, 2021 2 次提交
- S
  Refine CUDA atomicAdd for FP16 by CUDA primitive methods (#37895) · 033ebe7e
  由 sneaxiy 提交于 12月 09, 2021
```
* fix cuda atomicAdd for FP16

* try to fix ci
```
  033ebe7e
- J
  
  add ipu device p2 (#37840) · cb636a48
  由 jianghaicheng 提交于 12月 09, 2021
  
  cb636a48
08 12月, 2021 2 次提交

Fix host event recorder (#37944) · 20471de7

由 liutiexing 提交于 12月 08, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Fix RecordEvent
Co-authored-by: Nliutiexing <liutiexing@google.com>

20471de7

S
Fix CUDA Graph H2D bug by restore host memory (#37774) · a1ad3a63
由 sneaxiy 提交于 12月 08, 2021
```
* fix CUDA Graph H2D bug again

* fix no return bug
```
a1ad3a63

07 12月, 2021 2 次提交
- T
  add some op to xpu2 op list && format xpu op list (#37832) · efd7a229
  由 TTerror 提交于 12月 07, 2021
```
* format xpu op list

* format xpu op list

* update xpu1 op list
```
  efd7a229
- J
  
  add ipu device p1 (#37841) · c9a3c669
  由 jianghaicheng 提交于 12月 07, 2021
  
  c9a3c669
03 12月, 2021 2 次提交
- J
  
  add ipu_backend (#36322) · a3b3ec68
  由 jianghaicheng 提交于 12月 03, 2021
  
  a3b3ec68
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
01 12月, 2021 3 次提交

HostEventRecorder (#37629) · feda7c1d

由 liutiexing 提交于 12月 01, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update HostEventTracer

* update HostEventTracer

* fix c++17

* update

* update

* update

* update

* fix bug
Co-authored-by: Nliutiexing <liutiexing@google.com>

feda7c1d

T
add prior_box for kunlun (#37697) · e0fc8937
由 TTerror 提交于 12月 01, 2021
```
* add prior_box for kunlun

* update

* update CMakeLists
```
e0fc8937
F
add angle_op (#37689) · 28b43111
由 Feiyu Chan 提交于 12月 01, 2021
```
* add angle_op
```
28b43111

29 11月, 2021 3 次提交
- T
  
  DLTP-40731 [Bug] xpu1+x86环境，develop paddle包，nlp case glue_xpu1_dy_bert_bs32 (#37666) · 46c71f2c
  由 taixiurong 提交于 11月 29, 2021
  
  46c71f2c
- T
  add expand_v2/expand_as_v2 for kunlun (#37592) · dae4e7f2
  由 TTerror 提交于 11月 29, 2021
```
* add expand_v2/expand_as_v2 for kunlun

* update expand_as_v2

* update expand_as_v2

* support float16/bool

* update xpu.cmake
```
  dae4e7f2
- P
  
  Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
  由 piotrekobiIntel 提交于 11月 29, 2021
  
  1ba81500
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

24 11月, 2021 2 次提交

P
Changed second batch of deprecated mkldnn header and function names to new oneDNN names (#37351) · 7db7a0ec
由 piotrekobiIntel 提交于 11月 24, 2021
```
* Add second batch of deprecated mkldnn namespace and macro changes

* Unlock CI

* Fix temporary namespace alias placing
```
7db7a0ec

[Paddle-Inference] Matmul_int8_convert: tensor*tensor (#37285) · 16590799

由 Wangzheee 提交于 11月 24, 2021

* matmul_convert_int8

* matmul_convert_int8

* matmulconvert_int8

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

16590799

23 11月, 2021 2 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
- Y
  
  [fleet_executor] Update with collective (#37462) · df14dbf0
  由 Yuang Liu 提交于 11月 23, 2021
  
  df14dbf0
19 11月, 2021 1 次提交

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

18 11月, 2021 1 次提交
- J
  Fix for wrong results in segmentation models (#37310) · c1802f91
  由 jakpiase 提交于 11月 18, 2021
```
* fix

* ci rerun

* ci rerun

* ci Rerun
```
  c1802f91
17 11月, 2021 1 次提交

Changed first batch of deprecated mkldnn headers and function names to new oneDNN names (#37040) · ce3ee9bb

由 piotrekobiIntel 提交于 11月 17, 2021

* Change first batch of mkldnn headers and namespace names to dnnl

* Revert changes to tensor.h, which require approval

* Format changes with pre-commit

* Add int32 tests

* Fix int32 tests and call GetDataFromTensor for int32

* Fix test

ce3ee9bb

11 11月, 2021 1 次提交
- T
  add where/where_index/masked_select for kunlun (#37053) · f5e7b02a
  由 TTerror 提交于 11月 11, 2021
```
* add where/where_index/masked_select for kunlun

* fix where/where_index

* update where/masked_select
```
  f5e7b02a
10 11月, 2021 1 次提交

Added stack FP32 FWD oneDNN kernel (#37002) · 99f9224c

由 jakpiase 提交于 11月 10, 2021

* added stack oneDNN FP32 op

* minor change

* CI fix

* added skipping for gpus

* fix for stack op

* CI fix

* CI fix

* Added comment

* CI fix

99f9224c

09 11月, 2021 3 次提交
- S
  
  fix bugs when build in windows with_inference_api_test=on (#36973) · fd15477f
  由 Sing_chan 提交于 11月 09, 2021
  
  fd15477f
- Z
  Try to fix CUDA Graph H2D copy bug (#36987) · 2a143f84
  由 Zeng Jinle 提交于 11月 09, 2021
```
* try to fix CUDA Graph H2D copy bug

* remove useless code

* fix ci

* fix ROCM CI

* fix CUDA_VERSION

* improve CI coverage
```
  2a143f84
- T
  
  add gather_nd/tile op for kunlun (#37029) · 819b9589
  由 TTerror 提交于 11月 09, 2021
  
  819b9589
08 11月, 2021 1 次提交

Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a

由 wanghuancoder 提交于 11月 08, 2021

* Use cuda virtual memory management and merge blocks, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* window dll, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* use autogrowthv2 for system allocator, test=develop

* remove ~CUDAVirtualMemAllocator(), test=develop

* refine, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix bug, test=develop

* revert system allocator, test =develop

* revert multiprocessing, test=develop

* fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop

* catch cudaErrorInitializationError when create allocator, test=develop

* fix cuMemSetAccess use, test=develop

* refine cuda api use, test=develop

* refine, test=develop

* for test, test=develop

* for test, test=develop

* switch to v2, test=develop

* refine virtual allocator, test=develop

* Record cuMemCreate and cuMemRelease, test=develop

* refine, test=develop

* avoid out of bounds, test=develop

* rename allocator, test=develop

* refine, test=develop

* use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop

* for test,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

a1ec1d5a

05 11月, 2021 1 次提交

Disable pool&conv_transpose&quantize caching (#36695) · db6c00c4

由 Jacek Czaja 提交于 11月 05, 2021

* - WIP

- compilation fix

- fix

- fixes

- fix

- fix

- fix again

- fix

- another fix

- another compilation fix

- fix

- fix

- fix

- lint

* - pool2d partially stripped from cache

- pool2d partially stripped of caching

* - compilation fix

* - compilation fix

* - Fix to UT of caching

* - Enabling test_conv3d_mkldnn

* - conv_transpose stripped of cache

* - compilation fix

* - fix

* - fix

* - compilation fix

* - fix

* Reverted disabling caching of conv2d

* - compilation fix

* - ut reverted

db6c00c4

03 11月, 2021 1 次提交

Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops for controlling op types used... · 2479664a

由 Zhen Wang 提交于 11月 03, 2021

Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops for controlling op types used in training with CINN. (#36842)

* Update UT test_parallel_executor_run_cinn.py.

* Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops & FLAGS_cinn_ops_delim.

* Use the custom StringSplit function and remove the FLAGS_cinn_ops_delim flag.

* Add FlagController test.

* Apply lock to the cache_ only in CinnCompiler.

* Add VizGraph & ReadableKey method for CinnCompiler.

* Update the dot style of VizGraph in CinnCompiler.

2479664a

02 11月, 2021 1 次提交
- Q
  support different precision in kunlun (#36836) · e512aa9a
  由 QingshuChen 提交于 11月 02, 2021
```
* support different precision in kunlun

* minor

* minor

* minor
```
  e512aa9a

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致