提交 · 71e01d3f246bbd396c222e5521f0ee9376e6a5c2 · PaddlePaddle / Paddle

16 9月, 2021 1 次提交
- C
  
  Add CPU and GPU eigh op implementation (#34990) · 07d0b834
  由 crystal 提交于 9月 16, 2021
  
  07d0b834
15 9月, 2021 3 次提交
- J
  
  added fix for matmul and support for 6 rank tensor (#35740) · e80acff3
  由 jakpiase 提交于 9月 15, 2021
  
  e80acff3
- L
  add nvidia cusparse library, test=develop (#35675) · 7e18106a
  由 Liu-xiandong 提交于 9月 15, 2021
```
Put Nvidia's cusparse library into paddle.
```
  7e18106a
- S
  Add paddle.cuda.device.stream_guard API (#35623) · 3218075d
  由 Siming Dai 提交于 9月 15, 2021
```
Add paddle.cuda.device.stream_guard API 
```
  3218075d
14 9月, 2021 2 次提交

Y
Implement FunctionTraits to support two kinds of elementwise functor and... · 12bf0502
由 Yiqun Liu 提交于 9月 14, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35688)
```
12bf0502

Add api paddle.device.cuda.empty_cache to release idle gpu memory hold by allocator。 (#35427) · 83932715

由 chenenquan 提交于 9月 14, 2021

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Add empty_cache api to release idle gpu memory hold by allocator,test=develop

* Fix test coverage problem for empty_cache

* delete redundant check for empty_cache

* fix the problem of empty_cache's doc

* delete the nvidia-smi comment in doc of empty_cache, test=document_fix

83932715

13 9月, 2021 4 次提交
- Y
  Revert "Implement FunctionTraits to support two kinds of elementwise functor... · 40d4a295
  由 Yiqun Liu 提交于 9月 13, 2021
```
Revert "Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)" (#35686)
```
  40d4a295
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · d4f84d46
  由 Yiqun Liu 提交于 9月 13, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)
```
  d4f84d46
- T
  
  add xpu_wait & new implementation replace memcpy in adam, adamw (#35437) · 86a6be1a
  由 taixiurong 提交于 9月 13, 2021
  
  86a6be1a
- J
  Added clip BF16/FP32 FWD/BWD kernels (#35601) · 4e233712
  由 jakpiase 提交于 9月 12, 2021
```
* implemented clip op bf16/fp32

* added skipping if not cpu or bf16

* CI rerun after bf16 package change

* added parentheses to ensure formatting
```
  4e233712
11 9月, 2021 1 次提交
- A
  
  Clear VLOG in DeviceEvent (#35633) · cd5115f7
  由 Aurelius84 提交于 9月 11, 2021
  
  cd5115f7
09 9月, 2021 1 次提交

Add matrix_rank Op and it's GPU and CPU kernel (#34823) · eb1fbf12

由 0x45f 提交于 9月 09, 2021

* init matrix_rank op, add matrix_rank CPU code and test

* add GPU kernel, remove svd_eigen.h

* add CPU kernel when tol is tensor

* add cpu and gpu code when tol is tensor

* fix CI-ROCM error

* add matrix_rank API describe, fix PR-CI-Py3 error

* fix PR-CI-Windows error, add matrix_rank API test

* delete useless comments

* fix review

* add my code in svd_helper.h

* update doc commets

* remove spaces

eb1fbf12

08 9月, 2021 2 次提交

Enable program passes on Fleet APIs (#34955) · 5f369881

由 Zeng Jinle 提交于 9月 08, 2021

* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again

5f369881

merge CMakeList.txt manual (#35378) · c4a3e8b4

由 feng_shuai 提交于 9月 08, 2021

* merge CMakeList.txt manual

* add platform for changethreadnum

* repair some bugs according to make error

* do nothing just flush CI

* forget change thread num

* add inplace_atol param for check_output_with_place

* Windows

* std:min and std::max should be change because of windows

c4a3e8b4

07 9月, 2021 2 次提交
- Y
  
  support multi-node (#35396) · c6e0cedc
  由 yaoxuefeng 提交于 9月 07, 2021
  
  c6e0cedc
- A
  Fix DryRun unittest failed from test_standalon_executor.py (#35433) · 071e8156
  由 Aurelius84 提交于 9月 07, 2021
```
* fix commit

* Open unittest

* fix unittest on Windows

* fix constructor
```
  071e8156
06 9月, 2021 2 次提交
- A
  Support Reset for DeviceEvent (#35443) · 8c73c1b5
  由 Aurelius84 提交于 9月 06, 2021
```
* Support Reset for DeviceEvent

* fix code

* add more unittest
```
  8c73c1b5
- Y
  
  Revert hccl check nan (#35438) · c3ad7775
  由 Yuang Liu 提交于 9月 06, 2021
  
  c3ad7775
03 9月, 2021 3 次提交
- Y
  
  Unify the implementation of AlignedVector and simplify the codes of dropout and cast. (#35373) · c171eca2
  由 Yiqun Liu 提交于 9月 03, 2021
  
  c171eca2
- T
  
  fix bn_infer and optimize momentum for kunlun (#35250) · 8305ba37
  由 TTerror 提交于 9月 03, 2021
  
  8305ba37
- L
  
  [NPU] add 32 extra bytes for npu memory slot (#35347) · 668bfb35
  由 Leo Chen 提交于 9月 03, 2021
  
  668bfb35
02 9月, 2021 2 次提交

Add SVD Op and it's GPU and CPU kernel (#34953) · 7e5fb462

由 xiongkun 提交于 9月 02, 2021

* Add SVD Op and it's GPU and CPU kernel

* Remove CUDAPlace in test_svd_op, make the test available in CPU package

* modfity the file

* fix windows bug/ fix ROCM / fix test timeout

* for pass the CIs

* improve error report

* for code review

* some modification to test_svd_op

* change python code style

* expose the svd interface for document

7e5fb462

B

[npu] add update_loss_scaling npu min value (#35270) · 280d7421
由 Baibaifan 提交于 9月 02, 2021

280d7421

01 9月, 2021 2 次提交

Added slice BF16/FP32 FWD/BWD kernels (#34332) · 070cab11

由 jakpiase 提交于 9月 01, 2021

* aded slice FWD FP32

* added tests for slice FWD FP32

* added slice bwd

* added bf16 tests

* CI fix

* CI fix

* added reason to skip_if

* minor change

* temporary fix for failing test

* temporary fix

* changes after review

* CI rerun

070cab11

Q
support KL label smooth (#35177) · 7ca28bb6
由 QingshuChen 提交于 9月 01, 2021
```
* support KL label smooth

* update UT for KL label_smooth
```
7ca28bb6

31 8月, 2021 1 次提交

Support CostInfo and MemProfiler in InterpreterCore (#34981) · 572bad8a

由 Aurelius84 提交于 8月 31, 2021

* polish code

* fix unittest on windows

* refine pybind interface

* support statistic MemSize of AllocatorPool

* Replace mutex into atomic

572bad8a

30 8月, 2021 2 次提交
- J
  
  - candidate fix (#35231) · ca4d2fca
  由 Jacek Czaja 提交于 8月 30, 2021
  
  ca4d2fca
- A
  Abstract GenerateDeviceEventFlag to shield platforms (#35219) · 20cfa8ba
  由 Aurelius84 提交于 8月 30, 2021
```
* Abstract GenerateDeviceEventFlag to shield platforms

* Remove get_cuda_flags
```
  20cfa8ba
27 8月, 2021 1 次提交
- A
  Polish DeviceEvent interface and Remove #ifdef in InterpreterCore (#35196) · 48bf7cbf
  由 Aurelius84 提交于 8月 27, 2021
```
* add CPUDeiveEvent

* Polish DeviceEvent code

* Add DEVICE_EVENT_LIBS
```
  48bf7cbf
26 8月, 2021 1 次提交

[oneDNN] disable caching oneDNN primitives in matmul v2, Reduce grad and... · 31f0221f

由 Jacek Czaja 提交于 8月 26, 2021

[oneDNN] disable caching oneDNN primitives in  matmul v2, Reduce grad and elementwise_add grad, expand_v2 (#35132)

* - grad caching disabled of matmul_v1

- compilation fix

- compilation fix

* - reduction removed

* - Matmul v2 disabled caching

* Draft of further changes

* - workaround for reducegrad

* - fixes to UT

* - fix to compilation

* - another fix

* - fix

31f0221f

25 8月, 2021 1 次提交
- T
  
  update elementwise api in kunlun (#35021) · ff96a7d5
  由 taixiurong 提交于 8月 25, 2021
  
  ff96a7d5
24 8月, 2021 1 次提交
- G
  
  Add flags to control whether to check Nan value of hccl_allreduce_sum. (#35093) · 5b737834
  由 gongweibao 提交于 8月 24, 2021
  
  5b737834
23 8月, 2021 1 次提交
- B
  
  [CPU] Enable barrier op upon gloo (#34671) · e8f146a9
  由 Bo Liu 提交于 8月 23, 2021
  
  e8f146a9
19 8月, 2021 1 次提交

Abstract DeviceEvent to manage cross-platform Event implementation (#34922) · 22da1907

由 Aurelius84 提交于 8月 19, 2021

* add device_context

* add gtest for device_event_gpu

* Remvoe duplicate DeviceType

* push for test

* add unittest

* fix macros

* fix MSVC using usage

22da1907

18 8月, 2021 1 次提交

Add function to disable paddle signal handler (#34577) · dd533dd3

由 Zhanlue Yang 提交于 8月 18, 2021

* Add function to disable paddle signal handler

Paddle used google::InstallFaultSignalHandler to handle selected system signals,
mainly for debugging and bug report purposes.

However, this can be conflicted with other python packages whoever captures similar signals.
Such python package involves tvm and more

To resolve this issue, we support a function to disable signal handler

* Remove signal test from WIN32 platform

* Remove redundant return from disable_signal_handler() function

* Add detailed messages to en_doc

dd533dd3

17 8月, 2021 2 次提交

Copy boost optional to Paddle (#34780) · 9be41447

由 chentianyu03 提交于 8月 17, 2021

* copy boost optional.hpp to paddle

* copy boost optional.hpp to paddle

* move directions

* del fluid/utils

* modify .hpp to .h

* move directions

* modify to paddle::optional

* add modification description

* format code stype for the files in paddle/utils

* format code stype

9be41447

[oneDNN ] disabling more ops caching (#34830) · f1c1d9e0

由 Jacek Czaja 提交于 8月 17, 2021

* - disabled caching of layer norm

- fix in compilation

- compilation fix

- transpose caching disabled

- compilation fix

- more compilation fixes

- sum caching disabled

- compilation fix

* - LRN with disabled cache

* lint fixes

f1c1d9e0

16 8月, 2021 1 次提交

[oneDNN] Fix to 34554 (same as previous PR but should build with GPU) (#34859) · 9cb65653

由 Jacek Czaja 提交于 8月 16, 2021

* - Added softmax without caching

* - Binary is no longer manually cached

* - Activation onednn caching removed

* - Removed manual caching of activation

* - modified UT

* - fix

* - fix

* - fixes to building

* - fix

* - fix

* - fix to UT

* - Faulty UT workaround

* - approval workaround

* - Fixes after review

* - compilation fixes

* - more lint fixes

* - more fixes after review

* - fixes after another round of review

* - hopefully compilation fix

- compilation fix

9cb65653

13 8月, 2021 2 次提交

New Einsum API (#33821) · 8c8667f0

由 Tongxin Bai 提交于 8月 13, 2021

* OP dot: refactor CPU kernels and get better loop performance.

* Minor fix on code format.

* Fixed minor errors.

* Add new API: einsum

* Update the Einsum unit test.

One case failed with matmul_v2, where the dtype is int64:

a = np.arange(2 * 3 * 1).reshape(2, 3, 1)
b = np.arange(1)
paddle.einsum("...i, ...i", a, b)

* Test cases in test_einsum test floating point dtypes only.

As of now Paddle only supports float/double dtypes in matmul, which is
one of building blocks of this Einsum implementation. We decide not to
test einsum against other dtypes.

* Polish format.

* More formatting.

* Format...

* Einsum: improve test coverage.

* Einsum: bug fixes and more testcases for testing error messages

* Einsum: fix format..

* Einsum: fixed typo and format.

* Einsum: format again...

* Einsum: applied suggested changes.

* Einsum API: improve API documentation.

* Einsum API: apply suggested changes.

* Einsum API: Add dygraph only note.

* Einsum API: Add dygraph only note.

* Einsum API: fixed unittest.

8c8667f0

B

add retry for gethostbyname (#34855) · e92f0388
由 Baibaifan 提交于 8月 13, 2021

e92f0388

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功