提交 · 48bf7cbf13c41a67576237830f76e3806e8d6c12 · PaddlePaddle / Paddle

27 8月, 2021 1 次提交
- A
  Polish DeviceEvent interface and Remove #ifdef in InterpreterCore (#35196) · 48bf7cbf
  由 Aurelius84 提交于 8月 27, 2021
```
* add CPUDeiveEvent

* Polish DeviceEvent code

* Add DEVICE_EVENT_LIBS
```
  48bf7cbf
26 8月, 2021 1 次提交

[oneDNN] disable caching oneDNN primitives in matmul v2, Reduce grad and... · 31f0221f

由 Jacek Czaja 提交于 8月 26, 2021

[oneDNN] disable caching oneDNN primitives in  matmul v2, Reduce grad and elementwise_add grad, expand_v2 (#35132)

* - grad caching disabled of matmul_v1

- compilation fix

- compilation fix

* - reduction removed

* - Matmul v2 disabled caching

* Draft of further changes

* - workaround for reducegrad

* - fixes to UT

* - fix to compilation

* - another fix

* - fix

31f0221f

25 8月, 2021 1 次提交
- T
  
  update elementwise api in kunlun (#35021) · ff96a7d5
  由 taixiurong 提交于 8月 25, 2021
  
  ff96a7d5
24 8月, 2021 1 次提交
- G
  
  Add flags to control whether to check Nan value of hccl_allreduce_sum. (#35093) · 5b737834
  由 gongweibao 提交于 8月 24, 2021
  
  5b737834
23 8月, 2021 1 次提交
- B
  
  [CPU] Enable barrier op upon gloo (#34671) · e8f146a9
  由 Bo Liu 提交于 8月 23, 2021
  
  e8f146a9
19 8月, 2021 1 次提交

Abstract DeviceEvent to manage cross-platform Event implementation (#34922) · 22da1907

由 Aurelius84 提交于 8月 19, 2021

* add device_context

* add gtest for device_event_gpu

* Remvoe duplicate DeviceType

* push for test

* add unittest

* fix macros

* fix MSVC using usage

22da1907

18 8月, 2021 1 次提交

Add function to disable paddle signal handler (#34577) · dd533dd3

由 Zhanlue Yang 提交于 8月 18, 2021

* Add function to disable paddle signal handler

Paddle used google::InstallFaultSignalHandler to handle selected system signals,
mainly for debugging and bug report purposes.

However, this can be conflicted with other python packages whoever captures similar signals.
Such python package involves tvm and more

To resolve this issue, we support a function to disable signal handler

* Remove signal test from WIN32 platform

* Remove redundant return from disable_signal_handler() function

* Add detailed messages to en_doc

dd533dd3

17 8月, 2021 2 次提交

Copy boost optional to Paddle (#34780) · 9be41447

由 chentianyu03 提交于 8月 17, 2021

* copy boost optional.hpp to paddle

* copy boost optional.hpp to paddle

* move directions

* del fluid/utils

* modify .hpp to .h

* move directions

* modify to paddle::optional

* add modification description

* format code stype for the files in paddle/utils

* format code stype

9be41447

[oneDNN ] disabling more ops caching (#34830) · f1c1d9e0

由 Jacek Czaja 提交于 8月 17, 2021

* - disabled caching of layer norm

- fix in compilation

- compilation fix

- transpose caching disabled

- compilation fix

- more compilation fixes

- sum caching disabled

- compilation fix

* - LRN with disabled cache

* lint fixes

f1c1d9e0

16 8月, 2021 1 次提交

[oneDNN] Fix to 34554 (same as previous PR but should build with GPU) (#34859) · 9cb65653

由 Jacek Czaja 提交于 8月 16, 2021

* - Added softmax without caching

* - Binary is no longer manually cached

* - Activation onednn caching removed

* - Removed manual caching of activation

* - modified UT

* - fix

* - fix

* - fixes to building

* - fix

* - fix

* - fix to UT

* - Faulty UT workaround

* - approval workaround

* - Fixes after review

* - compilation fixes

* - more lint fixes

* - more fixes after review

* - fixes after another round of review

* - hopefully compilation fix

- compilation fix

9cb65653

13 8月, 2021 2 次提交

New Einsum API (#33821) · 8c8667f0

由 Tongxin Bai 提交于 8月 13, 2021

* OP dot: refactor CPU kernels and get better loop performance.

* Minor fix on code format.

* Fixed minor errors.

* Add new API: einsum

* Update the Einsum unit test.

One case failed with matmul_v2, where the dtype is int64:

a = np.arange(2 * 3 * 1).reshape(2, 3, 1)
b = np.arange(1)
paddle.einsum("...i, ...i", a, b)

* Test cases in test_einsum test floating point dtypes only.

As of now Paddle only supports float/double dtypes in matmul, which is
one of building blocks of this Einsum implementation. We decide not to
test einsum against other dtypes.

* Polish format.

* More formatting.

* Format...

* Einsum: improve test coverage.

* Einsum: bug fixes and more testcases for testing error messages

* Einsum: fix format..

* Einsum: fixed typo and format.

* Einsum: format again...

* Einsum: applied suggested changes.

* Einsum API: improve API documentation.

* Einsum API: apply suggested changes.

* Einsum API: Add dygraph only note.

* Einsum API: Add dygraph only note.

* Einsum API: fixed unittest.

8c8667f0

B

add retry for gethostbyname (#34855) · e92f0388
由 Baibaifan 提交于 8月 13, 2021

e92f0388

12 8月, 2021 2 次提交
- C
  Remove incorrect signal error stack trace (#34842) · 572adccd
  由 Chen Weihang 提交于 8月 12, 2021
```
* remove unmatched signal error stack

* fix error writing for cond
```
  572adccd
- C
  Revert "[oneDNN] Fix to issue #34554 (#34623)" (#34838) · dc62a227
  由 Chen Weihang 提交于 8月 12, 2021
```
This reverts commit 0a5c99e8.
```
  dc62a227
11 8月, 2021 1 次提交

[oneDNN] Fix to issue #34554 (#34623) · 0a5c99e8

由 Jacek Czaja 提交于 8月 11, 2021

* - Added softmax without caching

* - Binary is no longer manually cached

* - Activation onednn caching removed

* - Removed manual caching of activation

* - modified UT

* - fix

* - fix

* - fixes to building

* - fix

* - fix

* - fix to UT

* - Faulty UT workaround

* - approval workaround

* - Fixes after review

* - compilation fixes

* - more lint fixes

* - more fixes after review

* - fixes after another round of review

0a5c99e8

10 8月, 2021 1 次提交
- C
  
  add cudaEvent destructor function (#34734) · f30a5c42
  由 chentianyu03 提交于 8月 10, 2021
  
  f30a5c42
09 8月, 2021 1 次提交
- C
  Revert "add CuddEvent destructor function (#34610)" (#34720) · bf545344
  由 chentianyu03 提交于 8月 09, 2021
```
This reverts commit 090c863a.
```
  bf545344
06 8月, 2021 1 次提交
- Q
  support kunlun black list and add kl1 op (#34605) · 21beef91
  由 QingshuChen 提交于 8月 06, 2021
```
* support kunlun black list and add kl1 op

* xpu_op_list add device_context dependence
```
  21beef91
05 8月, 2021 3 次提交
- C
  remove boost::algorithm::ends_with ，boost macro and boost::lexical_cast apis (#34310) · bb7b4c0c
  由 chentianyu03 提交于 8月 05, 2021
```
* replace boost::algorithm::ends_with with self define ends_with function

* remove BOOST macro in certain operators

* remove boost::lexical_cast

* add test for string_helper

* add more test case for string_helper

* modify join_string func and test case

* fix build_strategy_test failed bug

* remove string_helper_test from parallel_UT_rule.py
```
  bb7b4c0c
- L
  
  Support Ternary ops in elmentwise and broadcast (#33976) · 1d7b75dd
  由 limingshu 提交于 8月 05, 2021
  
  1d7b75dd
- C
  
  add CuddEvent destructor function (#34610) · 090c863a
  由 chentianyu03 提交于 8月 05, 2021
  
  090c863a
04 8月, 2021 1 次提交
- L
  
  Set Tensor Core MathType for bfloat16 in conv using cudnn (#34409) · c79fa1c3
  由 Lijunhui 提交于 8月 04, 2021
  
  c79fa1c3
03 8月, 2021 2 次提交
- W
  
  [hybrid] remove the using of global ring in hybrid parallel (#34525) · 56b7ebbc
  由 WangXi 提交于 8月 03, 2021
  
  56b7ebbc
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
30 7月, 2021 2 次提交
- J
  Added expand_v2 BF16/FP32 FWD/BWD kernels (#34284) · 41c4f723
  由 jakpiase 提交于 7月 30, 2021
```
* added expand_v2 bf16/fp32 kernel

* minor change

* CI fix

* added missing test file

* added formatting

* reduced binary size

* CI fix
```
  41c4f723
- L
  
  [NPU] support npu config on aclinit (#34500) · 6c09496a
  由 Leo Chen 提交于 7月 30, 2021
  
  6c09496a
29 7月, 2021 1 次提交

add fix op run order pass (#34427) · 79e758c6

由 Zeng Jinle 提交于 7月 29, 2021

* add fix op run order pass

* add ut for fix_op_run_order

* fix ci error

* improve coverage

* improve coverge again and fix cpu test case

* follow some comments

79e758c6

20 7月, 2021 2 次提交
- 李
  Fix cast op that can not cast the arrays that the size of arrays is beyond int32 (#34209) · 038883fd
  由李季提交于 7月 20, 2021
```
* fix cast
```
  038883fd
- C
  
  fix cuda_stream missing mkldnn depending error (#34260) · c963a21d
  由 chentianyu03 提交于 7月 20, 2021
  
  c963a21d
19 7月, 2021 2 次提交

Q

[NPU] add is_empty_op_npu, test=develop (#34234) · d4fb5c68
由 Qi Li 提交于 7月 19, 2021

d4fb5c68

Add Cuda event and stream API (#32460) · 9c7f6af5

由 chentianyu03 提交于 7月 19, 2021

* add cuda event and stream api

* add cuda event and stream api

* add get_current_stream api

* add get_current_stream api

* init streams

* modify get_current_stream

* modify get_cuttent_stream

* add synchronize func

* add current_stream doc and test file

* move get_current_stream into CUDA macro

* move CudaEvent into CUDA macro

* move _get_current_stream and _device_synchronize into cuda macro

* modify the macro of cuda stream and event

* add test case for synchronize

* add paddle.devices.cuda module

* event and stream support hip

* add doc for stream and event class

* move cuda stream and event into single pybind

* add cuda_streams_py.cc to cmakelist

* add _device_synchronize and _get_current_stream to core module

* add test case for cudastream and cudaevent

* move __all__ in streams.py

* fix test fail

* add cuda to devices __all__

* fix current_stream doc writing error

* move devices to device direction, and merge device.py into __init__.py

* add required:gpu to sample codes

* remove cuda direction from device/__init__.py

9c7f6af5

15 7月, 2021 1 次提交
- A
  Upgrade Executor into ParallelExcutor to apply Graph Optimization in @to_static (#32283) · 2850391d
  由 Aurelius84 提交于 7月 15, 2021
```
* Refine Constructor logic of ParallelExecutor

* Replace executor into ParallelExecutor in run_program_op
```
  2850391d
13 7月, 2021 1 次提交
- L
  
  change hccl_helper as commid helper (#34118) · 1edf4374
  由 LiuWei 提交于 7月 13, 2021
  
  1edf4374
12 7月, 2021 1 次提交
- Z
  
  optimize perfermance of multiple-dimension reduce (#33761) · 2dde0eb0
  由 Zhang Zheng 提交于 7月 12, 2021
  
  2dde0eb0
07 7月, 2021 1 次提交
- F
  
  add no tensorrt warning (#33874) · 758dd7bb
  由 feng_shuai 提交于 7月 07, 2021
  
  758dd7bb
29 6月, 2021 1 次提交
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
24 6月, 2021 2 次提交
- J
  [oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN elementwise ops (#33549) · 049dd853
  由 Jacek Czaja 提交于 6月 24, 2021
```
* - fix to #33282

* - Increased threshold for elementwise_mul_bf16 grad

* -disabled faulty UT

* - fix to approval
```
  049dd853
- Z
  Modify the search order of dynamic library (#33722) · 6801b6e2
  由 Zhou Wei 提交于 6月 24, 2021
```
* Modify the search order of dynamic library

* Modify the search order of dynamic library
```
  6801b6e2
23 6月, 2021 1 次提交

Added split op bf16/fp32 oneDNN kernel (#33584) · 68106509

由 jakpiase 提交于 6月 23, 2021

* base changes for split op

* 90% of split functionality added

* full fp32 functionality

* added bf16 test

* added submemory caching

* added bf test to static mode whitelist

* minor change

* enabled split op for inference

* minor fix

* minor fix

68106509

21 6月, 2021 1 次提交

[NPU] flatten params and grads, fuse grad_clip and optimizer op (#33461) · c269a160

由 Leo Chen 提交于 6月 21, 2021

* enable npu alignment

* support flatten_params/grads

* support clip by global norm

* remove memset in coalesce_tensor_op

* fix npu kernel of sum op when input is one tensor

* add ut for flatten_param_grads+regularizer

* fix ut

* fix typo

c269a160

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功