提交 · 9be41447baaa8c7d398189d8f98a574beec9c750 · BaiXuePrincess / Paddle

17 8月, 2021 2 次提交

Copy boost optional to Paddle (#34780) · 9be41447

由 chentianyu03 提交于 8月 17, 2021

* copy boost optional.hpp to paddle

* copy boost optional.hpp to paddle

* move directions

* del fluid/utils

* modify .hpp to .h

* move directions

* modify to paddle::optional

* add modification description

* format code stype for the files in paddle/utils

* format code stype

9be41447

[oneDNN ] disabling more ops caching (#34830) · f1c1d9e0

由 Jacek Czaja 提交于 8月 17, 2021

* - disabled caching of layer norm

- fix in compilation

- compilation fix

- transpose caching disabled

- compilation fix

- more compilation fixes

- sum caching disabled

- compilation fix

* - LRN with disabled cache

* lint fixes

f1c1d9e0

16 8月, 2021 1 次提交

[oneDNN] Fix to 34554 (same as previous PR but should build with GPU) (#34859) · 9cb65653

由 Jacek Czaja 提交于 8月 16, 2021

* - Added softmax without caching

* - Binary is no longer manually cached

* - Activation onednn caching removed

* - Removed manual caching of activation

* - modified UT

* - fix

* - fix

* - fixes to building

* - fix

* - fix

* - fix to UT

* - Faulty UT workaround

* - approval workaround

* - Fixes after review

* - compilation fixes

* - more lint fixes

* - more fixes after review

* - fixes after another round of review

* - hopefully compilation fix

- compilation fix

9cb65653

13 8月, 2021 2 次提交

New Einsum API (#33821) · 8c8667f0

由 Tongxin Bai 提交于 8月 13, 2021

* OP dot: refactor CPU kernels and get better loop performance.

* Minor fix on code format.

* Fixed minor errors.

* Add new API: einsum

* Update the Einsum unit test.

One case failed with matmul_v2, where the dtype is int64:

a = np.arange(2 * 3 * 1).reshape(2, 3, 1)
b = np.arange(1)
paddle.einsum("...i, ...i", a, b)

* Test cases in test_einsum test floating point dtypes only.

As of now Paddle only supports float/double dtypes in matmul, which is
one of building blocks of this Einsum implementation. We decide not to
test einsum against other dtypes.

* Polish format.

* More formatting.

* Format...

* Einsum: improve test coverage.

* Einsum: bug fixes and more testcases for testing error messages

* Einsum: fix format..

* Einsum: fixed typo and format.

* Einsum: format again...

* Einsum: applied suggested changes.

* Einsum API: improve API documentation.

* Einsum API: apply suggested changes.

* Einsum API: Add dygraph only note.

* Einsum API: Add dygraph only note.

* Einsum API: fixed unittest.

8c8667f0

B

add retry for gethostbyname (#34855) · e92f0388
由 Baibaifan 提交于 8月 13, 2021

e92f0388

12 8月, 2021 2 次提交
- C
  Remove incorrect signal error stack trace (#34842) · 572adccd
  由 Chen Weihang 提交于 8月 12, 2021
```
* remove unmatched signal error stack

* fix error writing for cond
```
  572adccd
- C
  Revert "[oneDNN] Fix to issue #34554 (#34623)" (#34838) · dc62a227
  由 Chen Weihang 提交于 8月 12, 2021
```
This reverts commit 0a5c99e8.
```
  dc62a227
11 8月, 2021 1 次提交

[oneDNN] Fix to issue #34554 (#34623) · 0a5c99e8

由 Jacek Czaja 提交于 8月 11, 2021

* - Added softmax without caching

* - Binary is no longer manually cached

* - Activation onednn caching removed

* - Removed manual caching of activation

* - modified UT

* - fix

* - fix

* - fixes to building

* - fix

* - fix

* - fix to UT

* - Faulty UT workaround

* - approval workaround

* - Fixes after review

* - compilation fixes

* - more lint fixes

* - more fixes after review

* - fixes after another round of review

0a5c99e8

10 8月, 2021 1 次提交
- C
  
  add cudaEvent destructor function (#34734) · f30a5c42
  由 chentianyu03 提交于 8月 10, 2021
  
  f30a5c42
09 8月, 2021 1 次提交
- C
  Revert "add CuddEvent destructor function (#34610)" (#34720) · bf545344
  由 chentianyu03 提交于 8月 09, 2021
```
This reverts commit 090c863a.
```
  bf545344
06 8月, 2021 1 次提交
- Q
  support kunlun black list and add kl1 op (#34605) · 21beef91
  由 QingshuChen 提交于 8月 06, 2021
```
* support kunlun black list and add kl1 op

* xpu_op_list add device_context dependence
```
  21beef91
05 8月, 2021 3 次提交
- C
  remove boost::algorithm::ends_with ，boost macro and boost::lexical_cast apis (#34310) · bb7b4c0c
  由 chentianyu03 提交于 8月 05, 2021
```
* replace boost::algorithm::ends_with with self define ends_with function

* remove BOOST macro in certain operators

* remove boost::lexical_cast

* add test for string_helper

* add more test case for string_helper

* modify join_string func and test case

* fix build_strategy_test failed bug

* remove string_helper_test from parallel_UT_rule.py
```
  bb7b4c0c
- L
  
  Support Ternary ops in elmentwise and broadcast (#33976) · 1d7b75dd
  由 limingshu 提交于 8月 05, 2021
  
  1d7b75dd
- C
  
  add CuddEvent destructor function (#34610) · 090c863a
  由 chentianyu03 提交于 8月 05, 2021
  
  090c863a
04 8月, 2021 1 次提交
- L
  
  Set Tensor Core MathType for bfloat16 in conv using cudnn (#34409) · c79fa1c3
  由 Lijunhui 提交于 8月 04, 2021
  
  c79fa1c3
03 8月, 2021 2 次提交
- W
  
  [hybrid] remove the using of global ring in hybrid parallel (#34525) · 56b7ebbc
  由 WangXi 提交于 8月 03, 2021
  
  56b7ebbc
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
30 7月, 2021 2 次提交
- J
  Added expand_v2 BF16/FP32 FWD/BWD kernels (#34284) · 41c4f723
  由 jakpiase 提交于 7月 30, 2021
```
* added expand_v2 bf16/fp32 kernel

* minor change

* CI fix

* added missing test file

* added formatting

* reduced binary size

* CI fix
```
  41c4f723
- L
  
  [NPU] support npu config on aclinit (#34500) · 6c09496a
  由 Leo Chen 提交于 7月 30, 2021
  
  6c09496a
29 7月, 2021 1 次提交

add fix op run order pass (#34427) · 79e758c6

由 Zeng Jinle 提交于 7月 29, 2021

* add fix op run order pass

* add ut for fix_op_run_order

* fix ci error

* improve coverage

* improve coverge again and fix cpu test case

* follow some comments

79e758c6

20 7月, 2021 2 次提交
- 李
  Fix cast op that can not cast the arrays that the size of arrays is beyond int32 (#34209) · 038883fd
  由李季提交于 7月 20, 2021
```
* fix cast
```
  038883fd
- C
  
  fix cuda_stream missing mkldnn depending error (#34260) · c963a21d
  由 chentianyu03 提交于 7月 20, 2021
  
  c963a21d
19 7月, 2021 2 次提交

Q

[NPU] add is_empty_op_npu, test=develop (#34234) · d4fb5c68
由 Qi Li 提交于 7月 19, 2021

d4fb5c68

Add Cuda event and stream API (#32460) · 9c7f6af5

由 chentianyu03 提交于 7月 19, 2021

* add cuda event and stream api

* add cuda event and stream api

* add get_current_stream api

* add get_current_stream api

* init streams

* modify get_current_stream

* modify get_cuttent_stream

* add synchronize func

* add current_stream doc and test file

* move get_current_stream into CUDA macro

* move CudaEvent into CUDA macro

* move _get_current_stream and _device_synchronize into cuda macro

* modify the macro of cuda stream and event

* add test case for synchronize

* add paddle.devices.cuda module

* event and stream support hip

* add doc for stream and event class

* move cuda stream and event into single pybind

* add cuda_streams_py.cc to cmakelist

* add _device_synchronize and _get_current_stream to core module

* add test case for cudastream and cudaevent

* move __all__ in streams.py

* fix test fail

* add cuda to devices __all__

* fix current_stream doc writing error

* move devices to device direction, and merge device.py into __init__.py

* add required:gpu to sample codes

* remove cuda direction from device/__init__.py

9c7f6af5

15 7月, 2021 1 次提交
- A
  Upgrade Executor into ParallelExcutor to apply Graph Optimization in @to_static (#32283) · 2850391d
  由 Aurelius84 提交于 7月 15, 2021
```
* Refine Constructor logic of ParallelExecutor

* Replace executor into ParallelExecutor in run_program_op
```
  2850391d
13 7月, 2021 1 次提交
- L
  
  change hccl_helper as commid helper (#34118) · 1edf4374
  由 LiuWei 提交于 7月 13, 2021
  
  1edf4374
12 7月, 2021 1 次提交
- Z
  
  optimize perfermance of multiple-dimension reduce (#33761) · 2dde0eb0
  由 Zhang Zheng 提交于 7月 12, 2021
  
  2dde0eb0
07 7月, 2021 1 次提交
- F
  
  add no tensorrt warning (#33874) · 758dd7bb
  由 feng_shuai 提交于 7月 07, 2021
  
  758dd7bb
29 6月, 2021 1 次提交
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
24 6月, 2021 2 次提交
- J
  [oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN elementwise ops (#33549) · 049dd853
  由 Jacek Czaja 提交于 6月 24, 2021
```
* - fix to #33282

* - Increased threshold for elementwise_mul_bf16 grad

* -disabled faulty UT

* - fix to approval
```
  049dd853
- Z
  Modify the search order of dynamic library (#33722) · 6801b6e2
  由 Zhou Wei 提交于 6月 24, 2021
```
* Modify the search order of dynamic library

* Modify the search order of dynamic library
```
  6801b6e2
23 6月, 2021 1 次提交

Added split op bf16/fp32 oneDNN kernel (#33584) · 68106509

由 jakpiase 提交于 6月 23, 2021

* base changes for split op

* 90% of split functionality added

* full fp32 functionality

* added bf16 test

* added submemory caching

* added bf test to static mode whitelist

* minor change

* enabled split op for inference

* minor fix

* minor fix

68106509

21 6月, 2021 1 次提交

[NPU] flatten params and grads, fuse grad_clip and optimizer op (#33461) · c269a160

由 Leo Chen 提交于 6月 21, 2021

* enable npu alignment

* support flatten_params/grads

* support clip by global norm

* remove memset in coalesce_tensor_op

* fix npu kernel of sum op when input is one tensor

* add ut for flatten_param_grads+regularizer

* fix ut

* fix typo

c269a160

16 6月, 2021 1 次提交
- J
  [oneDNN] Further ops refactoring of oneDNN cache access (#33515) · f9ce1b1a
  由 Jacek Czaja 提交于 6月 16, 2021
```
* - Draft of implementation of refactoring

- compilation fix

* - Fixes after review

* - Removed unnecessary comment
```
  f9ce1b1a
11 6月, 2021 1 次提交
- R
  
  add expm1_op (#33066) · 5cca9e4c
  由 ronnywang 提交于 6月 11, 2021
  
  5cca9e4c
10 6月, 2021 2 次提交
- Z
  
  fix unittest failure due to the path is too long (#33447) · ab41a9ee
  由 Zhou Wei 提交于 6月 10, 2021
  
  ab41a9ee
- B
  
  dp c_allreduce_sum_fusion op (#33169) · 003b4616
  由 Baibaifan 提交于 6月 10, 2021
  
  003b4616
09 6月, 2021 2 次提交
- J
  [oneDNN] First fix to #33021 (#33174) · 1382cd22
  由 Jacek Czaja 提交于 6月 09, 2021
```
* - First fix to #33021
```
  1382cd22
- fix the bug of yolo_box which can't run on nano and tx2 (#33422) · 626c1edc
  由 s.feng 提交于 6月 09, 2021
  
  626c1edc
02 6月, 2021 1 次提交
- Q
  
  [ROCM] fix fused_fc_elementwise_layernorm, test=develop (#33281) · 3f366fee
  由 Qi Li 提交于 6月 02, 2021
  
  3f366fee

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致