提交 · ff6a145011fd9d792133643c29a97ae5d1c1bfe9 · 机器未来 / Paddle

14 12月, 2020 4 次提交
- W
  
  gen nccl id use socket (#29431) · 467c7169
  由 WangXi 提交于 12月 14, 2020
  
  467c7169
- L
  Fix compile problem when cuda_arch < 6000 (#29576) · c0163837
  由 Leo Chen 提交于 12月 14, 2020
```
* fix compile problem when cuda_arch < 6000

* refine code

* refine code
```
  c0163837
- Q
  support roi_align & affine_channel for kunlun (#29561) · 79a41a9e
  由 QingshuChen 提交于 12月 14, 2020
```
* support roi_align & affine_channel for kunlun

* minor
```
  79a41a9e
- J
  
  [oneDNN] Making ThreadID info in caching key optional (#29272) · f6cca625
  由 Jacek Czaja 提交于 12月 14, 2020
  
  f6cca625
11 12月, 2020 5 次提交

L

remove duplicated macro (#29563) · 1e72e032
由 Leo Chen 提交于 12月 11, 2020

1e72e032

improve dropout (#29465) · 6702040e

由 Zhang Ting 提交于 12月 11, 2020

* improve drop out

* add VectorizedRandomGeneratorWithGenerator

* fix bug

* modify according to comments

6702040e

Z

add cast cuda kernel (#29352) · 30d9589a
由 Zhang Ting 提交于 12月 11, 2020

30d9589a

Add the strategy of skipping cc/cu test compilation and execution in CI (#29499) · b5d4a1f3

由 LoveAn 提交于 12月 11, 2020

* Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop

* fix if error with CI_SKIP_TEST, test=develop

* fix add properties to test error on Linux/MAC, test=develop

* fix set test properties of test_code_generator error, test=develop

* remove test codes and advance judgment of file modification on Linux, test=develop

* rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix

* Add branch judgement on Linux, test=develop

b5d4a1f3

T
add xpu ops for training transformer in kunlun (#29539) · 760d015c
由 taixiurong 提交于 12月 11, 2020
```
* 1.fix matmul bug 2. add one hot

* add xpu error msg
```
760d015c

10 12月, 2020 4 次提交
- Z
  fix p_norm with empty shape (#29500) · 60bfd308
  由 Zhong Hui 提交于 12月 10, 2020
```
fix p_norm with empty shape (#29500)
```
  60bfd308
- L
  Layernorm opt (#29522) · 9f926eb7
  由 Leo Chen 提交于 12月 10, 2020
```
* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
```
  9f926eb7
- S
  
  fix error message of gather nd (#29521) · d8391a19
  由 ShenLiang 提交于 12月 10, 2020
  
  d8391a19
- Z
  Remove tensor copy in the update_loss_scaling op. (#29426) · 5ac71b36
  由 Zhen Wang 提交于 12月 10, 2020
```
* remove tensor copy in the update_loss_scaling op

* not use thrust.

* fix some cuda memory access error.
```
  5ac71b36
09 12月, 2020 4 次提交
- J
  Add tangent operator (#29207) · 87e75a77
  由 joejiong 提交于 12月 09, 2020
```
As the title
```
  87e75a77
- Z
  Softmax vectorization (#29404) · 95e33481
  由 zlsh80826 提交于 12月 09, 2020
```
* vec softmax fw

* vec softmax bw

* add a message argument for compiler compatibility
```
  95e33481
- P
  
  support mobilenet for kunlun (#29458) · 3a055833
  由 procr 提交于 12月 09, 2020
  
  3a055833
- L
  
  make gelu fp16 computing more robust (#29484) · e5e52249
  由 Leo Chen 提交于 12月 09, 2020
  
  e5e52249
08 12月, 2020 5 次提交
- Z
  Revert "improve elementwise_add_grad perf (#29277)" (#29464) · 560b4323
  由 Zhang Ting 提交于 12月 08, 2020
```
This reverts commit befd6d53.
```
  560b4323
- J
  added internal and external reorders to profiler (#29443) · 57a4f16d
  由 jakpiase 提交于 12月 08, 2020
```
* added external reorder to profiler

* added external and internal reorders to profiler

* added internal and external reorder to profiler

* added formatting to int/ext reorder commit

* removed unnecessary comment
```
  57a4f16d
- T
  1. fix elementwise ops'bug 2. fix softmax_with_cross_entropy_op 3. add biliner_interp_op (#29448) · ecca6585
  由 taixiurong 提交于 12月 08, 2020
```
Co-authored-by: Nroot <root@bjhw-sys-rpm0223.bjhw.baidu.com>
```
  ecca6585
- T
  update reduce_sum op on xpu (#29367) · a5fcc4b5
  由 TTerror 提交于 12月 08, 2020
```
* update reduce_sum op on xpu

* update reduce_sum op on xpu

* support running on xpu
```
  a5fcc4b5
- J
  
  Fix gru performace decline in 1.8.5 (#29455) · c7cada85
  由 Jack Zhou 提交于 12月 08, 2020
  
  c7cada85
07 12月, 2020 4 次提交

Z

revert cast eigen kernel (#29427) · 6296f4ed
由 Zhang Ting 提交于 12月 07, 2020

6296f4ed
L

fix layer_norm accuracy (#29434) · a040c055
由 Leo Chen 提交于 12月 07, 2020

a040c055
L

refine reshape grad and double grad kernel, use tensor copy async (#29128) · 4e19ce1d
由 Leo Chen 提交于 12月 07, 2020

4e19ce1d

Compiling operator libraries with Unity build (#29130) · 671555ed

由 LoveAn 提交于 12月 07, 2020

* Compiling operator libraries with Unity Build on Windows CPU.

* Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci

* Add option in windows ci script, no_test, test=windows_ci

* Optimize parallel compiling, test=develop

* remove limit of parallel compile and skip some ops in UB, test=develop

* remove changes of header file, test=develop

* remove changes of header file, test=develop

* fix test_eye_op unittest failed, test=develop

* Compiling operator libraries with Unity Build on Linux, test=develop

* set default WITH_UNITY_BUILD=OFF, test=develop

* Move unity build rules into a single file and add comment, test=develop

* optimize parallel compilation, test=develop

* fix undefined reference error on coverage ci, test=develop

671555ed

04 12月, 2020 4 次提交

Make transpose, trace, kron, reshape, sum op support complex type (#29321) · 879e913b

由 chentianyu03 提交于 12月 04, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

* kron, reshape, transpose support complex types

* sum and trace op support complex types

* add test case of sum and trace op

* fix the bug of imag part of complex not initialized

* format file

* format code style

* kron support type promotion; modify test cases

879e913b

卖
fix expand/uniform_random && concat/transpose to new api on xpu (#29280) · 074065e5
由卖鱼的哲学提交于 12月 04, 2020
```
* fix expand && concat/transpose to new api

* update uniform_random_op

* update xpu_header
```
074065e5
Q
support global pooling for kunlun (#29293) · 74bf3bed
由 QingshuChen 提交于 12月 04, 2020
```
* test=kunlun
```
74bf3bed

Support type promote for basic math ops (quantum required) (#29265) · 9ad800eb

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

9ad800eb

03 12月, 2020 5 次提交
- T
  fix gpu outofrange (#29238) · 83587916
  由 tangwei12 提交于 12月 03, 2020
```
* fix gpu emb out of range

Change-Id: I5794ac73bd634d5ea069a6fbbd914274b6d6b7bf

* fix doc

Change-Id: I5a3350b2930a9ab2f52116c192b087307faf8fdf
```
  83587916
- Z
  improve elementwise_add_grad perf (#29277) · befd6d53
  由 Zhang Ting 提交于 12月 03, 2020
```
* improve performance of elementwise_sum_grad
```
  befd6d53
- S
  fix tensorrt output shape error (#29308) · ebf68919
  由 Shang Zhizhou 提交于 12月 03, 2020
```
* fix tensorrt output shape error

* fix unittest tensorrt_engine_op_test

* fix code style for unitest
```
  ebf68919
- A
  
  [Dy2Stat] Add cache for Executor and Context in run_program_op (#28421) · 67c700b4
  由 Aurelius84 提交于 12月 03, 2020
  
  67c700b4
- W
  
  polish the code of cumsum and remove some unused code (#29303) · c4be80f4
  由 wangchaochaohu 提交于 12月 03, 2020
  
  c4be80f4
02 12月, 2020 5 次提交

S

enforce the matmul_v2 error message (#29297) · 0fb18bc2
由 ShenLiang 提交于 12月 02, 2020

0fb18bc2
Z

Remove some useless log. (#29300) · 9b59a589
由 Zhen Wang 提交于 12月 02, 2020

9b59a589
L

fix shape of tile_grad op (#29289) · 13a22a37
由 Leo Chen 提交于 12月 02, 2020

13a22a37

Add pure fp16 training with master weights. (#27712) · be3777a5

由 Zhen Wang 提交于 12月 02, 2020

* add the weight decay func for the momentum op

* Add the multi_precision function in Momentum Optimizer.

* Make sure that the initial value of master weights are same with the fp16 weights.

* add static loss scaling.

* add the rescale_grad function in the pure fp16 training.

* use the original momentum updating method.

* Polish some codes, such as variable names.

* add docstring for apis.

* update the var creation details of _create_master_weight.

* not modify codes about imperative momentum updating.

* Fix the error of test_dist_sparse_tensor_load_momentum UT.

* add unit test for multi precision fp16 training.

* add more unit tests for CI.

* Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.

* For CI Coverage Checking.

be3777a5

Layer norm fp16 (#29169) · 7584bb50

由 furnace 提交于 12月 02, 2020

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

7584bb50

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致