提交 · 4e62af80d57a4c7937a047629ccb130fbc070179 · PaddlePaddle / Paddle

08 9月, 2021 2 次提交

C

Add FP16 PRelu (#35532) · 4e62af80
由 cc 提交于 9月 08, 2021

4e62af80

merge CMakeList.txt manual (#35378) · c4a3e8b4

由 feng_shuai 提交于 9月 08, 2021

* merge CMakeList.txt manual

* add platform for changethreadnum

* repair some bugs according to make error

* do nothing just flush CI

* forget change thread num

* add inplace_atol param for check_output_with_place

* Windows

* std:min and std::max should be change because of windows

c4a3e8b4

01 9月, 2021 1 次提交
- W
  Stablize depthwise conv (#35161) · 3c21f26b
  由 wangguanzhong 提交于 9月 01, 2021
```
* stablize depthwise conv

* clean commend
```
  3c21f26b
27 8月, 2021 1 次提交

Add unpool2d op & Expose max_unpool2d API (#35056) · ceee71a0

由 xiaoting 提交于 8月 27, 2021

* add maxunppol2d op, test=develop

* fix typo, test=develop

* fix unpool unitest, test=develop

* fix unpool code-example, test=develop

* fix for unpool_op_unittest,test=develop

* fix example code, test=develop

* add noqa:F401, test=develop

* fix converage, test=develop

* fix unitest for unpool, test=develop

* rename unpool2d to unpool, test=develop

* rename unpool2d to unpool, test=develop

ceee71a0

17 8月, 2021 1 次提交

Align CTC grad scale same with ESPNet (#34729) · 10f9644c

由 Hui Zhang 提交于 8月 16, 2021

* dygraph support more ctc grad scale

* scale for 1.x

* fix unitest

* fix unitest

* format code

* fix unittest

* fix log info

* unittest cov

* fix format;notest,test=cpu,coverage

* skip ctc_loss egs;test=cpu

* warpctc grad cov;test=coverage

* add dygraph test;test=coverage

* format;test=cpu,coverage

* format;test=cpu

* add api compat;test=cpu

* add cpu test

* rename

* rename

* fix

* fix test

* format

* eigen cpu

* eigen gpu grad pass

* cuda gpu pass

* format

* fix ci

10f9644c

12 8月, 2021 1 次提交

Fix safety-bug of functional.linear (#34696) · 0e28c8bb

由 zhulei 提交于 8月 12, 2021

* Fix safety-bug of functional.linear

* Fix safety-bug of functional.linear

* Fix safety-bug of functional.linear

* Fix safety-bug of functional.linear

0e28c8bb

11 8月, 2021 1 次提交
- W
  
  miss format (#34771) · addd5fce
  由 wenbin 提交于 8月 11, 2021
  
  addd5fce
09 8月, 2021 1 次提交
- L
  
  fix split on empty tensor (#34356) · 898acb1a
  由 Leo Chen 提交于 8月 09, 2021
  
  898acb1a
22 7月, 2021 2 次提交
- W
  
  fix concat bug (#34319) · c342651e
  由 wuhuachaocoding 提交于 7月 22, 2021
  
  c342651e
- C
  Add int16 kernel for lookup_talbe and dequantize_abs_max op (#34275) · 85e531a9
  由 cc 提交于 7月 22, 2021
```
* add int16 kernel for lookup_talbe and dequantize_abs_max op
```
  85e531a9
09 7月, 2021 1 次提交

Use CBLAS for SelectedRows elementwise add operation. (#34008) · 1412d3bc

由 arlesniak 提交于 7月 09, 2021

* Use CBLAS for SelectedRows elementwise add operation. It's faster.

* template compilation fix

* reverted template compilation fix

* slimmed template compilation fix
Co-authored-by: NAdam Osewski <adam.osewski@intel.com>

1412d3bc

07 7月, 2021 1 次提交
- X
  
  [HIP] 解决hipMemcpy无法overlap的问题，修改后AMD GPU性能提升大于10% (#33982) · 20da7703
  由 xiayanming 提交于 7月 07, 2021
  
  20da7703
06 7月, 2021 1 次提交
- L
  
  Optimize the forward of log_softmax for the case when axis is not the last dimention. (#32396) · 69ffb386
  由 Lijunhui 提交于 7月 06, 2021
  
  69ffb386
05 7月, 2021 1 次提交
- W
  
  Add fused elemwise gelu and optimize performance (#33480) · eae31856
  由 WangXi 提交于 7月 05, 2021
  
  eae31856
22 6月, 2021 1 次提交
- Z
  
  fix gpt2 train loss Nan problem (#33658) · 687571f2
  由 zhiboniu 提交于 6月 22, 2021
  
  687571f2
21 6月, 2021 1 次提交

Add AXPY oneDNN handler (#33632) · 773aabc7

由 lidanqing 提交于 6月 21, 2021

* Add oneDNN AXPY handler.

* Add fallback for small tensors.

* Fix ifdefs

* Remove unnecessary namespace prefixes and add missing headers.

* Guard handler_axpy with proper ifdefs.

* Compilation of this function is possible only when Paddle is not build
with CUDA nor HIP.

* Move AXPY handler code to separate files.

* Use oneDNN AXPY handler in SGD op.

* Use axpy handler only when Paddle is built with oneDNN.

* Add test for SUM BF16 with big rows.

* Fix SFINAE rules for elementwise_add_to.

* Add test case for SGD with big rows.

* update

* update
Co-authored-by: NAdam Osewski <adam.osewski@intel.com>

773aabc7

05 6月, 2021 1 次提交
- W
  
  [Paddle-TRT] Add gather_nd and reduce_sum trt op. (#33324) · d194bd3a
  由 Wilber 提交于 6月 05, 2021
  
  d194bd3a
02 6月, 2021 1 次提交
- W
  
  fix iScan C++ problems, test=develop (#33274) · 1b10ccdb
  由 wuhuanzhou 提交于 6月 02, 2021
  
  1b10ccdb
01 6月, 2021 1 次提交

replace and remove complex64/128 types in custom OP and other files (#33195) · 06c63ca0

由 chentianyu03 提交于 6月 01, 2021

* replace and remove complex64/128 types in custom OP and other files

* fix custom_tensor_test fail bug

* fix custom_conj_test fail bug

* fix dispatch_test_op build fail bug

06c63ca0

26 5月, 2021 2 次提交

C
modify matmul Op to complex template types (#33130) · 6c07cd7e
由 chentianyu03 提交于 5月 26, 2021
```
* modify matmul Op to complex template types

* remove complex64/128 head file
```
6c07cd7e

optimize OP's compilation time (#32617) · 78ecb668

由 wuhuanzhou 提交于 5月 26, 2021

* optimize OP's compilation time, test=develop

* add more op and run ci test, test=develop

* CUDA Kernel register in cc file, test=develop

* fix macros, test=develop

* fix undefined symbol error, test=develop

* fix compilation error and undefined symbol, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

78ecb668

25 5月, 2021 1 次提交

modify Ops to complex template (#33041) · 5fa44c34

由 chentianyu03 提交于 5月 25, 2021

* modify conj, real, imag OP to complex template

* replace with complex template to dot Op

* replace with complex template to Abs Op

* add support for complex64 and complex128

5fa44c34

20 5月, 2021 1 次提交

Add complex template type (#32857) · 738bf20e

由 chentianyu03 提交于 5月 20, 2021

* add complex template file

* add numtraits for complex template

* add complex template type register

* modify specify template of complex

* modify specify template of complex

* modify specify template of complex

* modify specify template of complex

* make TensorCheckerVisitor support complex type

* fix operator= error

* add complex template

* add complex template type

* add complex template type to pyarray transform

* add complex template type to pyarray transform

* remove complex type for dlpack register

* set dlpack supprot complex type

* set dlpack supprot complex type

* set dlpack supprot complex type

* remove explict for complex constructor

* add complex unit test file

738bf20e

12 5月, 2021 1 次提交
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
06 5月, 2021 2 次提交

[ROCM] bugfix for unittest (#32392) · 31392627

由 ronnywang 提交于 5月 06, 2021

* fix test_unpool_op

* fix test_inplace_addto_strategy

* fix test_conv2d_fusion_op

* fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor

* fix test_dot_op

* fix test_correlation_op

* fix tracer

* fix test_memcpy_op

31392627

A

Sum kernel for CPU supporting BF16 and SelectedRows (#32631) · 9599c3b3
由 Adam Osewski 提交于 5月 06, 2021

9599c3b3

27 4月, 2021 1 次提交
- Z
  [OPs] Bug fix, fix the segment mean for illegal syncthreads usage. (#32596) · 1afe1ac9
  由 Zhong Hui 提交于 4月 27, 2021
```
* [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
```
  1afe1ac9
19 4月, 2021 1 次提交
- J
  
  Add BF16 Constant Initializer and support for other initializer (#31935) · 76cb83e8
  由 joanna.wozna.intel 提交于 4月 19, 2021
  
  76cb83e8
14 4月, 2021 1 次提交

fix matrix_inverse_op with rocm (#32128) · 995b5f2c

由 zhulei 提交于 4月 14, 2021

* fix matrix_inverse_op with rocm

* fix matrix_inverse_op with rocm

* fix matrix_inverse_op with rocm

* fix matrix_inverse_op with rocm

995b5f2c

13 4月, 2021 1 次提交
- Q
  
  [ROCM] fix depth conv2d in rocm, test=develop (#32170) · 693c7629
  由 Qi Li 提交于 4月 13, 2021
  
  693c7629
09 4月, 2021 2 次提交

N
make high precision for avg_pool and adaptive_avg_pool when data_type is float16 (#31887) · ec2ffb68
由 niuliling123 提交于 4月 09, 2021
```
* make high precision for avg_pool
```
ec2ffb68

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

07 4月, 2021 1 次提交
- O
  improve performance of DepthwiseConv(NHWC) (#31677) · 363b25aa
  由 Ouyang Chao 提交于 4月 07, 2021
```
* improve performance of DepthwiseConv(NWHC)
```
  363b25aa
02 4月, 2021 1 次提交
- R
  
  [ROCM] fix softmax_with_cross_entropy_op (#31982) · 9e06a641
  由 ronnywang 提交于 4月 02, 2021
  
  9e06a641
01 4月, 2021 2 次提交
- Q
  
  [ROCM] fix depthwise conv failure on ROCM, test=develop (#31998) · a4b30a12
  由 Qi Li 提交于 4月 01, 2021
  
  a4b30a12
- Z
  
  Support uint8_t for fill_constant_op (#31911) · 980227f9
  由 Zhang Zheng 提交于 4月 01, 2021
  
  980227f9
31 3月, 2021 1 次提交
- T
  fix split core (#31892) · 393b3bd6
  由 Thunderbrook 提交于 3月 31, 2021
```
* fix split core

* format
```
  393b3bd6
19 3月, 2021 1 次提交
- A
  
  [oneDNN] lookup_table op with support for BF16 data type. (#31558) · a4a2b77d
  由 Adam Osewski 提交于 3月 19, 2021
  
  a4a2b77d
08 3月, 2021 1 次提交
- Q
  
  [ROCM] fix test_dist_op ci test, test=develop (#31468) · 133a914b
  由 Qi Li 提交于 3月 08, 2021
  
  133a914b
05 3月, 2021 1 次提交
- J
  
  Creating a CUDA function to find the minimum value in warp or block (#31191) · 8491ae9a
  由 JamesLim 提交于 3月 05, 2021
  
  8491ae9a

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功