提交 · c5bf09bb8ae4a7ca8a73a4ab7965cc8dadb4f0b0 · PaddlePaddle / Paddle

30 12月, 2021 3 次提交

add dirichlet random sample op in cpu and gpu kernel (#38244) · c5bf09bb

由 Xiaoxu Chen 提交于 12月 30, 2021

* add dirichlet sample op and cpu backend kernel

* add Dirichlet op cuda kernel  (#6)

* add dirichlet op hip kernel
Co-authored-by: NFeiyu Chan <chenfeiyu@baidu.com>

c5bf09bb

Fix the bug of batch_norm and batch_norm_grad op. (#38288) · cc83c95f

由 Leo Guo 提交于 12月 30, 2021

* Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list.

* Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list. test=kunlun
Co-authored-by: NZibin <guozibin@baidu.com>

cc83c95f

T

Add CUDA_ARCH_BIN (#38569) · 9e0a03ee
由 tianshuo78520a 提交于 12月 30, 2021

9e0a03ee

29 12月, 2021 23 次提交
- L
  
  add _nvprof_range interface (#38572) · ea01e790
  由 Leo Chen 提交于 12月 29, 2021
  
  ea01e790
- C
  
  unify infermeta target (#38580) · 458365cf
  由 Chen Weihang 提交于 12月 29, 2021
  
  458365cf
- Z
  
  Added copy_if_different for eager code generator (#38562) · ad78a21e
  由 Zhanlue Yang 提交于 12月 29, 2021
  
  ad78a21e
- S
  [BugFix]Fix bug in obtaining parameters_buffers in layers (#38563) · ecb8c184
  由 ShenLiang 提交于 12月 29, 2021
```
* fix bug of dp in pfp16

* fix topo
```
  ecb8c184
- fix random OP failed (#38564) · 2fb1fc0d
  由 zhouweiwei2014 提交于 12月 29, 2021
  
  2fb1fc0d
- Y
  add hashtable dynamic mf support (#38493) · d9c174d1
  由 yaoxuefeng 提交于 12月 29, 2021
```
add hashtable dynamic mf support
```
  d9c174d1
- Y
  add dynamic mf size api (#38436) · 7411dab5
  由 yaoxuefeng 提交于 12月 29, 2021
```
add dynamic mf size api
```
  7411dab5
- Z
  [AMP] Add BatchNorm_1D_2D_3D skip for paddle.amp.decorate (#38541) · 2ebc8f77
  由 zhangbo9674 提交于 12月 29, 2021
```
* add bn_1d_2d_3d for fp16 decorate

* add unittest
```
  2ebc8f77
- J
  [Auto Parallel] Sharding Pass (#38502) · e3faf345
  由 JZ-LIANG 提交于 12月 29, 2021
```
* auto parallel sharding base

* chmod

* add unitest

* set unitest cmake dist label

* revise code according to rewiew

* chmod
```
  e3faf345
- Q
  
  [Approver Update] update check approver for New Hardware Integration, test=document_fix (#38429) · 9456170f
  由 Qi Li 提交于 12月 29, 2021
  
  9456170f
- L
  Make profiler better (#38280) · 851637fd
  由 liutiexing 提交于 12月 29, 2021
```
* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update OS info

* split host_event_recorder

* split host_event_recorder

* update

* update

* update

* update

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>
```
  851637fd
- H
  Fix buddy allocator random CI failure (#38545) · 14658d8f
  由 Huihuang Zheng 提交于 12月 29, 2021
```
Fix Buddy Allocator random CI failure due to machine environment.
```
  14658d8f
- 王
  
  [infrt] fix infrt ci test. test=develop, test=infrt (#38533) · 9d5b665c
  由王明冬提交于 12月 29, 2021
  
  9d5b665c
- fix extra_repr in _InstanceNormBase, test=develop (#38537) · 21366a92
  由小湉湉提交于 12月 29, 2021
  
  21366a92
- Y
  
  add top k v2 operator, test=kunlun (#38434) · d22f92ad
  由 ykkk2333 提交于 12月 29, 2021
  
  d22f92ad
- S
  
  fix reduce_max/reduce_min bug (#38476) · 995332ef
  由 Shang Zhizhou 提交于 12月 29, 2021
  
  995332ef
- H
  add timeout for matmul_xx_fuse_pass (#38544) · 20403fe9
  由 heliqi 提交于 12月 29, 2021
```
* del mkldnn options of baseline

* add timeout for matmul_scale_fuse_pass

* add timeout for matmul
```
  20403fe9
- T
  add argsort/scatter for kunlun (#38345) · 4643baa7
  由 TTerror 提交于 12月 29, 2021
```
* add argsort/scatter for kunlun

* update test_scatter

* update xpu.cmake

* update xpu.cmake

* fix scatter
```
  4643baa7
- S
  
  fix lamb beta1pow beta2pow update (#38518) · 3672480b
  由 sneaxiy 提交于 12月 29, 2021
  
  3672480b
- T
  
  reduce compile time of amax and amin (#38534) · 72a41e50
  由 Tao Luo 提交于 12月 29, 2021
  
  72a41e50
- S
  
  add nccl func of NCCL 2.11 (#38519) · 4853ab0a
  由 sneaxiy 提交于 12月 29, 2021
  
  4853ab0a
- L
  
  code clean (#38550) · 206a8f6c
  由 limingshu 提交于 12月 29, 2021
  
  206a8f6c
- W
  
  [fleet_executor] remove SetCreatingFlag (#38539) · 9171aaa0
  由 WangXi 提交于 12月 29, 2021
  
  9171aaa0
28 12月, 2021 14 次提交

L
Support multi-output feature for elementwise (#38410) · 48f061fb
由 limingshu 提交于 12月 28, 2021
```
* first commit

* pass ctest of  elementwise_div_grad
```
48f061fb
Z

add new API: paddle.cov (#38392) · 85f5d264
由 zhiboniu 提交于 12月 28, 2021

85f5d264
B

update seq_concat_fc_fuse_pass ut (#38538) · 706d2c08
由 baoachun 提交于 12月 28, 2021

706d2c08

Utilize StreamSafeCUDAAllocator to support fast GC in new executor (#37642) · 0c7153a4

由 From00 提交于 12月 28, 2021

* fix reshape move storage error

* remove needless set type

* alloc tensor by shared storage

* Utilize StreamSafeCUDAAllocator to support fast GC in new executor

* Fix compile error for Windows and ROCm

* Fix compile error for Windows

* Modify UT stream_safe_cuda_alloc_test

* Modify UT stream_safe_cuda_alloc_test

* Rewrite fast GC

* Rewrite fast GC

* Fix compile error for BOOST_GET_CONST

* Fix compile error for BOOST_GET_CONST

* Changes default stream for StreamSafeCUDAAllocator

* Fix a small CI error

* Remove some redundant code

* Fix conflict

* Fix compile error for ROCm

* Fix Windoes CI error

* Fix CI error

* Remove some unnecessary code

* Fix CI error

* Add UT for fast GC

* Fix CI error

* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile

* Use RWLock in GetAllocator

* Fix CI error
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

0c7153a4

add matmul_to_mul matmul_v2_to_mul matmul_v2_to_matmul test case (#37645) · bed71992

由 heliqi 提交于 12月 28, 2021

* add matmul_to_mul matmul_v2_to_mul matmul_v2_to_matmul test case

* modify skip func to ignore_pass_case func

* rebuild CI

* rebuild CI

* add test_map_xx_pass timeout

* add test_map_xx_pass timeout

* merge from develop

* add timeout notest;test=coverage

* Cmakelist add timeout

* add timeout

* add attr of matmul_v2

* add trt skip

* delete trt config

* add skip,  mul diff on 3080

bed71992

T

refine amax/amin example(#38525) · 00a50af8
由 Tao Luo 提交于 12月 28, 2021

00a50af8

Support test basic of Var and Layer (#38426) · 1fb80a6a

由 Jiabin Yang 提交于 12月 28, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* support inference test

* refine test and fix initializer failed

* support create varbase and fix retain grad error

* fix windows error

* support test code coverage

* support test code coverage

* support test code coverage
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NWang Huan <wanghuan29@baidu.com>

1fb80a6a

W

fix ci problem (#38474) · 2e4cb279
由 Wilber 提交于 12月 28, 2021

2e4cb279
Z
refactor matmul directory in pten (#38227) · 982bf444
由 zyfncg 提交于 12月 28, 2021
```
* refactor matmul directory in pten

* fix merge conflict
```
982bf444

Add API and op for take_along_axis (#38396) · 3310f519

由 huangxu96 提交于 12月 28, 2021

* add API and op for take_along_axis

* fix compile dependency problem and add example code and doc

* add unitest

* delete some code for CI coverage

* fix code style problem

* fix as review

3310f519

G

fix adamw epsilon in cuda kernel (#37746) · 6f1bb3d6
由 Guoxia Wang 提交于 12月 28, 2021

6f1bb3d6
T
Add Amax and Amin API (#38417) · 340dfb26
由 Tao Luo 提交于 12月 28, 2021
```
* add amax/amin

* support axis is list
```
340dfb26

[pten] remove in_type arg in cast kernel (#38486) · 0637b9a6

由 chentianyu03 提交于 12月 28, 2021

* remove intype arg in cast kernel

* modify conj config in api.yaml by dictionary order

* rm unused code in cast_kernel.cu

0637b9a6

add reduce_prod_xpu. fix reduce_mean_xpu bug. (#38481) · 78836bb7

由 houj04 提交于 12月 28, 2021

* add reduce_prod_xpu. fix reduce_mean_xpu bug.

* iadd reduce_prod_xpu. fix reduce_mean_xpu bug. test=kunlun

78836bb7

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功