提交 · ff2fba3987ef9eac3bf240b92564e87832a331b4 · BaiXuePrincess / Paddle

09 4月, 2022 8 次提交
- C
  
  modify the block size of the group_norm backward (#41570) · ff2fba39
  由 crystal 提交于 4月 09, 2022
  
  ff2fba39
- H
  
  add depthwise conv hip support (#41537) · b3b8d345
  由 hong 提交于 4月 09, 2022
  
  b3b8d345
- 王
  
  [infrt] opt support input valid places by commondline. (#41544) · 96ced1a1
  由王明冬提交于 4月 09, 2022
  
  96ced1a1
- L
  Autotune the workspace_size_limit in conv. (#40338) · b937cdc5
  由 limingshu 提交于 4月 09, 2022
```
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
```
  b937cdc5
- J
  fix_ci_problem3 (#41484) · 9cb2287c
  由 Jiabin Yang 提交于 4月 09, 2022
```
* fix_ci_problem3

* support windows no default error
```
  9cb2287c
- W
  
  fix pylayer mem leak, test=develop (#41559) · be11648a
  由 wanghuancoder 提交于 4月 09, 2022
  
  be11648a
- L
  [new-exec] fix bug that no thread is waked up when adding task to threadpool (#41567) · f581f5bf
  由 Leo Chen 提交于 4月 09, 2022
```
* fix bug that no thread is waked up when adding task to threadpool

* fix typo
```
  f581f5bf
- L
  
  [fleet executor] Add sink interceptor and test (#41497) · b3e79731
  由 LiYuRio 提交于 4月 09, 2022
  
  b3e79731
08 4月, 2022 10 次提交
- W
  
  Fix fake quant cuda kernel (#41305) · 330582e2
  由 whs 提交于 4月 08, 2022
  
  330582e2
- C
  fix group_norm (#41531) · 04a4bdf8
  由 crystal 提交于 4月 08, 2022
```
fix group_norm vectorized address misalignment
```
  04a4bdf8
- A
  
  fix running error for ipu (#41481) · c2e12949
  由 Allen Guo 提交于 4月 08, 2022
  
  c2e12949
- J
  
  Fix RNN OP multi-threads predict bug (#41529) · 09203e46
  由 Jack Zhou 提交于 4月 08, 2022
  
  09203e46
- modify unittest of lstm forward, *test=kunlun (#41534) · d4710dfe
  由 z8hanghuan 提交于 4月 08, 2022
```
* modify unittest of lstm forward, *test=kunlun

* modify unittest of lstm forward, *test=kunlun
```
  d4710dfe
- A
  [Eager]Fix segment_pool/allclose/isclose/scale API bug (#41506) · 0a6fe699
  由 Aurelius84 提交于 4月 08, 2022
```
* [Eager]Fix segment_pool/allclose/isclose/scale API bug

* fix kernel register problem
```
  0a6fe699
- Q
  [ROCm] fix dcu error in device event base, test=develop (#41521) · 14dba636
  由 Qi Li 提交于 4月 08, 2022
```
* [ROCm] fix dcu error in device event base, test=develop

* fix, test=develop
```
  14dba636
- T
  
  xpu mul unittest *test=kunlun (#41140) · 770ce7cf
  由 taixiurong 提交于 4月 08, 2022
  
  770ce7cf
- R
  
  pybind support CustomPlace (#41136) · 0cd577cf
  由 ronnywang 提交于 4月 08, 2022
  
  0cd577cf
- H
  Add conj pixel shuffle yaml (#41499) · bc88fbb5
  由 hong 提交于 4月 08, 2022
```
* ad conj flip yaml

* add flip conj pixel shuffle
```
  bc88fbb5
07 4月, 2022 22 次提交
- T
  [GPUPS] bind afs wrpper (#41227) · b3bcebbe
  由 Thunderbrook 提交于 4月 07, 2022
```
* afs wrapper

* format

* format

* macro
```
  b3bcebbe
- remove FLAGS_use_curand and change all random op CUDA implementation (#41308) · 9714878c
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  9714878c
- Y
  [Phi]Add hard_swish/kron/linspace/logit yaml file (#41298) · 90cb337e
  由 YuanRisheng 提交于 4月 07, 2022
```
* add yaml

* perfect converage
```
  90cb337e
- R
  Add yaml for matrix rank op (#41466) · c77a263d
  由 Ruibiao Chen 提交于 4月 07, 2022
```
* modify matrix_rank

* add matrix_rank shape

* add matrix_rank shape

* Add yaml for matrix_rank OP

* Add UT
Co-authored-by: Nzhoujianqian <15205085056@163.com>
```
  c77a263d
- C
  [Phi] Add unbind yaml and final state api (#41277) · 5516f180
  由 Chen Weihang 提交于 4月 07, 2022
```
* add unbind yaml

* fix unittest
```
  5516f180
- L
  
  use group id to differentiate keys for tcp store (#41496) · 75227c9e
  由 lilong12 提交于 4月 07, 2022
  
  75227c9e
- L
  Profile Executors (#41100) · dfb47986
  由 liutiexing 提交于 4月 07, 2022
```
* Profile Executors

* update

* fix ut

* fix names

* update

* update
```
  dfb47986
- fix compile bug of windows cuda11.5 (#41433) · eea85814
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  eea85814
- L
  
  add send/recv to/from switch module for PrcoessGroupHeter (#41285) · 633ac4e6
  由 lilong12 提交于 4月 07, 2022
  
  633ac4e6
- C
  
  fix get tensor backend set bug (#41478) · ad4193fe
  由 Chen Weihang 提交于 4月 07, 2022
  
  ad4193fe
- H
  
  [infrt]Add gpu compile method (#41463) · aadeff53
  由 huzhiqiang 提交于 4月 07, 2022
  
  aadeff53
- S
  Add Output(Step) to DistributedFusedLamb optimizer (#41249) · e4459a40
  由 sneaxiy 提交于 4月 07, 2022
```
* add Output(Step) to distributed fused lamb op

* add _set_step
```
  e4459a40
- Z
  
  Add Sparse API to_dense, to_sparse_coo and values (#41394) · f78cc3da
  由 zhangkaihuo 提交于 4月 07, 2022
  
  f78cc3da
- S
  [BugFix] Add error hint for one_hot gpu version (#41335) · 91266b96
  由 Siming Dai 提交于 4月 07, 2022
```
* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest
```
  91266b96
- C
  Fix dygraph record event position (#41445) · 8fba68d3
  由 chenjian 提交于 4月 07, 2022
```
* no

* maintain old profiler

* fix old dygraph record event
```
  8fba68d3
- Z
  
  fix p_norm gpu nan bug while divide zero (#41359) · dfa63126
  由 zhiboniu 提交于 4月 07, 2022
  
  dfa63126
- Q
  ignore some failed test for KL2 (#41342) · 81389c51
  由 QingshuChen 提交于 4月 07, 2022
```
* ignore some failed test for KL2
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun
```
  81389c51
- W
  infrt-trt run resnet50 (#41442) · 0701160a
  由 Wilber 提交于 4月 07, 2022
```
* add rewrite pattern form paddle op tp trt op

* infrt-trt run resnet50.
Co-authored-by: weishengying <1343838695@qq.com>
```
  0701160a
- S
  modify inference model test build method to support multi version (#41027) · c9e0e10e
  由 Sing_chan 提交于 4月 07, 2022
```
* change inference demo_test build method to ninja to choose visual studio version automaticly

* notest;test=windows_ci_inference

* set cuda of demo_ci by arg,fix bug of ninja compile,test=document_fix;test=windows_ci;test=windows_ci_inference

* fix bug;test=document_fix;test=windows_ci;test=windows_ci_inference

* fix bug;test=document_fix;test=windows_ci_inference"

* set lib_path according to generator
```
  c9e0e10e
- Z
  
  remove cudnn_deterministic=True (#41341) · cefa91fd
  由 Zhang Jun 提交于 4月 07, 2022
  
  cefa91fd
- C
  [Phi] Polish truncated normal kernel and add yaml (#41280) · d39e7896
  由 Chen Weihang 提交于 4月 07, 2022
```
* polish truncated normal kernel

* add yaml

* add truncated normal kernel and add yaml

* polish unittests and yaml

* import dygraph mehtod
```
  d39e7896
- H
  momentum support l2decay for xpu. test=kunlun (#41325) · 533c649f
  由 houj04 提交于 4月 07, 2022
```
* momentum support l2decay for xpu. test=kunlun

* fix include file. test=kunlun

* fix cmake for device_worker. test=kunlun
```
  533c649f

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致