提交 · 7ee31a96b436de4b0701de2ba56bd0b2a653994c · BaiXuePrincess / Paddle

17 4月, 2022 1 次提交

[Perf] Optimize dygraph scheduling performance (#41696) · 7ee31a96

由 Chen Weihang 提交于 4月 17, 2022

* split phi and fluid infermeta context

* resolve conflict

* fix type error

* optimize scheduling perf

* spec small vector size

* replace all grad var name

* fix test failed

* move init defalut signature

* polish details

* polish details

* fix no init bug

* init sig for tests

* add init sig for infer

* fix infrt error

* fix infrt failed

* fix kunlun error

* fix infrt failed

7ee31a96

16 4月, 2022 1 次提交
- 王
  
  move fc_functor from fluid to phi.test=develop (#41856) · 21aa3adc
  由王明冬提交于 4月 16, 2022
  
  21aa3adc
15 4月, 2022 8 次提交

solve brpc compile in arm-ubantu18 (#41649) · 56dafc4f

由 ziyoujiyi 提交于 4月 15, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* arm_brpc compile

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* only output is ok

* base is ok

* .

* .

* .

* .

* .

* .

* .

* .

* add switch server bin

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* adapt brpc ssl

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

56dafc4f

Moe ref (#41836) · c37af19c

由 Roc 提交于 4月 15, 2022

* moe ref

* ref commit; test=document_fix

* update; test=document_fix

* update test=document_fix

c37af19c

H
fix a bug which will casue cuda address error when the input size is very large (#41824) · e25b75b6
由 huangxu96 提交于 4月 15, 2022
```
As the title
```
e25b75b6
H
[Dygraph] Refactor Model Parallel in eager mode (#41761) · e6fb6599
由 Haohongxiang 提交于 4月 15, 2022
```
* refactor mp in eager mode

* update

* update

* add uts
```
e6fb6599
T

add fp16 for masked_select on kunlun, *test=kunlun (#41215) · ff818c77
由 TTerror 提交于 4月 15, 2022

ff818c77
F

[MLU] add mlu softmax kernel (#41816) · 2d6b71a2
由 fwenguang 提交于 4月 15, 2022

2d6b71a2

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

F

[MLU] add mlu activation kernels (#41751) · 10114859
由 fwenguang 提交于 4月 15, 2022

10114859

14 4月, 2022 6 次提交

L
[KP] Add registry for elementwise_add/max/min/sub/div/mul/floordiv on XPU2 with KP lib (#41494) · fbe2c311
由 Lijunhui 提交于 4月 14, 2022
```
* regist elementwise_xxx
```
fbe2c311
Y

Optimize the finding of max workspace size. (#41741) · 3ce879db
由 Yiqun Liu 提交于 4月 14, 2022

3ce879db

FC+elementwise_add (residual connection) (#41776) · 92d8d0bc

由 Sławomir Siwek 提交于 4月 14, 2022

* Change tensor name to match activation

* declare fc_eltwise_add pass

* merge conv_eltwise refactor PR

* first compilable draft

* unittest feedback tools

* Fuse pass tester

* Move IsReachable() to shared file

* 100% coverage of fuse_pass_tester.cc

* register pass

* Add bias node

* Improve unit tests / remove bias node from pattern

* improve fc_eltwiseadd_unittest

* cancel eltwise_add fuse if act is already fused

* Add elementwise_input scale

* Residual MVP

* Add new FC attrs

* Add more test cases

* Add missing op attrs

* Adapt code to new Elementwise pattern

* reuse existing fcpattern

* improve code style

* remove unused arguments

* fix typo

* remove whitespace

* remove int8 related code

* Remove attributes from base ops

* style

* style check

* Remove input from base op

* Set attribute during fuse

* ut timeout

* download and test model

* DRY

* apply feedback from review

* Style check

* fix typo

* cosmetic changes

* explicitly set residual as output

* VIT-OCR accuracy check

* trigger CI

* remove whitespaces

* fix missing data file

92d8d0bc

support multi layer and bidirection of lstm_grad, *test=kunlun (#41742) · 8b07ce0e

由 z8hanghuan 提交于 4月 14, 2022

* support multi layer and bidirection of lstm_grad, *test=kunlun

* support multi layer and bidirection of lstm_grad, *test=kunlun

8b07ce0e

X

[fix bug] communication op suppport rccl (#41763) · e26e51ba
由 xiayanming 提交于 4月 14, 2022

e26e51ba

Added shuffle_channel BF16/FP32 FWD oneDNN kernel (#39756) · c7623d72

由 jakpiase 提交于 4月 14, 2022

* added shuffle_channel bf16/fp32 fwd kernel

* added missing files

* CI fix

* changed from pten to phi

* tmp save

* added reviewers suggestions

* fix for test

c7623d72

13 4月, 2022 10 次提交

Lml/add prim ops (#41201) · 97dec7ca

由 levi131 提交于 4月 13, 2022

* native commit for triple grad of sigmod

* Updated unittests files

* init functional jacobian api

* Updated trible_test func

* Updated gradient_checker & test_script

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* fix dygraph grad to support high differential

* polish API docstring

* Updated gradient checker and some related files

* fix double grad strip error for high differential

* fix double grad strip error for high differential

* Add Sigmoid triple grad tests

* fix dygraph double grad dtype error when calling for high differential senario

* Updated triple grad teses func

* Use np.random to initialize ddx

* Updated triple_grad_check func

* add todo for gradient checker and refine some comments

* remove additional code

* add test for warnging in backward.py

* format python code

* support multi input in triple gradient checker

* Add matmul triple grad kernel

* Updated comments of TODO

* Supported some special tests

* Change code-format to follow CI std

* Updated gradient_checker.py

* Fix conflicts

* Removed unnecessary printing log

* Change code style to follow CI std

* merge upstream

* add_p

* rm useless files

* add sub_p mul_p div_p

* add sqrt_p and tanh_p

* add reshape_p

* add broadcast_p

* add broadcast_p fill_constant_p matmul_p reduce_p reshape_p transpose_p

* add split_p and concat_p

* add gather_p and scatter_add_p

* add slice_select_p and slice_assign_p

* add multi input check for add_p, sub_p, mul_p, div_p

* update concat_p

* refine gather_p and scatter_add_p

* refine slice_assign_p and slice_select_p

* add 9 test for prim ops

* add more test and fix some bug

* add more test

* register proto

* add shape valid check for broadcast_p op, and add keepdim attr into reduce_p op proto

* support multi input and multi output for split_p and concat_p

* fix slice bug for slice_select_p and slice_assign_p

* dtype for axis attr should be long int

* update dtype for axis attr int64_t

* update for iscan CI

* add more shape and dtype check

* change IndexTensor into int32 dtype

97dec7ca

L

Use densetensor instead of Tensor for ProcessGroup (#41403) · 1e56ca8a
由 lilong12 提交于 4月 13, 2022

1e56ca8a
Z
Add yaml and unittest for SGD (#41485) · 6d1e03a2
由 zyfncg 提交于 4月 13, 2022
```
* add sgd yaml

* change python api

* open eager mode in sgd

* fix bug
```
6d1e03a2

Add expand equal all yaml (#41540) · e53d1837

由 hong 提交于 4月 13, 2022

* add expand, poisson

* add poison grad

* add expand equal_all poisson triangular solve yaml

e53d1837

Z
Fix problem of infermeta with vector output (#41646) · b2390438
由 zyfncg 提交于 4月 13, 2022
```
* remove stack_grad infershape

* fix bug of output with null

* fix bug
```
b2390438
use bilstm_train for rnn forward, * test=kunlun (#41671) · b1adde3d
由 z8hanghuan 提交于 4月 13, 2022

b1adde3d
Z

concat and relu sopport FP16 in XPU, test=kunlun (#41631) · c4d5a77f
由 zhangyikun02 提交于 4月 13, 2022

c4d5a77f
Z

support bce_loss and bce_loss_grad in XPU, test=kunlun (#41610) · 468c1ad7
由 zhangyikun02 提交于 4月 13, 2022

468c1ad7
C
[Phi]fix split error when sections has 0 size and add test case (#41708) · 325e5712
由 chentianyu03 提交于 4月 13, 2022
```
* fix split error when sections has 0 size and add test case

* fix test case
```
325e5712
H
Update sign op xpu (#41685) · a4d4c116
由 houj04 提交于 4月 13, 2022
```
* update sign op on xpu. test=kunlun

* fix typo. test=kunlun
```
a4d4c116

11 4月, 2022 3 次提交
- J
  
  fix for gaussian random (#41572) · 8fc9c412
  由 jakpiase 提交于 4月 11, 2022
  
  8fc9c412
- Y
  
  fix arg_max for int type, *test=kunlun (#41522) · 368f1dda
  由 ykkk2333 提交于 4月 11, 2022
  
  368f1dda
- X
  [Yaml] add yaml for Uniform random and add unit test. (#41517) · cd2a4cdf
  由 xiongkun 提交于 4月 11, 2022
```
* gather op

* add mod

* [Yaml] final state for uniform and uniform_random
```
  cd2a4cdf
09 4月, 2022 2 次提交

C

modify the block size of the group_norm backward (#41570) · ff2fba39
由 crystal 提交于 4月 09, 2022

ff2fba39

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

08 4月, 2022 6 次提交
- W
  
  Fix fake quant cuda kernel (#41305) · 330582e2
  由 whs 提交于 4月 08, 2022
  
  330582e2
- C
  fix group_norm (#41531) · 04a4bdf8
  由 crystal 提交于 4月 08, 2022
```
fix group_norm vectorized address misalignment
```
  04a4bdf8
- modify unittest of lstm forward, *test=kunlun (#41534) · d4710dfe
  由 z8hanghuan 提交于 4月 08, 2022
```
* modify unittest of lstm forward, *test=kunlun

* modify unittest of lstm forward, *test=kunlun
```
  d4710dfe
- A
  [Eager]Fix segment_pool/allclose/isclose/scale API bug (#41506) · 0a6fe699
  由 Aurelius84 提交于 4月 08, 2022
```
* [Eager]Fix segment_pool/allclose/isclose/scale API bug

* fix kernel register problem
```
  0a6fe699
- T
  
  xpu mul unittest *test=kunlun (#41140) · 770ce7cf
  由 taixiurong 提交于 4月 08, 2022
  
  770ce7cf
- H
  Add conj pixel shuffle yaml (#41499) · bc88fbb5
  由 hong 提交于 4月 08, 2022
```
* ad conj flip yaml

* add flip conj pixel shuffle
```
  bc88fbb5
07 4月, 2022 3 次提交
- remove FLAGS_use_curand and change all random op CUDA implementation (#41308) · 9714878c
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  9714878c
- Y
  [Phi]Add hard_swish/kron/linspace/logit yaml file (#41298) · 90cb337e
  由 YuanRisheng 提交于 4月 07, 2022
```
* add yaml

* perfect converage
```
  90cb337e
- L
  
  add send/recv to/from switch module for PrcoessGroupHeter (#41285) · 633ac4e6
  由 lilong12 提交于 4月 07, 2022
  
  633ac4e6

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致