提交 · f6ee202fe4cd7499a628c4a5f7dbcdc60c9de2c8 · csdn_franckjun / Paddle

18 5月, 2022 1 次提交

Add support for forward and reverse high-order automatic differentiation mechanism (#41919) · f6ee202f

由 WangZhen 提交于 5月 18, 2022

* Updated triple_grad_check func

* add todo for gradient checker and refine some comments

* remove additional code

* add test for warnging in backward.py

* format python code

* support multi input in triple gradient checker

* Add matmul triple grad kernel

* Updated comments of TODO

* Supported some special tests

* Change code-format to follow CI std

* Updated gradient_checker.py

* Fix conflicts

* Removed unnecessary printing log

* Change code style to follow CI std

* merge upstream

* add priops.py

* add_p

* rm useless files

* add sub_p mul_p div_p

* add sqrt_p and tanh_p

* add reshape_p

* add broadcast_p

* Add python primitive wrappers.

* Jvp rules updated.

* JVP rules done for all the 17 primops.

* quick check and fixes.

* add jvp(op, *args)

* add broadcast_p fill_constant_p matmul_p reduce_p reshape_p transpose_p

* add split_p and concat_p

* add gather_p and scatter_add_p

* add slice_select_p and slice_assign_p

* Add transpose rules.

* add multi input check for add_p, sub_p, mul_p, div_p

* update concat_p

* Linearize and transpose in progress..

* refine gather_p and scatter_add_p

* updated.

* update transpose.

* refine slice_assign_p and slice_select_p

* init commit for lower

* Merged with primitive ops.

* small update

* add rules for orig2prim and prim2orig

* add 9 test for prim ops

* add more test and fix some bug

* add more test

* register proto

* Adding primops test.

* add shape valid check for broadcast_p op, and add keepdim attr into reduce_p op proto

* support multi input and multi output for split_p and concat_p

* Test updated.

* update

* fix slice bug for slice_select_p and slice_assign_p

* updated.

* Ops updated.

* Refactor and bug fixes.

* updated.

* finish orig2prim and prim2orig rules

* dtype for axis attr should be long int

* update dtype for axis attr int64_t

* update for iscan CI

* Update primx.

* Refactor vars in primx.

* update for lower transform

* add more shape and dtype check

* update primx.py

* change IndexTensor into int32 dtype

* update

* Fix linearize and transpose.

* Update is_dot

* Update is_dot

* Update is_dot

* add gradient aggregation, fix add_transpose.

* pass first linearize+transpose test.

* update test

* refactor op registration and primx.

* update rule for slice_assign

* try test lower

* update orig2prim and prim2orig

* pass simple lower pass

* update

* Update input types in the unit test.

* orig2prim segfault.

* 50% for adam.minimize

* test updated.

* temp fix erros in removing vars.

* primx updated.

* update for matmul_v2 and reshape2 orig2prim

* update for minimize

* Refine primrules

* Remove some code

* supporting unused and unreachable vars.

* update for use prim2orig in minimize

* fix gather and scatter_add transpose.

* Add rules UT

* update scatter_add

* Refine UT code

* fix nonetype check in topo

* Update gather_p pywrapper.

* remove useless print

* Merge tongxin PR and refine code

* readd some test

* rm useless print

* polish code.

* fix bug in minimize

* add get_input_var_list and get_output_var_list and use it in lower

* Fix scatter_add_p prim2orig

* Update code and fix orig2prim/prim2orig UT

* delete vars after block.desc._remove

* Improve ops and vars clean up logics.

* fix some bug in linearize and lower

* update tanh transpose.

* use set instead of list for var2remove

* test updated.

* polish code.

* fix dot2bar delete.

* merge tx/ad

* add indextensor_dot for gather and scatter_add

* add sorted for set

* Fix scale_orig2prim params

* fix some syntax bug

* add golbal_lower_update list

* Better handling of unused vars.

* update tests.

* Fix elementwise_sub orig2prim

* support none for transpose rule

* Merge and add transform UT

* fix a bug in transpose

* Fix transpose and UT

* a hacky fix for cancat op

* Fix exector place

* Refine variable name

* Add elementwise_mul orig2prim and support p_norm when p=1

* Add sqrt orig2prim rule and UT

* merge wz test

* rename files, add enable_prim, disable_prim, prim_enabled, delete global_lower_update

* fix a bug in test_ad_transform_trans

* revert modify in framework.py

* add paddle.fluid.incubate.ad_transform to  python/setup.py.in

* Fix remove vars error

* Fix p_norm_orig2prim

* merge wz

* Modify the code directory

* Add utils.py and remove get_input/output_vars functions

* Update maolin code

* Rename UT and refine test_ad_transform_primops

* Fix div_p jvp rule

* Add higher derivatives UT

* Remove UT to autograd dir

* Fix comments

* import paddle in primops.py

* Add some error message for assert

* Refine UT class name and refine some comments in primreg.py

* update minimize of paddle/optimizer for supporting new autograd

* resolve cicular importing between backward.py and optimizer.py

* fill gradients and minimize unittest

* Replace `assert isinstance` with `raise TypeError`

* Add some assert message for primx.py

* Polish variable name

* Add some assert message

* add some docstring

* refine some name

* update the format of english documents

* Split test_transform.py to two files to avoid ci error

* fix the document format of enable_prim/disable_prim/prim2orig/prim_enabled

* polish test_gradients_and_minimize

* add default value for prim_enabled api doc

* Remove some UT to avoid windows ci error

* Enlarge test_gradients_and_minimize limit time

* Fix ut limit time
Co-authored-by: Nveyron95 <veyron_wu@163.com>
Co-authored-by: NJiabin Yang <360788950@qq.com>
Co-authored-by: Nlevi131 <limaolin01@baidu.com>
Co-authored-by: NTongxin Bai <waffle.bai@gmail.com>
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
Co-authored-by: Nlevi131 <83750468+levi131@users.noreply.github.com>

f6ee202f

12 5月, 2022 1 次提交
- S
  
  Fix some typos in paddle/. (#42408) · 2012672c
  由 Shuangchi He 提交于 5月 12, 2022
  
  2012672c
10 5月, 2022 1 次提交

improve introduction of bfgs args (#42191) · 000edfd2

由 Sing_chan 提交于 5月 10, 2022

* improve introduction of bfgs args; test=document_fix

* modify according to zhouwei's comment; test=document_fix

000edfd2

28 4月, 2022 1 次提交

Add gradient merge for DistributedFusedLamb optimizer (#40177) · 108aeb28

由 sneaxiy 提交于 4月 28, 2022

* add gradient merge for DistributedFusedLamb

* use master acc gradient

* fix CI ut

* polish

* remove math_function_impl.h change

* fix test_update_loss_scaling_op.py

* try to fix XPU/NPU CI

* add gm ut

108aeb28

27 4月, 2022 3 次提交
- Z
  
  implement autotune python API (#42299) · 2094a584
  由 Zhang Ting 提交于 4月 27, 2022
  
  2094a584
- Z
  
  Delete api from __all__ (#42220) · d1e01232
  由 Zhang Zheng 提交于 4月 27, 2022
  
  d1e01232
- R
  Fix paddle setup (#42254) · 8395d660
  由 Roc 提交于 4月 27, 2022
```
* expose api

* ref clipgradbynorm

* update

* Update __init__.py
```
  8395d660
26 4月, 2022 1 次提交
- W
  
  Add fused_multi_transformer op to optimize transformer generation performance (#41814) · 9dadf7df
  由 WangXi 提交于 4月 26, 2022
  
  9dadf7df
25 4月, 2022 1 次提交
- R
  fix recompute (#42128) · f21824d9
  由 Roc 提交于 4月 25, 2022
```
* fix recompute

* modify return
```
  f21824d9
16 4月, 2022 1 次提交

Moe ref (#41864) · e9a63237

由 Roc 提交于 4月 16, 2022

* moe ref

* ref commit; test=document_fix

* update; test=document_fix

* update test=document_fix

* update; test=document_fix

e9a63237

15 4月, 2022 1 次提交

Moe ref (#41836) · c37af19c

由 Roc 提交于 4月 15, 2022

* moe ref

* ref commit; test=document_fix

* update; test=document_fix

* update test=document_fix

c37af19c

14 4月, 2022 1 次提交

fix bfgs_doc (#41505) · 7f73ef2c

由 Sing_chan 提交于 4月 14, 2022

* fix bfgs_doc; test=document_fix

* add parameter name; test=document_fix

* modify according to chenlong's comments;test=document_fix

7f73ef2c

13 4月, 2022 1 次提交
- R
  
  fix moe apis (#41650) · 5f2c5b9e
  由 Roc 提交于 4月 13, 2022
  
  5f2c5b9e
08 4月, 2022 2 次提交
- A
  [Eager]Fix segment_pool/allclose/isclose/scale API bug (#41506) · 0a6fe699
  由 Aurelius84 提交于 4月 08, 2022
```
* [Eager]Fix segment_pool/allclose/isclose/scale API bug

* fix kernel register problem
```
  0a6fe699
- S
  Fix cv2 import error and some issues for lamb (#41500) · 1ed1a97b
  由 sneaxiy 提交于 4月 08, 2022
```
* fix image cv2 import

* fix lamb
```
  1ed1a97b
07 4月, 2022 2 次提交
- H
  
  add norm, segment_pool (#41465) · 0d642d3a
  由 hong 提交于 4月 07, 2022
  
  0d642d3a
- S
  Add Output(Step) to DistributedFusedLamb optimizer (#41249) · e4459a40
  由 sneaxiy 提交于 4月 07, 2022
```
* add Output(Step) to distributed fused lamb op

* add _set_step
```
  e4459a40
06 4月, 2022 1 次提交
- Y
  [Phi]Add graph_send_recv yaml file (#41206) · 6f4bd0ea
  由 YuanRisheng 提交于 4月 06, 2022
```
* add graph_send_recv yaml

* deal with confict

* fix compile bugs
```
  6f4bd0ea
04 4月, 2022 1 次提交
- S
  cut off relation between xk and initial_position's graph (#41371) · afb56e8c
  由 Sing_chan 提交于 4月 04, 2022
```
* cut off relation between xk and initial_position's graph

* fix_bug

* add detach to cut off with original graph
```
  afb56e8c
02 4月, 2022 2 次提交

Add graph apis (#40809) · b0398c8e

由 Siming Dai 提交于 4月 02, 2022

* Add graph_reindex API

* add graph_sample_neighbors api

* Add buffer

* delete VLOG

* delete thrust::copy for output

* add ShareDataWith

* delete graph_reindex hashtable output

* add graph_reindex dispensable

* add reindex unittest, move memset to cuda kernel, change api

* fix conflict

* add reindex buffer for gpu version note

* fix conflicts for op_func_generator

* Add fisher_yates sampling, add dispensable, change infermeta

* add dtype for edge_id

* fix rocm ci and static check ci

* add unittest

* fix unittest

* fix unittest

* fix bug

b0398c8e

Enhance vjp/jvp/Jacobian/Hessian API for supporting dynamic, static graph and... · 9e764d82

由 Xiaoxu Chen 提交于 4月 02, 2022

Enhance vjp/jvp/Jacobian/Hessian API for supporting dynamic, static graph and batched, unbatched mode (#40692)

* modify vjp/jvp for both dynamic and static graph

* enforce jacobian class for supporting first/last batch

* add unittest for jvp, jacobian withlast batch, jacobian with first batch

* fix the incorrect shape when multi-index Jacobian

* enforce Hessian class for supporting dynamic graph

* add Hessian class unittest

* bugfix, jvp double_backward_trick zeros_like return stop_gradient=True in static graph

* add API beta warnnings

* add white_list for cuda11.x ci windows.

* optimize some code snippets and documments

* set unittest timeout to 100 seconds

* move vjp,jvp,Jacobian,Hessian to incubate

* fix vjp,vjp import path of sample code

* fix code style error of augtograd/__init__ file

9e764d82

01 4月, 2022 3 次提交
- H
  
  add final state python api (#41252) · ab8c33b1
  由 hong 提交于 4月 01, 2022
  
  ab8c33b1
- S
  change vjp to paddle.grad (#41231) · 34241dd1
  由 Sing_chan 提交于 4月 01, 2022
```
* change vjp to paddle.grad

* use grad and gradients api

* fix preprocess for x

* fix a bug, val_and_grad should return a Tensor

* detach value and grad to avoid assign error
Co-authored-by: Nlevi131 <limaolin01@baidu.com>
```
  34241dd1
- S
  
  fix bug of bfgs example code;test=document_fix (#41195) · db948373
  由 Sing_chan 提交于 4月 01, 2022
  
  db948373
31 3月, 2022 1 次提交

[New API]: miminize_bfgs and miminize_lbfgs (#40710) · e7928a06

由 Sing_chan 提交于 3月 31, 2022

* [New API]: miminize_bfgs and miminize_lbfgs

* modify for python module call correctly

* add functional package, add error raise in static_graph, change assign to set_value

* unify static_graph and dygraph, fix bug when x or H0 is float64

* now only accept input is tensor, put check args in utils.py, put exception test together

* temp

* add more detailed algorithm illustration and comment, reduce test case to limit test time in 15s

* change in_dygraph_mode to in_dynamic_mode

* fix bug of sample code; reduce test case to reduce test time

* change dir to incubate

e7928a06

30 3月, 2022 1 次提交

[MoE] Moe apis (#41092) · aac7879a

由 Roc 提交于 3月 30, 2022

* add random routing op

add _random_routing api in utils

add random routing ut

* # This is a combination of 10 commits.
# The first commit's message is:
add expert count op

add ut for expert_count

# This is the 2nd commit message:

update UT only for cuda

# This is the 3rd commit message:

fix for rocm

# This is the 4th commit message:

update ut

# This is the 5th commit message:

add moe module

# This is the 6th commit message:

add expert count op

add ut for expert_count

# This is the 7th commit message:

update UT only for cuda

# This is the 8th commit message:

update ut

# This is the 9th commit message:

add moe module

# This is the 10th commit message:

make expert count private

* add assign pos op

* fix upper num name

* add api _assign pos

* add ut for assign pos op

* update date

* add op about moe gate

update utils

add limit by capacity op

add ut for limit_by_capacity

add ut for prune_gate_by_capacity

add ut for limit_by_capacity

add ut for prune_gate_by_capacity

* fix for win

* fix bugs in test_limit_by_capacity_op

* update ut

* update for test (timeout)

* fix ut

* update

* update(fix) ut for win

* moe apis in incubate

* # This is a combination of 10 commits.
# The first commit's message is:
add expert count op

add ut for expert_count

# This is the 2nd commit message:

update UT only for cuda

# This is the 3rd commit message:

fix for rocm

# This is the 4th commit message:

update ut

# This is the 5th commit message:

add moe module

# This is the 6th commit message:

add expert count op

add ut for expert_count

# This is the 7th commit message:

update UT only for cuda

# This is the 8th commit message:

update ut

# This is the 9th commit message:

add moe module

# This is the 10th commit message:

make expert count private

* add assign pos op

* fix upper num name

* add api _assign pos

* add ut for assign pos op

* update date

* fix for win

* update for test (timeout)

* fix ut

* update

* fix ut for number count

* add apis and utils

* add gate apis

* add moe and grad clip apis

* update moe apis

* add ops for moe gate

* fix

* update for base moe layer api

* add random routing op

add _random_routing api in utils

add random routing ut

* fix for dygraph

* update with ranodm routing

* update

* fix ut for limit by capacity

* update

* update limit by capacity for easily to switch to single thread mode

* update api docs
Co-authored-by: Nhlygit66666 <2570058140@qq.com>

aac7879a

29 3月, 2022 1 次提交

[MoE] Moe apis (#40895) · aeade538

由 Roc 提交于 3月 29, 2022

* add random routing op

add _random_routing api in utils

add random routing ut

* # This is a combination of 10 commits.
# The first commit's message is:
add expert count op

add ut for expert_count

# This is the 2nd commit message:

update UT only for cuda

# This is the 3rd commit message:

fix for rocm

# This is the 4th commit message:

update ut

# This is the 5th commit message:

add moe module

# This is the 6th commit message:

add expert count op

add ut for expert_count

# This is the 7th commit message:

update UT only for cuda

# This is the 8th commit message:

update ut

# This is the 9th commit message:

add moe module

# This is the 10th commit message:

make expert count private

* add assign pos op

* fix upper num name

* add api _assign pos

* add ut for assign pos op

* update date

* add op about moe gate

update utils

add limit by capacity op

add ut for limit_by_capacity

add ut for prune_gate_by_capacity

add ut for limit_by_capacity

add ut for prune_gate_by_capacity

* fix for win

* fix bugs in test_limit_by_capacity_op

* update ut

* update for test (timeout)

* fix ut

* update

* update(fix) ut for win

* moe apis in incubate

* # This is a combination of 10 commits.
# The first commit's message is:
add expert count op

add ut for expert_count

# This is the 2nd commit message:

update UT only for cuda

# This is the 3rd commit message:

fix for rocm

# This is the 4th commit message:

update ut

# This is the 5th commit message:

add moe module

# This is the 6th commit message:

add expert count op

add ut for expert_count

# This is the 7th commit message:

update UT only for cuda

# This is the 8th commit message:

update ut

# This is the 9th commit message:

add moe module

# This is the 10th commit message:

make expert count private

* add assign pos op

* fix upper num name

* add api _assign pos

* add ut for assign pos op

* update date

* fix for win

* update for test (timeout)

* fix ut

* update

* fix ut for number count

* add apis and utils

* add gate apis

* add moe and grad clip apis

* update moe apis

* add ops for moe gate

* fix

* update for base moe layer api

* add random routing op

add _random_routing api in utils

add random routing ut

* fix for dygraph

* update with ranodm routing

* update

* fix ut for limit by capacity

* update
Co-authored-by: Nhlygit66666 <2570058140@qq.com>

aeade538

25 3月, 2022 1 次提交

Refactor Dygraph Flags (#40786) · 3085d5e4

由 Jiabin Yang 提交于 3月 25, 2022

* refactor eager flags

* fix flags error when we switch from eager to dygraph

* fix ci problem

* fix ci

* fix ci

* merge develop and fix code style

* merge develop and fix code style

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* merge develop

3085d5e4

22 3月, 2022 1 次提交

[phi] Update graph_send_recv OP (#40509) · 67b46e45

由 Siming Dai 提交于 3月 22, 2022

* add out_size shape for graph_send_recv

* fix bug in register kernel: no const int& support

* add out_size in infermeta

* change unittest

* fix unittest

* fix out_size default value

* fix doc

* delete arg mapping

* add sig

* move -1 to 0

* move -1 to 0

67b46e45

16 3月, 2022 1 次提交
- Z
  [Ops] segment pool op support for int int64 kernel. (#40577) · 6849d33b
  由 Zhong Hui 提交于 3月 16, 2022
```
* segment pool support for int int64 kernel.

* add support in python api
```
  6849d33b
14 3月, 2022 1 次提交

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors ... · e553f758

由 Zhong Hui 提交于 3月 14, 2022

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors  between python processes. (#37302)

* Add support for paddle.multiprocessing
* move multiprocessing to incubate.

e553f758

11 3月, 2022 1 次提交
- Y
  
  [hybrid] Support tensor parallel and cache structure for fused attention op. (#40101) · 1882c496
  由 Yuang Liu 提交于 3月 11, 2022
  
  1882c496
01 3月, 2022 1 次提交
- S
  Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
  由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
  d17961ed
25 2月, 2022 1 次提交

Add MultiTensorApply to calculate L2-Norm in DistributedFusedLamb optimizer (#39900) · d32a0102

由 sneaxiy 提交于 2月 25, 2022

* add multi tensor apply l2 norm

* add multi_tensor_apply code

* make sizeof(TensorMeta) smalller

* move code to distributed_fused_lamb_op.cu

* remove useless FLAGS

d32a0102

24 2月, 2022 1 次提交
- L
  fix 'invalid escape sequence' (#39842) · 4e26fa57
  由 Leo Chen 提交于 2月 24, 2022
```
* fix 'invalid escape sequence'

* fix assert error
```
  4e26fa57
19 2月, 2022 1 次提交

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

28 1月, 2022 1 次提交
- Z
  
  recovery code (#39287) · 45f9c9eb
  由 zhangkaihuo 提交于 1月 28, 2022
  
  45f9c9eb
27 1月, 2022 2 次提交

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

Add SparseCooTensor and SparseCsrTensor (#38906) · a7edb3f3

由 zhangkaihuo 提交于 1月 27, 2022

* fix bug:
1. atten: set the default value of attn_dropout_rate to None
2. ffn: add activation parameter

* for pure fp16

* Add a SparseCsrTensor

* remove unused functional

* remove const

* remove SetMemoberTensor

* remove non_zero_nums_, the number of non zero elements of each batch can be obtained from the crows

* SparseCooTensor

* add SetMember

* merge upstream; add SetMember

* merge upstream

* merge upstream; add newline at end of file

* add newline at end of file

* remove newline at end of file

* remove newline at end of file

* stash

* user pten::framework::make_ddim

* user pten::framework::make_ddim

* merge upstream; use the latest mutable_data

* merge upstream; use the latest mutable_data

* return mutable dense tensor

a7edb3f3

22 12月, 2021 1 次提交
- Z
  
  Replaced core.ops with _C_ops (#38337) · 242ef2b9
  由 Zhanlue Yang 提交于 12月 22, 2021
  
  242ef2b9

csdn_franckjun / Paddle 与 Fork 源项目一致

csdn_franckjun / Paddle
与 Fork 源项目一致