提交 · 74d411e751f47655621a4f17bae35f7fe93c6af7 · PaddlePaddle / Paddle

28 11月, 2022 1 次提交

Fix bug of TransToFluidOpName (#48355) · d3f52efd

由 zyfncg 提交于 11月 28, 2022

* add fluid_op_name_map

* rename some kernel name

* add comments for op-kernel map

* refine map name of op to kernel

d3f52efd

25 11月, 2022 2 次提交

[PROFILER] add flops for Profiler (#47766) · 3d1981ad

由 Chitsing KUI 提交于 11月 25, 2022

* attr ready

* op ip ready

* start dynamic

* end2end ok

* input shape to map, stat by op

* layer wip

* first version ready

* fix proto depds

* fix profiler deps

* fix flops typo, rm tuple shape

3d1981ad

W

fix mac python link (#48317) · 00b3b4bd
由 wanghuancoder 提交于 11月 25, 2022

00b3b4bd

24 11月, 2022 4 次提交

[PHI decoupling] simplify "convert_utils.h" in fluid (#48168) · de4310e6

由 huangjiyi 提交于 11月 24, 2022

* rm dependence to "convert_utils.h" in some files

* fix bugs

* replace DataType2String with DataTypeToString

* replace framework::DataTypeSize with phi::SizeOf

* mv convert_function from fluid to phi and rm old map

* recommit with pre-commit

* repalce ProtoVarType with ProtoDataType and update comment.

* fix error about include "dnnl.hpp"

* revert add dep mkldnn to convert_utils in phi

* add mkldnn deps in convert_utils.h in phi

* move deps to convert_utils.h in phi

de4310e6

[Phi Support CuDNN] Support ALL CuDNN (#47865) · 1623f1b4

由 HongyuJia 提交于 11月 24, 2022

* support default use_gpudnn=True

* fully support cudnn in phi

* add header file

* add white_list, verify accuracy

* phi support all cudnn

* opt affine_grad

* try different arches of pretrained_model

* try different arches of pretrained_model

* add debug string

* debug eager_method

* add debug string, pass all local ctest

* polish all debug code

* delete use_cudnn relevant code autogen

* fix depthwise_conv2d

* Share all other members of Tensor except use_cudnn

* polish codes according to review opinion

* polish codes according to review opinion, fix bug

* polish codes according to review opinion, opt performance

* polish codes according to review opinion, fix pooling.py

1623f1b4

do not calc reduce_all in eager mode (#48199) · bcf75132

由 wanghuancoder 提交于 11月 24, 2022

* do not calc reduce_all in eager mode

* refine python c cast list

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

bcf75132

W
dense tensor in eager mode support data_ptr (#48235) · 3f265815
由 wanghuancoder 提交于 11月 24, 2022
```
* dense tensor in eager mode support data_ptr
```
3f265815

23 11月, 2022 1 次提交

Add nparray case for basic operator (#48229) · b7d3143f

由 Charles-hit 提交于 11月 23, 2022

* add nparray case for basic operator

* fix unit test

* fix unit test

* add unit test

* fix unit test

b7d3143f

21 11月, 2022 4 次提交
- L
  fix doc of NPUPlace (#48148) · 809516f6
  由 Leo Chen 提交于 11月 21, 2022
```
* fix doc of NPUPlace

* fix doc of NPUPlace, test=document_fix
```
  809516f6
- W
  Unify `ProcessGroupNCCL` APIs underlying implementation (#48163) · 88410225
  由 Wen Sun 提交于 11月 21, 2022
```
* refactor: replace Collective & PointToPoint with NCCLEnv

* refactor: rename to RunFnInNCCLEnv

* refactor: pass std::function by value
```
  88410225
- L
  
  add new map instance (#48145) · 2a47416c
  由 LiYuRio 提交于 11月 21, 2022
  
  2a47416c
- L
  
  return pointer rather than reference (#48152) · 403d58bb
  由 LiYuRio 提交于 11月 21, 2022
  
  403d58bb
18 11月, 2022 3 次提交

Z
Fix bug of zero_allocator in HostAlloc (#48108) · 7f92e27e
由 zyfncg 提交于 11月 18, 2022
```
* fix bug of zero_allocator in host

* fix test compile bug

* add unittest

* update test
```
7f92e27e
W

Refactor collective communication reduce, scatter, reduce_scatter C++ API (#48115) · edda13cd
由 Wen Sun 提交于 11月 18, 2022

edda13cd

fix device id issue for xpu eager mode (#48076) · 3b18d96b

由 james 提交于 11月 18, 2022

* fix device id issue for xpu eager

xpu device id is not correctly set in eager mode, thus vars are on dev0 unless
XPUDeviceGurad is called, leading to this error message for all node rank != 0:
"NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported."

* fix typo

* fix pybind error

3b18d96b

17 11月, 2022 2 次提交
- W
  
  Refactor collective communication all_to_all, all_to_all_single C++ API (#48059) · 3f480af2
  由 Wen Sun 提交于 11月 17, 2022
  
  3f480af2
- Y
  Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16... · e5ed5257
  由 Yuang Liu 提交于 11月 17, 2022
```
Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16 training with tensor fusion. (#48041)

* add bfloat16 for adamw

* set lr not to bfloat16 for pure bf16 training

* update the logic

* update the adamw optimizer

* support bfloat for adam
```
  e5ed5257
16 11月, 2022 2 次提交
- L
  
  increase the level of some log (#47990) · 2f8901cb
  由 Leo Chen 提交于 11月 16, 2022
  
  2f8901cb
- C
  
  feat(ipu): add paddle inference support for model_runtime. (#47364) · 39c85064
  由 czr-gc 提交于 11月 16, 2022
  
  39c85064
14 11月, 2022 4 次提交
- W
  Refactor collective communication send_partial, recv_partial, all_gather_partial C++ API (#47863) · 25e63dca
  由 Wen Sun 提交于 11月 14, 2022
```
* refactor: simplify send, recv interfaces

* refactor: rm send_partial, recv_partial, all_gather_partial
```
  25e63dca
- L
  
  Remove place for process group (#47857) · 2d383b81
  由 LiYuRio 提交于 11月 14, 2022
  
  2d383b81
- L
  
  remove heter and hccl (#47918) · 9191e743
  由 LiYuRio 提交于 11月 14, 2022
  
  9191e743
- E
  
  add lite opencl support api (#47112) · 798ab3f9
  由 engineer1109 提交于 11月 14, 2022
  
  798ab3f9
10 11月, 2022 4 次提交

W
Get grads types from cpp for adam to speed up (#47769) · 5900129c
由 WangZhen 提交于 11月 10, 2022
```
Get grads types from cpp for adam to speed up
```
5900129c

[PHI]Standardise some C++ API (Part4) (#47702) · 594bd723

由 YuanRisheng 提交于 11月 10, 2022

* standard api

* fix sparse bugs

* fix xpu bugs, test=kunlun

* remove hard code for custom unittest

* open ci, test=kunlun

* deal with conflict

594bd723

XPU multi-card support eager mode (#47445) · 3b91f8f3

由 james 提交于 11月 10, 2022

* XPU support eager mode

* add unittest for XPU eager mode

* minor bugfix

* minor bugfix, test=kunlun

* correct copyright info

* 1. remove unsed vars/funcs
2. ProcessGroupBKCL inherit from ProcessGroupStream

* bugfix for fp16 in eager mode multi-card, test=kunlun

* rebase & fix a few issues

* use new processgroup interface, test=kunlun

* fix compile issue, test=kunlun

3b91f8f3

W
Refactor collective communication P2P C++ API (#47801) · d926c270
由 Wen Sun 提交于 11月 10, 2022
```
* refactor: send, recv, send_partial, recv_partial

* refactor: rm useless const ref
```
d926c270

09 11月, 2022 4 次提交

Get grads from cpp for optimizer to avoid gpu idel time (#47709) · 261ebb0c

由 WangZhen 提交于 11月 09, 2022

* Get params and grads in cpp to avoid gpu idel time

* Using python param instead of cpp return param to fix test_asp_optimize_dynamic.py

* Get grads from cpp and construct params_grads on python

* Check meta and remove comments

261ebb0c

Enable fc passes (#45704) · 7e914386

由 Paulina Gacek 提交于 11月 09, 2022

* Analysis API interface for disabling fc passes

* Unit tests corrected

* Python API added

* test runs only when PADDLE_WITH_MKLDNN

* Fc op changed to relu in matmul_op_test

* Disable fc passes in tests where acc drops

* code formating

* Unit test for analysisConf added

* Unit test gpu added

* fc passes disabled when iterations=0 in gru test

* style

* passes disabled when fp32 in gru test

* fc passes disabled in lstm test

* Import from inference, not fluid in doc

7e914386

W

refactor: ProcessGroupNCCL (#47740) · ae14bad1
由 Wen Sun 提交于 11月 09, 2022

ae14bad1
W
refine python call error report (#47724) · 5c7fce47
由 wanghuancoder 提交于 11月 09, 2022
```
* refine python call error report
```
5c7fce47

08 11月, 2022 2 次提交
- L
  
  refine comm api implementation (#47713) · 84c9a0d6
  由 LiYuRio 提交于 11月 08, 2022
  
  84c9a0d6
- R
  
  [CustomDevice] fix undefined symbol GetCCLComm in the cpu version (#47717) · 97004f67
  由 ronnywang 提交于 11月 08, 2022
  
  97004f67
07 11月, 2022 4 次提交
- W
  
  remove hardcoded -Wunused-variable compiler flags (#47706) · 45bc4542
  由 Wang Xin 提交于 11月 07, 2022
  
  45bc4542
- W
  Get three grad lists in CPP to avoid gpu idle time (#47665) · 01bfe786
  由 WangZhen 提交于 11月 07, 2022
```
* Get three grad lists in CPP to avoid gpu idle time

* Support legacy mode
```
  01bfe786
- Q
  
  [cusotm device] add python inference api, test=develop (#46460) · 6074c50a
  由 Qi Li 提交于 11月 07, 2022
  
  6074c50a
- W
  
  Refactor collective communication all_gather, all_reduce, broadcast & barrier C++ API (#47481) · e1a1c354
  由 Wen Sun 提交于 11月 07, 2022
  
  e1a1c354
04 11月, 2022 1 次提交
- W
  fix cc_library link python lib (#47605) · cd59c10c
  由 wanghuancoder 提交于 11月 04, 2022
```
* fix cc_library link python lib
```
  cd59c10c
03 11月, 2022 1 次提交
- L
  
  clean unused code: save_load_util.cc/.h (#47588) · 6d0f730d
  由 Leo Chen 提交于 11月 03, 2022
  
  6d0f730d
01 11月, 2022 1 次提交
- Y
  
  [Paddle Inference] add RegisterOutputHook interface (#47050) · db323927
  由 Yuanle Liu 提交于 11月 01, 2022
  
  db323927

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功