提交 · 0a963ee9211174766dd4f718b43f9965b467cd4b · PaddlePaddle / Paddle

01 11月, 2021 2 次提交

add cinn_launch_op for using CINN to optimize graph (#36600) · 0a963ee9

由 CtfGo 提交于 11月 01, 2021

增加CinnLaunchOp，负责执行Cinn子图编译的结果，要点如下：
1. 在子图划分的BuildCinnPass中，每个子图在原图中会被替换为该CinnLaunchOp，由它来调用Cinn进行子图编译、执行的功能。
2. CinnLaunchOp的输入/输出即为子图的输入和输出，另外增加`compilation_key`属性，它可由该属性key从全局Cache中获取子图对象、编译结果，该属性由BuildCinnPass在创建Op时进行设置
3. CinnLaunchOp功能实现的流程为：
        - 从全局Cache中获取子图对象
        - 从全局Cache中获取子图编译结果，未命中cache时进行即时编译
        - 根据编译结果的变量信息(数据类型、shape）初始化运行时数据，分配内存/显存
        - 将运行时数据打包为参数，调用cinn的可执行对象runtime program进行计算
        - 子图运行结果通过参数指针同步到paddle侧的tensor

0a963ee9

add googlenet (#36034) · 8937205b

由 Nyakku Shigure 提交于 11月 01, 2021

* update AvgPool2D to AdaptiveAvgPool2D
* class_num -> num_classes
* add en doc
* add googlenet to pretrained test
* remove weights name
* add parameter with_pool
* update en doc
* fix googlenet out shape
* 2020 -> 2021
Co-authored-by: Ainavo <ainavo@163.com>
Co-authored-by: Npithygit <pyg20200403@163.com>
Co-authored-by: Ainavo <ainavo@163.com>
Co-authored-by: Npithygit <pyg20200403@163.com>

8937205b

29 10月, 2021 10 次提交

T
add some ops support fp16 in kunlun2 (#36854) · 442688a8
由 taixiurong 提交于 10月 29, 2021
```
* aaaa

* add some ops support fp16 in kunlun2
```
442688a8
M

Move the ASP training API to paddle.static.sparsity. (#36525) · 113816d8
由 Ming-Xu Huang 提交于 10月 29, 2021

113816d8
B

fix matmul error when input's dim is 3 (#36849) · f6b4ed22
由 baoachun 提交于 10月 29, 2021

f6b4ed22
N

Add io api and compute api for XPU (#36423) · 89a8989f
由 niuliling123 提交于 10月 29, 2021

89a8989f

add new API/OP: paddle.linalg.triangular_solve (#36714) · 92d6a048

由 zhouweiwei2014 提交于 10月 29, 2021

* add new API: paddle.linalg.triangular_solve

* add new API/OP: paddle.linalg.triangular_solve

* add new API/OP: paddle.linalg.triangular_solve

* fix comment

92d6a048

W
fix some bug in new executor (#36822) · b5af9575
由 wanghuancoder 提交于 10月 29, 2021
```
* fix some bug in new executor, test=develop

* fix error message, test=develop
```
b5af9575

[new-exec] enable check_nan_inf (#36802) · be55bac3

由 Leo Chen 提交于 10月 29, 2021

* enable check_nan_inf and fix variable scope

* add ut

* fix bug

* update ut

* revert doc change

* fix npu compile

be55bac3

W

fix dcnv2 trt8 compile error (#36850) · 82fb63eb
由 wangxinxin08 提交于 10月 29, 2021

82fb63eb
F
1. fix ifftshift(missing negative sign before shifts); (#36834) · f3ee5c99
由 Feiyu Chan 提交于 10月 29, 2021
```
2. add complex data type support for paddle.shape at graph assembly.
```
f3ee5c99

[Auto Parallel] Improve the interface and the underlying mechanisms (#36617) · a02532b5

由 Yulong Ao 提交于 10月 29, 2021

* default dist op

* add dist_attr for dist op

* add unitest

* update inputname

* update function name

* add unitest

* update CMakeLists.txt for CI

* fix dis_matmul

* fix compile error

* update matmul to matmul_v2

* unify api

* unify api

* todo

* update distop forward func

* update distop forward func

* auto parallel backward

* update dist op

* autoparallel backward

* add backward for embedding

* temp1

* temp2

* temp3

* temp4

* backward done1

* backward done2

* backward done3

* dist embedding remove mp mode

* dist matmul remove mp mode

* update dist embedding
『

* dist op init1

* dist op init 2

* update unitest

* context remove parallel mode

* partitioner remove parallel mode

* update unitest

* a more general method to support varying mesh in pipeline parallel

* support varying mesh in pipeline parallel

* embedding support varying mesh in pipeline parallel

* matmul support varying mesh in pipeline parallel

* default dist op support varying mesh in pipeline parallel

* dist attribute for startup program

* default dist op support varying mesh in pipeline parallel 2

* partitoner support varying mesh in pipeline parallel

* revise logic for auto compeletion

* revise framework.py

* revise reshard unitest

* revise unitest for parallelize

* chmod

* fixed bug for dist embedding name mapping

* Improve the interface and the underlying mechanisms of auto parallel

* revise completion for backward

* revise completion for update

* revise completion for update

* update unitest

* chmod

* bugfix for grad_op output var's mesh

* Modify codes for pr 36744

* Remove unnecessary comments in framework.py

* Remove unnecessary comments in completion.py
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
Co-authored-by: NJZ-LIANG <38102074+JZ-LIANG@users.noreply.github.com>

a02532b5

28 10月, 2021 20 次提交
- C
  Update ci reviewer (#36839) · 2e40cfb5
  由 Chen Long 提交于 10月 28, 2021
```
* update readme test=document_fix

* update ci reviewer list of api docs

* add docs info for api docs change; test=document_fix
```
  2e40cfb5
- P
  Expose paddle.version.show API and add doc for it (#36800) · d88c3e12
  由 pangyoki 提交于 10月 28, 2021
```
* add doc for show() in paddle.version

* fix format

* print cuda and cudnn in show API
```
  d88c3e12
- Z
  Fix several bugs for enabling Paddle to train with CINN. (#36739) · c93331c5
  由 Zhen Wang 提交于 10月 28, 2021
```
* Update the content of `test_parallel_executor_run_cinn.py`.

* Fix some bugs in the topological sort and `CreateNewSubGraph`.

* Update the CINN commit id used by Paddle.

* Update the unit test to `add+relu`.

* Update according to reviewers' suggestion.
```
  c93331c5
- R
  [NPU] Add int64 supporting for expand_v2, reduce_max, scale and tests (#36582) · c038cc7a
  由 ronnywang 提交于 10月 28, 2021
```
* add TypeAdapter method for npu_op_runner

* add int64 supporting for elementwise_mul and reduce_sum

* add int64 supporting and UT for expand_v2, scale and reduce_max

* fix bug
```
  c038cc7a
- 0
  
  polish _remove_no_value_return_var() function (#36826) · adb28d67
  由 0x45f 提交于 10月 28, 2021
  
  adb28d67
- S
  
  lower cpu_parallel_job's parallel num to 10 to avoiding timeout (#36798) · d118c8b7
  由 Sing_chan 提交于 10月 28, 2021
  
  d118c8b7
- X
  
  support quantization of bert (#36593) · 6390b175
  由 XGZhang 提交于 10月 28, 2021
  
  6390b175
- L
  [fix-doc-bug] Fix fused_attention_op english doc test=document_fix (#36803) · 11c2874e
  由 Li Min 提交于 10月 28, 2021
```
* Fix fused_attention english doc test=document_fix
```
  11c2874e
- H
  ctc grad compute on gpu (#36756) · 54ef9d06
  由 Hui Zhang 提交于 10月 28, 2021
```
* Revert "Align CTC grad scale same with ESPNet (#34729)"

This reverts commit 10f9644c.

* ctc grad compute on gpu
```
  54ef9d06
- W
  save/load in ps runtime(the_one_ps) (#36097) · e7842ba6
  由 wangguanqun 提交于 10月 28, 2021
```
* add trainer desc config to distributed strategy

* code style modified

* data_feed set lod

* fix bug

* code style

* fix bug

* save load

* save load

* save unittest

* add unittest of the_one_ps

* unittest

* add todo in communicator sendsparse
```
  e7842ba6
- L
  
  Rewrite Softmax in Kernel Primitive API, test=develop (#36706) · ef76f664
  由 Liu-xiandong 提交于 10月 28, 2021
  
  ef76f664
- X
  support inference for quantized matmul_v2 (#36594) · b151a451
  由 XGZhang 提交于 10月 28, 2021
```
* support inference for quantized matmul_v2

* undate code style

* code style
```
  b151a451
- S
  
  fix MultiSlotDataGenerator error (#36773) · dc0178ef
  由 seemingwang 提交于 10月 28, 2021
  
  dc0178ef
- L
  Fix cancel (#36740) · 704e454f
  由 liutiexing 提交于 10月 28, 2021
```
* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* update

* update

* update Error MSG

* update EventsWaiter

* Add Cancel For ThreadPool

* Add UT for Cancel

* fix Cancel
```
  704e454f
- B
  
  Add lazy distributed launch with rank mapping (#36570) · 7de3f81c
  由 Bo Liu 提交于 10月 28, 2021
  
  7de3f81c
- L
  Fix fused_attention_op and fused_feedforward_op bug when pre_layer_norm is false. (#36793) · ff3018d7
  由 Li Min 提交于 10月 28, 2021
```
* Fix bug when pre_layer_norm is false.
```
  ff3018d7
- A
  Modify Struct into Class to improve encapsulation and Polish code exception (#36797) · 9516108a
  由 Aurelius84 提交于 10月 28, 2021
```
* Refactor InterpreterCore code

* make tuple
```
  9516108a
- F
  change api to support trt8 in pool3d_op_convert (#36783) · a7d8837b
  由 feng_shuai 提交于 10月 28, 2021
```
* change api for support trt8

* fix:change api
```
  a7d8837b
- L
  fix device docs;test=document_fix (#36784) · d4b0d03b
  由 Ligoml 提交于 10月 28, 2021
```
* fix device docs;test=document_fix

* update __init__.py
```
  d4b0d03b
- L
  
  first commit (#36778) · 6edbdbfa
  由 limingshu 提交于 10月 28, 2021
  
  6edbdbfa
27 10月, 2021 8 次提交
- P
  
  add unittest (#36511) · 51a33962
  由 pangyoki 提交于 10月 27, 2021
  
  51a33962
- Q
  [ROCM] add custom op support, test=develop (#36771) · dd1d3789
  由 Qi Li 提交于 10月 27, 2021
```
* [ROCM] add custom op support, test=develop

* remove debug codes, test=develop
```
  dd1d3789
- W
  GeneratePass support attr condition and mapping (#36747) · 5c569aef
  由 wuhuanzhou 提交于 10月 27, 2021
```
* GeneratePass support attr condition and mapping, test=develop

* fix coverage, test=develop
```
  5c569aef
- P
  add paddle.version.cuda and paddle.version.cudnn API (#36556) · d65f41db
  由 pangyoki 提交于 10月 27, 2021
```
* add paddle.version.cuda and paddle.version.cudnn API

* fix little bug

* fix bug

* add doc string

* fix mkdir error

* fix windows path

* fix new paddle/version path

* fix unittest

* fix format
```
  d65f41db
- Z
  
  fix dygraph adamw (#36745) · b42a7370
  由 zhaoyingli 提交于 10月 27, 2021
  
  b42a7370
- W
  add dcnv2 trt plugin (#36612) · 8c3decd8
  由 wangxinxin08 提交于 10月 27, 2021
```
* add dcnv2 plugin
```
  8c3decd8
- Z
  
  fix ernie serialize problem (#36769) · d6b1beb0
  由 zlsh80826 提交于 10月 27, 2021
  
  d6b1beb0
- J
  [Auto Parallel] Completion Dist Attribute for Backward & Update stage (#36744) · 5e9845b8
  由 JZ-LIANG 提交于 10月 27, 2021
```
* revise completion for backward

* revise completion for update

* revise completion for update

* update unitest
```
  5e9845b8

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功