提交 · 5efc4146d3d3db1a7789364d1d04d444cabf5368 · Crayon鑫 / Paddle

20 5月, 2022 3 次提交
- Z
  
  add arg_max tensorrt converter, fix identity_scale_op_clean_pass (#42850) · 5efc4146
  由 zhupengyang 提交于 5月 20, 2022
  
  5efc4146
- Z
  
  [MLU]support to spawn processes on mlu (#41787) · 5d1bbecb
  由 zn 提交于 5月 20, 2022
  
  5d1bbecb
- Y
  merge dymf branch (#42714) · 3f619290
  由 yaoxuefeng 提交于 5月 20, 2022
```
merge dymf branch
```
  3f619290
19 5月, 2022 11 次提交

Q

[MLU] add lookup_table_v2 and unstack op (#42847) · e726960a
由 qipengh 提交于 5月 19, 2022

e726960a
R
Fix PD_INFER_DECL redefine (#42731) · 313f5d01
由 Rui Li 提交于 5月 19, 2022
```
Signed-off-by: NKernelErr <me@lirui.tech>
```
313f5d01

OneDNN md-in-tensor refactoring part 3: Changes in quantize and dequantize (#42766) · b522ca52

由 jakpiase 提交于 5月 19, 2022

* added md support inside (de)quantizes

* added missing file

* changed paddle enforce text

* another paddle enforce change

* same as before

* removed broken tests

b522ca52

【CI】run all demo ci before exit in windows (#42700) · 6d0e4e4a

由 Sing_chan 提交于 5月 19, 2022

* run all demo ci before exit;test=document_fix;test=windows_ci_inference

* fix bug;test=document_fix;test=windows_ci_inference

* improve log

* commetn test code

* modify according to zhouwei's comments

6d0e4e4a

[Phi] Change the output format of C++ backward api (Part2) (#42545) · 4427f1b1

由 zyfncg 提交于 5月 19, 2022

* change the output format of C++ backward api

* fix merge conflict

* fix sparse api code auto-gen

* fix eager_gen bug

* fix bug of output is null

* fix bug of conv2d_grad_impl

* fix optional grad

* fix bug of eager-gen double_grad

* fix bug

* fix multiply_double_grad bug

* fix bug of higher order derivative

* fix bug of FillZeroForEmptyGradInput

* remove redundant vector in grad_node

* fix bug of test_deformable_conv_v1_op

* fix bug of test_deformable_conv_v1_op

* some refacotr

4427f1b1

A

[NPU] minor changes for version control to support version without suffix (#42856) · 892f6850
由 Aganlengzi 提交于 5月 19, 2022

892f6850
D

【GPUPS】add ctr_dymf_accessor for pscore (#42827) · 148582fe
由 danleifeng 提交于 5月 19, 2022

148582fe
Z
[Phi] Remove shared_storage (#42821) · 7a171e3c
由 zyfncg 提交于 5月 19, 2022
```
* remove shared_storage

* fix bug

* fix rnn bug
```
7a171e3c
C
[CompileOpt] Refine enforce code and remove boost/variant include (#41093) · ca359fec
由 Chen Weihang 提交于 5月 19, 2022
```
* refine enforce code

* refine enforce code

* fix compile failed

* fix infrt failed
```
ca359fec

distribute label evenly among partitions in graph engine (#42846) · 68babef1

由 seemingwang 提交于 5月 19, 2022

* enable graph-engine to return all id

* change vector's dimension

* change vector's dimension

* enlarge returned ids dimensions

* add actual_val

* change vlog

* fix bug

* bug fix

* bug fix

* fix display test

* singleton of gpu_graph_wrapper

* change sample result's structure to fit training

* recover sample code

* fix

* secondary sample

* add graph partition

* fix pybind

* optimize buffer allocation

* fix node transfer problem

* remove log

* support 32G+ graph on single gpu

* remove logs

* fix

* fix

* fix cpu query

* display info

* remove log

* remove empyt file

* distribute labeled data evenly in graph engine
Co-authored-by: NDesmonDay <908660116@qq.com>

68babef1

[TensorRT] Support yolov5s (#42688) · a7778930

由 shentanyue 提交于 5月 19, 2022

* support yolov5s static/int8

* fix eltwise_sub and div weight compute

* fix delete_fill_constant_pass

a7778930

18 5月, 2022 7 次提交
- C
  
  fix tensorrt dla int8 problem (#42826) · a51817d7
  由 csy0225 提交于 5月 18, 2022
  
  a51817d7
- F
  Add Code Generation for operators, op makers and argument mapping functions (#41772) · e339d3c1
  由 Feiyu Chan 提交于 5月 18, 2022
```
Add Code Generation for operators,  op makers and argument mapping functions (#41772)
```
  e339d3c1
- W
  [Eager] Polish eager code generation (#42822) · b9342a80
  由 Weilong Wu 提交于 5月 18, 2022
```
* [Eager] Polish eager code generation

* Remove useless code in codegen
```
  b9342a80
- S
  matmul and matmul_v2 refactor (#42732) · 570d0322
  由 Sławomir Siwek 提交于 5月 18, 2022
```
* matmul refactor

* remove UT which only check ENFORCE output

* code format

* improve memory usage
```
  570d0322
- A
  [NPU] add take_along_axis and take_along_axis_grad kernels (#42773) · 6f0a28f5
  由 Aganlengzi 提交于 5月 18, 2022
```
* [NPU] add take_along_axis and take_along_axis_grad ops

* [NPU] add take_along_axis and take_along_axis_grad ops

* fix ut because cpu kernel can not be fallbacked
```
  6f0a28f5
- Y
  
  [collective] dynamic shape for send_v2 and recv_v2 (#42765) · 1f64c42e
  由 Yuang Liu 提交于 5月 18, 2022
  
  1f64c42e
- T
  Fix graph hang (#42768) · 133d63fa
  由 Thunderbrook 提交于 5月 18, 2022
```
* fix device_free

* fix hang
```
  133d63fa
17 5月, 2022 8 次提交
- C
  
  polish kernel type str (#42791) · d3686376
  由 Chen Weihang 提交于 5月 17, 2022
  
  d3686376
- A
  [NPU] add multinomial op (#42613) · fd140696
  由 Aganlengzi 提交于 5月 17, 2022
```
* [NPU] add multinomial op

* fix place

* deal with cann version

* fix for old operator

* change another way
```
  fd140696
- Z
  
  add yolo_box_fuse_pass, yolo_box_head_op, yolo_box_post_op (#42641) · 6b58de95
  由 zhupengyang 提交于 5月 17, 2022
  
  6b58de95
- C
  [Eager] Add nan and inf check utils (#42763) · a51c492c
  由 Chen Weihang 提交于 5月 17, 2022
```
* add nan_inf_utils for eager

* support check nan and inf

* add unittest for coverage
```
  a51c492c
- S
  
  refine cpu query (#42803) · 9b15efce
  由 Siming Dai 提交于 5月 17, 2022
  
  9b15efce
- A
  [IPU] rm updateOptimizerFromHost for eval mode (#42800) · b2d8f6df
  由 Allen Guo 提交于 5月 17, 2022
```
* rm updateOptimizerFromHost for eval mode (#742)

* rm updateOptimizerFromHost for eval mode

* fix ci

* clean files
```
  b2d8f6df
- C
  [Eager] Adapt faster tokenizer op (#42718) · b189e83f
  由 Chen Weihang 提交于 5月 17, 2022
```
* adapt faster tokenizer op

* add eager test

* add unittest
```
  b189e83f
- A
  
  [NPU] add reduce_max_grad op (#42672) · 78d5cf7b
  由 Aganlengzi 提交于 5月 17, 2022
  
  78d5cf7b
16 5月, 2022 11 次提交

Enable bfloat16 for VIT-OCR model. (#42758) · c714926d

由 Tomasz Socha 提交于 5月 16, 2022

* Clean-up bfloat16 tester

* New blacklist mechanizm for dequantization

* Style

* Style II

* Style III

c714926d

N

delete rank switch in broadcast_function.h for compile (#42645) · 8501fb00
由 niuliling123 提交于 5月 16, 2022

8501fb00
L
Fix statistics (#42232) · 5c811382
由 liutiexing 提交于 5月 16, 2022
```
* WorkQueue supports always_spinning option

* update

* update

* fix stat
```
5c811382
C

fix trace op record event error (#42775) · 5d136510
由 Chen Weihang 提交于 5月 16, 2022

5d136510

Add the new XDNN implementation. test=kunlun (#42683) · 87667c66

由 wbn 提交于 5月 16, 2022

* Add the new XDNN implementation. test=kunlun

* Add the new XDNN implementation. test=kunlun

* Modify the code based on review, test=kunlun

87667c66

Y

Optimize linspace to avoid GPU -> CPU copy. (#42750) · 34cda80b
由 Yiqun Liu 提交于 5月 16, 2022

34cda80b
Z
[Phi] Refactor format of inplace C++ api (#42735) · 5924458b
由 zyfncg 提交于 5月 16, 2022
```
* update code

* change the return type for inplace dygraph api

* change the tuple construct
```
5924458b
W
[Eager] Fix test_model.py under eager in windows-openblas ci (#42756) · 566ccfef
由 Weilong Wu 提交于 5月 16, 2022
```
* [Eager] Fix test_model.py under eager in windows-openblas ci

* Use || in this case

* recover _in_eager_mode_
```
566ccfef
W

fused_multi_transformer add fused softmax mask (#42636) · f9d5ae4e
由 WangXi 提交于 5月 16, 2022

f9d5ae4e

optimize cinn find graph by graph address (#42697) · 661d0800

由 jiangcheng 提交于 5月 16, 2022

* optimize cinn find graph by graph address

* graph_key use int64_t instead of program string

* fix framework _to_readable_code python code

* rename get_readable_comile_key to get_serialize_comile_key

661d0800

fix node transfer problem (#42674) · b61a6e71

由 seemingwang 提交于 5月 16, 2022

* enable graph-engine to return all id

* change vector's dimension

* change vector's dimension

* enlarge returned ids dimensions

* add actual_val

* change vlog

* fix bug

* bug fix

* bug fix

* fix display test

* singleton of gpu_graph_wrapper

* change sample result's structure to fit training

* recover sample code

* fix

* secondary sample

* add graph partition

* fix pybind

* optimize buffer allocation

* fix node transfer problem

* remove log

* support 32G+ graph on single gpu

* remove logs

* fix

* fix

* fix cpu query

* display info

* remove log

* remove empyt file
Co-authored-by: NDesmonDay <908660116@qq.com>

b61a6e71

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致