提交 · 2ffb337183f8d970c8b6eca002963061f48afba6 · 机器未来 / Paddle

22 5月, 2022 1 次提交

Quantize elementwise sub (#42854) · 2ffb3371

由 Zuza Gawrysiak 提交于 5月 22, 2022

* Add elementwise_sub quantization

* Remove unnecessary comments

* Specify names for tests

* Remove comments

* Remove comments leftovers

2ffb3371

21 5月, 2022 1 次提交

delete PADDLE_WITH_TESTING in memory_block_desc (#41817) · 7b6bf281

由 pangyoki 提交于 5月 21, 2022

* delete PADDLE_WITH_TESTING in memory_block_desc

* test FLAGS_allocator_strategy=naive_best_fit

* delete flag naive_best_fit

7b6bf281

20 5月, 2022 10 次提交

W

fix fused_attention_op cacheKV InferShape (#42900) · 7306d1fb
由 WangXi 提交于 5月 20, 2022

7306d1fb

use fp32 compute type for cublasGemmStridedBatchedEx with fp16 input/output (#42851) · f36a9464

由 Leo Chen 提交于 5月 20, 2022

* use fp32 compute type for cublasGemmStridedBatchedEx with fp16 input/output

* add flags to control compute type

* default to false

* add unit test

* default to true

f36a9464

Y

move activation kernel (#42880) · 191c441a
由 YuanRisheng 提交于 5月 20, 2022

191c441a
J

fix hook mem leak (#42857) · 723c4ae7
由 Jiabin Yang 提交于 5月 20, 2022

723c4ae7

[Hackathon No.5] tril_indices OP (#41639) · 75db5b86

由 xiaoguoguo626807 提交于 5月 20, 2022

* add tril_indices cpu kernal

* modify tril_indice cpu op

* modify bug

* modify bug

* add tril_indices python api

* add tril_indices python api

* resolve conflict

* add tril_indices test

* modify details

* add tril_indices.cu

* pythonapi pass

* save tril_indices

* CPU tril_indices pass

* delete vlog

* modify test_tril_indices_op.py

* delete tril_indices_kernel.cc.swp

* delete tril_indice.cu

* modify code style

* add newline in creation.py

* modify creation.py linux newline

* delete annotation

* check code style

* check .py style add final_state??

* modify code style

* add gpu_tril_indices

* modify gpu_compiled_juage

* modify gpu judge

* code style

* add test example

* modify english document

modify english document

modify english document

modify document

modify document

* modify pram name

* modify pram name

* modify pram

* reduce test ex

75db5b86

Z
fix Wtype-limits (#42676) · 1f76eabf
由 zhaocaibei123 提交于 5月 20, 2022
```
* fix Wtype-limits

* fix

* remove -Wno-error=type-limits
```
1f76eabf
Y

add dymf accessor support (#42881) · 56a8b3e3
由 yaoxuefeng 提交于 5月 20, 2022

56a8b3e3
Z

add arg_max tensorrt converter, fix identity_scale_op_clean_pass (#42850) · 5efc4146
由 zhupengyang 提交于 5月 20, 2022

5efc4146
Z

[MLU]support to spawn processes on mlu (#41787) · 5d1bbecb
由 zn 提交于 5月 20, 2022

5d1bbecb
Y
merge dymf branch (#42714) · 3f619290
由 yaoxuefeng 提交于 5月 20, 2022
```
merge dymf branch
```
3f619290

19 5月, 2022 11 次提交

Q

[MLU] add lookup_table_v2 and unstack op (#42847) · e726960a
由 qipengh 提交于 5月 19, 2022

e726960a
R
Fix PD_INFER_DECL redefine (#42731) · 313f5d01
由 Rui Li 提交于 5月 19, 2022
```
Signed-off-by: NKernelErr <me@lirui.tech>
```
313f5d01

OneDNN md-in-tensor refactoring part 3: Changes in quantize and dequantize (#42766) · b522ca52

由 jakpiase 提交于 5月 19, 2022

* added md support inside (de)quantizes

* added missing file

* changed paddle enforce text

* another paddle enforce change

* same as before

* removed broken tests

b522ca52

【CI】run all demo ci before exit in windows (#42700) · 6d0e4e4a

由 Sing_chan 提交于 5月 19, 2022

* run all demo ci before exit;test=document_fix;test=windows_ci_inference

* fix bug;test=document_fix;test=windows_ci_inference

* improve log

* commetn test code

* modify according to zhouwei's comments

6d0e4e4a

[Phi] Change the output format of C++ backward api (Part2) (#42545) · 4427f1b1

由 zyfncg 提交于 5月 19, 2022

* change the output format of C++ backward api

* fix merge conflict

* fix sparse api code auto-gen

* fix eager_gen bug

* fix bug of output is null

* fix bug of conv2d_grad_impl

* fix optional grad

* fix bug of eager-gen double_grad

* fix bug

* fix multiply_double_grad bug

* fix bug of higher order derivative

* fix bug of FillZeroForEmptyGradInput

* remove redundant vector in grad_node

* fix bug of test_deformable_conv_v1_op

* fix bug of test_deformable_conv_v1_op

* some refacotr

4427f1b1

A

[NPU] minor changes for version control to support version without suffix (#42856) · 892f6850
由 Aganlengzi 提交于 5月 19, 2022

892f6850
D

【GPUPS】add ctr_dymf_accessor for pscore (#42827) · 148582fe
由 danleifeng 提交于 5月 19, 2022

148582fe
Z
[Phi] Remove shared_storage (#42821) · 7a171e3c
由 zyfncg 提交于 5月 19, 2022
```
* remove shared_storage

* fix bug

* fix rnn bug
```
7a171e3c
C
[CompileOpt] Refine enforce code and remove boost/variant include (#41093) · ca359fec
由 Chen Weihang 提交于 5月 19, 2022
```
* refine enforce code

* refine enforce code

* fix compile failed

* fix infrt failed
```
ca359fec

distribute label evenly among partitions in graph engine (#42846) · 68babef1

由 seemingwang 提交于 5月 19, 2022

* enable graph-engine to return all id

* change vector's dimension

* change vector's dimension

* enlarge returned ids dimensions

* add actual_val

* change vlog

* fix bug

* bug fix

* bug fix

* fix display test

* singleton of gpu_graph_wrapper

* change sample result's structure to fit training

* recover sample code

* fix

* secondary sample

* add graph partition

* fix pybind

* optimize buffer allocation

* fix node transfer problem

* remove log

* support 32G+ graph on single gpu

* remove logs

* fix

* fix

* fix cpu query

* display info

* remove log

* remove empyt file

* distribute labeled data evenly in graph engine
Co-authored-by: NDesmonDay <908660116@qq.com>

68babef1

[TensorRT] Support yolov5s (#42688) · a7778930

由 shentanyue 提交于 5月 19, 2022

* support yolov5s static/int8

* fix eltwise_sub and div weight compute

* fix delete_fill_constant_pass

a7778930

18 5月, 2022 7 次提交
- C
  
  fix tensorrt dla int8 problem (#42826) · a51817d7
  由 csy0225 提交于 5月 18, 2022
  
  a51817d7
- F
  Add Code Generation for operators, op makers and argument mapping functions (#41772) · e339d3c1
  由 Feiyu Chan 提交于 5月 18, 2022
```
Add Code Generation for operators,  op makers and argument mapping functions (#41772)
```
  e339d3c1
- W
  [Eager] Polish eager code generation (#42822) · b9342a80
  由 Weilong Wu 提交于 5月 18, 2022
```
* [Eager] Polish eager code generation

* Remove useless code in codegen
```
  b9342a80
- S
  matmul and matmul_v2 refactor (#42732) · 570d0322
  由 Sławomir Siwek 提交于 5月 18, 2022
```
* matmul refactor

* remove UT which only check ENFORCE output

* code format

* improve memory usage
```
  570d0322
- A
  [NPU] add take_along_axis and take_along_axis_grad kernels (#42773) · 6f0a28f5
  由 Aganlengzi 提交于 5月 18, 2022
```
* [NPU] add take_along_axis and take_along_axis_grad ops

* [NPU] add take_along_axis and take_along_axis_grad ops

* fix ut because cpu kernel can not be fallbacked
```
  6f0a28f5
- Y
  
  [collective] dynamic shape for send_v2 and recv_v2 (#42765) · 1f64c42e
  由 Yuang Liu 提交于 5月 18, 2022
  
  1f64c42e
- T
  Fix graph hang (#42768) · 133d63fa
  由 Thunderbrook 提交于 5月 18, 2022
```
* fix device_free

* fix hang
```
  133d63fa
17 5月, 2022 8 次提交
- C
  
  polish kernel type str (#42791) · d3686376
  由 Chen Weihang 提交于 5月 17, 2022
  
  d3686376
- A
  [NPU] add multinomial op (#42613) · fd140696
  由 Aganlengzi 提交于 5月 17, 2022
```
* [NPU] add multinomial op

* fix place

* deal with cann version

* fix for old operator

* change another way
```
  fd140696
- Z
  
  add yolo_box_fuse_pass, yolo_box_head_op, yolo_box_post_op (#42641) · 6b58de95
  由 zhupengyang 提交于 5月 17, 2022
  
  6b58de95
- C
  [Eager] Add nan and inf check utils (#42763) · a51c492c
  由 Chen Weihang 提交于 5月 17, 2022
```
* add nan_inf_utils for eager

* support check nan and inf

* add unittest for coverage
```
  a51c492c
- S
  
  refine cpu query (#42803) · 9b15efce
  由 Siming Dai 提交于 5月 17, 2022
  
  9b15efce
- A
  [IPU] rm updateOptimizerFromHost for eval mode (#42800) · b2d8f6df
  由 Allen Guo 提交于 5月 17, 2022
```
* rm updateOptimizerFromHost for eval mode (#742)

* rm updateOptimizerFromHost for eval mode

* fix ci

* clean files
```
  b2d8f6df
- C
  [Eager] Adapt faster tokenizer op (#42718) · b189e83f
  由 Chen Weihang 提交于 5月 17, 2022
```
* adapt faster tokenizer op

* add eager test

* add unittest
```
  b189e83f
- A
  
  [NPU] add reduce_max_grad op (#42672) · 78d5cf7b
  由 Aganlengzi 提交于 5月 17, 2022
  
  78d5cf7b
16 5月, 2022 2 次提交
- T
  Enable bfloat16 for VIT-OCR model. (#42758) · c714926d
  由 Tomasz Socha 提交于 5月 16, 2022
```
* Clean-up bfloat16 tester

* New blacklist mechanizm for dequantization

* Style

* Style II

* Style III
```
  c714926d
- N
  
  delete rank switch in broadcast_function.h for compile (#42645) · 8501fb00
  由 niuliling123 提交于 5月 16, 2022
  
  8501fb00

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致