提交 · 921c0917a37b6d5012f6290b6c061a1266d10a22 · Crayon鑫 / Paddle

21 10月, 2021 5 次提交

Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#36373) · 921c0917

由 niuliling123 提交于 10月 21, 2021

* Update the implement of reduceAnyKernel according to kernel primitive api
* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1

921c0917

S

Graph engine4 (#36587) · 5eb640c6
由 seemingwang 提交于 10月 21, 2021

5eb640c6
Z
add ctr table depends (#36465) · d64f7b3b
由 zhaocaibei123 提交于 10月 21, 2021
```
* add ctr table depends

* code style

* fix

* fix

* fix naming

* rename

* rename
```
d64f7b3b

Fix flame graph (#36578) · 72533986

由 liutiexing 提交于 10月 21, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* adjust multithread using, fix flame graph

* update

72533986

A
Support No DataTransform From GetKernelTypeForVar (#36571) · e82c3a5f
由 Aurelius84 提交于 10月 21, 2021
```
* Add kQueueSync.synchronize_run_ logic

* Support No DataTransform From GetKernelTypeForVar
```
e82c3a5f

20 10月, 2021 13 次提交

[heterps]fix heterps pipeline training (#36512) · ded3e705

由 danleifeng 提交于 10月 20, 2021

* split into PreBuildTask and BuildPull; slove endpass bug;test=develop

* change buildcpu into prebuild and buildcpu into build;test=develop

ded3e705

李
Fix global gather and global scatter operators (#36517) · 17b4dd70
由李季提交于 10月 20, 2021
```
* fix global gather and global scatter operators
```
17b4dd70
R

[NPU] Add kldiv_loss_op for npu (#36494) · 6a572a19
由 ronnywang 提交于 10月 20, 2021

6a572a19
W

fix fc fuse proble (#36568) · fc5db55a
由 Wilber 提交于 10月 20, 2021

fc5db55a

Add FasterTokenizer Operator (#34491) · 3f2d6a3f

由 Steffy-zxf 提交于 10月 20, 2021

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

3f2d6a3f

W

adapt to cann5.0.3_alpha3. (#36106) · 873ee4e3
由 wuhuachaocoding 提交于 10月 20, 2021

873ee4e3
Z

fix pow2 decay (#36559) · 605e7f08
由 Zeng Jinle 提交于 10月 20, 2021

605e7f08
W

add unittest (#36371) · 7325c9fb
由 Wilber 提交于 10月 20, 2021

7325c9fb
W

update for trt convert ut. (#36549) · 06bd348d
由 Wilber 提交于 10月 20, 2021

06bd348d

fix SerializeSelectedRows (#36543) · 8ca5206b

由 zmx 提交于 10月 20, 2021

* bug fix for  DeserializeSelectedRows. test=develop

* fix bug for SerializeSelectedRows. test=develop

* update. test=develop

8ca5206b

Add CINN Compile Option (#36292) · 6524fa8d

由 Huihuang Zheng 提交于 10月 20, 2021

Add CINN compile option in CMake.

Now you can use CINN in Paddle by `-DWITH_CINN=ON` when `cmake`

To test it, you can run `make cinn_lib_test -j` and `ctest -R cinn_lib_test`. 

Note:
1. You should set
```
export runtime_include_dir=${CINN_SOURCE_DIR}/cinn/runtime/cuda 
```
When run test, the `${CINN_SOURCE_DIR}` should be set based on your CINN directory.

2. CINN is under developing now, you may have to change `CINN_GIT_TAG` to the git commit you need.

6524fa8d

W
fix (#36557) · 4bd19770
由 wenbin 提交于 10月 20, 2021
```
* fix

* remove const
```
4bd19770
A

Add kQueueSync.synchronize_run_ logic (#36546) · 127488ba
由 Aurelius84 提交于 10月 20, 2021

127488ba

19 10月, 2021 13 次提交
- W
  Support elementwise_add triple grad Kernel (#36508) · 51c97d9f
  由 Weilong Wu 提交于 10月 19, 2021
```
* Support elementwise_add triple grad Kernel

* Change code-format to follow CI std
```
  51c97d9f
- Z
  [NPU] Add iou_similarity op (#36412) · 999242e3
  由 zhulei 提交于 10月 19, 2021
```
* [NPU] Add iou_similarity op

* [NPU] Add iou_similarity op

* [NPU] Add iou_similarity op
```
  999242e3
- Q
  [NPU] update inference cmake, test=develop (#36505) · 49d7bd38
  由 Qi Li 提交于 10月 19, 2021
```
* [NPU] update inference cmake, test=develop

* address review comments, test=develop

* fix compile error when WITH_ASCEND_CXX11 ON, test=develop
```
  49d7bd38
- D
  
  [heterps]edit shrink and unseenday logit for pslib (#36194) · 9e494472
  由 danleifeng 提交于 10月 19, 2021
  
  9e494472
- W
  Inference add type check in copy_from_cpu (#36429) · be6a8330
  由 Wilber 提交于 10月 19, 2021
```
* update

* fix ut error

* update ut
```
  be6a8330
- J
  Optimize the subgraph generated by BuildCinnPass (#36503) · 6cdc5a4b
  由 jiangcheng 提交于 10月 19, 2021
```
* add feed op and new var for the generated subgraph

* perfect the test script of build_cinn_pass 

* remove useless clear and perfect some annotation
```
  6cdc5a4b
- W
  add nearest_interp_v2 trt plugin (#34126) · 7b67f398
  由 wangxinxin08 提交于 10月 19, 2021
```
* add nearest_interp_v2 trt plugin
```
  7b67f398
- W
  
  [hybrid] static model parallel dropout support deterministic RandomSeedGenerator (#36228) · 8cc8e411
  由 WangXi 提交于 10月 19, 2021
  
  8cc8e411
- fix replicate pad when input size is 0 (#36510) · d89a759b
  由 littletomatodonkey 提交于 10月 19, 2021
```
* fix replicate pad when input size is 0
* add unit test
```
  d89a759b
- Y
  [paddle.linalg.qr] Add the Qr Operator (#35742) · 34d785c2
  由 Yulong Ao 提交于 10月 19, 2021
```
* Add QR decomposition op

* Change codes to adapt to new svd_helper

* Update linalg.py

Restore the deleted comma

* Restore the deleted line

* Update linalg.py

* Update linalg.py

* Improve the qr code by reviews

* Update QR based on CI results

* Update qr doc, test=document_fix

* Change unsafe and ill-formed codes
```
  34d785c2
- X
  
  add rocm support for fft api (#36415) · 1d5746bd
  由 Xiaoxu Chen 提交于 10月 19, 2021
  
  1d5746bd
- Z
  
  bug fix for DeserializeSelectedRows. test=develop (#36520) · a7830a29
  由 zmx 提交于 10月 19, 2021
  
  a7830a29
- Z
  Add pow2_decay_with_linear_warmup op (#36421) · 305b99a0
  由 Zeng Jinle 提交于 10月 19, 2021
```
* add pow2_warmup op

* remove contrib __all__

* add AttrT

* rename

* follow comments

* fix duplicate PADDLE_RESTRICT
```
  305b99a0
18 10月, 2021 8 次提交

Added softplus FP32 FWD OneDNN kernel (#36382) · bdac9ff6

由 jakpiase 提交于 10月 18, 2021

* added softplus

* refactored softplus op

* deleted unnecessary file

* added missing file

* added formatting

* disabled tests if GPU is used

* added reviewer suggestion

* unified softplus kernel

bdac9ff6

Add quant axis (#36467) · b7f76647

由 xiaoxiaohehe001 提交于 10月 18, 2021

* add_quant_axis

* add_quant_axis

* --amend

* Update quant_conv2d_dequant_fuse_pass.cc

b7f76647

Q

[NPU] add kernels for elementwise_add gather_nd tile, test=develop (#36464) · cbd15f7d
由 Qi Li 提交于 10月 18, 2021

cbd15f7d
Q

[NPU] fix dtype for arg_max, test=develop (#36457) · 8757fc5b
由 Qi Li 提交于 10月 18, 2021

8757fc5b

Add operators for async read & async write (#36333) · 3845afff

由 Siming Dai 提交于 10月 18, 2021

* fix async_read bug

* change index place to cpu

* add tensor size judge

* add async_read & async_write test

* fix bug in async_write

* fix mac py3 ci

* fix bug for cpu version paddle

* fix windows ci bug

* change input argument error type

* change const_cast to mutable_data

* add async_write out-of-bound check and consumate error hint

* fix a small bug for dst_tensor

* add docs and refine codes

* refine docs

* notest,test=windows_ci

* fix windows ci

* fix require

* fix code-block

* add core.is_compiled_with_cuda()

3845afff

W

add IPluginV2Layer: AddPluginV2Ext (#36493) · 623e36b0
由 Wangzheee 提交于 10月 18, 2021

623e36b0
T
[XPU AMP] 1. xpu support gradient acc 2. xpu support create tensor in dygraph... · d19a9b39
由 taixiurong 提交于 10月 18, 2021
```
[XPU AMP] 1. xpu support gradient acc 2. xpu support create tensor in dygraph 3. xpu support update weight params in amp (#36439)
```
d19a9b39
J

Fix conv2d op_teller error (#36474) · d3c93942
由 JingZhuangzhuang 提交于 10月 17, 2021

d3c93942

17 10月, 2021 1 次提交
- Z
  
  refine rescale_grad (#36490) · 4e036fa1
  由 Zeng Jinle 提交于 10月 17, 2021
  
  4e036fa1

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致