提交 · 2136bd42910d759f54dec111779dd3f1d2218db6 · 机器未来 / Paddle

24 2月, 2022 3 次提交
- N
  
  Fix a bug in IndexKernel out-of-memory (#39867) · 2136bd42
  由 niuliling123 提交于 2月 24, 2022
  
  2136bd42
- L
  optimize performance of lookup_table_v2_op (#39856) · d6038c22
  由 Li Min 提交于 2月 24, 2022
```
* optimize block config  and fp16 atomicAdd perf for lookup_table_v2_grad.
```
  d6038c22
- C
  [PHi] Skip kernel declare for cuda only kernel on rocm (#39869) · 76a6b88d
  由 Chen Weihang 提交于 2月 24, 2022
```
* skip kernel declare for cuda only kernel on rocm

* fix error
```
  76a6b88d
23 2月, 2022 26 次提交

J

added paddle_bfloat to requirements (#39740) · 2457a7d1
由 jakpiase 提交于 2月 23, 2022

2457a7d1
S
Add ProcessGroupNCCL for distributed training (#39737) · 0b205817
由 ShenLiang 提交于 2月 23, 2022
```
* add processgroup_nccl
```
0b205817
石
infrt runtime supports phi, test=develop (#39836) · 058e1d85
由石晓伟提交于 2月 23, 2022
```
* runtime supports pten kernels, test=develop

* fixes a bug, test=develop
```
058e1d85
Z

Support dispensable inputs for eager final state codegen (#39743) · ca11a0e5
由 Zhanlue Yang 提交于 2月 23, 2022

ca11a0e5
C

move array_ref_test and small_vector_test into paddle/utils and format header macro define (#39831) · 96d530c1
由 chentianyu03 提交于 2月 23, 2022

96d530c1
S
move trunc_op's infere shape to phi (#39772) · 95280a36
由 Sing_chan 提交于 2月 23, 2022
```
* move trunc_op's infere shape

* modify according to risheng's comment
```
95280a36
L
[phi] move randperm to phi (#39816) · 30992ea0
由 Leo Chen 提交于 2月 23, 2022
```
* move randperm to phi

* fix npu

* fix memory::Copy
```
30992ea0
Y

[Phi] move flip op to phi kernel (#39822) · ad294a81
由 Yang 提交于 2月 23, 2022

ad294a81
C
[Phi] Polish default signature attr and output select impl (#39810) · 64ed92bd
由 Chen Weihang 提交于 2月 23, 2022
```
* polish default sig impl

* revert dispenable out
```
64ed92bd
[MLU] add cncl parallel context and mlu resource pool (#39803) · 6241913b
由 mhhhh1 提交于 2月 23, 2022
```
* [MLU] add cncl parallel context and mlu resource pool

* [MLU] fix the cncl_context_test
```
6241913b
change CUDA implementaion of bernoulli OP (#39732) · b9675acc
由 zhouweiwei2014 提交于 2月 23, 2022
```
* change CUDA implementaion of bernoulli OP

* fix CI
```
b9675acc
Z
refactor range unittest for kunlun (#39800) · 69a04209
由 zhangxiaoci 提交于 2月 23, 2022
```
*test=kunlun
```
69a04209
R

[phi] migrate atan2_op into phi (#39806) · b089e7cd
由 ronnywang 提交于 2月 23, 2022

b089e7cd

[phi] move unbind to phi (#39789) · dba694f4

由 Leo Chen 提交于 2月 23, 2022

* move unbind to phi

* revert infer shape

* add header file

* move concat_and_split to phi

dba694f4

[KP] Add elementwise add xpu after phi, test=develop (#39787) · 1a1a2ce8

由 Liu-xiandong 提交于 2月 23, 2022

* [KP] Add elementwise add xpu, test=develop

* modify the File Permissions

* modify the copyright time

* modify code style

* modify code style

1a1a2ce8

A
[Phi] Migrate lable_smooth_op into Phi (#39796) · b7bcd0f6
由 Aurelius84 提交于 2月 23, 2022
```
* [Phi] Migrate lable_smooth_op into Phi

* fix PT->PD
```
b7bcd0f6
A
[IPU] update inference demos (#39792) · 24f55aed
由 Allen Guo 提交于 2月 23, 2022
```
* update inference part

* restore white space
```
24f55aed
B
update gather_nd trt converter ut (#39584) · 4130b640
由 baoachun 提交于 2月 23, 2022
```
* update gather_nd trt converter ut

* update ut
```
4130b640
T

refactoring gather/masked_select/arg_max unittests for kunlun, *test=kunlun (#39711) · da492a13
由 TTerror 提交于 2月 23, 2022

da492a13
L
fix 'is with a literal' warning (#39798) · 22abb6b3
由 Leo Chen 提交于 2月 23, 2022
```
* fix 'is with a literal'

* fix typo
```
22abb6b3
H

fix activation ut typo xpu. test=kunlun (#39813) · 9880595a
由 houj04 提交于 2月 23, 2022

9880595a

[Eager] Support Eager mode for some model testcase (#39248) · abe232d8

由 wanghuancoder 提交于 2月 23, 2022

* eager, test=develop

* fix bug, test=develop

* eager, test=develop

* merge legacy to fluid

* eager, test=develop

* eager, test=develop

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* eager, test=develop

* eager, test=develop

* Use overload instead of template

* Remove legacy code

* Remove legacy code

* selectedrows, test=develop

* Remove DataType test

* eager, test=develop

* eager, test=develop

* support gan, test=develop

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* ptb, test=develop

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* eager, test=develop

* eager, test=develop

* eager, test=develop

* eager, test=develop

* add more test

* eager, test=develop

* Support copiable selected rows and merge develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* clear grad, test=develop

* merge, develop

* merge, develop
Co-authored-by: NJiabinYang <360788950@qq.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>

abe232d8

[bf16] add bf16 kernel: elementwise_div (#39602) · ca4df333

由 zhangbo9674 提交于 2月 23, 2022

* add elementwise_div

* refine rocm

* refine code

* refine op register

* solve conflict

* refine unittest

* refine unittest precision

* add rocm

ca4df333

Update record interface using part3 (#39695) · 1fcaab45

由 chenjian 提交于 2月 23, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update record event interface using

* update record event interface using

* update operator.cc

* update part2

* update part1

* update part3

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

* fix profiler.h header

* fix profiler.h header

* fix merge buf

* update

* fix bug

* fix bug

1fcaab45

Z
Supported intermediate outputs for eager final state codegen (#39767) · 94243789
由 Zhanlue Yang 提交于 2月 23, 2022
```
* Supported intermediate outputs for eager final state codegen

* Added validation check for intermediate tensors
```
94243789

[PHI] Remove fill_any_like kernel register in fluid (#39807) · 69e9e9d5

由 zyfncg 提交于 2月 23, 2022

* remove fill_any_like kernel in fluid and fix data transform bug

* support scalar in infershpe

* recover infershape in fill_and_like

69e9e9d5

22 2月, 2022 11 次提交

A
[custom kernel]Delete useless and upgrade (#39791) · edc3ba13
由 Aganlengzi 提交于 2月 22, 2022
```
* [custom kernel]Delete useless

* change RegType enum names

* mod notes

* merge

* update
```
edc3ba13
C

import llvm::ArrayRef and add test (#39802) · a167a143
由 chentianyu03 提交于 2月 22, 2022

a167a143
Z

unset fluid in tensor (#35082) · 42eb56e2
由 zhiboniu 提交于 2月 22, 2022

42eb56e2

Auto Parallel support conditional block (#39612) · a08ee62a

由 JZ-LIANG 提交于 2月 22, 2022

* add subblock logic for context and partitioner

* partitioner support sub blocks

* revise typos

* fixed param init bug for while

* chmod 644

* add unitest

* mv forward parser

* update unitest

* update dist op ctx

* update dist op ctx

* fixed bug in dist op ctx

* fixed bug for recompute subblock

a08ee62a

Y

disable some distribute test case when in CPU test env (#39801) · ae8c811a
由 YUNSHEN XIE 提交于 2月 22, 2022

ae8c811a

Move real and imag op to phi (#39777) · 345cc8fa

由 From00 提交于 2月 22, 2022

* Move Real OP to phi

* Move Imag OP to phi

* Move Real and Imag InferShape to phi

* Move Real and Imag to complex_kernel

* Change PT_REGISTER_XXX to PD_REGISTER_XXX

345cc8fa

J

added round fwd onednn kernel (#39653) · 74c0bc1c
由 jakpiase 提交于 2月 22, 2022

74c0bc1c
L
Add the implementation of TCP Store (#39384) · b95cd3b7
由 lilong12 提交于 2月 22, 2022
```
* add tcp_socket and tcp_store
```
b95cd3b7
F
delete gather_ut skip_case (#39657) · da43e065
由 feng_shuai 提交于 2月 22, 2022
```
* delete gather_ut skip_case

* add trt version limit
```
da43e065

Adapt to batch_norm_grad op and add align function in roi_align op for kunlun (#39685) · f33ae206

由 Leo Guo 提交于 2月 22, 2022

* Adapt to batch_norm_grad op and add align function in
roi_align op for kunlun, *test=kunlun

* Adapt to batch_norm, batch_norm_grad op api for kunlun, and add unit-tests of batch_norm, roi_align. *test=kunlun

f33ae206

change Vector to std::vector and provide MixVector class as a helper … (#39559) · 728c0624

由 xiongkun 提交于 2月 22, 2022

* change Vector to std::vector and provide MixVector class as a helper wrapper class

* solve the multi-gpu hang problem

* remove the duplicate template instantialize

* Copy vector to cpu

* add CopyToCPU

* xxx

* final version: fix the problem of all reduce

* remove mixvector dependence

* fix

* merge

* fix code

* fix by CI

728c0624

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致