提交 · 2bb5aae8fd6a521183565ade69dfc72ec8a83ce0 · 机器未来 / Paddle

21 2月, 2022 12 次提交

[pten]rm reduce_sum and reduce_mean raw kernel (#39484) · 2bb5aae8

由 chentianyu03 提交于 2月 21, 2022

* rm reduce_sum raw kernel

* remove reduce_mean kernel

* remove reduce_mean kernel

* reduce support int and int64_t

* mean support int and int64_t type

2bb5aae8

disable some distribute test case when in CPU test env (#39682) · 941bdb41

由 wanghuancoder 提交于 2月 21, 2022

* disable some distribute test case when in CPU test env, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

941bdb41

fix fill_constant bug, *test=kunlun (#39681) · b1805727
由 z8hanghuan 提交于 2月 21, 2022
```
* fix fill_constant bug, *test=kunlun

* fix fill_constant bug,*test=kunlun
```
b1805727

[bf16] add bf16 kernel: elementwise_max (#39461) · 93016331

由 zhangbo9674 提交于 2月 21, 2022

* add elementwise_max & unittest

* refine cuda register and unittest

* refine unittest

* refine uinttest for bf16

* refine optest

* refine code

* refine unittest

* refine unittest

93016331

Update record interface using part2 (#39694) · c984cd85

由 chenjian 提交于 2月 21, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update record event interface using

* update operator.cc

* update part2

* update part1

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

* fix profiler.h header

c984cd85

update unittests for activation ops on xpu test=kunlun (#39677) · 83dd7e47

由 houj04 提交于 2月 21, 2022

* update unittests for activation ops on xpu. test=kunlun

* update input data range. test=kunlun

* update input data range. test=kunlun

83dd7e47

[PTen]Remove infershape of Reshape OP (#39631) · 45dd4a5f

由 YuanRisheng 提交于 2月 21, 2022

* remove infershape and Xshape

* add xshape

* fix bugs when run ci

* fix bugs when run ci

* fix bugs when run infrt test

* pass converage

45dd4a5f

update unittest for reduce_prod op on xpu. test=kunlun (#39671) · f858b645

由 houj04 提交于 2月 21, 2022

* update unittest for reduce_prod op on xpu. test=kunlun

* update unittest for reduce_prod op on xpu. test=kunlun

* bugfix: use dtype instead of float32. test=kunlun

f858b645

[HeterPS]fix ut for heteps comm op (#39684) · d41836ef

由 zmxdream 提交于 2月 21, 2022

* fix. test=develop

* fix. test=develop

* fix code style. test=develop

* fix. test=develop

* fix. test=develop

d41836ef

S

fix alignment bug (#39747) · 65ced1fa
由 sneaxiy 提交于 2月 21, 2022

65ced1fa
S

fix bug: core when missing range XPU kernel in kunlun2 (#39673) · 496aadfb
由 ShiningZhang 提交于 2月 21, 2022

496aadfb
Z
[Pten] Add copy_to wrapped infermeta (#39703) · e16ab42b
由 zyfncg 提交于 2月 21, 2022
```
* add copy_to wrapped infermeta

* test=allcases

* test=allcases

* test=allcases
```
e16ab42b

20 2月, 2022 4 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

add index initialization in the block loop for index_sample kernel when... · c6950ab2

由 FlyingQianMM 提交于 2月 20, 2022

add index initialization in the block loop for index_sample kernel when dealing with a input tensor whose shape is larger than block_dim * grid_dim (#39736)

* add block and grid loop for index_sample kernel to deal with a large-shape tensor

* fix code format

* limit grid dim

* fix the omissive initialization of index_i in the second cycle for index_sample kernel

* fix conflicts

c6950ab2

Y

Rename the general elementwise and broadcast functions. (#39623) · 553afc07
由 Yiqun Liu 提交于 2月 20, 2022

553afc07
S
Add int16 support for several ops (#39636) · 267275d9
由 sneaxiy 提交于 2月 20, 2022
```
* add more op int16 support

* fix xpu ci
```
267275d9

19 2月, 2022 9 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

[Pten] Add selected_rows kernel for Full (#39465) · 79f8eeca

由 zyfncg 提交于 2月 19, 2022

* Add selected_rows kernel for full

* remove fill_constant register in fluid

* fix bug without GPU

* add jit_kernel_helper dependency for fc

* do some refactor

* add unittest for ops signatures

* add coverage unittest

* fix merge conflict

* fix full selectew_rows bug

79f8eeca

Update record interface using part1 (#39693) · eec6ef81

由 chenjian 提交于 2月 19, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update operator.cc

* update part1

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

eec6ef81

Z
Enabled test_matmul_v2_op for final state Eager Dygraph (#39504) · 77625d7d
由 Zhanlue Yang 提交于 2月 19, 2022
```
* Enabled test_matmul_v2_op for final state Eager Dygraph

* Fixed minor issue

* Fixed format issue
```
77625d7d
C
[PTen] Support parse cc file in gpu (#39691) · b29c05c7
由 Chen Weihang 提交于 2月 19, 2022
```
* support parse cc in gpu

* change file name
```
b29c05c7

fix RecordEvent interface (#39675) · 019a552b

由 chenjian 提交于 2月 19, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update operator.cc

019a552b

[Pten] Adjust the params of creation kernel for inference (#39573) · 4e5d6743

由 zyfncg 提交于 2月 19, 2022

* remove manual_api

* change sig map of full and empty

* fix fill_any_like_xpu_op

* fix fill_any_like_xpu_op

* fix problem of fill_any_like_xpu_op

* fix conflict

* polish code

4e5d6743

[Eager Hook] Support ReduceHook in GradNodeAccumulation (#39674) · 06b177c0

由 Weilong Wu 提交于 2月 19, 2022

* [Eager] Support GradientHook before running seperate GradNode

* Fix CI issue

* Support eager ReduceHook in accumulation_node

* Fix CI  issue

* Add some tests to fix coverage CI issue

06b177c0

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

18 2月, 2022 15 次提交

Shared selected rows (#39608) · 7fc04070

由 Jiabin Yang 提交于 2月 18, 2022

* merge legacy to fluid

* Remove legacy code

* Remove legacy code

* Remove DataType test

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* add more test

* Support copiable selected rows and merge develop

7fc04070

Z

bug fix (#39630) · bbf31a4e
由 zhaoyingli 提交于 2月 18, 2022

bbf31a4e
F
[Pten] blas and lapck migration (#39587) · 8c7ee8c2
由 Feiyu Chan 提交于 2月 18, 2022
```
* move blas related files
* move lapack related files
```
8c7ee8c2
Z

Fix wrong inputs (#39700) · 1d6fd81d
由 zlsh80826 提交于 2月 18, 2022

1d6fd81d

cinn_instruction_run_op test (#39576) · fdc4fe3b

由 TeFeng Chen 提交于 2月 18, 2022

* add cinn_instruction_run_op test code

* update several interfaces of CinnLaunchContext

* update several interfaces and add detail comments in CinnLaunchContext class

* to skip the bug of error message check

* fix ut test failed due to reliant interface updated

fdc4fe3b

X
[pten] trans diagonal kernel into pten (#39575) · 5c66338f
由 xiongkun 提交于 2月 18, 2022
```
* trans diagonal kernel into pten

* fix by code review
```
5c66338f

[AMP] support GPU BF16 amp for dygraph (#39029) · 7d6d3848

由 zhangbo9674 提交于 2月 18, 2022

* support dtype param for auto_cast

* add amp_dtype for tracer

* add unsupported bf16 list

* support bf16 amp for O2

* refine python interface for bfloat16

* refine code

* refine code

* refine unittest

* refine code

* refine code

* add bf16 o1

* refine code by comment

* add gradient accumulator

* add recompute

7d6d3848

R

[CustomDevice]Improved custom device initialization (#39634) · 7e4ed848
由 ronnywang 提交于 2月 18, 2022

7e4ed848
R

[CustomRuntime] add pten::Backend support (#39606) · d6d0820e
由 ronnywang 提交于 2月 18, 2022

d6d0820e
A
[IPU] Update IpuStrategy (#39644) · 46161679
由 Allen Guo 提交于 2月 18, 2022
```
* Update IpuStrategy

* fix ci

* rerun ci
```
46161679
B
Fix sharding group (#39668) · bc3ca678
由 Baibaifan 提交于 2月 18, 2022
```
* fix_sharding_group

* fix_sharding_group
```
bc3ca678
B

refactor the forward implementation of shape npu op (#39613) · e674af23
由 baoachun 提交于 2月 18, 2022

e674af23
new way of unit test , *test=kunlun (#39650) · c5179772
由 z8hanghuan 提交于 2月 18, 2022
```
* new way of unit test , *test=kunlun

* new way of ut, *test=kunlun
```
c5179772

Infrt registers pten kernels (#39588) · dc39eb18

由 Wilber 提交于 2月 18, 2022

* the mlir representation of pten, test=develop

* fixes an error, test=develop

* infrt registers pten kernels
Co-authored-by: NShixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

dc39eb18

Z
[Pten] Support inplace and intermediate in C++ API (#39651) · 638aab6e
由 zyfncg 提交于 2月 18, 2022
```
* support inplace and intermediate in yaml

* add cmake for dygraph_api
```
638aab6e

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致