提交 · f33ae2060320fe68a1aa0465de503bc882febc8c · BaiXuePrincess / Paddle

22 2月, 2022 21 次提交
- L
  Adapt to batch_norm_grad op and add align function in roi_align op for kunlun (#39685) · f33ae206
  由 Leo Guo 提交于 2月 22, 2022
```
* Adapt to batch_norm_grad op and add align function in
roi_align op for kunlun, *test=kunlun

* Adapt to batch_norm, batch_norm_grad op api for kunlun, and add unit-tests of batch_norm, roi_align. *test=kunlun
```
  f33ae206
- X
  change Vector to std::vector and provide MixVector class as a helper … (#39559) · 728c0624
  由 xiongkun 提交于 2月 22, 2022
```
* change Vector to std::vector and provide MixVector class as a helper wrapper class

* solve the multi-gpu hang problem

* remove the duplicate template instantialize

* Copy vector to cpu

* add CopyToCPU

* xxx

* final version: fix the problem of all reduce

* remove mixvector dependence

* fix

* merge

* fix code

* fix by CI
```
  728c0624
- W
  fix bug in new the_one_ps (#39505) · d56a0a1b
  由 wangguanqun 提交于 2月 22, 2022
```
* fix benchmark and communicator config

* fix bugs of the_one_ps

* multi program and fix bug in optimizer

* multi program in the_one_ps

* public commcontext
```
  d56a0a1b
- 王
  
  add pten convert pass.test=develop (#39664) · a6abb6e7
  由王明冬提交于 2月 22, 2022
  
  a6abb6e7
- A
  [Phi] Migrate unfold_op into phi (#39778) · 1aa67778
  由 Aurelius84 提交于 2月 22, 2022
```
* [Phi] Migrate unfold_op into phi

* fix im2col CPUContext template instantial

* fix unfold_op.h header include problem

* fix unittest

* fix PT->PD
```
  1aa67778
- R
  
  [CustomRuntime] fix CustomDeviceContext (#39766) · 60fc555e
  由 ronnywang 提交于 2月 22, 2022
  
  60fc555e
- L
  Update profiler (#39779) · c5d15655
  由 liutiexing 提交于 2月 22, 2022
```
* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add log for Executor

* update the profiler
Co-authored-by: Nliutiexing <liutiexing@google.com>
```
  c5d15655
- T
  
  build_cinn_pass: fix bug because of output control var (#39782) · 62ae5f62
  由 TeFeng Chen 提交于 2月 22, 2022
  
  62ae5f62
- H
  
  update unittests for nearest_interp_v2_op_xpu: 'sync' from gpu. test=kunlun (#39768) · e89bf25b
  由 houj04 提交于 2月 22, 2022
  
  e89bf25b
- W
  [Paddle-Inference] fix pass and convert_op for preln_ernie (#39733) · 574f3402
  由 Wangzheee 提交于 2月 22, 2022
```
* fix pass and convert_op for preln_ernie and add preln_ernie'flag in pass
```
  574f3402
- Z
  [GPUPS]Config fleet optimize 2 (#39783) · 0efa64c8
  由 zmxdream 提交于 2月 22, 2022
```
* update. test=develop

* update. test=develop

* fix. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop
```
  0efa64c8
- Z
  Modify the implementation of BlockXReduce to fit more scenes (#39554) · 85a11c47
  由 Zhang Zheng 提交于 2月 22, 2022
```
* Modify the implementation of BlockYReduce to fit more scenes

* fix

* fix
```
  85a11c47
- Y
  
  [phi] add dtype fetcher for scalar (#39775) · 7fa29a6b
  由 Yuang Liu 提交于 2月 22, 2022
  
  7fa29a6b
- Z
  Support NoNeedBuffer for final state codegen (#39628) · 911cb2ea
  由 Zhanlue Yang 提交于 2月 22, 2022
```
* Support NoNeedBuffer for final state codegen

* Replaced pten with phi
```
  911cb2ea
- Z
  
  add hard_swish in xpu2_op_list.h and update xpu.cmake,test=kunlun (#39586) · 8d1d0bdf
  由 zhangyikun02 提交于 2月 22, 2022
  
  8d1d0bdf
- A
  
  sync recent changes (#39763) · d945e24c
  由 Allen Guo 提交于 2月 22, 2022
  
  d945e24c
- L
  
  make enable_program_desc_tracing_ thread_local (#39776) · ec21bf98
  由 Leo Chen 提交于 2月 22, 2022
  
  ec21bf98
- N
  Modified RandomKernel with Kernel Primitive API (#39666) · 9f94821b
  由 niuliling123 提交于 2月 22, 2022
```
* Modified RandomKernel with Kernel Primitive API

* update pten.h to phi.h

* update

* update fullKernel
```
  9f94821b
- N
  Add Sort API for Kernel Primitive API (#39734) · f4e74887
  由 niuliling123 提交于 2月 22, 2022
```
* Add Sort API for Kernel Primitive API

* update & -> ptr
```
  f4e74887
- C
  [PTen->Phi PR2] Rename PT_REGISTER macro to PD_REGISTER (#39790) · 4a338796
  由 Chen Weihang 提交于 2月 22, 2022
```
* unify register macro

* rename declare macro

* fix infrt error
```
  4a338796
- Y
  
  [fleet exe] supprot fp16 feed and fetch on cpp side (#39758) · 73bf9673
  由 Yuang Liu 提交于 2月 22, 2022
  
  73bf9673
21 2月, 2022 14 次提交

[PluggableDevice]custom kernel to phi core structs (#39690) · 68631ed4

由 Aganlengzi 提交于 2月 21, 2022

* [PluggableDevice]custom kernel to pten core structs

* mod extension.h for custom op

* compatible python for CI

* support custom context

* refactor to pten

* fix windows and ut

68631ed4

F
Move Abs InferShape to phi (#39762) · 9c51eee1
由 From00 提交于 2月 21, 2022
```
* Move Abs InferShaper to phi

* Fix CI error
```
9c51eee1
A
[Pten] Migrate huber_loss into phi (#39761) · 6aafb2fa
由 Aurelius84 提交于 2月 21, 2022
```
* migrate huber_loss into phi

* migrate infershape

* modify pten into phi
```
6aafb2fa
0
[Dy2St]Fix cond grad error when handle tensor array (#39689) · a863b32e
由 0x45f 提交于 2月 21, 2022
```
* fix cond grad error when handle tensor array

* add UT
```
a863b32e
S
gpu ps graph engine (#39699) · 05982c10
由 seemingwang 提交于 2月 21, 2022
```
* gpu ps graph engine

* remove logs
```
05982c10

[pten]rm reduce_sum and reduce_mean raw kernel (#39484) · 2bb5aae8

由 chentianyu03 提交于 2月 21, 2022

* rm reduce_sum raw kernel

* remove reduce_mean kernel

* remove reduce_mean kernel

* reduce support int and int64_t

* mean support int and int64_t type

2bb5aae8

fix fill_constant bug, *test=kunlun (#39681) · b1805727
由 z8hanghuan 提交于 2月 21, 2022
```
* fix fill_constant bug, *test=kunlun

* fix fill_constant bug,*test=kunlun
```
b1805727

[bf16] add bf16 kernel: elementwise_max (#39461) · 93016331

由 zhangbo9674 提交于 2月 21, 2022

* add elementwise_max & unittest

* refine cuda register and unittest

* refine unittest

* refine uinttest for bf16

* refine optest

* refine code

* refine unittest

* refine unittest

93016331

Update record interface using part2 (#39694) · c984cd85

由 chenjian 提交于 2月 21, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update record event interface using

* update operator.cc

* update part2

* update part1

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

* fix profiler.h header

c984cd85

[PTen]Remove infershape of Reshape OP (#39631) · 45dd4a5f

由 YuanRisheng 提交于 2月 21, 2022

* remove infershape and Xshape

* add xshape

* fix bugs when run ci

* fix bugs when run ci

* fix bugs when run infrt test

* pass converage

45dd4a5f

[HeterPS]fix ut for heteps comm op (#39684) · d41836ef

由 zmxdream 提交于 2月 21, 2022

* fix. test=develop

* fix. test=develop

* fix code style. test=develop

* fix. test=develop

* fix. test=develop

d41836ef

S

fix alignment bug (#39747) · 65ced1fa
由 sneaxiy 提交于 2月 21, 2022

65ced1fa
S

fix bug: core when missing range XPU kernel in kunlun2 (#39673) · 496aadfb
由 ShiningZhang 提交于 2月 21, 2022

496aadfb
Z
[Pten] Add copy_to wrapped infermeta (#39703) · e16ab42b
由 zyfncg 提交于 2月 21, 2022
```
* add copy_to wrapped infermeta

* test=allcases

* test=allcases

* test=allcases
```
e16ab42b

20 2月, 2022 4 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

add index initialization in the block loop for index_sample kernel when... · c6950ab2

由 FlyingQianMM 提交于 2月 20, 2022

add index initialization in the block loop for index_sample kernel when dealing with a input tensor whose shape is larger than block_dim * grid_dim (#39736)

* add block and grid loop for index_sample kernel to deal with a large-shape tensor

* fix code format

* limit grid dim

* fix the omissive initialization of index_i in the second cycle for index_sample kernel

* fix conflicts

c6950ab2

Y

Rename the general elementwise and broadcast functions. (#39623) · 553afc07
由 Yiqun Liu 提交于 2月 20, 2022

553afc07
S
Add int16 support for several ops (#39636) · 267275d9
由 sneaxiy 提交于 2月 20, 2022
```
* add more op int16 support

* fix xpu ci
```
267275d9

19 2月, 2022 1 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致