提交 · 31f3f643668ac98afe76bcf9e95b752c4b872c29 · PaddlePaddle / Paddle

11 11月, 2022 1 次提交

Generate static graph code for some ops by yaml (part3) (#47803) · 31f3f643

由 zyfncg 提交于 11月 11, 2022

* generate static graph code for some ops by yaml

* remove deleted files

* update cmake

* update cmake

* udpate cmake

31f3f643

09 11月, 2022 1 次提交

[PHI decoupling] Move fluid op generator into fluid (#47714) · f369b2b1

由 Chen Weihang 提交于 11月 09, 2022

* move fluid op generator into fluid

* remove parsed op

* resolve sig undef error

* append python interp find logic

* remove dup code

f369b2b1

20 10月, 2022 1 次提交
- J
  Add infer prune function (#47046) · af9486fc
  由 JingZhuangzhuang 提交于 10月 20, 2022
```
* Add infer prune function

* Update phi.cmake

* Update operators.cmake

* add fusion op
```
  af9486fc
18 10月, 2022 1 次提交

[code-gen] Support code-gen for opmaker of sparse op (#46993) · bdd3dde3

由 zyfncg 提交于 10月 18, 2022

* support generating code of opmaker for backward op invoke forward op

* gsupport code-gen of opmaker for sparse op

* refind logic of choose phi kernrel

* fix complie budg

* fix code_gen bug

* fix bug

* fix kernel signature code-gen

* fix complie bug of VarType

* fix complie bug of VarType

* fix test_sparse_conv_op

* fix test_sparse_norm_op

bdd3dde3

09 9月, 2022 1 次提交
- R
  [CustomDevice] add dy2static support (#45878) · abc85c50
  由 ronnywang 提交于 9月 09, 2022
```
* [CustomDevice] add dy2static support

* update
```
  abc85c50
30 8月, 2022 1 次提交

Remove extra attribute in OpMaker (#44310) · fe321f9a

由 zyfncg 提交于 8月 30, 2022

* add runtime config in phi

* add runtime attr for op desc and op

* fix no proto error

* adjust opdesc set_attr impl

* try to remove conv_op extra attrs

* add init runtime attr map

* change extra header path

* fix runtime_attr

* fix trace_op

* fix bug of pass

* fix merge conflict

* fix dygraph attrs

* fix bug of pass

* fix dygraph bug

* fix unittest module

* delete extra attr default

* fix dropout kernel

* polish code

* fix extra output of instance_norm

* fix merge confilct

* fix op_desc bug

* add extra attr in yaml for conv3d_transpose

* don't remove extra input and output

* fix save_inference_model

* fix bug of batch_norm

* revert some change

* polish log

* polish code

* add code comment
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

fe321f9a

25 8月, 2022 1 次提交
- R
  [NPU] add run_program_op_npu (#45349) · 64afa638
  由 ronnywang 提交于 8月 25, 2022
```
* [NPU] add run_program_op_npu

* add run_program_op_npu ut
```
  64afa638
19 8月, 2022 1 次提交

Support beam search decode op in XPU environment (#44917) · adaffb7b

由 mengqingchun02 提交于 8月 19, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

adaffb7b

05 8月, 2022 1 次提交

move fft kernels to phi (#44714) · 153f1138

由 Feiyu Chan 提交于 8月 05, 2022

* move fft kernels to phi, done with cufft, pocketfft, mkl_cdft, hipfft
* make stft_op use fft from phi/kernels/funcs, clean code

153f1138

15 7月, 2022 1 次提交
- R
  
  Remove boost library (#44092) · d2e59e15
  由 Ruibiao Chen 提交于 7月 15, 2022
  
  d2e59e15
14 7月, 2022 1 次提交
- W
  Compilation optimization (#44242) · 4baf0dbe
  由 wanghuancoder 提交于 7月 14, 2022
```
* Compilation optimization
```
  4baf0dbe
12 7月, 2022 1 次提交
- Q
  
  [MLU]add sync_batch_norm op (#44176) · f1be9cf1
  由 qipengh 提交于 7月 12, 2022
  
  f1be9cf1
24 6月, 2022 1 次提交
- 王
  
  add xpu support for new static alone executor. test=develop (#43076) · b2704837
  由王明冬提交于 6月 24, 2022
  
  b2704837
14 6月, 2022 1 次提交
- W
  fix cmake-lint problems. (#43406) · 59f89236
  由 Wilber 提交于 6月 14, 2022
```
* cmake-lint

* update
```
  59f89236
10 6月, 2022 1 次提交

make all phi kernels to 2(host/device) static libraries directly (#43247) · 5781999d

由 Leo Chen 提交于 6月 10, 2022

* make all phi kernels to 2(host/device) static libraries directly

* fix calling kernel_declare

* fix compile

* fix cpu compile

* fix rocm compile

* fix xpu compile

* fix xpu kp compile

* fix inference compile

5781999d

02 6月, 2022 1 次提交

Support CUDA Graph for partial graph in dygraph mode (#42786) · d05b940a

由 sneaxiy 提交于 6月 02, 2022

* support CUDAGraph for partial graph

* add ut

* fix ci

* fix ut again because of eager mode

* fix kunlun ci

* fix win ci

d05b940a

16 4月, 2022 1 次提交
- 王
  
  move fc_functor from fluid to phi.test=develop (#41856) · 21aa3adc
  由王明冬提交于 4月 16, 2022
  
  21aa3adc
13 4月, 2022 1 次提交

Lml/add prim ops (#41201) · 97dec7ca

由 levi131 提交于 4月 13, 2022

* native commit for triple grad of sigmod

* Updated unittests files

* init functional jacobian api

* Updated trible_test func

* Updated gradient_checker & test_script

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* fix dygraph grad to support high differential

* polish API docstring

* Updated gradient checker and some related files

* fix double grad strip error for high differential

* fix double grad strip error for high differential

* Add Sigmoid triple grad tests

* fix dygraph double grad dtype error when calling for high differential senario

* Updated triple grad teses func

* Use np.random to initialize ddx

* Updated triple_grad_check func

* add todo for gradient checker and refine some comments

* remove additional code

* add test for warnging in backward.py

* format python code

* support multi input in triple gradient checker

* Add matmul triple grad kernel

* Updated comments of TODO

* Supported some special tests

* Change code-format to follow CI std

* Updated gradient_checker.py

* Fix conflicts

* Removed unnecessary printing log

* Change code style to follow CI std

* merge upstream

* add_p

* rm useless files

* add sub_p mul_p div_p

* add sqrt_p and tanh_p

* add reshape_p

* add broadcast_p

* add broadcast_p fill_constant_p matmul_p reduce_p reshape_p transpose_p

* add split_p and concat_p

* add gather_p and scatter_add_p

* add slice_select_p and slice_assign_p

* add multi input check for add_p, sub_p, mul_p, div_p

* update concat_p

* refine gather_p and scatter_add_p

* refine slice_assign_p and slice_select_p

* add 9 test for prim ops

* add more test and fix some bug

* add more test

* register proto

* add shape valid check for broadcast_p op, and add keepdim attr into reduce_p op proto

* support multi input and multi output for split_p and concat_p

* fix slice bug for slice_select_p and slice_assign_p

* dtype for axis attr should be long int

* update dtype for axis attr int64_t

* update for iscan CI

* add more shape and dtype check

* change IndexTensor into int32 dtype

97dec7ca

05 4月, 2022 1 次提交
- G
  
  add new format of quantization (#41041) · b72a7ebb
  由 Guanghua Yu 提交于 4月 05, 2022
  
  b72a7ebb
28 3月, 2022 1 次提交

[Phi] Move warpctc OP to phi (#40023) · cb183762

由 0x45f 提交于 3月 28, 2022

* moving OP

* move forward

* move grad and infershape

* code format

* format code

* fix code

* fix code

* fix CMakerLists.txt

* fix comments

* Refine CMakeLists for rocm ci

cb183762

10 3月, 2022 1 次提交
- Z
  [PHI] Move segment_pool to phi. (#40099) · a07f19ee
  由 Zhong Hui 提交于 3月 10, 2022
```
* move segment_pool to phi.

* mark summed ids as optional tensor.

* fix as reviews.
```
  a07f19ee
24 2月, 2022 1 次提交
- C
  [PTen->Phi PR3] Rename pten make target to phi (#39832) · f77019a0
  由 Chen Weihang 提交于 2月 24, 2022
```
* rename pten to phi

* fix infrt compile failed

* resolve conflict
```
  f77019a0
16 2月, 2022 1 次提交
- F
  [Pten] move complex_functors.h (#39558) · 5b5656d0
  由 Feiyu Chan 提交于 2月 16, 2022
```
* move complex_functors.h and update all references to symbols within it
```
  5b5656d0
26 1月, 2022 2 次提交

[pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1

由 Leo Chen 提交于 1月 26, 2022

* update cmake file to remove fluid kernel

* add pten declaration.h to where pybind.h used

* fix sync_bn and tensorrt_engine

* refine detection_library

* fix interpreter_core

* support eager legacy

* fit eager legacy for pten

* fall back to cpu if not found kernel

* fix compile problem

* fix compile problem

* refine fallback logic

* fit operator.run()

* fix xpu compile

* fit for new_exec

* add REGISTER_OP_WITHOUT_GRADIENT

* un-cache pt_kernel_context

* fix compile

* fix cudnn

* fix compiling with on_infer

* fix mkldnn

* fix isfinite_v2

* fix xpu problem

* fix op_device

* refine fallback for xpu

* fix xpu compile

* merge develop

* refine code format

* fix compile

* fix compile

* add data_transfer

* fix PreparePtenData

* fix cpu context

* merge develop

* fix compile

* fix error device context

* fix xpu

* fix dev_ctx

3ab9aef1

[IPU] sync misc changes 02 (#39189) · 5df78366

由 Allen Guo 提交于 1月 26, 2022

* sync misc changes

* apply comments 01

* fix compile error

* remove is_ipu_place check

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* sync changes

* restore cmake

* update ir cmake and setup.py

* update inference_lib cmake

* restore for split PR
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

5df78366

24 1月, 2022 1 次提交

[Pten] Migration of eigen numeric extensions and functors in paddle/fluid/operatos/eigen (#39124) · a1e40dc6

由 Feiyu Chan 提交于 1月 24, 2022

* migration of functors in paddle/fluid/operators/eigen and paddle/fluid/platform/eigen_ext.h
* update path of data types like float16.h in includes in extensions.h

a1e40dc6

21 1月, 2022 1 次提交
- W
  
  Renamed selected_rows.* -> selected_rows_utils.* (#39037) · 814e5ab4
  由 Weilong Wu 提交于 1月 21, 2022
  
  814e5ab4
28 12月, 2021 1 次提交

Add API and op for take_along_axis (#38396) · 3310f519

由 huangxu96 提交于 12月 28, 2021

* add API and op for take_along_axis

* fix compile dependency problem and add example code and doc

* add unitest

* delete some code for CI coverage

* fix code style problem

* fix as review

3310f519

20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
08 12月, 2021 1 次提交

add a subdirectory named cinn in operators and move releated files into it (#37938) · 9cb637ed

由 CtfGo 提交于 12月 08, 2021

1. add a subdirectory named `cinn` in `paddle/fluid/operators` directory and move releated files into it
2. seperate CinnLaunchContext class from `cinn_launch_op.h` and put it in a  new independent file named `cinn_launch_context.h`, so that it can be included by others clearly.

9cb637ed

06 12月, 2021 1 次提交

Update CINN tag (#37870) · 3e33ef5a

由 Huihuang Zheng 提交于 12月 06, 2021

1. Modify git tag for CINN
2. Support compile option "-DWITH_CINN=ON, -DWITH_TESTING=OFF"

3e33ef5a

01 12月, 2021 1 次提交
- S
  Fix inplace addto pass by setting dtype correctly (#37717) · b0d580a2
  由 sneaxiy 提交于 12月 01, 2021
```
* fix inplace addto pass

* update

* fix ut

* improve ci coverage

* fix musl ci compile error
```
  b0d580a2
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

19 11月, 2021 1 次提交
- L
  
  fix cmake dependence error (#37304) · 6653ac5e
  由 LiYuRio 提交于 11月 19, 2021
  
  6653ac5e
13 11月, 2021 1 次提交

cinn_launch_op: skip checking input variables must be used (#37119) · 228eb898

由 CtfGo 提交于 11月 13, 2021

Modify serveral implements on CinnLaunchOp：
1. Skip checking input variables must be used 
2. Move current helper functions to a CinnlaunchContext

228eb898

03 11月, 2021 1 次提交

improve CinnLaunchOpKernel implement (#36936) · 0590277a

由 CtfGo 提交于 11月 03, 2021

1. 功能不变，简化CinnLaunchOpKernel实现：将原先直接从Scope获取变量信息的方式改为借助参数ExecutionContext标准接口获取，简化了实现逻辑，相应地也简化了辅助函数的实现，原先cinn_launch_op_helper较为冗余，删除不必要的接口并迁移至cinn_launch_op.cc中定义。
2. 修复CinnLaunchOp InferShape判断是否有指定输出：HasOutput->HasOutputs
3. 添加详细的注释和debug信息，方便问题排查和代码维护

0590277a

02 11月, 2021 1 次提交
- L
  
  fix cusparse compile bug in CUDA11.2, test=develop (#36911) · dc08c187
  由 Liu-xiandong 提交于 11月 02, 2021
  
  dc08c187
01 11月, 2021 2 次提交

Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc

由 Chen Weihang 提交于 11月 01, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

b9fdd3bc

add cinn_launch_op for using CINN to optimize graph (#36600) · 0a963ee9

由 CtfGo 提交于 11月 01, 2021

增加CinnLaunchOp，负责执行Cinn子图编译的结果，要点如下：
1. 在子图划分的BuildCinnPass中，每个子图在原图中会被替换为该CinnLaunchOp，由它来调用Cinn进行子图编译、执行的功能。
2. CinnLaunchOp的输入/输出即为子图的输入和输出，另外增加`compilation_key`属性，它可由该属性key从全局Cache中获取子图对象、编译结果，该属性由BuildCinnPass在创建Op时进行设置
3. CinnLaunchOp功能实现的流程为：
        - 从全局Cache中获取子图对象
        - 从全局Cache中获取子图编译结果，未命中cache时进行即时编译
        - 根据编译结果的变量信息(数据类型、shape）初始化运行时数据，分配内存/显存
        - 将运行时数据打包为参数，调用cinn的可执行对象runtime program进行计算
        - 子图运行结果通过参数指针同步到paddle侧的tensor

0a963ee9

21 10月, 2021 1 次提交

[NPU] Add sync_batch_norm and sync_batch_norm_grad NPU Kernel (#36320) · 0ca2807c

由 furnace 提交于 10月 21, 2021

* add sync_batch_norm (support train, infer, and fp32, fp16, and NCHW, NHWC)

* [NPU] Delete debug codes

* [NPU] Remove FP16

0ca2807c

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功