提交 · 07dad6d6ec415758d520e33960a0c53e50ef2ab5 · 机器未来 / Paddle

02 3月, 2022 24 次提交
- H
  
  [Infrt]add phi kernel dialect (#39726) · 07dad6d6
  由 huzhiqiang 提交于 3月 02, 2022
  
  07dad6d6
- Z
  [bf16] add bf16 kernel: softmax & log_softmax (#39999) · 4a4215ff
  由 zhangbo9674 提交于 3月 02, 2022
```
* add softmax log_softmax

* refine rocm

* refine unittest
```
  4a4215ff
- J
  [Auto Parallel] Adapt Partitioner & DistOp for ERNIE3.0 Inference and cache (#39895) · c9cd47d9
  由 JZ-LIANG 提交于 3月 02, 2022
```
* adapot dist op

* add dist_fill_constant_batch_size_like

* remvoe print

* update compitable

* add unitest
```
  c9cd47d9
- C
  【phi】migrate gather_tree,reduce_prod to phi (#39844) · 6af2729e
  由 crystal 提交于 3月 02, 2022
```
* move to phi

* migrate gather_tree_op into phi

* move reduce_prod tp phi

* optimize code
```
  6af2729e
- A
  [IPU] update ipu unittests p0 (#39707) · 1db188f3
  由 Allen Guo 提交于 3月 02, 2022
```
* update ipu UTs part0

* rename UT

* sync api changes

* update uts for new api

* use_ipumodel() as classmethod
```
  1db188f3
- C
  Upgrade new profiler (#39984) · 0c3f7fbc
  由 chenjian 提交于 3月 02, 2022
```
* add new profiler components

* fix bug

* upgrade new profiler

* fix operator.cc

* fix operator.cc

* fix cmakelists.txt

* fix bug

* fix according to pr

* fix bug

* fix cmake

* fix bug

* fix a bug

* fix bug

* fix bug
```
  0c3f7fbc
- J
  
  add logic kernel for mlu (#39940) · bc113e10
  由 joeqiao12 提交于 3月 02, 2022
  
  bc113e10
- Y
  [fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for... · 244ae318
  由 Yuang Liu 提交于 3月 02, 2022
```
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference (#39992)
```
  244ae318
- L
  
  [KP] Activation op registration for XPU2. part 1/2 (#40002) · 90ab7403
  由 Lijunhui 提交于 3月 02, 2022
  
  90ab7403
- C
  [Phi] Unify complex type trait and fix real imag bug (#40036) · 0764fda2
  由 Chen Weihang 提交于 3月 02, 2022
```
* unify complex type trait and fix real imag bug

* add unittest for type tratis
```
  0764fda2
- Q
  [MLU] adapt matmul op (#39727) · b4d931e8
  由 qipengh 提交于 3月 02, 2022
```
* [MLU] adapt matmul op

* [MLU] fix phi namespace
```
  b4d931e8
- Z
  
  test=document_fix;record py3 case time (#40018) · 9070d5c5
  由 zhangchunle 提交于 3月 02, 2022
  
  9070d5c5
- 王
  
  [infrt] speed up the infrt ci. test=devvelop (#40032) · 36660d4c
  由王明冬提交于 3月 02, 2022
  
  36660d4c
- F
  
  [MLU] add transpose2 mlu kernel (#39994) · 4cab812e
  由 fwenguang 提交于 3月 02, 2022
  
  4cab812e
- B
  
  add_new_comm_primitive (#40040) · 4e00d2bb
  由 Baibaifan 提交于 3月 02, 2022
  
  4e00d2bb
- L
  
  fix unittests for eignvalsh (#39841) · aa47297a
  由 lkylkylky 提交于 3月 02, 2022
  
  aa47297a
- optimize CUDA implementaion of randint OP (#39952) · fb635089
  由 zhouweiwei2014 提交于 3月 02, 2022
```
* change CUDA implementaion of randint OP,move distribution common func to phi

* fix CI

* fix CI
```
  fb635089
- W
  [Eager] open eager when WITH_PYTHON (#39979) · 9af72957
  由 wanghuancoder 提交于 3月 02, 2022
```
* open eager when WITH_PYTHON, test=develop

* refine, test=develop

* refine, test=develop

* add DWITH_PYTHON for gen_fluid_lib, test=develop
```
  9af72957
- W
  
  ernie: revert skip_layernorm_fp16 (#39991) · 26e2b918
  由 Wangzheee 提交于 3月 02, 2022
  
  26e2b918
- J
  
  add share external data interface (#39809) · 1ff1c1e0
  由 JingZhuangzhuang 提交于 3月 02, 2022
  
  1ff1c1e0
- F
  [Pten] Gru lstm migration (#39729) · e4dba69a
  由 Feiyu Chan 提交于 3月 02, 2022
```
* move sequence2batch

* move lstm and gru

* Add phi/kernels directory into exclusion to stop using hipcc to compile non .cu files in it.
```
  e4dba69a
- W
  
  [Eager] Support gnn ptb_rnn in eager mode (#39993) · dbcf8797
  由 Weilong Wu 提交于 3月 02, 2022
  
  dbcf8797
- F
  
  Fix bug for prepare phi OP (#40033) · fb0cadfd
  由 From00 提交于 3月 02, 2022
  
  fb0cadfd
- S
  update pd_2_trt lower pass (#40019) · acdf0663
  由 Shang Zhizhou 提交于 3月 02, 2022
```
* update pd_2_trt lower pass

* update pd_2_trt lower pass

* update style

* udpate

* change trt.graph to trt.create_engine

* update comments

* update comments

* add test
```
  acdf0663
01 3月, 2022 16 次提交
- Z
  
  Added attr & tensor type mapping for final state codegen (#39997) · 852a872f
  由 Zhanlue Yang 提交于 3月 01, 2022
  
  852a872f
- Q
  
  [ROCM] fix to get rocm number in script, test=develop (#39938) · 72e462cd
  由 Qi Li 提交于 3月 01, 2022
  
  72e462cd
- fix bug of paddle.to_tensor and paddle.moveaxis (#39662) · 4617c1b2
  由 zhouweiwei2014 提交于 3月 01, 2022
```
* fix bug of paddle.to_tensor and paddle.moveaxis

* fix CI
```
  4617c1b2
- A
  
  fix compiling and running with ipu (#39920) · 69ab2700
  由 Allen Guo 提交于 3月 01, 2022
  
  69ab2700
- C
  [Phi]rm reduce infershape (#39820) · 09039636
  由 chentianyu03 提交于 3月 01, 2022
```
* modify infershape utils and rm reduce infershape

* merge develop

* fix infermete bug

* add IsForInferShape func in ArgumentMappingContext

* add reduce_mean infermeta

* modify annotation

* add default dims
```
  09039636
- X
  [phi] tranfer the selu_op and pass the CI (#39819) · 197da15a
  由 xiongkun 提交于 3月 01, 2022
```
* tranfer the selu_op and pass the CI

* add sig files

* fix code

* fix by code review

* remove TOOD

* change the include position

* change the head position
```
  197da15a
- N
  Add function description for Kernel Primitive API (#39884) · 255bf609
  由 niuliling123 提交于 3月 01, 2022
```
* Add function description for Kernel Primitive API
1. Set cumsum and sort share memory size = 1024
2.sort and cumsum api limitation : blockDim.x must be less than 512 (blockDim.x <= 512)
```
  255bf609
- Z
  Fixed auto codegen for intermediate tensors (#39797) · 2592805b
  由 Zhanlue Yang 提交于 3月 01, 2022
```
* Refactored GradNodeAccumulation data structure and behaviour

* Fixed CI issues

* Fix compilation issues

* Fixed minor issues

* Reverted changes for intermediate and OverwriteOutput

* fixed minor issue

* Fixed auto codegen for intermediate tensors

* Removed restriction on AccumulationNode modification

* Fixed CI Coverage issues

* Adjusted Log contents

* Fixed CI issues
```
  2592805b
- J
  Add mobilenetv3_large performance test for bf16 and int8 (#39738) · eb7c211a
  由 joanna.wozna.intel 提交于 3月 01, 2022
```
* Add mobilenetv3_large performance test

* Disable the BF16 test if the device does not support BF16 computations

* Change test timeout
```
  eb7c211a
- Z
  [bf16] add bf16 kernel: layer_norm p_norm reduce_sum (#39843) · ce8ed978
  由 zhangbo9674 提交于 3月 01, 2022
```
* add layer norm

* add p norm

* add reduce sum

* refine layer norm register bf16 for cudnn811

* add bf16 cast for hip

* add unittest

* refine rocm

* refine layer_norm unittest

* refine reduce op

* refine unittest

* enhance atol for reduce unittest
```
  ce8ed978
- W
  remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
  由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
  fc06be9d
- Z
  
  add test_warpctc_op in mac (#39983) · 25650774
  由 zhangchunle 提交于 3月 01, 2022
  
  25650774
- Z
  [bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332
  由 zhangbo9674 提交于 3月 01, 2022
```
* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest
```
  6d26b332
- G
  
  add MasterParam and MasterParamOut for sparse_momentum op (#39969) · 9de79892
  由 Guoxia Wang 提交于 3月 01, 2022
  
  9de79892
- P
  
  change tests_v2 to dynamic_tests_v2 in CI op benchmark (#39995) · 4204b97a
  由 pangyoki 提交于 3月 01, 2022
  
  4204b97a
- H
  
  update error_string when target is out of bound (#40001) · a7acfc5b
  由 HydrogenSulfate 提交于 3月 01, 2022
  
  a7acfc5b

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致