提交 · 6f439e5a5d5cc653316616dde43475d4b2b5dea8 · PaddlePaddle / Paddle

17 12月, 2021 26 次提交
- J
  Support multi place constructor (#38171) · 6f439e5a
  由 Jiabin Yang 提交于 12月 17, 2021
```
* support more eager tensor api

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* refine test in pure cpu

* refine test in pure cpu
```
  6f439e5a
- L
  
  fit CI_SKIP_CPP_TEST (#38242) · b613c31e
  由 Leo Chen 提交于 12月 17, 2021
  
  b613c31e
- S
  Add _compile_dir argument for custom ops compilation (#38211) · 411d64ad
  由 sneaxiy 提交于 12月 17, 2021
```
* add compile_dir

* follow comments
```
  411d64ad
- C
  
  add scale lost deps (#38237) · 66a9d71a
  由 Chen Weihang 提交于 12月 17, 2021
  
  66a9d71a
- S
  Refine some AMP operators for BERT (#37923) · d80fe268
  由 sneaxiy 提交于 12月 17, 2021
```
* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci
```
  d80fe268
- S
  
  write includes.txt (#38210) · cff03734
  由 sneaxiy 提交于 12月 17, 2021
  
  cff03734
- F
  
  add test for conv_transpose_bn_fuse_pass (#38203) · 672d94b2
  由 feng_shuai 提交于 12月 17, 2021
  
  672d94b2
- S
  
  fix bug when build inference lib without tensorrt (#38156) · 6d1b8c52
  由 Sing_chan 提交于 12月 17, 2021
  
  6d1b8c52
- C
  [pten] modify reduce_sum reduce_mean args (#38216) · eaa2363e
  由 chentianyu03 提交于 12月 17, 2021
```
* modify sum mean args

* add GetExpectedPtenKernelArgs for redcue_op

* modify kernel args number

* modify kernel args number
```
  eaa2363e
- L
  
  [fleet_executor] Fix the problem in fleet executor stop (#38114) · 843435ff
  由 LiYuRio 提交于 12月 17, 2021
  
  843435ff
- Z
  Generated CoreOpsInfos for potential use in append_op API (#38085) · e3b033f9
  由 Zhanlue Yang 提交于 12月 17, 2021
```
* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* Generated CoreOpsInfos for potential use in append_op API

* Fixed CI problem
```
  e3b033f9
- K
  
  add op/api repeat/interleave (#37981) · a7de0e66
  由 kuizhiqing 提交于 12月 17, 2021
  
  a7de0e66
- H
  test_adaptive_pool2d_convert_global_pass增加超时时间 (#38220) · 885767e3
  由 heliqi 提交于 12月 17, 2021
```
* add timeout

* add timeout
```
  885767e3
- L
  [Paddle-TRT] Use TRT inspector to show the information inside an engine to verbose log (#38200) · 237c1fe6
  由 Leo Chen 提交于 12月 17, 2021
```
* Inspect the information inside a TRT engine.

* Follow up the google code style.

* Fix code error.
```
  237c1fe6
- A
  [CustomOp]Add RWLock to protect loading module under multi-thread and multi-process (#38128) · 8bc27015
  由 Aurelius84 提交于 12月 17, 2021
```
* Add RWLock to protect loading module under multi-thread

* refine code

* remove import statement
```
  8bc27015
- Z
  add launch bound to limit the registers usage for volta architecture (#38113) · 18a59822
  由 zlsh80826 提交于 12月 17, 2021
```
From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block.
There are two ways to resolve this problem:
    Reduce the threads per block launch configuration
    add __launch_bound__ to give information to nvcc compiler for reducing registers usage
this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
```
  18a59822
- Z
  [AutoParallel] add gpt model for unittest (#38202) · 76eb371e
  由 zhaoyingli 提交于 12月 17, 2021
```
* add gpt modeling

* update file name
```
  76eb371e
- N
  
  Delete cub_reduce.h and modified the TensorReduce to TensorReduceFunctorImpl (#38197) · 9a8a4c77
  由 niuliling123 提交于 12月 17, 2021
  
  9a8a4c77
- F
  Get base pointer from Allocation (#37978) · 431a2d6a
  由 From00 提交于 12月 17, 2021
```
* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy
```
  431a2d6a
- F
  
  Add GetStream Interface for StreamSafeCUDAAllocator (#38195) · b0d12d99
  由 From00 提交于 12月 17, 2021
  
  b0d12d99
- C
  
  fix detail error for scale (#38213) · 20b7c99c
  由 Chen Weihang 提交于 12月 16, 2021
  
  20b7c99c
- Y
  
  [fleet_executor] run time graph on python side (#38164) · fc701369
  由 Yuang Liu 提交于 12月 17, 2021
  
  fc701369
- L
  [BugFix]: Elementwise branch selection and Broadcast dimension merge (#38204) · e097a748
  由 limingshu 提交于 12月 17, 2021
```
* fix_bugs_for_elementwise_branch_selection

* fix merge_dims bugs

* fix all influenced file
```
  e097a748
- H
  
  update xpu1 op list, for train ResNet50 using PaddleClas. (#38201) · 3a0e0b6f
  由 houj04 提交于 12月 17, 2021
  
  3a0e0b6f
- J
  ipu add dockerfile (#37792) · 9075a0fd
  由 jianghaicheng 提交于 12月 17, 2021
```
* ipu add dockerfile

* resolve comments
```
  9075a0fd
- W
  
  fix bind failed with Address already in use (#38174) · 446a62e8
  由 WangXi 提交于 12月 17, 2021
  
  446a62e8
16 12月, 2021 14 次提交

S

modify according to zhouwei's comment (#38166) · a37be82f
由 Sing_chan 提交于 12月 16, 2021

a37be82f
L
[new-exec] skip add_dependency when cc_test skipped because of CI_SKIP_CPP_TEST=ON (#38191) · 30973183
由 Leo Chen 提交于 12月 16, 2021
```
* fix cmake

* not check execution time
```
30973183
S

block warning: overriding D9025 (#38034) · 672dba1b
由 Sing_chan 提交于 12月 16, 2021

672dba1b
C

add grad maker debug log (#38183) · a43d8e59
由 chentianyu03 提交于 12月 16, 2021

a43d8e59

Faster implementation of CPU kernel for ROI Align operator (#37848) · 023ff4f5

由 Tomasz Socha 提交于 12月 16, 2021

* Faster implementation of CPU kernel for ROI_ALIGN Operator

* Add missing variable to CUDA roi_align_op

* Style

* Fix boundaries

* Rename variables for indexes calculation

* Remove unnecessary emplace

* Revert "Remove unnecessary emplace"

This reverts commit c10e87f7fb812f1a672fde32f2690a97d47e2f5a.

* Style

023ff4f5

C

pylayer support HIP (#38184) · 2e76d5ad
由 chentianyu03 提交于 12月 16, 2021

2e76d5ad

Fixed LD_LIBRARY_PATH for eager_code_generator (#38160) · af30f545

由 Zhanlue Yang 提交于 12月 16, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Enabled Eager AutoCodeGen for All Existing Operators & Possible Future Operators

* Fixed CI issues

* Fixed LD_LIBRARY_PATH for eager_code_generator

af30f545

Y

remove nightly ut from parallel ut list (#38163) · 3d7de712
由 YUNSHEN XIE 提交于 12月 16, 2021

3d7de712

Add arc hyperbolic function op (#37076) · 36b7368d

由 xiaoting 提交于 12月 16, 2021

* add activation

* update activation_op

* add unitest for activation

* fix acosh for init, test=develop

36b7368d

Conv transpose eltwiseadd bn fuse pass (#37800) · e64f0997

由 feng_shuai 提交于 12月 16, 2021

* conv_transpose_eltwiseadd_bn_fuse_pass

* change timeout

* add TIMEOUT

* add random num for group and dilation

* change PassCompat

e64f0997

王

Revert "modify the fix_seed attribute in dropout op is a def... · 464f2af8

由王明冬提交于 12月 16, 2021

Revert "modify the fix_seed attribute in dropout op is a def attribute.test=develop (#38100)" (#38127)

This reverts commit f44add7b.

464f2af8

Add tests for PaddleInference Pass (#37676) · 96597a85

由 yeliang2258 提交于 12月 16, 2021

* add test for conv_elementwise_add2_act_fuse_pass and conv_elementwise_add_act_fuse_pass

* Add conv_eltwiseadd_bn_fuse_pass test and fix test_conv_elementwise_addX_act_fuse_pass

* add tests for conv_act_mkldnn_fuse_pass

* add test for conv_bias_mkldnn_fuse_pass

* update code

* add conv_act_mkldnn_fuse_pass for relu, relu6, swish, leaky_relu

* update test

* update

* update bug

* update

* update pattern_detector

* fix test_conv_eltwiseadd_bn_fuse_pass

* add diff display notest;test=windows_ci_inference

* fix

* remove test_conv_act_mkldnn_fuse_pass.py

* ifix

96597a85

[PTen] Unify device context entrance in pten part 2 (#38182) · e02537f9

由 Chen Weihang 提交于 12月 16, 2021

* unify device context entrance

* move all_context include to header

* polish cmake relay for device_context

* fix npu compile failed

* fix npu compile failed

e02537f9

Y

add defaults value for disable_ut (#38110) · 55509ae7
由 YUNSHEN XIE 提交于 12月 16, 2021

55509ae7

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功