提交 · 327e5050eb8bdf2548979c39e2a6637beaecadac · PaddlePaddle / Paddle

19 12月, 2021 1 次提交
- B
  
  Integration sharding stage2 function (#38151) · 327e5050
  由 Baibaifan 提交于 12月 19, 2021
  
  327e5050
18 12月, 2021 5 次提交
- N
  
  [pnorm] fix bug in pnorm (#38215) · 9e42fe9a
  由 Noel 提交于 12月 18, 2021
  
  9e42fe9a
- G
  
  fix seed for class_center_sample using paddle.seed (#38248) · 59be8e0e
  由 Guoxia Wang 提交于 12月 18, 2021
  
  59be8e0e
- Y
  add test_conv_act_mkldnn_fuse_pass (#38153) · 6418bc75
  由 yeliang2258 提交于 12月 18, 2021
```
* add test_conv_act_mkldnn_fuse_pass

* update cmakelist

* fix cmakelist

* fix timeout

* fix timeout

* fix timeout

* fix
```
  6418bc75
- F
  add complex op (#37918) · 31e874b1
  由 Feiyu Chan 提交于 12月 18, 2021
```
* add complex op and `paddle.complex`.
```
  31e874b1
- 王
  
  [infrt] add unit test script for infrt. test=develop (#38232) · a3bd6fc0
  由王明冬提交于 12月 18, 2021
  
  a3bd6fc0
17 12月, 2021 27 次提交
- C
  Add mcmc of planner, of update cost model and relaunch (#38177) · 1bb2c68a
  由 caozhou 提交于 12月 17, 2021
```
* add planner

* add planner

* add cost model update

* add relaunch updation

* update process_group

* fix error

* add unitest

* update unitest

* update cost model

* avoid api problem
```
  1bb2c68a
- J
  Support multi place constructor (#38171) · 6f439e5a
  由 Jiabin Yang 提交于 12月 17, 2021
```
* support more eager tensor api

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* refine test in pure cpu

* refine test in pure cpu
```
  6f439e5a
- L
  
  fit CI_SKIP_CPP_TEST (#38242) · b613c31e
  由 Leo Chen 提交于 12月 17, 2021
  
  b613c31e
- S
  Add _compile_dir argument for custom ops compilation (#38211) · 411d64ad
  由 sneaxiy 提交于 12月 17, 2021
```
* add compile_dir

* follow comments
```
  411d64ad
- C
  
  add scale lost deps (#38237) · 66a9d71a
  由 Chen Weihang 提交于 12月 17, 2021
  
  66a9d71a
- S
  Refine some AMP operators for BERT (#37923) · d80fe268
  由 sneaxiy 提交于 12月 17, 2021
```
* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci
```
  d80fe268
- S
  
  write includes.txt (#38210) · cff03734
  由 sneaxiy 提交于 12月 17, 2021
  
  cff03734
- F
  
  add test for conv_transpose_bn_fuse_pass (#38203) · 672d94b2
  由 feng_shuai 提交于 12月 17, 2021
  
  672d94b2
- S
  
  fix bug when build inference lib without tensorrt (#38156) · 6d1b8c52
  由 Sing_chan 提交于 12月 17, 2021
  
  6d1b8c52
- C
  [pten] modify reduce_sum reduce_mean args (#38216) · eaa2363e
  由 chentianyu03 提交于 12月 17, 2021
```
* modify sum mean args

* add GetExpectedPtenKernelArgs for redcue_op

* modify kernel args number

* modify kernel args number
```
  eaa2363e
- L
  
  [fleet_executor] Fix the problem in fleet executor stop (#38114) · 843435ff
  由 LiYuRio 提交于 12月 17, 2021
  
  843435ff
- Z
  Generated CoreOpsInfos for potential use in append_op API (#38085) · e3b033f9
  由 Zhanlue Yang 提交于 12月 17, 2021
```
* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* Generated CoreOpsInfos for potential use in append_op API

* Fixed CI problem
```
  e3b033f9
- K
  
  add op/api repeat/interleave (#37981) · a7de0e66
  由 kuizhiqing 提交于 12月 17, 2021
  
  a7de0e66
- H
  test_adaptive_pool2d_convert_global_pass增加超时时间 (#38220) · 885767e3
  由 heliqi 提交于 12月 17, 2021
```
* add timeout

* add timeout
```
  885767e3
- L
  [Paddle-TRT] Use TRT inspector to show the information inside an engine to verbose log (#38200) · 237c1fe6
  由 Leo Chen 提交于 12月 17, 2021
```
* Inspect the information inside a TRT engine.

* Follow up the google code style.

* Fix code error.
```
  237c1fe6
- A
  [CustomOp]Add RWLock to protect loading module under multi-thread and multi-process (#38128) · 8bc27015
  由 Aurelius84 提交于 12月 17, 2021
```
* Add RWLock to protect loading module under multi-thread

* refine code

* remove import statement
```
  8bc27015
- Z
  add launch bound to limit the registers usage for volta architecture (#38113) · 18a59822
  由 zlsh80826 提交于 12月 17, 2021
```
From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block.
There are two ways to resolve this problem:
    Reduce the threads per block launch configuration
    add __launch_bound__ to give information to nvcc compiler for reducing registers usage
this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
```
  18a59822
- Z
  [AutoParallel] add gpt model for unittest (#38202) · 76eb371e
  由 zhaoyingli 提交于 12月 17, 2021
```
* add gpt modeling

* update file name
```
  76eb371e
- N
  
  Delete cub_reduce.h and modified the TensorReduce to TensorReduceFunctorImpl (#38197) · 9a8a4c77
  由 niuliling123 提交于 12月 17, 2021
  
  9a8a4c77
- F
  Get base pointer from Allocation (#37978) · 431a2d6a
  由 From00 提交于 12月 17, 2021
```
* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy
```
  431a2d6a
- F
  
  Add GetStream Interface for StreamSafeCUDAAllocator (#38195) · b0d12d99
  由 From00 提交于 12月 17, 2021
  
  b0d12d99
- C
  
  fix detail error for scale (#38213) · 20b7c99c
  由 Chen Weihang 提交于 12月 16, 2021
  
  20b7c99c
- Y
  
  [fleet_executor] run time graph on python side (#38164) · fc701369
  由 Yuang Liu 提交于 12月 17, 2021
  
  fc701369
- L
  [BugFix]: Elementwise branch selection and Broadcast dimension merge (#38204) · e097a748
  由 limingshu 提交于 12月 17, 2021
```
* fix_bugs_for_elementwise_branch_selection

* fix merge_dims bugs

* fix all influenced file
```
  e097a748
- H
  
  update xpu1 op list, for train ResNet50 using PaddleClas. (#38201) · 3a0e0b6f
  由 houj04 提交于 12月 17, 2021
  
  3a0e0b6f
- J
  ipu add dockerfile (#37792) · 9075a0fd
  由 jianghaicheng 提交于 12月 17, 2021
```
* ipu add dockerfile

* resolve comments
```
  9075a0fd
- W
  
  fix bind failed with Address already in use (#38174) · 446a62e8
  由 WangXi 提交于 12月 17, 2021
  
  446a62e8
16 12月, 2021 7 次提交

S

modify according to zhouwei's comment (#38166) · a37be82f
由 Sing_chan 提交于 12月 16, 2021

a37be82f
L
[new-exec] skip add_dependency when cc_test skipped because of CI_SKIP_CPP_TEST=ON (#38191) · 30973183
由 Leo Chen 提交于 12月 16, 2021
```
* fix cmake

* not check execution time
```
30973183
S

block warning: overriding D9025 (#38034) · 672dba1b
由 Sing_chan 提交于 12月 16, 2021

672dba1b
C

add grad maker debug log (#38183) · a43d8e59
由 chentianyu03 提交于 12月 16, 2021

a43d8e59

Faster implementation of CPU kernel for ROI Align operator (#37848) · 023ff4f5

由 Tomasz Socha 提交于 12月 16, 2021

* Faster implementation of CPU kernel for ROI_ALIGN Operator

* Add missing variable to CUDA roi_align_op

* Style

* Fix boundaries

* Rename variables for indexes calculation

* Remove unnecessary emplace

* Revert "Remove unnecessary emplace"

This reverts commit c10e87f7fb812f1a672fde32f2690a97d47e2f5a.

* Style

023ff4f5

C

pylayer support HIP (#38184) · 2e76d5ad
由 chentianyu03 提交于 12月 16, 2021

2e76d5ad

Fixed LD_LIBRARY_PATH for eager_code_generator (#38160) · af30f545

由 Zhanlue Yang 提交于 12月 16, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Enabled Eager AutoCodeGen for All Existing Operators & Possible Future Operators

* Fixed CI issues

* Fixed LD_LIBRARY_PATH for eager_code_generator

af30f545

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功