提交 · 479c8834f7acf4ef78872e9f3cd70b37eb21719e · BaiXuePrincess / Paddle

17 6月, 2020 1 次提交

[Paddle-TRT] Fixes #24731, opt for SoftmaxKernelWithEltadd kernel, test=develop (#24834) · 479c8834

由 zlsh80826 提交于 6月 17, 2020

* blockReduce opt

* launch threads align to warpSize

* reduce unnecessary shared memory for broadcast reduced value

* vectorize SoftmaxKernelWithEltadd

* add fp16 constrain

* test=develop

479c8834

16 6月, 2020 2 次提交

Monitor Framework (#24079) · 5822862d

由 hutuxian 提交于 6月 16, 2020

* Add a StatValue class in the backend to represent a stat.
* Add a singleton StatRegistry to maintain the collection of stats.
* For the sake of code neatness, we only support type of int and float, which can cover most of the scenarios.

5822862d

L

fix dtype error of compare op, test=develop (#25059) · 028de857
由 Leo Chen 提交于 6月 16, 2020

028de857

15 6月, 2020 2 次提交

bugfix for unique_ptr of IOptimizationProfile (#23917) · bef4afa6

由 Jeng Bai-Cheng 提交于 6月 15, 2020

This commit fixs the compiling bug regarding unique_ptr of IOptimizationProfile.

IOptimizationProfile has protected dtor and is controlled by TensorRT
internally. Application shouldn't delete the pointer of IOptimizationProfile.
See TensorRT document: https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/c_api/classnvinfer1_1_1_i_builder.html#a9ac47e100454151d8206ac91d543299a
test=develop

bef4afa6

Z
[Paddle-TRT] slice kernel optimization (#24783) · 49e4ee27
由 zlsh80826 提交于 6月 15, 2020
```
* parallel move shared data test=develop

* test=develop
```
49e4ee27

14 6月, 2020 1 次提交
- T
  fix make device_context error (#25045) · 770c11a1
  由 tianshuo78520a 提交于 6月 14, 2020
```
* test=develop

* test=develop

* fix bug

* test=develop

* test=develop
```
  770c11a1
12 6月, 2020 2 次提交
- T
  Fix/sync barrier (#25016) · be6a315f
  由 tangwei12 提交于 6月 12, 2020
```
* fix sync barrier with barrier monitor, test=develop
```
  be6a315f
- C
  
  fix cos_sim, test=develop (#25017) · 8db66fc3
  由 ceci3 提交于 6月 12, 2020
  
  8db66fc3
11 6月, 2020 1 次提交
- L
  Use allow list instead of white list (#25002) · 25a4dac4
  由 Leo Chen 提交于 6月 11, 2020
```
* use allow list instead of white list, test=develop

* reduce include, test=develop
```
  25a4dac4
10 6月, 2020 6 次提交
- Z
  
  improve performance of instance_norm, test=develop (#25005) · 621b6385
  由 Zhang Ting 提交于 6月 10, 2020
  
  621b6385
- H
  support CMatchAuc (#24990) · 1c224e26
  由 hutuxian 提交于 6月 10, 2020
```
Support CMatchAucCalculator based on CMatchRankAucCalculator with a new parameter ignore_rank
```
  1c224e26
- Z
  windows publish package scripts (#24851) · ff8ca52f
  由 Zhou Wei 提交于 6月 10, 2020
```
* windows publish package scripts,test=develop

* windows publish package scripts,test=develop

* windows publish package scripts,test=develop
```
  ff8ca52f
- L
  
  bn supports reverse_space, test=develop (#24988) · bfa46c38
  由 Leo Chen 提交于 6月 10, 2020
  
  bfa46c38
- W
  
  refine the slice Op to improve the performance of xlnet for fp16 training (#24967) · 613303db
  由 wangchaochaohu 提交于 6月 10, 2020
  
  613303db
- S
  test=develop, add log message in the function UpdateDllFlag (#24937) · 37bdb526
  由 silingtong123 提交于 6月 10, 2020
```
* test=develop, add log message in the function UpdateDllFlag

* test=develop, add the test
```
  37bdb526
09 6月, 2020 7 次提交
- C
  
  clear old var in scope, test=develop (#24976) · d152d723
  由 Chen Weihang 提交于 6月 09, 2020
  
  d152d723
- S
  Reshape transpose matmul coverage (#24970) · 53d563a0
  由 Sylwester Fraczek 提交于 6月 09, 2020
```
* remove gmock from ut

test=develop

* coverage enabled for r+t+m fuse pass

test=develop
```
  53d563a0
- W
  Add support the 5d, 6d tensor support for the reduce ops · 0eb1b0bc
  由 wawltor 提交于 6月 09, 2020
```
Add the support the 5d,6d tensor support for the reduce ops;
Add the same time, the compile time, it was 22 minutes, it was 21 minutes after fixed.
```
  0eb1b0bc
- L
  
  fix randomly hang issue of PaddleDetection training task on windows (#24977) · 8603b5fb
  由 liuwei1031 提交于 6月 09, 2020
  
  8603b5fb
- S
  
  test=develop, remove the tensorrt dll file from windows package (#24922) · 640196c4
  由 silingtong123 提交于 6月 09, 2020
  
  640196c4
- W
  
  fix the sgement fault error of profiler in seqseq model test=develop (#24952) · feba1318
  由 wangchaochaohu 提交于 6月 09, 2020
  
  feba1318
- S
  fix WARNING: ThreadSanitizer: heap-use-after-free (#24929) · a7ee634b
  由 Sylwester Fraczek 提交于 6月 09, 2020
```
test=develop
```
  a7ee634b
08 6月, 2020 4 次提交

M
fixes the place info in the Print op (#24934) · 24e24987
由 mapingshuo 提交于 6月 08, 2020
```
fixes the CUDAPlace info in the Print op
```
24e24987

Support LoDTensorArray in reverse_op (#24797) · 6be0ee15

由 Aurelius84 提交于 6月 08, 2020

* Support LoDTensorArray in reverse_op test=develop

* polish en doc and unittest code test=develop

* refine sample code test=develop

* add example of LoDTensorArray test=develop

* fix typo test=develop

6be0ee15

Refine error message in pybind folder (#24886) · 6190023a

由 Leo Chen 提交于 6月 08, 2020

* refine err_msg of pybind.cc, test=develop

* refine err_msg in tensor_py.h, test=develop

* refine error msg, test=develop

* fix test_exception, test=develop

* follow comments, test=develop

6190023a

Z

temporarily disable these unittests failed on windows (#24942) · 4058e736
由 Zhou Wei 提交于 6月 08, 2020

4058e736

05 6月, 2020 6 次提交

Fix/isfinite on windows (#24927) · a7cb97a1

由 Leo Chen 提交于 6月 05, 2020

* refine isfinite, test=develop

* use namespace std of isfinite, test=develop, test=win_gpu

a7cb97a1

S

test=develop, remove the gflags/gflags.h form paddle_api.h (#24921) · ef9b3687
由 silingtong123 提交于 6月 05, 2020

ef9b3687
W

Enhance checking in some operator. (#24473) · 4c01d6d5
由 whs 提交于 6月 05, 2020

4c01d6d5

Support SelelctedRows allreduce in multi-cards imperative mode (#24690) · 4a702ef3

由 Chen Weihang 提交于 6月 05, 2020

* support selectedrows allreduce in multi-cards dygraph, test=develop

* remove useless import modules in unittests, test=develop

* add nccl cmake to get nccl version, test=develop

* add if-condition to compiled correctly, test=develop

* add detail version parseing for old nccl, test=develop

* polish camke details, test=develop

* fix remove test cmake error, test=develop

* fix cmake condition, test=develop

* change unittest camke list, test=develop

* fix unittest cmake rule, test=develop, test=framep0

4a702ef3

P

add default ctor for AnalysisConfig python api. test=develop (#24924) · 14b85405
由 Pei Yang 提交于 6月 05, 2020

14b85405
S
test=develop, fix the bug of tensorrt package can't compile on windows (#24860) · fc443517
由 silingtong123 提交于 6月 05, 2020
```
* test=develop, fix a bug

* test=develop, remove the macro of PADDLE_DLL_INFERENCE
```
fc443517

04 6月, 2020 6 次提交
- L
  add the support to specify device index for device_guard (#24555) · 29de0d97
  由 lilong12 提交于 6月 04, 2020
```
* add the support of device index for device_guard.
```
  29de0d97
- L
  add queue_generator_op, dequeue_op, enqueue_op and ut (#24481) · 6e100227
  由 lilong12 提交于 6月 04, 2020
```
* add queue_generator_op, dequeue_op, enqueue_op and ut, test=develop
```
  6e100227
- H
  fix problem in dump and add log (#24891) · b8f17a04
  由 hutuxian 提交于 6月 04, 2020
```
* Fix the field length in LoD scenario
* Fix the missed lod info when copy tensor in dump field
* Add some log to make debug easy
```
  b8f17a04
- 石
  
  ignore warnings of external libraries, test=develop (#24193) · 76cdbb84
  由石晓伟提交于 6月 04, 2020
  
  76cdbb84
- L
  Feature/add amp_checkout_finite_and_scale op (#24875) · 1e818158
  由 Leo Chen 提交于 6月 04, 2020
```
* add amp_check_finite_and_scale op, test=develop

* add cpu kernel, test=develop

* use bool, test=develop

* follow comments, test=develop
```
  1e818158
- Z
  
  generate ci index (#24792) · 576d6808
  由 zhangchunle 提交于 6月 04, 2020
  
  576d6808
03 6月, 2020 2 次提交
- L
  
  FTRL with sparse update, test=develop (#22092) · a6beb96d
  由 leesusu 提交于 6月 03, 2020
  
  a6beb96d
- C
  
  add dep for fs.cc, test=develop, test=document_fix (#24881) · 6aae034f
  由 Chen Weihang 提交于 6月 03, 2020
  
  6aae034f

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致