提交 · 47a82e38e3b52fe05fcc6665b57ef0fa607c6694 · BaiXuePrincess / Paddle

27 11月, 2019 5 次提交

Support data_norm gpu kernel (#21325) · 47a82e38

由 hutuxian 提交于 11月 27, 2019

* support data_norm_op run in CUDA
* add two parameters sync_stats & summary_decay_rate
* add UT

47a82e38

Support numpy bridge (enabled by default in dygraph mode) (#20983) · d5ff79e5

由 Youwei Song 提交于 11月 27, 2019

* add numpy bridge

* fix template compile

* add unittest, add default
test=develop

* fix unittest
test=develop

* fix unittest
test=develop

* zero_copy=True for to_variable,
test=develop

* bug fix
test=develop

* disable deprecated NumPy API
test=develop

* use better design of NumpyAllocator
test=develop

* fix Py_None check
test=develop

* reset c++ tracer when jump out dygraph guard
test=develop

* refine PADDLE_ENFORCE_xx format
test=develop

* bug fix of tracer switch
test=develop

* update decref
test=develop

d5ff79e5

G
Polish the codes of fc when needs padding (#21378) · 8493f20e
由 GaoWei8 提交于 11月 27, 2019
```
test=develop
```
8493f20e

INT8 Fully-connected (#17641) · 5d7d5482

由 Michał Gallus 提交于 11月 27, 2019

* Implement Int8 FC

* Integrate FC into INT8v2

test=develop

* int8 FC: transpose weights before computing scales

test=develop

* Add support for activation_type string in FC

test=develop

* Disable MKL-DNN's FC in VGG16 and 19

test=develop

* Disable FC quantization when mkldnn FC is disabled

test=develop

* Solve PADDLE_ENFORCES in FC int8

* Fix Paddle enforces and remove const cast

test=develop

* Fix style changes

test=develop

* Fix quantizer_tester test and add fc quantization

test=develop

* Fix FC test fail on CUDA

* Remove unnecessary log from quantize placement pass

test=develop

* Add Thread ID to FC hash key

test=develop

* Add comments to MKL-DNN FC Kernel

test=develop

* Refactor quantizer

test=develop

* Fix linter issues

test=develop

* Fix crash in slim googlenet

test=develop

* Fix PADDLE_ENFORCE messages

test=develop

5d7d5482

Z

fix syn bn grad maker, test=develop, test=document_fix (#21317) · b639a882
由 Zeng Jinle 提交于 11月 27, 2019

b639a882

26 11月, 2019 16 次提交
- Y
  add axis check for concat op (#21288) · 4d0f5ab1
  由 Youwei Song 提交于 11月 26, 2019
```
* add axis check for concat op
test=develop

* fix PADDLE_ENFORCE format
test=develop

* move to ComputeAxis for InferShape check
test=develop
```
  4d0f5ab1
- I
  
  paddleslim quantization skip pattern support list of string (#21141) · 07e6a942
  由 itminner 提交于 11月 26, 2019
  
  07e6a942
- T
  make CUDA_ARCH_NAME default Auto (#21352) · d8e7d252
  由 Tao Luo 提交于 11月 26, 2019
```
* make CUDA_ARCH_NAME default Auto

test=develop

* refine warning

test=develop
```
  d8e7d252
- Z
  Fix some typos in AMP. (#21354) · be2e3e67
  由 Zhen Wang 提交于 11月 26, 2019
```
* fix some typos in AMP. test=develop

* delete useless codes. test=develop
```
  be2e3e67
- Z
  Fix ernie python infer diff (#21311) · afb13484
  由 zhaoyuchen2018 提交于 11月 26, 2019
```
* Fix ernie pythoin infer diff
* Refine mask

test=develop
```
  afb13484
- L
  Fix mistake of batch norm op (#21237) · b6ce4f8b
  由 Lv Mengsi 提交于 11月 26, 2019
```
* fix_bn

* revert unittest,test=develop
```
  b6ce4f8b
- L
  add the framework support for distfc (#21197) · 41d13209
  由 lilong12 提交于 11月 26, 2019
```
* add the framework support for distfc and ut, test=develop
* fix the implementation of shard_index_op, test=develop
```
  41d13209
- Z
  
  polish global_value_getter_setter, test=develop (#21332) · dbba9c7e
  由 Zeng Jinle 提交于 11月 26, 2019
  
  dbba9c7e
- H
  change download log format (#21290) · a214a308
  由 hong 提交于 11月 26, 2019
```
* change download log formate; test=develop

* add unittest for data download; test=develop

* remove cache before download; test=develop
```
  a214a308
- G
  Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) · 234060f8
  由 GaoWei8 提交于 11月 26, 2019
```
* Add fc padding to solve mkl performance
test=develop

* fix gpu pass and error information
test=develop

* fix fc_fuse_pass_test
test=develop

* fix error information
test=develop

* fix error information
test=develop

* fix name and add fc op padding test
test=develop

* fix attributes
test=develop

* optimize fc padding
test=develop

* fix test
test=develop
```
  234060f8
- R
  
  reduce interp op input size to pass CI, test=develop (#21341) · 6cfcbe05
  由 ruri 提交于 11月 26, 2019
  
  6cfcbe05
- S
  
  add prediction demo and script on windows (#21248) · 45c1e7bb
  由 silingtong123 提交于 11月 26, 2019
  
  45c1e7bb
- S
  
  package the CAPI inference library and third_party (#21299) · 4b429c19
  由 silingtong123 提交于 11月 26, 2019
  
  4b429c19
- J
  
  [MKL-DNN] Error throwing for NHWC layout for MKL-DNN ops (#21207) · f4cf028a
  由 Jacek Czaja 提交于 11月 26, 2019
  
  f4cf028a
- M
  Refactor MKL-DNN ElementwiseMul (#21061) · ed9ceb9f
  由 Michał Gallus 提交于 11月 26, 2019
```
* Refactor MKL-DNN ElementwiseMul

remove manual fallback, remove format attrs
test=develop

* Refine PADDLE_ENFORCEs in eltwise_mul_op.h

test=develop

* Make ElementwiseMulOp inherit from ElementwiseOp

* Change type of simd_width to int

test=develop

* Remove Constructor extensions in ElementwiseOp and ElementwiseMulOp

test=develop

* Restore attributes

test=develop

* Fix test coverage for mkldnn eltwise mul

test=develop

* Conform to new is_run_common_broadcast API

test=develop

* Add UT for AreDimsAndFormatCorrect

test=develop
```
  ed9ceb9f
- D
  fix logger problem (#21342) · 0a93635b
  由 Dong Daxiang 提交于 11月 26, 2019
```
* fix logger problem
test=develop

* refine logger
test=develop
```
  0a93635b
25 11月, 2019 9 次提交
- Z
  
  remove warning LNK4006 and warning LNK4221 (#21226) · 345b67b5
  由 zhouwei25 提交于 11月 25, 2019
  
  345b67b5
- W
  fix the fill_constant op precious problem (#21322) · 6514f52e
  由 wangchaochaohu 提交于 11月 25, 2019
```
* fix the fill_constant op precious problem test=develop
```
  6514f52e
- Z
  Improve argsort performance. (#21267) · 08c19c58
  由 zhaoyuchen2018 提交于 11月 25, 2019
```
* Improve argsort performance.

- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s

- Add fp16 support

* Refine error message
* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  08c19c58
- L
  
  fix Print_op input dtype list error test=develop (#21326) · 7fcaa39b
  由 lijianshe02 提交于 11月 25, 2019
  
  7fcaa39b
- J
  
  add resnet50 test for post trainint quantization, test=develop (#21272) · 84865b80
  由 juncaipeng 提交于 11月 25, 2019
  
  84865b80
- T
  print table stat info for pslib (#21296) · 9a7832f8
  由 Thunderbrook 提交于 11月 25, 2019
```
* print table stat
test=develop

* notes
test=develop

* notes
test=develop
```
  9a7832f8
- Z
  
  Cache 3rd source code, improve stability, reduce the compilation time (#21190) · 341dee06
  由 zhouwei25 提交于 11月 25, 2019
  
  341dee06
- W
  
  Fix dgc accuracy by mv regularization to local (#21278) · 8ac7687e
  由 WangXi 提交于 11月 25, 2019
  
  8ac7687e
- Z
  Add global value getter setter (#21285) · b9f8ae84
  由 Zeng Jinle 提交于 11月 25, 2019
```
* add global value getter setter, test=develop

* fix error messages, test=develop
```
  b9f8ae84
24 11月, 2019 5 次提交

use prefetch to load next mem into cache (#21206) · b19e1a1b

由 Leo Zhao 提交于 11月 24, 2019

* use prefetch to load next mem into cache

test=develop

* remove hard code memcpy om pyramid_hash_ff

test=develop

b19e1a1b

Refactor fetch handler (#21264) · 691ced87

由 Dong Daxiang 提交于 11月 24, 2019

* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.

For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.

691ced87

Y
adapt test_collective_base.py for only two GPU cards available. (#21307) · f1b09ba3
由 Yi Liu 提交于 11月 24, 2019
```
* adapt test_collective_base.py for only two GPU cards available.
test=develop

* fix bug of issue #21259
test=develop
```
f1b09ba3
G

optimize nhwc for tensor core in ConvOp and ConvGradOp (#20597) · ed2a1852
由 gongweibao 提交于 11月 24, 2019

ed2a1852

Disable fusion_group pass for windows and mac. We will do some experiments on Linux first. (#21310) · c918788b

由 Yiqun Liu 提交于 11月 24, 2019

* Disable fusion_group pass for windows and mac. We will do some experiments on Linux first.
test=develop

* Print the subgraph when check failed.
test=develop

c918788b

22 11月, 2019 5 次提交

Fix the crash issue when scale or bias was null-pointer. (#21284) · 69dd5152

由 Yihua Xu 提交于 11月 22, 2019

* Fix the crash issue when scale or bias was null-pointer.

test=develop

* Add the error message for passing CI.

test=develop

69dd5152

Z

optimize lod_reset op to avoid data transform · 698b8b73
由 Zhang Ting 提交于 11月 22, 2019

698b8b73

add dequantize_abs_max op and modify lookup_table op (#20899) · f0b15184

由 Liufang Sang 提交于 11月 22, 2019

* add int8 kernel to lookup_table op and add dequantize op test=develop

* change paddle_enforce to paddle_enforce_eq test=develop

* change copyright and change some not suitable code test=develop

* remove debug log test=develop

* replace GetInputType with IndicateVarDataType test=develop

* fix EmptyGradMaker test=develop

* fix diff between cpu and gpu test=develop

* use memcopy when int8_t test=develop

f0b15184

support cvm_op run in gpu (#21300) · a6ce2306

由 hutuxian 提交于 11月 22, 2019

Previously, CVM OP was only able to run in CPU. This PR implements its GPU kernel.
What's more, we improve the UTs about CVM OP.

a6ce2306

Avoid the string as the key of map to improve the jit performance (#21292) · b085ecc2

由 Yihua Xu 提交于 11月 22, 2019

* Avoid the string as the key of map to improve the jit performance.

test=develop

* Use map to replace unordered_map.

test=develop

b085ecc2

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致