提交 · de009152a7c99115cd1681a38a6a3960119e267f · BaiXuePrincess / Paddle

10 2月, 2020 1 次提交

Compile without nccl deps. [2/2] (#22484) · de009152

由 Wilber 提交于 2月 10, 2020

Compile without nccl deps. [1/2]
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

de009152

07 2月, 2020 1 次提交

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

17 1月, 2020 1 次提交

Implement a common python unittest to test the ir passes. (#22209) · b7cac50b

由 Yiqun Liu 提交于 1月 17, 2020

* Implement a common python unittest to test the ir passes.
test=develop

* Save the results in np.array and support to startup on CPU.
test=develop

* Fix the unittest.
test=develop

* Add check_program to check whether the optimized program is different from the origin one.
test=develop

* Remove the inferface all_ops.
test=develop

* Add exception test in pass_test.
test=develop

b7cac50b

14 1月, 2020 1 次提交
- X
  add collective communication library in fleet (#22211) · e3a457d3
  由 xujiaqi01 提交于 1月 14, 2020
```
* add collective communication library in fleet to replace mpi
* test=develop
```
  e3a457d3
10 1月, 2020 1 次提交

Add bn and relu fuse pass (#22048) · 46189b16

由 Zhen Wang 提交于 1月 10, 2020

* add bn and relu fuse pass

* add op attr assert and dtype assert

* fix some inputs&&outputs bugs for the fused op and pattern.

* add the unittest for fuse_bn_act_pass. test=develop

* use normative enforce statements. test=develop

* add the cpu test. test=develop

* add the support of batch_size=1 for the bn with relu op. test=develop

* add the error type for paddle throws. test=develop

* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop

46189b16

06 1月, 2020 1 次提交
- H
  
  Add ParallelExecutor Test for Cond API and Fix PE Checks Shape Bug (#22029) · dd436156
  由 Huihuang Zheng 提交于 1月 06, 2020
  
  dd436156
18 12月, 2019 1 次提交

Fix Backward Bugs in Conditional Block (#21809) · 557bce77

由 Huihuang Zheng 提交于 12月 18, 2019

The fixed bugs:

1. The condition sub-graph is not pruned
2. When backward graph is extremely simple, the whole backward ops are pruned.

557bce77

11 12月, 2019 1 次提交
- M
  add `no_need_buffer_slots` interface to pybind (#21575) · 686f0ecb
  由 mapingshuo 提交于 12月 11, 2019
```
* add no_need_buffer_slots interface to pybind
```
  686f0ecb
06 12月, 2019 1 次提交

add file check_op_desc.py and add interface to get default value. (#21530) · 9da7e6b4

由 liym27 提交于 12月 06, 2019

* add file check_op_desc.py and add interface to get default value. test=develop

* add test for c++ coverage rate. test=develop

* Correct typo. test=develop

9da7e6b4

05 12月, 2019 2 次提交

Z

add grad maker assert, test=develop (#21564) · 3a7caf48
由 Zeng Jinle 提交于 12月 05, 2019

3a7caf48

Split VarBase from Python Variable for Dygraph (#21359) · cdd46d7e

由 Leo Chen 提交于 12月 05, 2019

* test=develop, fix docker with paddle nccl problem

* don't expose numerous Tensor.set(), test=develop

* fix condition, test=develop

* fix float16 bug, test=develop

* feed should be Tensor or np.array, not Variable or number, test=develop

* use forcecast to copy numpy slice to new array, test=develop

* remove float16-uint16 hacking, test=develop

* add variable method to varbase and refactor to_variable to support return varbase

* support kwargs in varbase constructor

* add VarBase constructor to support default python args

* refine varbase initial method

* reset branch

* fix ut for change VarBase error info to PaddleEnforce

* cherry is parameter change before

* overload isinstance to replace too many change of is_variable

* rm useless files

* rm useless code merged by git

* test=develop, fix some ut failed error

* test=develop, fix test_graph_wrapper

* add some tests, test=develop

* refine __getitem__, test=develop

* add tests, test=develop

* fix err_msg, test=develop

cdd46d7e

04 12月, 2019 1 次提交

Add get_all_kernels api of registered data_type in pybind.cc (#21499) · 54382ce4

由 Aurelius84 提交于 12月 04, 2019

* add _get_all_register_op_kernels api test=develop

* refine usage of check_op_register_type test=develop

* add import in core test=develop

54382ce4

27 11月, 2019 1 次提交

Support numpy bridge (enabled by default in dygraph mode) (#20983) · d5ff79e5

由 Youwei Song 提交于 11月 27, 2019

* add numpy bridge

* fix template compile

* add unittest, add default
test=develop

* fix unittest
test=develop

* fix unittest
test=develop

* zero_copy=True for to_variable,
test=develop

* bug fix
test=develop

* disable deprecated NumPy API
test=develop

* use better design of NumpyAllocator
test=develop

* fix Py_None check
test=develop

* reset c++ tracer when jump out dygraph guard
test=develop

* refine PADDLE_ENFORCE_xx format
test=develop

* bug fix of tracer switch
test=develop

* update decref
test=develop

d5ff79e5

25 11月, 2019 1 次提交
- Z
  Add global value getter setter (#21285) · b9f8ae84
  由 Zeng Jinle 提交于 11月 25, 2019
```
* add global value getter setter, test=develop

* fix error messages, test=develop
```
  b9f8ae84
24 11月, 2019 1 次提交

Refactor fetch handler (#21264) · 691ced87

由 Dong Daxiang 提交于 11月 24, 2019

* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.

For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.

691ced87

14 11月, 2019 1 次提交

Add friendly dygraph trace API (#21091) · 5fdfbe34

由 Zeng Jinle 提交于 11月 14, 2019

* friendly trace interface, test=develop

* refine TracedLayer, test=develop

* add some docs, test=develop

5fdfbe34

01 11月, 2019 1 次提交

Update Tensor.set() to support float16 (#19964) · 9974e407

由 Leo Chen 提交于 11月 01, 2019

* don't expose numerous Tensor.set(), test=develop

* fix condition, test=develop

* fix float16 bug, test=develop

* feed should be Tensor or np.array, not Variable or number, test=develop

* use forcecast to copy numpy slice to new array, test=develop

* remove float16-uint16 hacking, test=develop

9974e407

31 10月, 2019 1 次提交

Refine the cache of program, context and scope in executor. (#18483) · 16e4d026

由 Yiqun Liu 提交于 10月 31, 2019

* Refine the cache of program, context and scope in executor.
test=develop

* Refine the unittest test_executor_and_use_program_cache.

* Add the test the PaddingRNN with use_program_cache=True.
test=develop

* Remove a check.
test=develop

* Refine the unittest to check whether it is correct when setting use_program_cache=True.
test=develop

16e4d026

29 10月, 2019 1 次提交

save load problem fix and new feature add (#20823) · ff0886a9

由 hong 提交于 10月 29, 2019

* fix persistable;

* fix save load bugs; test=develop

* fix bug; test=develop

* add example for new io api; test=develop

* addd example; test=develop

ff0886a9

18 10月, 2019 1 次提交
- W
  
  Fix dgc nan by stripping nccl from sparseReduce. (#20630) · 507afa8a
  由 WangXi 提交于 10月 17, 2019
  
  507afa8a
14 10月, 2019 2 次提交

Dlpack support (#20039) · 12e4be03

由 633WHU 提交于 10月 14, 2019

* support dlpack to tensor and implement python interface test=develop

* add unittest for _to_dlpack and from_dlpack test=develop

12e4be03

Refine py_reader exit (#20331) · 40effc61

由 Zeng Jinle 提交于 10月 14, 2019

* refine py_reader exit, test=develop

* fix multiprocess_reader exception unittest, test=develop

* increase code coverage for legacy fluid.layers.py_reader, test=develop

40effc61

11 10月, 2019 3 次提交

update the api en doc of BuildStrategy (#20445) · f855a86c

由 liu zhengxi 提交于 10月 11, 2019

* update the api en doc of BuildStrategy and its setting, test=develop, test=document_fix

* update api.spec, test=develop, test=document_fix

* update the en doc of fuse_relu_depthwise_conv, test=develop, test=document_fix

f855a86c

T
doc fix, test=develop, test=document_fix (#20239) · a010d883
由 tangwei12 提交于 10月 11, 2019
```
* doc fix, test=develop, test=document_fix
```
a010d883

Update en APIs of LoDTensor (#20115) · 5a7142ac

由 Leo Chen 提交于 10月 11, 2019

* polish en APIs of LodTensor, test=develop, test=document_dix

* polish en APIs of LoDTensor, test=develop, test=document_fix

* follow comments, test=develop, test=document_dix

5a7142ac

10 10月, 2019 2 次提交

New save load interface (#20148) · fa43e80e

由 hong 提交于 10月 10, 2019

* add new save load interface; test=develop

* add new save interface; test=develop

* add save load interface ;

* fix save load error;

* fix dygraph set dict bug;

* add save load unit test; test=develop

* fix test_imperative_optimizer bug; test=develop

* fix unitest optimizer bug; test=develop

* fix code coverage; test=develop

* fix converage; test=develop

* add document for apis; test=develop

* fix unitest error; test=develop

* fix save load unit test error; test=develop

* fix error message; test=develop

* change set_parameter set_optimizer to save_dygraph; test=develop

* add load_graph check; test=develop

* fix api spec; test=develop

fa43e80e

Polish en doc of LoDTensorArray, test=document_fix (#19972) · f4c56e9f

由 Leo Chen 提交于 10月 10, 2019

* Polish en doc of LoDTensorArray, test=develop, test=document_fix

* follow comments, test=develop, test=document_dix

f4c56e9f

09 10月, 2019 1 次提交

refine CUDA CPU places en doc (#20243) · 20f68916

由 Youwei Song 提交于 10月 09, 2019

* fix CUDA CPU places, test=document_fix, test=develop

* fix CUDAPlace param doc, test=document_fix, test=develop

* fix CUDAPlace param doc, test=document_fix, test=develop

20f68916

07 10月, 2019 1 次提交
- T
  trainer from dataset fetch targets (#19760) · c9139c3d
  由 tangwei12 提交于 10月 07, 2019
```
add executor.FetchHandler for train/infer from the dataset
```
  c9139c3d
28 9月, 2019 1 次提交

Enable users to create custom cpp op outside framework. (#19256) · 1a3eef02

由 qingqing01 提交于 9月 28, 2019

* How to write custom op needs to follow framework OP spec.
* Package fluid_framework.so and headers into whl.
* Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir.
* Export some C-APIs to merge OpInfo between core.so and custom_op.so.
* Add unit testing.
* Update API.spec.

1a3eef02

27 9月, 2019 1 次提交

石

update operator compatible info, test=develop (#19978) · 01b9d079

由石晓伟提交于 9月 27, 2019

* update operator compatible info, test=develop

* revert cmake/version.cmake, test=develop

* add unit_tests and fix bugs, test=develop

* update ../paddle/fluid/framework/framework.proto, test=develop

* fix bug of paddle/fluid/inference/api/analysis_predictor.cc, test=develop

* update paddle/fluid/framework/version_test.cc, test=develop

* add comments and rename interfaces, test=develop

01b9d079

26 9月, 2019 1 次提交

Expose `mutable_data` as python binding (#19932) · cde73a7b

由 Yang Zhang 提交于 9月 26, 2019

* Expose `mutable_data` as python binding

test=develop

* Add test for device pointer binding

test=develop

* Make test compatible with python 2

cde73a7b

25 9月, 2019 1 次提交

Add support for new QAT models (#18970) · 4286a627

由 Wojciech Uss 提交于 9月 25, 2019

* Add support for new QAT models

test=develop
Co-Authored-By: NMichał Gallus <michal.gallus@intel.com>
Co-Authored-By: NWojciech Uss <wojciech.uss@intel.com>

* fixed fps results

test=develop

* fix top5 accuracy drop problem

* updated for new QAT models

* skip quantizing average pooling - dirty but working

* add missing pass

* added missing conv+brelu fuse pass

* removed a call to non-existent pass

test=develop

* renamed pass

test=develop

* Adjust finding pooling scale to newest QAT models

* Remove unnecessary code from quantization_mkldnn_pass

* Copy Pooling input scale to output scale in QAT

* Refactor & remove unused code in QAT

* Incorporate fp32 FC into QAT

test=develop

* Enable graph drawing with debug flag

test=develop

* Add tests for QATv2

* Fix paths for QATv2 models

test=develop

* Add option to save transformed int8 qat model

test=develop

* Remove redundant lines from qat mkldnn pass

test=develop

* Delegate disablement of avg pooling to qat

test=develop

* fix CI bug, test=develop

* Follow Wangzhen's Review, test=develop

* Update API.spec

test=develop

* Name False in (is_unsigned, TensorScale) tuple

test=develop

4286a627

16 9月, 2019 1 次提交
- C
  
  Add prune_backward function to cover complicated test_program.clone situation (#19772) · 00d5375e
  由 Chen Weihang 提交于 9月 16, 2019
  
  00d5375e
13 9月, 2019 1 次提交

Open fuse all reduce option (#19765) · 056fdedd

由 chengduo 提交于 9月 13, 2019

* Open fuse all reduce op
test=develop

* Add Fuse optimization op log

* Add log in fuse_optimizer op pass and fuse all_reduce op pass

* replace with boost::optional<bool>
test=develop

* Polish code
test=develop

* fix code coverage
test=develop

056fdedd

05 9月, 2019 1 次提交
- M
  add feed_var_names to Prune interface (#19589) · dca9b6c5
  由 mapingshuo 提交于 9月 05, 2019
```
* Fix bug: add feed_vars to the prune function
```
  dca9b6c5
31 8月, 2019 1 次提交

Paddlebox Framework (#18982) · c756b5d2

由 hutuxian 提交于 8月 31, 2019

* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982

c756b5d2

26 8月, 2019 1 次提交

Fix bug of getting bool Flags from os.environ (#19349) · 6fb310ae

由 Leo Chen 提交于 8月 26, 2019

* fix bug of getting bool Flags from os.environ, test=develop

* add empty loss_name in CompiledProgram for inplace grad test, test=develop

6fb310ae

22 8月, 2019 1 次提交

Enhance OpTest to check the consistency of operators when using and not using inplace (#19101) · a9d5fc51

由 Leo Chen 提交于 8月 22, 2019

* add pybind interface to get all inplace ops, test=develop

* enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop

* handle corner cases in op_test, test=develop

* support outputs without tensor holder_, like XShape in reshape_op, test=develop

* fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop

* use reshape_grad instead of reshape in FlattenGradOp, test=develop

* fix error debug dims info for variables like XShape, test=develop

* change computational order in sum_op to relieve computation difference using inplace, test=develop

* add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop

* follow sneaxiy's comments, test=develop

* remove unused DefaultGradOpDescMaker in mkldnn op, test=develop

a9d5fc51

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致