- 22 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add sync communicator and implement
-
- 18 2月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add python flag to control profile level test=develop
-
- 12 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add thread barrier for the compiled program
-
- 11 2月, 2020 1 次提交
-
-
由 yaoxuefeng 提交于
* update * update test=develop * update compile set test=develop * update compile set test=develop * update test=develop * update test=develop * update test=develop * update compile setting test=develop * update compile setting test=develop * update run demo test=develop * update test=develop * update test=develop * fix test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update format test=develop * update format test=develop * update style test=develop * update style test=develop * change style test=develop * change style test=develop * change style test=develop * add dataset unittest test=develop * update test=develop * update for record test=develop * udpate style for record test=develop * update for record test=develop * update for record test=develop * update for record test=develop * fix format test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop
-
- 10 2月, 2020 1 次提交
-
-
由 Wilber 提交于
Compile without nccl deps. [1/2] Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 07 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 04 2月, 2020 2 次提交
- 02 2月, 2020 1 次提交
-
-
由 xujiaqi01 提交于
* add GeneralRoleMaker which is for general usage * test=develop
-
- 21 1月, 2020 1 次提交
-
-
由 Leo Chen 提交于
remove unnecessary template.
-
- 19 1月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* use function instead of lambda, test=develop * follow comments, test=develop
-
- 17 1月, 2020 2 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common python unittest to test the ir passes. test=develop * Save the results in np.array and support to startup on CPU. test=develop * Fix the unittest. test=develop * Add check_program to check whether the optimized program is different from the origin one. test=develop * Remove the inferface all_ops. test=develop * Add exception test in pass_test. test=develop
-
由 tangwei12 提交于
* add half_async in the communicator * fix DistributedStrategy
-
- 16 1月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add multiprocess for dygraph data loader, test=develop * polish code & add safe gurad, test=develop * refactor dygraph dataloader & add signal handler, test=develop * fix member initializer compile error on ci, test=develop * fix member initializer compile error one more, test=develop * remove useless config, test=develop * skip windows incompatible problem, test=develop * add unittest for coverage, test=coverage * add more exception unittest case, test=develop * deal with signal handler coverage, test=develop * polish code & add signal handler tests, test=develop * deal with coverage ci problem, test=develop * split data loader test & coverage ci fix, test=develop * remove test_imperative_data_loader_with_exception, test=develop * remove singal process except test case, test=develop * add exception tests again & remove sample list test, test=develop * split normal and exception unittests to diff class, test=develop * polish doc for use_multiprocess effect in static mode, test=develop
-
- 14 1月, 2020 1 次提交
-
-
由 xujiaqi01 提交于
* add collective communication library in fleet to replace mpi * test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
- 08 1月, 2020 1 次提交
-
-
由 zhongpu 提交于
* modify fc to linear in sample code, test=develop * remove FC, test=develop * remove warnings, test=develop * drop fluid/imperative/README.md , test=develop * change fc to linear, test=develop * polish code style, test=develop
-
- 06 1月, 2020 2 次提交
-
-
由 Huihuang Zheng 提交于
-
由 123malin 提交于
* add distributed_strategy
-
- 26 12月, 2019 1 次提交
-
-
由 zhouwei25 提交于
* Fix openblas to support compile on Windows when WITH_MKL=OFF
-
- 25 12月, 2019 1 次提交
-
-
由 flame 提交于
* python zero copy inference * support delete inference pass
-
- 19 12月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* add some debug flags to auto growth allocator, test=develop * add comments about auto growth, test=develop
-
- 18 12月, 2019 2 次提交
-
-
由 Huihuang Zheng 提交于
The fixed bugs: 1. The condition sub-graph is not pruned 2. When backward graph is extremely simple, the whole backward ops are pruned.
-
由 xujiaqi01 提交于
* fix compiled error of butil when with_pslib=on and with_testing=on * test=develop
-
- 12 12月, 2019 1 次提交
-
-
由 Leo Chen 提交于
* polish cmake, test=develop * add current directory to LD_LIBRARY_PATH, test=develop
-
- 11 12月, 2019 2 次提交
-
-
由 mapingshuo 提交于
* add no_need_buffer_slots interface to pybind
-
由 Zeng Jinle 提交于
-
- 10 12月, 2019 2 次提交
-
-
由 Chen Weihang 提交于
* refine dygraph dataloader & polish related code, test=develop * refine code based review comment, test=develop
-
由 Leo Chen 提交于
* add op function generator, test=develop * add unittest, test=develop * follow comments, test=develop * fix windows compilation problem, test=develop
-
- 09 12月, 2019 1 次提交
-
-
由 Leo Chen 提交于
* refine init function, test=develop * add tests, test=develop * remove extern, which may cause symbol error in gcc-4.8, test=develop
-
- 06 12月, 2019 2 次提交
- 05 12月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 Leo Chen 提交于
* test=develop, fix docker with paddle nccl problem * don't expose numerous Tensor.set(), test=develop * fix condition, test=develop * fix float16 bug, test=develop * feed should be Tensor or np.array, not Variable or number, test=develop * use forcecast to copy numpy slice to new array, test=develop * remove float16-uint16 hacking, test=develop * add variable method to varbase and refactor to_variable to support return varbase * support kwargs in varbase constructor * add VarBase constructor to support default python args * refine varbase initial method * reset branch * fix ut for change VarBase error info to PaddleEnforce * cherry is parameter change before * overload isinstance to replace too many change of is_variable * rm useless files * rm useless code merged by git * test=develop, fix some ut failed error * test=develop, fix test_graph_wrapper * add some tests, test=develop * refine __getitem__, test=develop * add tests, test=develop * fix err_msg, test=develop
-
- 04 12月, 2019 1 次提交
-
-
由 Aurelius84 提交于
* add _get_all_register_op_kernels api test=develop * refine usage of check_op_register_type test=develop * add import in core test=develop
-
- 03 12月, 2019 1 次提交
-
-
由 zhongpu 提交于
* support SelectedRows in dygraph, test=develop * fix bug of _grad_ivar interface, test=develop * add optest for support seletedrows, test=develop * fix bug for gradient_accumulator in GPU mode, test=develop * fix error when Selectedrows addto LodTensor in sorted_gradient mdoe in dygraph, test=develop * refine and simplify gradient accumulator code, test=develop * add optest, test=develop * add optest and simplify code, test=develop * fix bug for test_imperative_selected_rows, test=develop * add optest for Coverage, test=develop * fix gradient interface and simplify code, test=develop * update api for gradient, test=develop * fix ShareDim's bug in DygraphExecutionContext class, test=develop * add optest, test=develop
-
- 28 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* use system allocator in unittests, test=develop * fix op bugs, test=develop * fix tensor copy bug when src and dst are the same, test=develop
-
- 27 11月, 2019 1 次提交
-
-
由 Youwei Song 提交于
* add numpy bridge * fix template compile * add unittest, add default test=develop * fix unittest test=develop * fix unittest test=develop * zero_copy=True for to_variable, test=develop * bug fix test=develop * disable deprecated NumPy API test=develop * use better design of NumpyAllocator test=develop * fix Py_None check test=develop * reset c++ tracer when jump out dygraph guard test=develop * refine PADDLE_ENFORCE_xx format test=develop * bug fix of tracer switch test=develop * update decref test=develop
-
- 26 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-