- 19 7月, 2019 2 次提交
-
-
由 Huihuang Zheng 提交于
Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)
-
由 Adam 提交于
test=develop
-
- 18 7月, 2019 2 次提交
-
-
由 hutuxian 提交于
* hash_op support int64 hash_size * add corresponding UT
-
由 guru4elephant 提交于
* remove ctr reader, all functions are satisfied in dataset
-
- 17 7月, 2019 3 次提交
-
-
由 Yang Zhang 提交于
* Add GPU implementation for `prelu` backward pass test=develop * Fix logic error in `prelu` GPU backward and simplify a bit test=develop * Fix `prelu` backward CUDA implementation test=develop CPU version was not used actually, so test passed
-
由 Yihua Xu 提交于
-
由 baojun 提交于
-
- 16 7月, 2019 2 次提交
-
-
由 Jacek Czaja 提交于
* - Added partial draft of pooling acquire - Workspace support - compilation fix - Added draft of pooling backward reimplementation - Segfault fix - reverted 'any' for diff_dst crewation in pooling - Lint fixes test=develop - lint fixes test=develop - Further lint fixes test=develop * - Fixes after review test=develop * - Lint fixes test=develop * - Even more lint fixes test=develop
-
由 chengduo 提交于
test=develop
-
- 15 7月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* make auc op compatible with 1 dim
-
- 11 7月, 2019 2 次提交
-
-
由 Hongyu Liu 提交于
-
由 Zeng Jinle 提交于
* feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop
-
- 10 7月, 2019 4 次提交
-
-
由 Zeng Jinle 提交于
* clean code of dim and place, test=develop * fix failed unittests, test=develop
-
由 Jacek Czaja 提交于
-
由 Yibing Liu 提交于
-
由 Physher 提交于
-
- 09 7月, 2019 3 次提交
-
-
由 Jiabin Yang 提交于
* test=develop, fix docker with paddle nccl problem * test=develop, fix/gcc_4.8_ubt_link_error * test=develop, fix code format
-
由 Physher 提交于
-
由 LielinJiang 提交于
* fix transform matrix bug, test=develop * modify API.spec
-
- 08 7月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
* Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop
-
- 05 7月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Fix topk cannot handle 1D vector bug Add path to handle 1D vector test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 04 7月, 2019 2 次提交
-
-
由 qingqing01 提交于
* Refine Infershape in activation_op for double_grad.
-
由 chengduo 提交于
-
- 03 7月, 2019 6 次提交
-
-
由 zhoukunsheng 提交于
-
由 zhoukunsheng 提交于
* test=develop support Tensor input for chunk_eval op * test=develop fix testcase for chunk_eval op * test=develop fix typos in nn.py
-
由 zhoukunsheng 提交于
-
由 zhoukunsheng 提交于
-
由 zhoukunsheng 提交于
-
由 zhoukunsheng 提交于
-
- 02 7月, 2019 3 次提交
-
-
由 Leo Zhao 提交于
* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop
-
由 Yi Liu 提交于
1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
-
由 chengduo 提交于
* add not_been_used_vars to no_grad_set test=develop
-
- 01 7月, 2019 2 次提交
-
-
由 LielinJiang 提交于
* modify roi_perspective_transform_op to output mask and transform matrix * modify comment * modify comment * modify API.spec * update API.spec * remove no use header, test=develop * resolve conflict
-
由 Brian Liu 提交于
* Fix bug in quantize kernel which cause crash in vgg16/19 model test=develop * refine the code to reduce verbose code; test=develop * remove useless code; test=develop
-
- 28 6月, 2019 2 次提交
-
-
由 Leo Zhao 提交于
1. some key generation method is not aligned with PR#17965 2. enlarge ptr lifetime to avoid memory release if SetBlob fails otherwise it will get core dump. test=develop
-
由 Zeng Jinle 提交于
* add_elementwise_add_inplace_test,test=develop * rename file, test=develop
-
- 27 6月, 2019 4 次提交
-
-
由 tangwei12 提交于
* add is_runnning in communicator, test=develop
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
由 Sylwester Fraczek 提交于
add prior_box quantization code add scale algo rules for prior box test=develop
-
由 Jacek Czaja 提交于
* - Reusing of reuder used in elementwise_add_mkldnn - Added MKL-DNN sum prim reusing test=develop - Compilation fixes test=develop - Yet another compilation fix test=develop - Yet another compilation fix test=develo - Yet another linking fix test=develop - Final compilation fix test=develop - lint fixes test=develop - Lint fixes test=develop * - Fixes after review test=develop
-