- 02 4月, 2020 1 次提交
-
-
由 Kaipeng Deng 提交于
* add inplace_abn_op. test=develop
-
- 30 3月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 26 3月, 2020 1 次提交
-
-
由 Zhaolong Xing 提交于
* add dynamic plugin support. test=develop * change emb eltwise layernorm to math function test=develop * add emb eltwise layernorm test=develop * can run dynamic shape ernie test=develop * fix ci test=develop * add ut for trt ernie dynamic test=develop * refine dynamic shape c++ interface. test=develop * fix comments test=develop * fix comments test=develop
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 04 2月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 09 1月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 28 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix ref_cnt pass, test=develop * add cpp unittests to reference_count_pass, test=develop * follow comments, test=develop
-
- 07 11月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
These ops are useful in control flow.
-
- 05 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* support no need buffer vars in dygraph, test=develop * fix inference compilation error, test=develop * update no_need_buffer_vars_inference, test=develop * add unittests for no_need_buffer_vars_context, test=develop * refine no_need_buffer_vars by return ref, test=develop * polish some codes, test=develop
-
- 30 10月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Move the codes of fused operators to operators/fused directory. test=develop * Correct the op name in cmake. * Change the use of PADDLE_ENFORCE. test=develop
-
- 28 10月, 2019 1 次提交
-
-
由 Aurelius84 提交于
-
- 24 10月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* make search_compute support avx only * clean search_compute.h * rename sse_axpy to avx_axpy test=develop * update CMakeLists.txt test=develop
-
- 18 10月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 02 10月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Add multihead op for ernie opt test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine softmax test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine kernel. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine cuda kernel test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine cuda version test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine cmake test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 30 9月, 2019 1 次提交
-
-
由 Wilber 提交于
* fix compile with anakin bug * remove useless deps test=develop - 修复了联编anakin时,遇到的bug. - 编译test_anakin_activate 不通过 - 编译test_anakin_engine 不通过
-
- 17 9月, 2019 1 次提交
-
-
由 chengjuntao 提交于
* add deformable conv v1 op, test=develop
-
- 11 9月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
* make leaky relu inplacable, test=develop * force add unittests to pass coverage, test=develop
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop
-
- 08 9月, 2019 1 次提交
-
-
由 hutuxian 提交于
fix cmakelist deps: remove unnecessary deps and add proper op deps
-
- 19 8月, 2019 1 次提交
-
-
由 Aurelius84 提交于
* add matrch_matrix_tensor op test=develop * fix ignore unittest if with_mkl=off test=develop * clean code and rm is_test param test=develop * modify API.spec test=develop * rm useless code in search_compute.h test=develop * modify api.spec test=develop * modify default_grad.spec test=develop * Add API test code test=develop * clean code in search_computer.h * modify PADDLE_ENFORCE and clean search_compute.h test=develop * fix code style test=develop
-
- 06 8月, 2019 1 次提交
-
-
由 Kevin 提交于
* fix overflow by int32 mul test=develop * fix reference nullptr * fix codestyle test=develop * modify to point in ContextProjectFunctor test=develop * modify to point in ContextProjectFunctor test=develop * modify . to -> test=develop * add var_conv_2d op test=develop * edit api.spec test=develop * ignore unittest if with_mkl=off test=develop * fix python3 division test=develop * fix ignore unittest bug test=develop * remove useless code test=develop * modify api.spec test=develop * modify default_grad.spec test=develop
-
- 23 7月, 2019 1 次提交
-
-
由 chengduo 提交于
* support sparse gradients test=develop
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 11 6月, 2019 1 次提交
-
-
由 石晓伟 提交于
* update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop
-
- 30 5月, 2019 1 次提交
-
-
由 Bai Yifan 提交于
* unit commits, test=develop * update API.spec, test=develop
-
- 17 5月, 2019 1 次提交
-
-
由 chengduo 提交于
* add record_event test=develop * remove csp test=develop
-
- 18 4月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 28 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 22 3月, 2019 1 次提交
-
-
由 nhzlx 提交于
2. refine trt code test=develop
-
- 20 3月, 2019 1 次提交
-
-
由 nhzlx 提交于
-
- 19 3月, 2019 1 次提交
-
-
由 zhhsplendid 提交于
test=develop
-
- 16 3月, 2019 1 次提交
-
-
由 qingqing01 提交于
test=develop
-
- 15 3月, 2019 1 次提交
-
-
由 qingqing01 提交于
* Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)
-
- 22 2月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Initialize the benchmark tester for operator. test=develop * Rearrange the codes. test=develop
-
- 30 1月, 2019 1 次提交
-
-
由 xuezhong 提交于
-
- 25 1月, 2019 1 次提交
-
-
由 baojun 提交于
* enable ngraph_engine_op test=develop * merge develop test=develop * avoid const_cast test=develop * rm ngraph_operator test=develop * Added TODO to move EnableNgraph test=develop * Add TODO to remove const_cast test=develop
-
- 24 1月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Refine the beam_search op and test. * A basic CUDA implementation of beam_search for small batch_size. * Implement CUDA kernel for beam_search_op. * Use multiple CUDA threads in the same block to select the top beam. * Update the python api of beam_search op. * Enable extend function in CPU kernel of beam_search op. * Unify the CUDA codes. test=develop * Unify the CPU kernel of beam_search op. * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores. * Update the description of beam_search in API.spec. * Enable the use of CUDA kernel in beam_search op. * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements. test=develop * Follow comments. test=develop * Call the CPU kernel for beam_search op when batch_size > 4. test=develop * Remove the except of is_empty op in PrepareData. test=develop
-
- 18 1月, 2019 1 次提交
-
-
由 zhaozhehao 提交于
* refactor tree2col operator with new memory mechanism test=develop * test=develop * test=develop * Modified API according to panyx0718 test=develop * fix API change according to heavengate test=develop * Modify API comment test=develop
-
- 29 12月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 26 12月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-