- 08 7月, 2019 3 次提交
-
-
由 Zhaolong Xing 提交于
* Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop
-
由 Leo Zhao 提交于
-
由 gongweibao 提交于
-
- 04 7月, 2019 1 次提交
-
-
由 chengduo 提交于
-
- 03 7月, 2019 2 次提交
- 02 7月, 2019 1 次提交
-
-
由 Yi Liu 提交于
1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops 2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext 3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
-
- 01 7月, 2019 1 次提交
-
-
由 Michał Gallus 提交于
* Int8: Fix Pooling output scale test=develop * Update scales quantization for certain operators These include: concat, transpose, pool and reshape. test=develop * Move concat minimum scale finding to quantizer test=develop
-
- 29 6月, 2019 1 次提交
-
-
由 jiaqi 提交于
fix data feed ptr runtime error, pipeline trainer will core in some cases, so set it nullptr as default value.
-
- 27 6月, 2019 4 次提交
-
-
由 chengduo 提交于
* update pe reduce config test=develop * drop the local_exe_scopes of the previous parallel_executor test=develop
-
由 tangwei12 提交于
* add is_runnning in communicator, test=develop
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
由 Sylwester Fraczek 提交于
add prior_box quantization code add scale algo rules for prior box test=develop
-
- 26 6月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 24 6月, 2019 2 次提交
- 21 6月, 2019 1 次提交
-
-
由 jiaqi 提交于
(1) use channel instead of vector/BlockingQueue in Dataset,to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training. (2) add Record because MultiSlotType costs too much memory (80B),fix memory out of limit problem. (3) add Channel, Archive in paddle/fluid/framework (4) change dataset from shared_ptr to unique_ptr in pybind (5) move create/destroy readers from trainer to dataset (6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network. (7) fix thread num bug of Dataset when filelist size < thread num (8) support set_queue_num in InMemoryDataset
-
- 19 6月, 2019 1 次提交
-
-
由 chengduo 提交于
* update execution_strategy option default value test=develop * fix doc error test=develop
-
- 18 6月, 2019 1 次提交
-
-
由 chengduo 提交于
* remove nccl dep when the number of GPU is 1 test=develop
-
- 15 6月, 2019 1 次提交
-
-
由 chengduo 提交于
* fix code bug test=develop
-
- 14 6月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 13 6月, 2019 1 次提交
-
-
由 chengduo 提交于
* update CPU_NUM config test=develop
-
- 12 6月, 2019 1 次提交
-
-
由 hutuxian 提交于
-
- 11 6月, 2019 3 次提交
-
-
由 gongweibao 提交于
-
由 石晓伟 提交于
* update anakin-engine interfaces for content-dnn test=develop * support only-gpu mode of Anakin modify eltwise parse test=develop * modification for thread-safe test=develop * Integrated template instance test=develop * increase template parameters test=develop * support MLU predictor test=develop * update anakin cmake files test=develop * update TargetWrapper::set_device * update the initialization of anakin subgraph test=develop * use the default constructor of base class test=develop
-
由 hutuxian 提交于
Add Pipeline Concurrency Train Mode: - Cpp: pipeline_trainer & section_worker - Python: PipelineOptimizer - Add a new data_feed type: PrivateInstantDataFeed - Add a test demo of pipeline trainer and the test model is gnn - Do not support win32 now
-
- 10 6月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
* remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop
-
由 gongweibao 提交于
-
- 08 6月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 06 6月, 2019 2 次提交
-
-
由 gongweibao 提交于
-
由 wopeizl 提交于
* fix the ParallelExecutor on Windows test=develop * restrict to use one GPU only under windows
-
- 05 6月, 2019 1 次提交
-
-
由 baojun 提交于
* delay infershape test=develop * fall back subblock to paddle test=develop * fix edge cases test=develop * remove output duplicates test=develop * handle reshape2_grad infershape test=develop
-
- 04 6月, 2019 2 次提交
- 03 6月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 31 5月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * cache sub_scope, program, var when use_program_cache=True is set * make fetch_list runable with variables, add more unittest for use_program_cache
-
- 30 5月, 2019 2 次提交
-
-
由 chengduo 提交于
* add event for fast executor and add threads for scopebuffer executor test=develop
-
由 Yiqun Liu 提交于
* Enhance fused_elementwise_activation op. test=develop * Move the api fused_elementwise_activation to contrib. test=develop * Add including files. test=develop * Add the support of sigmoid in fused_elementwise_activetion op. * Update API.spec. test=develop
-
- 29 5月, 2019 2 次提交
-
-
由 gongweibao 提交于
-
由 mozga-intel 提交于
-