- 23 9月, 2019 1 次提交
-
-
由 hong 提交于
* add op compatible infomation; test=develop * add enum type * add enum type; test=develop
-
- 19 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
Add boost as dependency of prune fix #19862
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 08 9月, 2019 1 次提交
-
-
由 hutuxian 提交于
fix cmakelist deps: remove unnecessary deps and add proper op deps
-
- 31 8月, 2019 1 次提交
-
-
由 hutuxian 提交于
* Support looking up embeddings from BoxPS. * Add a _pull_box_sparse op, for now this op is not exposed to users. * Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on. * Add 'BoxPSDataset' in python code. * Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS. * Add UT. * More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982
-
- 19 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 09 8月, 2019 1 次提交
-
-
由 chengduo 提交于
* Add call stack info during runtime and compile time test=develop * Rename operator_call_stack test=develop * Add unit test test=develop * follow comment test=develop
-
- 02 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* open gc by default, test=develop * fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop * fix conditional_block op eager deletion bug, test=develop * add some comments to reviewers, test=develop
-
- 29 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop
-
- 19 7月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)
-
- 17 7月, 2019 1 次提交
-
-
由 guru4elephant 提交于
* remove async executor and add data_feed.proto to the deps of train demo
-
- 27 6月, 2019 1 次提交
-
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 12 6月, 2019 1 次提交
-
-
由 hutuxian 提交于
-
- 11 6月, 2019 1 次提交
-
-
由 hutuxian 提交于
Add Pipeline Concurrency Train Mode: - Cpp: pipeline_trainer & section_worker - Python: PipelineOptimizer - Add a new data_feed type: PrivateInstantDataFeed - Add a test demo of pipeline trainer and the test model is gnn - Do not support win32 now
-
- 23 5月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* Revert "Revert "Fix allocator bug"" This reverts commit 174d0d0b. * Revert "fix travis ci" This reverts commit 5656fa9f. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop
-
- 29 3月, 2019 22 次提交
-
-
由 liuwei1031 提交于
* fix comments of 16410, test=develop * modify inplace_op_inference_test according to pass interface change, test=develop
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
-
由 xujiaqi01 提交于
-
由 dongdaxiang 提交于
-
由 xjqbest 提交于
-
由 dongdaxiang 提交于
-
由 heqiaozhi 提交于
-
由 xjqbest 提交于
-
由 dongdaxiang 提交于
-
由 xjqbest 提交于
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
-
由 dongdaxiang 提交于
-
由 dongdaxiang 提交于
-
由 dongdaxiang 提交于
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
-
- 27 3月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 26 3月, 2019 2 次提交