提交 · 4ce272ed9307ffcb68e59bda390a92f3f6f7ebbb · PaddlePaddle / Paddle

20 8月, 2021 1 次提交

use spin lock in auto growth allocator (#34910) · 6bacfb0e

由 wanghuancoder 提交于 8月 20, 2021

* use spin lock in auto growth allocator, test=develop

* use pthread spin lock, test=develop

* use lock guard, test=develop

* use malloc spin lock, test=develop

* use lock_guard, test=develop

6bacfb0e

21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

18 1月, 2021 1 次提交
- H
  
  Ascend Framework Part2: pybind files (#30410) · e207fe63
  由 hutuxian 提交于 1月 18, 2021
  
  e207fe63
25 2月, 2020 1 次提交

PaddleBox Framework Part2 (#22466) · 175954d8

由 hutuxian 提交于 2月 25, 2020

* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.

175954d8

31 8月, 2019 1 次提交

Paddlebox Framework (#18982) · c756b5d2

由 hutuxian 提交于 8月 31, 2019

* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982

c756b5d2

29 3月, 2019 2 次提交
- X
  
  add DataSet and InMemoryDataFeed, support load data into memory and shuffle data · 824b84d1
  由 xjqbest 提交于 3月 06, 2019
  
  824b84d1
- D
  
  add pybind for fleet · be757096
  由 dongdaxiang 提交于 2月 28, 2019
  
  be757096
30 11月, 2018 1 次提交

AsyncExecutor (#14627) · 41e19eb4

由 Wang Guibao 提交于 11月 30, 2018

* AsyncExecutor: C++ side

* Google naming conventions

* Rename MultiExecutor to AsyncExecutor

* pybind with async_executor

* Naming convention

* remove some flags and unused code

* add refactored file of async_executor and data_feed

* clear async executor interface and add data feed factory

* split async executor into executor_thread_worker and async_executor, refactor pybind, add datafeed and corresponding proto

* Fix async_executor interfaces: 1) Remove all protobufs; 2) Stop after each epoch

* refine async_executor_refactor.cc

* add some files about datafeed

* Revert "add some files about datafeed"

This reverts commit 8ee8133a.

* Interface rework

* add MultiSlotDataFeed

* Creating DataFeedDesc from .proto file, then manipulate it (add/del fields etc) from python side

* update data_feed for add MultiSlotDataFeed

* update datafeed and async_executor to run bow_net demo

* fix bug that finish_set_filelist failed in multithread

* delete finish_binding_memory_(flag), because it can not be marked under the current interface

* Fix bug

* update async_executor.py for support set_use_slots

* update async_executor.py for support set_use_slots and set set_dense_slots

* fix bug that when the number of files is less than the number of threads, it will fetch nan

* remove redundant code, and make executor exit when set a illegal queue size

* add batch_size check

* add MultiSlotDesc

* Revert "add MultiSlotDesc"

This reverts commit 2e72ebfa.

* add some checkpoint in DataFeedDesc

* add CheckFile function in MultiSlotDataFeed

* update something error info

* fix deaded lock bug

* Fix fetch variable

* Merge error

* fix code style in async_executor

* using one lock blocking queue replace two lock blocking queue because of some bugs

* update code style

* add utest for data_feed

* Fix fetch var

* update utest for data_feed for multithread

* update SetFileList info

* fix bug in utest of data_feed

* Add comments for python

* Add comments for python code

* Fix pybind.cc with new pybind11 version

* add note for DataFeedDesc's set_use_slots function

* Add save_model

* update data_feed_test for multi-type

* add comment for executor_thread_worker

* Remove unused code

* update data_feed_test for generate test data file

* removed unnecessary interfaces and add comments

* c++ style check

* update data_feed.cc

* AsyncExecutor: C++ side

Google naming conventions

Rename MultiExecutor to AsyncExecutor

pybind with async_executor

Naming convention

remove some flags and unused code

add refactored file of async_executor and data_feed

clear async executor interface and add data feed factory

split async executor into executor_thread_worker and async_executor, refactor pybind, add datafeed and corresponding proto

Fix async_executor interfaces: 1) Remove all protobufs; 2) Stop after each epoch

refine async_executor_refactor.cc

add some files about datafeed

Revert "add some files about datafeed"

This reverts commit 8ee8133a.

add MultiSlotDataFeed

Interface rework

Creating DataFeedDesc from .proto file, then manipulate it (add/del fields etc) from python side

update datafeed and async_executor to run bow_net demo

update async_executor.py for support set_use_slots

Fix bug

update async_executor.py for support set_use_slots and set set_dense_slots

fix bug that when the number of files is less than the number of threads, it will fetch nan

remove redundant code, and make executor exit when set a illegal queue size

add MultiSlotDesc

Revert "add MultiSlotDesc"

This reverts commit 2e72ebfa.

add some checkpoint in DataFeedDesc

Fix fetch variable

fix code style in async_executor

Fix fetch var

add utest for data_feed

Add comments for python

update utest for data_feed for multithread

fix bug in utest of data_feed

Add comments for python code

Fix pybind.cc with new pybind11 version

add note for DataFeedDesc's set_use_slots function

update data_feed_test for multi-type

Add save_model

update data_feed_test for generate test data file

removed unnecessary interfaces and add comments

add comment for executor_thread_worker

Remove unused code

update data_feed.cc

c++ style check

* commit for code style

* commit for code style

* commit for code style

* commit for code style

* Comment away __init__ in async_executor.py

* clang-format fix test=develop

* use PADDLE_THROW instead of exit(-1); use unique_ptr to manage scope var in data_feed_test.cc

* commit for update code style

* commit for update code style

* Add async_executor demo; Remove some methods
test=develop

* commit for update code style

* commit for update code style

* commit for update code style

* update API.spec

* AsyncExecutor
test=develop

* AsyncExecutor
test=develop

* AsyncExecutor
test=develop

* AsyncExecutor
test=develop

* Fix API.spec
test=develop

* Fix API.spec
test=develop

* Fix windows build error
test=develop

* FIx windows build error
test=develop

* FIx windows build error
test=develop

* FIx windows build error
test=develop

* Fix Windows Build
test=develop

* Fix Windows Build
test=develop

* Fix Windows Build
test=develop

* Fix code style
test=develop

* Fix code style
test=develop

* update datafeed

* Fix code style
test=develop

* update data_feed_test for test Tensor test=develop

* Fix code style
test=develop

* Fix windows build failure
test=develop

* Fix code style and windows build failure
test=develop

* Fix PYTHON3.5 build failure
test=develop

* AsyncExecutor API
test=develop

41e19eb4

07 4月, 2018 1 次提交
- Y
  Fix cpplint errors of paddle/fluid/pybind and add some tests (#9694) · 1543c4cf
  由 Yi Wang 提交于 4月 06, 2018
```
* cpplint test and add tesnor_py_test.cc

* Update

* Update
```
  1543c4cf
07 3月, 2018 2 次提交
- Y
  
  Complete RecordIO reader op · 72be7a61
  由 Yu Yang 提交于 3月 07, 2018
  
  72be7a61
- F
  
  fix compile errors · af64f39b
  由 fengjiayi 提交于 3月 07, 2018
  
  af64f39b
06 3月, 2018 2 次提交
- F
  
  init double buffer · 3fcd16ed
  由 fengjiayi 提交于 3月 06, 2018
  
  3fcd16ed
- Y
  
  Extract create_reader_op to three files · 4d8345e3
  由 Yu Yang 提交于 3月 06, 2018
  
  4d8345e3
15 2月, 2018 1 次提交

Update tensor_util.h (#8422) · cfffb1a3

由 Yi Wang 提交于 2月 14, 2018

* Update tensor_util.h

* Update with moved TensorDesc

* Fix tensur_utils.cu

* Update

* Update

* Update

* Update

* Make tensor_util.cu a symbolic link

cfffb1a3

10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
07 2月, 2018 1 次提交
- F
  
  fix compile errors · c1349d98
  由 fengjiayi 提交于 2月 07, 2018
  
  c1349d98
06 2月, 2018 2 次提交
- F
  
  refine code and add unit tests · 0bb9c80e
  由 fengjiayi 提交于 2月 06, 2018
  
  0bb9c80e
- F
  
  Add ReadOp · 1010e39b
  由 fengjiayi 提交于 2月 06, 2018
  
  1010e39b
01 2月, 2018 1 次提交
- F
  
  refine inheritance relationship · d8cc21da
  由 fengjiayi 提交于 2月 01, 2018
  
  d8cc21da
31 1月, 2018 1 次提交
- F
  
  draft of Reader classes · f32ca636
  由 fengjiayi 提交于 1月 31, 2018
  
  f32ca636
30 1月, 2018 1 次提交
- F
  
  init reader.h and reader.cc files · 1acad21b
  由 fengjiayi 提交于 1月 30, 2018
  
  1acad21b

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功