提交 · b09f4d7fb52ba3d20d43f270269dc1b1dc3a4c5e · BaiXuePrincess / Paddle

18 8月, 2021 2 次提交

code refactoring for new executor (#34970) · 40d4d834

由 wanghuancoder 提交于 8月 18, 2021

* code refactoring, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

40d4d834

Add function to disable paddle signal handler (#34577) · dd533dd3

由 Zhanlue Yang 提交于 8月 18, 2021

* Add function to disable paddle signal handler

Paddle used google::InstallFaultSignalHandler to handle selected system signals,
mainly for debugging and bug report purposes.

However, this can be conflicted with other python packages whoever captures similar signals.
Such python package involves tvm and more

To resolve this issue, we support a function to disable signal handler

* Remove signal test from WIN32 platform

* Remove redundant return from disable_signal_handler() function

* Add detailed messages to en_doc

dd533dd3

17 8月, 2021 2 次提交

Copy boost optional to Paddle (#34780) · 9be41447

由 chentianyu03 提交于 8月 17, 2021

* copy boost optional.hpp to paddle

* copy boost optional.hpp to paddle

* move directions

* del fluid/utils

* modify .hpp to .h

* move directions

* modify to paddle::optional

* add modification description

* format code stype for the files in paddle/utils

* format code stype

9be41447

Add some passes which can be applied to Program (#34730) · 8046e33d

由 Zeng Jinle 提交于 8月 17, 2021

* add inplace passes and tests

* update

* fix use_cuda undefined
fix compile error of op compat

* add more ut

* fix CPU CI error

* check adam unique

* fix mac/windows ci, improve coverage

* fix ci error

* follow weihang's comment

* fix BlockDesc::MoveFrom

* follow qiuliang's comment

* update

* follow huihuang's comments

8046e33d

13 8月, 2021 1 次提交
- R
  
  fix npu_finalize (#34857) · 17a99760
  由 ronnywang 提交于 8月 13, 2021
  
  17a99760
11 8月, 2021 2 次提交
- R
  [NPU] add momentum_op_npu and test (#34082) · 9e3e08f0
  由 ronnywang 提交于 8月 11, 2021
```
* add momentum_op_npu and test

* update

* fix hang
```
  9e3e08f0
- L
  add the basic apis for auto_parallel (#33804) · 3f962e77
  由 lilong12 提交于 8月 11, 2021
```
* add auto_parallel apis
```
  3f962e77
06 8月, 2021 1 次提交
- T
  
  add get xpu version api (#34594) · 8a9dc5dc
  由 TTerror 提交于 8月 06, 2021
  
  8a9dc5dc
05 8月, 2021 1 次提交

New executor dev (#34407) · 012d12b5

由 hong 提交于 8月 05, 2021

* first test version

* add test exec;

* add data transfer; test=develop

* add new exec head;

* add memcpy; test=develop

* add python fetch

* add new test

* add graph node; test=develop

* remove useless new executor test; test=develop

* remove gperf dependency; test=develop

* fix compile bugs; test=develop

* remove useless code; test=develop

* remove useless code; test=develop

* add uni test; test=develop

* polish code; test=develop

* polish code; test=develop

* add interpreter cmakefile; test=develop

* remove useless code; test=develop

012d12b5

03 8月, 2021 1 次提交
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
02 8月, 2021 1 次提交

Add basic functions of Program Pass (#34524) · 145cdb5a

由 Zeng Jinle 提交于 8月 02, 2021

* add basic APIs

* add attr_types

* follow comments

* change pass attr types

* add set pass attribute codes

* refine PADDLE_THROW

145cdb5a

29 7月, 2021 1 次提交

add fix op run order pass (#34427) · 79e758c6

由 Zeng Jinle 提交于 7月 29, 2021

* add fix op run order pass

* add ut for fix_op_run_order

* fix ci error

* improve coverage

* improve coverge again and fix cpu test case

* follow some comments

79e758c6

27 7月, 2021 1 次提交

Revert "Revert "[Dy2Stat] Refactor ExecutorCache logic and pre-support... · 0dd6a44a

由 Aurelius84 提交于 7月 27, 2021

Revert "Revert "[Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy for pass (#34181)" (#34348)" (#34384)

This reverts commit 577fdde5.

0dd6a44a

23 7月, 2021 1 次提交

Revert "[Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy... · 577fdde5

由 Aurelius84 提交于 7月 23, 2021

Revert "[Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy for pass (#34181)" (#34348)

This reverts commit 609f8225.

577fdde5

22 7月, 2021 2 次提交
- A
  [Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy for pass (#34181) · 609f8225
  由 Aurelius84 提交于 7月 22, 2021
```
* modify into program_id

* fix cache_info declare problem

* fix python int to C long problem

* modify point to reference

* add ENVS
```
  609f8225
- L
  
  enable amp unsupported_fp16_list for npu (#34314) · b0a2f005
  由 Leo Chen 提交于 7月 22, 2021
  
  b0a2f005
19 7月, 2021 1 次提交

Add Cuda event and stream API (#32460) · 9c7f6af5

由 chentianyu03 提交于 7月 19, 2021

* add cuda event and stream api

* add cuda event and stream api

* add get_current_stream api

* add get_current_stream api

* init streams

* modify get_current_stream

* modify get_cuttent_stream

* add synchronize func

* add current_stream doc and test file

* move get_current_stream into CUDA macro

* move CudaEvent into CUDA macro

* move _get_current_stream and _device_synchronize into cuda macro

* modify the macro of cuda stream and event

* add test case for synchronize

* add paddle.devices.cuda module

* event and stream support hip

* add doc for stream and event class

* move cuda stream and event into single pybind

* add cuda_streams_py.cc to cmakelist

* add _device_synchronize and _get_current_stream to core module

* add test case for cudastream and cudaevent

* move __all__ in streams.py

* fix test fail

* add cuda to devices __all__

* fix current_stream doc writing error

* move devices to device direction, and merge device.py into __init__.py

* add required:gpu to sample codes

* remove cuda direction from device/__init__.py

9c7f6af5

15 7月, 2021 1 次提交
- A
  Upgrade Executor into ParallelExcutor to apply Graph Optimization in @to_static (#32283) · 2850391d
  由 Aurelius84 提交于 7月 15, 2021
```
* Refine Constructor logic of ParallelExecutor

* Replace executor into ParallelExecutor in run_program_op
```
  2850391d
13 7月, 2021 1 次提交
- Z
  
  expose gc analysis interface (#34092) · 2b557da0
  由 Zeng Jinle 提交于 7月 13, 2021
  
  2b557da0
06 7月, 2021 1 次提交

【HETERPS】pipeline adaptive for heterps (#33159) · bfef7feb

由 danleifeng 提交于 7月 06, 2021

* pipeline adaptive for heterps;test=develop
* fix finalize hang;test=develop
* add is_compiled_with_heterps for dataset;test=develop
* fix hashtable core when pass ins_num=0;test=develop

bfef7feb

30 6月, 2021 1 次提交
- H
  [NPU] support set_device (#33815) · 8225a6a1
  由 houj04 提交于 6月 30, 2021
```
* support set_device for NPU.

* minor update doc and add more unit test.
```
  8225a6a1
29 6月, 2021 1 次提交
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
23 6月, 2021 1 次提交

optimize attr default value (#33357) · 5d2eb678

由 wanghuancoder 提交于 6月 23, 2021

* optimize attr default value, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* fix bug in AttrReader, test=develop

* fix bug, test=develop

* fix double_grad, test=develop

* refine, test=develop

* refine, test=develop

* fix checker null, test=develop

* for test, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

5d2eb678

09 6月, 2021 1 次提交

paddle.save support object save to memory. (#32999) · cdd6437a

由 WeiXin 提交于 6月 09, 2021

* support state_dict save to memory.

* Perfect unittest

* perfect unittest.

* suport saving binary var to memory

* polish code.

* packag save/load files into pybind/io.py

* polish code .

* add example for save to memory; remove useless save load function(_load_static_dict,_save_dygraph_dict)

* delete _load_static/dygraph_dict;_save_static/dygraph_dict

* edit example of paddle.save/load

cdd6437a

25 5月, 2021 1 次提交

[Other] SparseShardingMerge Tool (#32887) · 09bc0f59

由 tangwei12 提交于 5月 25, 2021

* fix save/load with unexpected value
* fix save and user interface
* add save sparse sharding to selected rows

09bc0f59

22 4月, 2021 2 次提交

support save/load binary format tensor. (#32211) · f4d9adc7

由 WeiXin 提交于 4月 22, 2021

* support save/load binary format tensor

* Fix error when create cudaplace

* Fix error when create cudaplace

* Fix error when create cudaplace

* get devive context from pool.

* move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'.

* improve coverage.

* improve coverage.

* polish API

* deal with conflict

* disable save/load large file in unnittest

* split unnittest.

f4d9adc7

T

Delete WITH_GRPC flag and Distributed old code (#32383) · e58c705b
由 tianshuo78520a 提交于 4月 22, 2021

e58c705b

21 4月, 2021 1 次提交
- L
  [NPU] register npu finalize on exit (#32390) · 8e4c1936
  由 Leo Chen 提交于 4月 21, 2021
```
* [NPU] register finalize on exit

* fix
```
  8e4c1936
19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

15 4月, 2021 2 次提交

1
tree-based-model (#31696) · a8c3a902
由 123malin 提交于 4月 15, 2021
```
* add index_dataset and index_sampler for tree-based model
```
a8c3a902

heterps support pscore (#32093) · 9f8c8f96

由 Thunderbrook 提交于 4月 15, 2021

* pscore support heterps

* fleet cmake

* fleet wrapper

* macro

* solve conflict

* solve conflict

* add unitest

* paddle enforce

* unitest

* unitest

* unitest

9f8c8f96

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

08 4月, 2021 1 次提交

The unsupported_fp16_list using in AMP will be created automatically during the runtime. (#32102) · 6e65fe02

由 Zhen Wang 提交于 4月 08, 2021

* Use the runtime to create the unsupported_fp16_list using in AMP.

* Add more infos about supported ops.

* Add some comments for the function of OpSupportedInfos.

* Fix the unit test of test_multi_precision_fp16_train.

6e65fe02

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

02 4月, 2021 1 次提交

graph engine (#31226) · 94736d60

由 seemingwang 提交于 4月 02, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>

94736d60

30 3月, 2021 1 次提交
- Z
  [Custom OP]Remove old custom OP and reduce whl package volume (#31813) · 04a49b09
  由 Zhou Wei 提交于 3月 30, 2021
```
* Remove old custom OP to reduce whl package volume

* [Custom OP]Remove old custom OP to reduce whl package volume
```
  04a49b09
26 3月, 2021 1 次提交
- T
  delete include framework.pb.h (#31859) · e804f085
  由 tianshuo78520a 提交于 3月 26, 2021
```
* delete include framework.pb.h

* fix error
```
  e804f085
05 3月, 2021 1 次提交
- L
  [Kunlun]Multi xpu dygraph performance optimization , add distributed.spawn... · 9ebf05b0
  由 liuyuhui 提交于 3月 05, 2021
```
[Kunlun]Multi xpu dygraph performance optimization , add distributed.spawn support for multi xpu and some bug-fixes (#31130)
```
  9ebf05b0
04 3月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm (part5), test=develop (#31315) · 4d647ec1
  由 Qi Li 提交于 3月 04, 2021
  
  4d647ec1
26 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part6), test=develop (#31015) · 28b356b9
  由 Qi Li 提交于 2月 26, 2021
  
  28b356b9

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致