提交 · 0c037d2d5918a4533764cc17dcd1cafd25aefa81 · 机器未来 / Paddle

14 4月, 2021 1 次提交
- C
  Add inner register backward hook method for Tensor (#32171) · 7ba85aca
  由 Chen Weihang 提交于 4月 14, 2021
```
* add register backward hook method

* add leaf grad accumullated test
```
  7ba85aca
13 4月, 2021 1 次提交

add layer.to api (#32040) · 6e946e9d

由 chentianyu03 提交于 4月 13, 2021

* add layer.to api

* add layer.to api

* add layer.to api

* add the doc for Layer.to

* add input type checking

* modify assert and import bug

* format code style

* format code style

* make place support str type

* add SetGradVarBase method to set the gradient after conversion

* modify argument palce to device

* modify argument palce to device

* modify doc of layers.to API

* add xpuplace to device argument

6e946e9d

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

08 4月, 2021 1 次提交

The unsupported_fp16_list using in AMP will be created automatically during the runtime. (#32102) · 6e65fe02

由 Zhen Wang 提交于 4月 08, 2021

* Use the runtime to create the unsupported_fp16_list using in AMP.

* Add more infos about supported ops.

* Add some comments for the function of OpSupportedInfos.

* Fix the unit test of test_multi_precision_fp16_train.

6e65fe02

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

02 4月, 2021 1 次提交

graph engine (#31226) · 94736d60

由 seemingwang 提交于 4月 02, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>

94736d60

01 4月, 2021 3 次提交

add custom init grad for backward function (#31540) · 83b953f5

由 chentianyu03 提交于 4月 01, 2021

* add custom init grad for backward function

* add custom init grad for backward function

* handle when the grad_tensor is none

* handle when the grad_tensor is none

* fix the args type error on windows platform

* modify the args order and doc

* format code

* add grad_tensor to xpu

* modify the grad_tensor type check

* add paddle.backward api to support multi tensors gradient compute

* add paddle.backward api to support multi tensors gradient compute

* add paddle.atuograd module and backward api

* change tensor.backward func args

* modify tensor backward api

* remove create_graph intputs args

* add doc and examplex code for backward api

* when have the same tensor, throw error

* modify test Init func args

* modify the execute.Init func args in test files

* add paddle.autograd package in setup.py.in

* modify error msg, remove _run_backward method in class Tensor

* add test cases for backward api

83b953f5

K
new group (#31682) · 07741593
由 kuizhiqing 提交于 4月 01, 2021
```
* new group

* ci compatible fix

* assert nccl
```
07741593

Refactor and simplify hook design & add Tensor.register_hook API (#31775) · dbeb3ea4

由 Chen Weihang 提交于 3月 31, 2021

* refactor and simplify hook design

* fix reducer add hook error

* add Tensor.register_hook basic impl

* refine prepare data impl

* revert prepare data change

* support register_hook for Tensor

* add hook test in model

* polish tests and doc example

* fix double grad test failed

* remove reduce hook func

* fix set empty error

* polish code by comments

* change reduce_hook to mutable_hook

* remove useless tmp_ins

* fix shape code format error

* fix shape code format error

dbeb3ea4

31 3月, 2021 2 次提交

[Parallel UT]Improve Parallel UT level on Windows/Linux (#31377) · b05f6142

由 Zhou Wei 提交于 3月 31, 2021

* [Parallel UT]improve Parallel UT level on Windows/Linux

* [Parallel UT]improve Parallel UT level on Windows/Linux

* [Parallel UT]Improve Parallel UT level on Windows/Linux

* [Parallel UT]Improve Parallel UT level on Windows/Linux

* fix CI

b05f6142

K
Polish tensor pipeline (#31701) · e973bd73
由 Kaipeng Deng 提交于 3月 31, 2021
```
* polish tensor pipeline. test=develop
```
e973bd73

30 3月, 2021 2 次提交
- L
  
  [dynamic setitem] Fix bug of dynamic setitem: Decerease axes to do right broadcast (#31960) · 57d4288a
  由 liym27 提交于 3月 30, 2021
  
  57d4288a
- Z
  [Custom OP]Remove old custom OP and reduce whl package volume (#31813) · 04a49b09
  由 Zhou Wei 提交于 3月 30, 2021
```
* Remove old custom OP to reduce whl package volume

* [Custom OP]Remove old custom OP to reduce whl package volume
```
  04a49b09
29 3月, 2021 1 次提交
- R
  
  [ROCM] added a cudnn switch of conv2d for rocm platform (#31836) · 123949eb
  由 ronnywang 提交于 3月 29, 2021
  
  123949eb
26 3月, 2021 2 次提交
- C
  [dygraph qat] Use layer to calculate output scale (#31861) · b47478ef
  由 cc 提交于 3月 26, 2021
```
* Use layer to calculate output scale
* add backward for moving_average_abs_max_scale and save output scales to op's attr
```
  b47478ef
- T
  delete include framework.pb.h (#31859) · e804f085
  由 tianshuo78520a 提交于 3月 26, 2021
```
* delete include framework.pb.h

* fix error
```
  e804f085
23 3月, 2021 1 次提交
- Z
  Update windows compiler and CI from VS2015 to VS2017 (#31652) · a70de87d
  由 Zhou Wei 提交于 3月 23, 2021
```
* modify windows CI to VS2017

* modify windows CI to VS2017

* modify windows CI to VS2017
```
  a70de87d
18 3月, 2021 1 次提交
- C
  [CustomOp] Support complex dtype in custom op (#31657) · 87852616
  由 Chen Weihang 提交于 3月 18, 2021
```
* support custom complex op

* fix detail error

* add inference support

* fix setup windows failed
```
  87852616
15 3月, 2021 1 次提交
- K
  DataLoader supprot dict str (#31481) · a32e8bf1
  由 Kaipeng Deng 提交于 3月 15, 2021
```
* add dict/str/list supprot for DataLoader. test=develop
```
  a32e8bf1
05 3月, 2021 1 次提交
- L
  [Kunlun]Multi xpu dygraph performance optimization , add distributed.spawn... · 9ebf05b0
  由 liuyuhui 提交于 3月 05, 2021
```
[Kunlun]Multi xpu dygraph performance optimization , add distributed.spawn support for multi xpu and some bug-fixes (#31130)
```
  9ebf05b0
04 3月, 2021 2 次提交
- Q
  
  [ROCM] update fluid platform for rocm (part5), test=develop (#31315) · 4d647ec1
  由 Qi Li 提交于 3月 04, 2021
  
  4d647ec1
- W
  
  Windows system supports Ninja compilation (#31161) · 4d6d2db8
  由 wuhuanzhou 提交于 3月 04, 2021
  
  4d6d2db8
26 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part6), test=develop (#31015) · 28b356b9
  由 Qi Li 提交于 2月 26, 2021
  
  28b356b9
22 2月, 2021 1 次提交
- T
  support save multi sparse table in one path (#31108) · 565354f6
  由 Thunderbrook 提交于 2月 22, 2021
```
* save multi table one path

* format
```
  565354f6
20 2月, 2021 2 次提交

1
test=develop, save/load, shrink (#30625) · 16b4260b
由 123malin 提交于 2月 20, 2021
```
* test=develop, save/load, shrink
Co-authored-by: NseiriosPlus <tangwei12@baidu.com>
```
16b4260b

[static setitem] Support the index is Tensor; step>1; step<0 .(#30949) · 5b367dab

由 liym27 提交于 2月 20, 2021

* [static setitem] support the index step > 1. tensor_a[::3] = value

* [static setitem] support the index step < 0. Eg: tensor_a[::-3] = value

* [static setitem] support the index is Tensor. eg: tensor_a[tensor_3:0:-1] = value

* Add op version.

5b367dab

19 2月, 2021 1 次提交
- W
  
  fix python pass builder error. (#30946) · 0020d915
  由 Wilber 提交于 2月 18, 2021
  
  0020d915
10 2月, 2021 1 次提交

New custom operator extension mechanism (#30690) · f649442d

由 Chen Weihang 提交于 2月 09, 2021

* initial commit: simple demo

* polish copyright format

* add grap op simple demo

* adapt uncertain number of argument

* change trait marco name

* add place & dtype support for add kernel

* add dispath and infershape func

* poish code & add notes

* add dynamic_loader dep for paddle_framework

* add new custom op test dir

* polish impl details

* add unittest for new custom op

* fix failed unittest

* Costum op (#1)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* Remove ShareData from user && Change CustomTensor to Tensor && Support more data type (#2)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* refactor register design & add test

* change op_funtion to op_meta_info

* split op meta info into .h and .cc

* move get methods into friend class

* move OpMetaInfoHelper into framework space

* move CustomTensorUtils into framework space

* change pybind api name

* move PD C API into op meta info

* add register custom op api

* remove inference cmake change

* refactor copy to api && change Reshape to lowercase && support more dtype && add more test (#3)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* support multi dtype

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* fix copy to error

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* polish detail & error message

* polish test details

* Add cast api && Change copy related api to copy_to && add more test (#4)

* fix compile error

* wrap framework tensor with LoDTensor

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* add CustomTensor default constructor

* add size() for CustomTensor

* make size const for CustomTensor

* refactor place related api to circle the concept

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* fix compile error

* make place const

* make Tensor copy

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* debug CustomTensor core

* remove additional head of framework

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* use back to shared ptr for custom tensor

* add gpu test

* merge latest cwh code in

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* adjust ut code of custom op

* hid share data from and to

* rename CustomTensor to Tensor

* support multi dtype

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* remove lod, make reshape lowercase, add copy test and refactor copy api

* fix copy to error

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add more test

* add type cast

* add cast and make copy to api

* add cast and make copy to api

* add cast and make copy to api

* add cast and make copy to api

* merge cwh code

* merge cwh code

* merge cwh code

* merge cwh code

* merge cwh code

* add more error log

* add more error log

* polish code

* used for test

* remove test comment

* remove test comment

* fix uint8 type error

* fix lost uint8 type error

* add test for coverage

* polish details by reviewer comments

* add prefix for DISABLE_COPY_AND_ASSIGN
Co-authored-by: NJiabin Yang <360788950@qq.com>

f649442d

05 2月, 2021 1 次提交

Performance optimization for dynamic setitem: Call op set_value to speed up... · 39f41cb4

由 liym27 提交于 2月 05, 2021

Performance optimization for dynamic setitem: Call op set_value to speed up because the original call to TensorToPyArray will introduce unnecessary data copy. (#30817)

39f41cb4

04 2月, 2021 2 次提交
- J
  Add bf16 fast performance verification (#30551) · 73cdea01
  由 joanna.wozna.intel 提交于 2月 04, 2021
```
* Update Xbyak and add bf16 fast performance verification

* Fix formating

* Change LOG message

* Trigger an update of a new tag
```
  73cdea01
- W
  
  fix xpu dygraph place (#30868) · 6e3856d3
  由 WangXi 提交于 2月 04, 2021
  
  6e3856d3
03 2月, 2021 2 次提交
- 石
  support xpu with analysis predictor, test=develop (#30832) · 2ac4143b
  由石晓伟提交于 2月 03, 2021
```
* support xpu inference with analysis predictor, test=develop

* merge the cmake of the xpu toolchain, test=develop

* add c-apis, test=develop

* fix a bug in extern_xpu, test=develop
```
  2ac4143b
- W
  
  【kunlun】dygraph supports multi xpu card training (#30671) · b1026f64
  由 WangXi 提交于 2月 03, 2021
  
  b1026f64
01 2月, 2021 1 次提交
- T
  dump to cpu (#30750) · cb66c53c
  由 Thunderbrook 提交于 2月 01, 2021
```
* dump to cpu

* format

* format

* format
```
  cb66c53c
29 1月, 2021 1 次提交
- S
  
  rm Singleton of reducer (#30775) · 3858f458
  由 ShenLiang 提交于 1月 29, 2021
  
  3858f458
25 1月, 2021 1 次提交

add DLA support：C++&&Python api (#30165) · ae0f88a9

由 Shang Zhizhou 提交于 1月 25, 2021

* add dla

* add dla done

* add python api
Co-authored-by: Nshangzhizhou <root@szth-rp-fanyi-opera49.szth.baidu.com>

ae0f88a9

21 1月, 2021 1 次提交
- T
  solve build gpu task core (#30626) · 1bebc092
  由 Thunderbrook 提交于 1月 21, 2021
```
* build gpu task core

* format
```
  1bebc092
20 1月, 2021 2 次提交

use nvtx push pop in timeline (#30567) · 90773473

由 wanghuancoder 提交于 1月 20, 2021

* delete empty line of pybing.cc, test=develop

* use nvtx push pop in timeline, test=develop

* change year, test=develop

* add #ifdef PADDLE_WITH_CUDA, test=develop

* add #ifndef WIN32, test=develop

* is_pushed to is_pushed_, test=develop

90773473

add some RecordEvent, for dygraph timeline (#30299) · d1b25ed9

由 wanghuancoder 提交于 1月 20, 2021

* add some RecordEvent, for dygraph timeline, test=develop

* change GpuMemcpySync to memory::Copy, test=develop

* fix compile problem, test=develop

* fix compile problem, test=develop

* fix, test=develop

* fix, test=develop

d1b25ed9

19 1月, 2021 1 次提交
- L
  support layer_norm fp16 in dygraph amp (#30430) · 7043b8cf
  由 Leo Chen 提交于 1月 19, 2021
```
* support layer_norm fp16 in dygraph amp

* add ut

* refine code
```
  7043b8cf

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致