提交 · 99a4ff8fe4be92c982177b735b176aa8f55fae71 · 机器未来 / Paddle

30 6月, 2022 2 次提交
- L
  [new-exec] support runing with different scope and the same program using scope_guard (#43962) · 99a4ff8f
  由 Leo Chen 提交于 6月 30, 2022
```
* support scope_guard

* fix test
```
  99a4ff8f
- R
  Remove boost::variant for FetchResultType (#43932) · f720e231
  由 Ruibiao Chen 提交于 6月 30, 2022
```
* Remove boost::variant for FetchResultType

* Fix pybind errors
```
  f720e231
29 6月, 2022 1 次提交
- R
  
  fix custom_device log (#43890) · fb1a93a8
  由 ronnywang 提交于 6月 29, 2022
  
  fb1a93a8
28 6月, 2022 1 次提交

Remove boost::variant (#43100) · b3cf28f8

由 Ruibiao Chen 提交于 6月 28, 2022

* boost::variant -> paddle::variant

* boost::variant.apply_visit -> paddle::visit

* Update pybind_boost_hraders.h

* Fix CINN compilation errors

* Revert FetchResultType

b3cf28f8

27 6月, 2022 2 次提交
- A
  [CustomDevice]add custom place supports (#43813) · 7f22ef54
  由 Aganlengzi 提交于 6月 27, 2022
```
* [CustomDevice]add custom place supports

* sync format
```
  7f22ef54
- C
  Add get_op_names method for counting ops (#43831) · 4b74178b
  由 Chen Weihang 提交于 6月 27, 2022
```
* add get_op_names api

* Update pybind.cc
```
  4b74178b
24 6月, 2022 1 次提交

record memory and op supplement info (#43550) · 8dd0a3b9

由 chenjian 提交于 6月 24, 2022

* record memory and op supplement info

* update

* update

* fix a bug

* fix memory recording

* fix a bug

* update

* update

* fix a bug

* update

* fix a bug

* fix a bug

* fix a bug

* Revert "fix a bug"

This reverts commit c1d4df52762ba9ae7c7e27cd2ba4fc3a7ed9c7a5.

* fix a bug

* fix format

* fix

8dd0a3b9

16 6月, 2022 1 次提交

[CustomKernel] add custom kernel c api (#42986) · 6fe10181

由 ronnywang 提交于 6月 16, 2022

* [CustomKernel] add custom kernel c api

* update

* update

* fix unable to export capi
Co-authored-by: Nronny1996 <524019753@qq.com>

6fe10181

05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
02 6月, 2022 1 次提交

Support CUDA Graph for partial graph in dygraph mode (#42786) · d05b940a

由 sneaxiy 提交于 6月 02, 2022

* support CUDAGraph for partial graph

* add ut

* fix ci

* fix ut again because of eager mode

* fix kunlun ci

* fix win ci

d05b940a

27 5月, 2022 1 次提交
- R
  Support memory stats for CPU (#42945) · 21f11d35
  由 Ruibiao Chen 提交于 5月 27, 2022
```
* Support memory stats for CPU

* Add UTs

* Fix typos

* Fix typos
```
  21f11d35
16 5月, 2022 1 次提交

optimize cinn find graph by graph address (#42697) · 661d0800

由 jiangcheng 提交于 5月 16, 2022

* optimize cinn find graph by graph address

* graph_key use int64_t instead of program string

* fix framework _to_readable_code python code

* rename get_readable_comile_key to get_serialize_comile_key

661d0800

11 5月, 2022 1 次提交
- A
  [IPU] update to popart v2.5.0 (#42552) · 27acc6c3
  由 Allen Guo 提交于 5月 11, 2022
```
* update to popart v2.5.0

* use a specific version of sdk2.5.0
```
  27acc6c3
05 5月, 2022 2 次提交
- A
  [IPU] merge recent changes (#42078) · 6ec89eeb
  由 Allen Guo 提交于 5月 05, 2022
```
* merge recent changes

* fix setting pipline
```
  6ec89eeb
- W
  
  fix the v100 cuda11.2 matmul_v2 and elementwise_div bug (#42477) · 98c3f85e
  由 wawltor 提交于 5月 05, 2022
  
  98c3f85e
27 4月, 2022 1 次提交

[CustomDevice] op_test supports custom device (#42227) · 4df02fdf

由 Aganlengzi 提交于 4月 27, 2022

* [DO NOT MERGE] test op_test

* update with more related modifications

* split op_test.py to use test=allcases for testing

* split op_test.py to use test=allcases for testing

4df02fdf

26 4月, 2022 2 次提交

optimize graph_engine pybind (#42192) · 1bf08eca

由 seemingwang 提交于 4月 26, 2022

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind

* optimize pybind

* test

* fix pybind

* fix
Co-authored-by: NDesmonDay <908660116@qq.com>

1bf08eca

L
fit for printing cinn_launch op (#42141) · ee56906e
由 Leo Chen 提交于 4月 26, 2022
```
* fit for printing cinn_launch op

* update boost::variant caster for bytes
```
ee56906e

24 4月, 2022 2 次提交

R

[CustomDevice] add eager mode support (#42034) · ccafd2e5
由 ronnywang 提交于 4月 24, 2022

ccafd2e5

combine graph_table and feature_table in graph_engine (#42134) · 0e0f7da6

由 seemingwang 提交于 4月 24, 2022

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind
Co-authored-by: NDesmonDay <908660116@qq.com>

0e0f7da6

19 4月, 2022 1 次提交
- Z
  
  Implement Amp Layout AutoTune (#41884) · c2bcb141
  由 Zhang Ting 提交于 4月 19, 2022
  
  c2bcb141
17 4月, 2022 1 次提交

[Perf] Optimize dygraph scheduling performance (#41696) · 7ee31a96

由 Chen Weihang 提交于 4月 17, 2022

* split phi and fluid infermeta context

* resolve conflict

* fix type error

* optimize scheduling perf

* spec small vector size

* replace all grad var name

* fix test failed

* move init defalut signature

* polish details

* polish details

* fix no init bug

* init sig for tests

* add init sig for infer

* fix infrt error

* fix infrt failed

* fix kunlun error

* fix infrt failed

7ee31a96

15 4月, 2022 3 次提交

Add eager string tensor (#41039) · a22b68b8

由 Jack Zhou 提交于 4月 15, 2022

* Add core.eager.StringTensor __init__ which pyarray args can be passed

* Add the numpy method of core.eager.StringTensor

* revert tensor.to_string modification

* Add ToPyObject for core.eager.StringTensor

* Add debug string for core.eager.StringTensor

* Remove place args of core.eager.StringTensor temporarily

* Fix check string_tensor error

* remove dtype of core.eager.StringTensor

* add core.eager.StringTensor unittest

* remove pstring from VarDesc

* Add InitStringTensorWithStringTensor

* Remove to_string modification

* Remove zero_copy arg from StringTensor creator

a22b68b8

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

F
[MLU] add mlu new profiler (#41138) · fc208b7e
由 fwenguang 提交于 4月 15, 2022
```
* [MLU] add mlu new profiler

* fix format
```
fc208b7e

14 4月, 2022 1 次提交
- L
  executor perf statistics (#41648) · cbe7466f
  由 liutiexing 提交于 4月 14, 2022
```
* executor perf statistics

* fix ut

* fix ut

* fix ut

* add ut

* add ut
```
  cbe7466f
09 4月, 2022 1 次提交

Unittest recover (#41431) · 7a07c4a5

由 zhaocaibei123 提交于 4月 09, 2022

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove
Co-authored-by: Nesythan <esythan@126.com>

7a07c4a5

08 4月, 2022 1 次提交
- R
  
  pybind support CustomPlace (#41136) · 0cd577cf
  由 ronnywang 提交于 4月 08, 2022
  
  0cd577cf
07 4月, 2022 2 次提交
- T
  [GPUPS] bind afs wrpper (#41227) · b3bcebbe
  由 Thunderbrook 提交于 4月 07, 2022
```
* afs wrapper

* format

* format

* macro
```
  b3bcebbe
- L
  Profile Executors (#41100) · dfb47986
  由 liutiexing 提交于 4月 07, 2022
```
* Profile Executors

* update

* fix ut

* fix names

* update

* update
```
  dfb47986
05 4月, 2022 1 次提交

Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e

由 Zhang Ting 提交于 4月 05, 2022

* switch autotune

* implement AutoTuneCache

* implement AutoTuneCache class

* add pybind api

* add dygraph test

* support static mode and eager mode and improve unittests

* rename the SwitchAutoTune Class and improve tests

* improve AutoTuneStatus and reduce the cost of tests

b0f8000e

01 4月, 2022 1 次提交

[Eager] Support pinned (#41035) · f3270fc8

由 wanghuancoder 提交于 4月 01, 2022

* support pinned, test=develop

* support async_write, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

f3270fc8

30 3月, 2022 1 次提交

Add new APIs for GPU memory monitoring (max_memory_allocated,... · afe02e9d

由 From00 提交于 3月 30, 2022

Add new APIs for GPU memory monitoring (max_memory_allocated, max_memory_reserved, memory_allocated, memory_reserved) (#38657)

* Add new API memory_reserved

* Add memory_allocated, max_memory_reserved and max_memory_allocater

* Fix CI error

* Fix CI error

* Enhance UT

* Add FLAGS_memory_stats_opt

* Add STATS macro functions

* Add StatAllocator

* Fix CI errors

* Add UT

* Fix CI errors

afe02e9d

23 3月, 2022 2 次提交

Support sharding (#40637) · fe291daf

由 Jiabin Yang 提交于 3月 23, 2022

* suppor sharding api

* support multi api for sharding in eager

* support multi api for sharding in eager

* fix test

* fix test coverage

fe291daf

Add profiler features (#40357) · c15e3823

由 chenjian 提交于 3月 23, 2022

* add event record for model profiling

* fix format

* fix format

* fix code example bug

* no

* add profiler statistic

* add profiler feature

* fix bug

* fix bug

* fix bug

* fix bug

* required: gpu

* required: gpu

* fix bug

* required: gpu

* fix ci bug

* fix ci error

* fix ci error

* upgrade document

* fix doc

* fix ci bug

* add doc and fix bug

* nothing

* fix bug

* fix format bug

* modify format

* add deprecated description for old profiler

* fix bug

* fix bug

* fix

* add load_profiler_reuslt doc

* add load_profiler_reuslt doc

* add load_profiler_reuslt doc

* help fix old profiler sample code

* add api doc

* fix format

* fix api doc

* fix api doc format

* fix api doc format

* fix api doc c format

* fix api doc format

c15e3823

22 3月, 2022 1 次提交

[new-exec] async prepare deps (#40713) · 814f7211

由 Leo Chen 提交于 3月 22, 2022

* async prepare deps

* fix bug that std::future is not set

* add ut

* refine code

* fix standalone ut

* disable prof

814f7211

21 3月, 2022 1 次提交

[IPU] update ipu_backend (#40685) · d67fe921

由 Allen Guo 提交于 3月 21, 2022

* sync changes

* copy sOpNamescope

* fix UTs

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* fix code-format

* fix compile error

* add comments for feed_op
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

d67fe921

16 3月, 2022 1 次提交
- R
  
  clean up DeviceManager in advance manually (#40504) · 23c036d6
  由 ronnywang 提交于 3月 16, 2022
  
  23c036d6
14 3月, 2022 2 次提交

Support custom op and paddle.autograd.bacward in eager (#40423) · 227fa408

由 Jiabin Yang 提交于 3月 14, 2022

* eager, test=develop

* fix bug, test=develop

* eager, test=develop

* merge legacy to fluid

* eager, test=develop

* eager, test=develop

* Refactor TensorAdd func by template and remove gradient_accumulation in eager

* Remove needless target name

* eager, test=develop

* eager, test=develop

* Use overload instead of template

* Remove legacy code

* Remove legacy code

* selectedrows, test=develop

* Remove DataType test

* eager, test=develop

* eager, test=develop

* support gan, test=develop

* Using Tensor directly instead of using EagerTensor

* support gradient_accumulation

* make test_imperative_lod_tensor_to_selected_rows longer

* make test_imperative_lod_tensor_to_selected_rows longer

* refine code

* ptb, test=develop

* Rename all EagerTensor to Tensor

* Rename some EagerTensor to Tensor

* rename EagerTensor to EagerVariable

* eager, test=develop

* eager, test=develop

* eager, test=develop

* eager, test=develop

* add more test

* eager, test=develop

* Support copiable selected rows and merge develop

* save load, eager, test=develop

* save load, eager, test=develop

* refine, test=develop

* remove useless _set_value method

* refine, test=develop

* refine, test=develop

* revert static_runner, test=develop

* EagerTensor to Tensor, test=develop

* refine, test=develop

* refine, test=develop

* clear grad, test=develop

* merge, develop

* merge, develop

* merge, test=develop

* merge, test=develop

* Support quant and part of slice

* support legacy static save

* extend slim tests time

* remove imperative on inference

* remove imperative on inference

* merge develop

* fix typo

* fix typo

* split slice related code into 2 part for imperative and eager

* split slice from inference

* split slice from inference

* fix test_tensor_register_hook

* support custom op in eager mode

* fix inference deps error

* split eager utils from custom operator

* fix type match

* fix typo
Co-authored-by: NWang Huan <wanghuan29@baidu.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>
Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>

227fa408

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors ... · e553f758

由 Zhong Hui 提交于 3月 14, 2022

[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors  between python processes. (#37302)

* Add support for paddle.multiprocessing
* move multiprocessing to incubate.

e553f758

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致