提交 · efcd108dd330e0cea2ad5afb366b0e5817557c59 · 机器未来 / Paddle

24 9月, 2021 1 次提交

Basic PR on Cost Model (#35774) (#35915) · efcd108d

由 Huihuang Zheng 提交于 9月 24, 2021

Add basic Cost Model, it uses executor to run program and profile it to get op time.

This is an early basic version, we will add more functions in the future.

efcd108d

26 5月, 2021 1 次提交
- Y
  
  Marker op for profiling (#33034) · 5c79dbb2
  由 Yuang Liu 提交于 5月 26, 2021
  
  5c79dbb2
19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
03 11月, 2020 1 次提交
- W
  
  Paddle support compile on sw (#27858) · 09fd2b2a
  由 Wilber 提交于 11月 03, 2020
  
  09fd2b2a
07 7月, 2020 1 次提交
- G
  Refine PADDLE_ENFORCE (#25369) · ea7e5325
  由 GaoWei8 提交于 7月 07, 2020
```
* refine PADDLE_ENFORCE
test=develop
```
  ea7e5325
03 7月, 2020 1 次提交
- G
  fix PADDLE_ENFORCE (#25297) · fb70682f
  由 GaoWei8 提交于 7月 03, 2020
```
* fix PADDLE_ENFORCE and refine the description
test=develop
```
  fb70682f
09 6月, 2020 1 次提交
- W
  
  fix the sgement fault error of profiler in seqseq model test=develop (#24952) · feba1318
  由 wangchaochaohu 提交于 6月 09, 2020
  
  feba1318
26 5月, 2020 1 次提交
- W
  
  fix the print error of PE record_event and framework overhead in profiler test=develop (#24744) · 79caed66
  由 wangchaochaohu 提交于 5月 26, 2020
  
  79caed66
25 5月, 2020 1 次提交
- W
  
  Add pe profiler Event (#24611) · dbfe5333
  由 wangchaochaohu 提交于 5月 25, 2020
  
  dbfe5333
11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

24 2月, 2020 1 次提交
- W
  Fusion group profile support (#22718) · 611411b9
  由 wangchaochaohu 提交于 2月 24, 2020
```
* add support for the driver api callback and fix the profiler name show bug
```
  611411b9
23 2月, 2020 1 次提交
- T
  
  fix typo words (#22653) · d2ba91aa
  由 tianshuo78520a 提交于 2月 23, 2020
  
  d2ba91aa
09 1月, 2020 1 次提交
- W
  add support for nested profiling event and printing in different level (#22061) · c3876cf8
  由 wangchaochaohu 提交于 1月 09, 2020
```
* add support for nested profiling event and printing in different level
```
  c3876cf8
28 11月, 2019 1 次提交
- W
  Profile refine (#21258) · 8293f21a
  由 wangchaochaohu 提交于 11月 28, 2019
```
* fix profile api high version test=develop
```
  8293f21a
13 3月, 2019 1 次提交
- C
  Add memory profiler (#16137) · 09799566
  由 chengduo 提交于 3月 12, 2019
```
test=develop
```
  09799566
11 3月, 2019 1 次提交

Revert "Revert "Add Event for TensorCopy"" (#16035) · ad80bde8

由 chengduo 提交于 3月 11, 2019

* Revert "Revert "Add Event for TensorCopy" (#16022)"

This reverts commit e2da3a5b.

* use default stream
test=develop

ad80bde8

04 3月, 2019 3 次提交
- C
  Revert "Add Event for TensorCopy" (#16022) · 92438f61
  由 chengduo 提交于 3月 03, 2019
```
* Revert "Add Event for TensorCopy (#15953)"

This reverts commit 7235fd66.
test=develop

* fix CI
test=develop
```
  92438f61
- C
  Add Event for TensorCopy (#15953) · 06f3c857
  由 chengduo 提交于 3月 01, 2019
```
Add Event for TensorCopy 
```
  06f3c857
- C
  Revert "Add Event for TensorCopy" (#16022) · e2da3a5b
  由 chengduo 提交于 3月 03, 2019
```
* Revert "Add Event for TensorCopy (#15953)"

This reverts commit 7235fd66.
test=develop

* fix CI
test=develop
```
  e2da3a5b
01 3月, 2019 1 次提交
- C
  Add Event for TensorCopy (#15953) · 7235fd66
  由 chengduo 提交于 3月 01, 2019
```
Add Event for TensorCopy 
```
  7235fd66
24 2月, 2019 1 次提交
- D
  
  add memset CUPTI && test=develop (#15868) · c6bd434f
  由 Dun 提交于 2月 24, 2019
  
  c6bd434f
22 2月, 2019 1 次提交
- C
  enhance profiler (#15842) · 3b08c9ab
  由 chengduo 提交于 2月 22, 2019
```
test=develop
```
  3b08c9ab
21 2月, 2019 1 次提交

Profiler refine and add CUDA runtime api tracer (#15301) · a83e4704

由 Dun 提交于 2月 21, 2019

* refine profiler && add runtime tracer

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* fix bug && test=develop

* add thread id map && test=develop

* test=develop

* testing

* bug fix

* remove cuda event && refine code && test=develop

* test=develop

* test=develop

* test=develop

* fix windows temp file && test=develop

* test=develop

* fix windows bug && test=develop

* fix start up issue && test=develop

* code polish &&  test=develop

* remove unused code && test=develop

* add some cupti cbid && test=develop

* add FLAGS_multiple_of_cupti_buffer_size && test=develop

* fix compile error && test=develop

* add keyword && test=develop

* fix && test=develop

* code polish && test=develop

a83e4704

04 12月, 2018 1 次提交
- Z
  test=develop · deb04809
  由 ZongwuYang 提交于 12月 04, 2018
```
Fix the bug that profiler cannot trace the nccl allreduce operator
```
  deb04809
26 11月, 2018 1 次提交
- M
  Revert the changes of VLOG · 53433d7f
  由 minqiyang 提交于 11月 26, 2018
```
test=develop
```
  53433d7f
08 11月, 2018 1 次提交
- M
  Change the origin VLOG level to 10 times · 0c3227a5
  由 minqiyang 提交于 11月 08, 2018
```
Fix code to support cpplint syntax check

test=develop
```
  0c3227a5
13 8月, 2018 1 次提交
- Q
  
  fix profiler dead lock · 5a6c3cd9
  由 qiaolongfei 提交于 8月 13, 2018
  
  5a6c3cd9
10 8月, 2018 1 次提交
- Q
  
  optimize code · e008600b
  由 qiaolongfei 提交于 8月 10, 2018
  
  e008600b
31 7月, 2018 1 次提交
- X
  make profiler use thread_id from g_thread_id · caf10b47
  由 Xin Pan 提交于 7月 31, 2018
```
Add a few more RecordEvent.
Cleanup
```
  caf10b47
30 7月, 2018 3 次提交
- T
  
  clean up · ff97c709
  由 typhoonzero 提交于 7月 30, 2018
  
  ff97c709
- T
  
  clean up · b7b60022
  由 typhoonzero 提交于 7月 30, 2018
  
  b7b60022
- T
  
  fix_tests_on_gcc482 · f628b1df
  由 typhoonzero 提交于 7月 30, 2018
  
  f628b1df
23 7月, 2018 1 次提交
- Q
  
  profiler support cpu · a6d30a86
  由 qiaolongfei 提交于 7月 23, 2018
  
  a6d30a86
14 6月, 2018 1 次提交

Remove cuptiFinalize. · d2afd210

由 Xin Pan 提交于 6月 14, 2018

In cupti samples, only cuptiFlush is used.
I can't find any places calling cuptiFinalize and
this API can error out as not_implemented in some
cuda installation.

d2afd210

08 6月, 2018 2 次提交
- G
  
  Update device_tracer.cc · 310598f9
  由 guochaorong 提交于 6月 08, 2018
  
  310598f9
- G
  
  fix some bugs introduced by unfreed memory · 0fec9469
  由 guochaorong 提交于 6月 08, 2018
  
  0fec9469
22 5月, 2018 1 次提交

multi-thread handlerequest · b4dd4c04

由 Xin Pan 提交于 5月 21, 2018

    Experiment on vgg flower, 2 trainers, 1ps.
    more trainer could have more speedup.

    After:
    Pass = 0, Iters = 327, Speed = (7.52) img/s
    Before:
    Pass = 0, Iters = 385, Speed = (6.77) img/s

b4dd4c04

10 4月, 2018 1 次提交
- Y
  
  Fix part of the cpplint errors in fluid/platform (#9802) · 8dbd9c39
  由 Yi Wang 提交于 4月 09, 2018
  
  8dbd9c39
14 3月, 2018 1 次提交
- X
  
  Better timeline · 4840c49b
  由 Xin Pan 提交于 3月 14, 2018
  
  4840c49b

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致