Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
6dc5b34e
P
Paddle
项目概览
PaddlePaddle
/
Paddle
1 年多 前同步成功
通知
2302
Star
20931
Fork
5422
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
6dc5b34e
编写于
12月 01, 2017
作者:
A
Abhinav Arora
提交者:
Yi Wang
11月 30, 2017
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Polishing the cpu profiling doc (#6116)
上级
0d40a4db
变更
1
显示空白变更内容
内联
并排
Showing
1 changed file
with
11 addition
and
11 deletion
+11
-11
doc/howto/optimization/cpu_profiling.md
doc/howto/optimization/cpu_profiling.md
+11
-11
未找到文件。
doc/howto/optimization/cpu_profiling.md
浏览文件 @
6dc5b34e
This tutorial introduces techniques we use
d
to profile and tune the
This tutorial introduces techniques we use to profile and tune the
CPU performance of PaddlePaddle. We will use Python packages
CPU performance of PaddlePaddle. We will use Python packages
`cProfile`
and
`yep`
, and Google
`perftools`
.
`cProfile`
and
`yep`
, and Google
's
`perftools`
.
Profiling is the process that reveals
the
performance bottlenecks,
Profiling is the process that reveals performance bottlenecks,
which could be very different from what's in the developers' mind.
which could be very different from what's in the developers' mind.
Performance tuning is
to fix th
e bottlenecks. Performance optimization
Performance tuning is
done to fix thes
e bottlenecks. Performance optimization
repeats the steps of profiling and tuning alternatively.
repeats the steps of profiling and tuning alternatively.
PaddlePaddle users program AI by calling the Python API, which calls
PaddlePaddle users program AI
applications
by calling the Python API, which calls
into
`libpaddle.so.`
written in C++. In this tutorial, we focus on
into
`libpaddle.so.`
written in C++. In this tutorial, we focus on
the profiling and tuning of
the profiling and tuning of
...
@@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:
...
@@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:
We can see that the most time-consuming function is the
`built-in
We can see that the most time-consuming function is the
`built-in
method run`
, which is a C++ function in
`libpaddle.so`
. We will
method run`
, which is a C++ function in
`libpaddle.so`
. We will
explain how to profile C++ code in the next section. At th
e right
explain how to profile C++ code in the next section. At th
is
moment, let's look into the third function
`sync_with_cpp`
, which is a
moment, let's look into the third function
`sync_with_cpp`
, which is a
Python function. We can click it to understand more about it:
Python function. We can click it to understand more about it:
...
@@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
...
@@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
`main.py.prof`
.
`main.py.prof`
.
Please be aware of the
`-v`
command line option, which prints the
Please be aware of the
`-v`
command line option, which prints the
analysis results after generating the profiling file. By
taking a
analysis results after generating the profiling file. By
examining the
glance at
the print result, we'd know that if we stripped debug
the print result, we'd know that if we stripped debug
information from
`libpaddle.so`
at build time. The following hints
information from
`libpaddle.so`
at build time. The following hints
help make sure that the analysis results are readable:
help make sure that the analysis results are readable:
...
@@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
...
@@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
variable
`OMP_NUM_THREADS=1`
to prevents OpenMP from automatically
variable
`OMP_NUM_THREADS=1`
to prevents OpenMP from automatically
starting multiple threads.
starting multiple threads.
###
Look into
the Profiling File
###
Examining
the Profiling File
The tool we used to
look into
the profiling file generated by
The tool we used to
examine
the profiling file generated by
`perftools`
is
[
`pprof`
](
https://github.com/google/pprof
)
, which
`perftools`
is
[
`pprof`
](
https://github.com/google/pprof
)
, which
provides a Web-based GUI like
`cprofilev`
.
provides a Web-based GUI like
`cprofilev`
.
...
@@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
...
@@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
optimize
`MomentumOp`
.
optimize
`MomentumOp`
.
`pprof`
would mark performance critical parts of the program in
`pprof`
would mark performance critical parts of the program in
red. It's a good idea to follow the hint.
red. It's a good idea to follow the hint
s
.
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录