From 8d2ccd5b0a5b2606de267f8701758e1cd8ab17a7 Mon Sep 17 00:00:00 2001
From: Tao Luo <luotao02@baidu.com>
Date: Fri, 27 Sep 2019 12:25:13 +0800
Subject: [PATCH] refine profiler document (#1418)

---
 .../api_cn/profiler_cn/cuda_profiler_cn.rst   | 20 +++-----
 doc/fluid/api_cn/profiler_cn/profiler_cn.rst  | 49 ++++++++++++++-----
 .../api_cn/profiler_cn/reset_profiler_cn.rst  | 10 +---
 .../api_cn/profiler_cn/start_profiler_cn.rst  | 20 ++------
 .../api_cn/profiler_cn/stop_profiler_cn.rst   | 16 ++----
 5 files changed, 51 insertions(+), 64 deletions(-)
diff --git a/doc/fluid/api_cn/profiler_cn/cuda_profiler_cn.rst b/doc/fluid/api_cn/profiler_cn/cuda_profiler_cn.rst
index 810b1da2d..d10f39cea 100644
--- a/doc/fluid/api_cn/profiler_cn/cuda_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/cuda_profiler_cn.rst
@@ -6,17 +6,18 @@ cuda_profiler
 .. py:function:: paddle.fluid.profiler.cuda_profiler(output_file, output_mode=None, config=None)
 
 
-CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行性能分析。分析结果将以键-值对格式或逗号分隔的格式写入output_file。用户可以通过output_mode参数设置输出模式，并通过配置参数设置计数器/选项。默认配置是[' gpustarttimestamp '， ' gpuendtimestamp '， ' gridsize3d '， ' threadblocksize '， ' streamid '， ' enableonstart 0 '， ' conckerneltrace ']。然后，用户可使用 `NVIDIA Visual Profiler <https://developer.nvidia.com/nvidia-visual-profiler>`_ 工具来加载这个输出文件以可视化结果。
+CUDA性能分析器。该分析器通过调用CUDA运行时编程接口，对CUDA程序进行性能分析，并将分析结果写入输出文件output_file。输出格式由output_mode参数控制，性能分析配置选项由config参数控制。得到输出文件后，用户可使用 `NVIDIA Visual Profiler <https://developer.nvidia.com/nvidia-visual-profiler>`_ 工具来加载这个输出文件以获得可视化结果。
 
 
 参数:
-  - **output_file** (string) – 输出文件名称, 输出结果将会写入该文件
-  - **output_mode** (string) – 输出格式是有 key-value 键值对 和 逗号的分割的格式。格式应该是' kvp '或' csv '
-  - **config** (list of string) – 参考"Compute Command Line Profiler User Guide" 查阅 profiler options 和 counter相关信息
+  - **output_file** (str) – 输出文件名称, 输出结果将会写入该文件。
+  - **output_mode** (str，可选) – 输出格式，有两种可以选择，分别是 key-value 键值对格式'kvp' 和 逗号分割的格式'csv'（默认格式）。
+  - **config** (list<str>, 可选) – NVIDIA性能分析配置列表，默认值为None时会选择以下配置：['gpustarttimestamp', 'gpuendtimestamp', 'gridsize3d', 'threadblocksize', 'streamid', 'enableonstart 0', 'conckerneltrace']。上述每个配置的含义和更多配置选项，请参考 `Compute Command Line Profiler User Guide <https://developer.download.nvidia.cn/compute/DevZone/docs/html/C/doc/Compute_Command_Line_Profiler_User_Guide.pdf>`_ 。
 
 抛出异常:
-    - ``ValueError`` -  如果 ``output_mode`` 不在 ['kvp', 'csv'] 中
+    - ``ValueError`` -  如果输出格式output_mode不是'kvp'、'csv'两者之一，会抛出异常。
 
+返回: 无
 
 **代码示例**
 
@@ -43,12 +44,3 @@ CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行
             exe.run(fluid.default_main_program(), feed={'data': input})
 
     # 之后可以使用 NVIDIA Visual Profile 可视化结果
-
-
-
-
-
-
-
-
-
diff --git a/doc/fluid/api_cn/profiler_cn/profiler_cn.rst b/doc/fluid/api_cn/profiler_cn/profiler_cn.rst
index b63593a6f..72901c5d0 100644
--- a/doc/fluid/api_cn/profiler_cn/profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/profiler_cn.rst
@@ -5,18 +5,15 @@ profiler
 
 .. py:function:: paddle.fluid.profiler.profiler(state, sorted_key=None, profile_path='/tmp/profile')
 
-profile interface 。与cuda_profiler不同，此profiler可用于分析CPU和GPU程序。默认情况下，它记录CPU和GPU kernel，如果想分析其他程序，可以参考教程来在c++代码中添加更多代码。
-
-
-如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `这里 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/advanced_usage/development/profiling/timeline_cn.html>`_
+通用性能分析器 。与 :ref:`cn_api_fluid_profiler_cuda_profiler` 不同，此分析器可用于分析CPU和GPU程序。
 
 参数:
-  - **state** (string) –  profiling state, 取值为 'CPU' 或 'GPU',  profiler 使用 CPU timer 或GPU timer 进行 profiling. 虽然用户可能在开始时指定了执行位置(CPUPlace/CUDAPlace)，但是为了灵活性，profiler不会使用这个位置。
-  - **sorted_key** (string) – 如果为None，prfile的结果将按照事件的第一次结束时间顺序打印。否则，结果将按标志排序。标志取值为"call"、"total"、"max"、"min" "ave"之一，根据调用着的数量进行排序。total表示按总执行时间排序，max 表示按最大执行时间排序。min 表示按最小执行时间排序。ave表示按平均执行时间排序。
-  - **profile_path** (string) –  如果 state == 'All', 结果将写入文件 profile proto.
+  - **state** (str) –  性能分析状态, 取值为 'CPU' 或 'GPU' 或 'All'。'CPU'表示只分析CPU上的性能；'GPU'表示同时分析CPU和GPU上的性能；'All'表示除了同时分析CPU和GPU上的性能外，还将生成 `性能分析的时间轴信息 <../../advanced_usage/development/profiling/timeline_cn.html>`_ 。
+  - **sorted_key** (str，可选) – 性能分析结果的打印顺序，取值为None、'call'、'total'、'max'、'min'、'ave'之一。默认值为None，表示按照第一次结束时间顺序打印；'call'表示按调用的数量进行排序；'total'表示按总执行时间排序；'max'表示按最大执行时间排序；'min'表示按最小执行时间排序；'ave'表示按平均执行时间排序。
+  - **profile_path** (str，可选) –  如果性能分析状态为'All', 将生成的时间轴信息写入profile_path，默认输出文件为 ``/tmp/profile`` 。
 
 抛出异常：
-  - ``ValueError`` – 如果state 取值不在 ['CPU', 'GPU', 'All']中. 如果 sorted_key 取值不在 ['calls', 'total', 'max', 'min', 'ave']
+  - ``ValueError`` – 如果state取值不在 ['CPU', 'GPU', 'All']中，或sorted_key取值不在 [None, 'calls', 'total', 'max', 'min', 'ave']中，则抛出异常。
 
 **代码示例**
 
@@ -40,9 +37,37 @@ profile interface 。与cuda_profiler不同，此profiler可用于分析CPU和GP
             input = np.random.random(dshape).astype('float32')
             exe.run(fluid.default_main_program(), feed={'data': input})
 
+**结果示例**
 
+.. code-block:: python
 
-
-
-
-
+    #### sorted_key = 'total', 'calls', 'max', 'min', 'ave' 结果 ####
+    # 示例结果中，除了Sorted by number of xxx in descending order in the same thread 这句随着sorted_key变化而不同，其余均相同。
+    # 原因是，示例结果中，上述5列都已经按从大到小排列了。
+    ------------------------->     Profiling Report     <-------------------------
+
+    Place: CPU
+    Time unit: ms
+    Sorted by total time in descending order in the same thread
+    #Sorted by number of calls in descending order in the same thread
+    #Sorted by number of max in descending order in the same thread
+    #Sorted by number of min in descending order in the same thread
+    #Sorted by number of avg in descending order in the same thread
+
+    Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
+    thread0::conv2d             8           129.406     0.304303    127.076     16.1758     0.983319
+    thread0::elementwise_add    8           2.11865     0.193486    0.525592    0.264832    0.016099
+    thread0::feed               8           0.076649    0.006834    0.024616    0.00958112  0.000582432
+
+    #### sorted_key = None 结果 ####
+    # 示例结果中，是按照Op结束时间顺序打印，因此打印顺序为feed->conv2d->elementwise_add
+    ------------------------->     Profiling Report     <-------------------------
+
+    Place: CPU
+    Time unit: ms
+    Sorted by event first end time in descending order in the same thread
+
+    Event                       Calls       Total       Min.        Max.        Ave.        Ratio.
+    thread0::feed               8           0.077419    0.006608    0.023349    0.00967738  0.00775934
+    thread0::conv2d             8           7.93456     0.291385    5.63342     0.99182     0.795243
+    thread0::elementwise_add    8           1.96555     0.191884    0.518004    0.245693    0.196998
diff --git a/doc/fluid/api_cn/profiler_cn/reset_profiler_cn.rst b/doc/fluid/api_cn/profiler_cn/reset_profiler_cn.rst
index bd7ba19a9..5d92153fe 100644
--- a/doc/fluid/api_cn/profiler_cn/reset_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/reset_profiler_cn.rst
@@ -5,7 +5,7 @@ reset_profiler
 
 .. py:function:: paddle.fluid.profiler.reset_profiler()
 
-清除之前的时间记录。此接口不适用于 ``fluid.profiler.cuda_profiler`` ，它只适用于 ``fluid.profiler.start_profiler`` , ``fluid.profiler.stop_profiler`` , ``fluid.profiler.profiler`` 。
+清除之前的性能分析记录。此接口不能和 :ref:`cn_api_fluid_profiler_cuda_profiler` 一起使用 ，但它可以和 :ref:`cn_api_fluid_profiler_start_profiler` 、:ref:`cn_api_fluid_profiler_stop_profiler` 和 :ref:`cn_api_fluid_profiler_profiler` 一起使用。
 
 **代码示例**
 
@@ -18,11 +18,3 @@ reset_profiler
         if iter == 2:
             profiler.reset_profiler()
         # ...
-
-
-
-
-
-
-
-
diff --git a/doc/fluid/api_cn/profiler_cn/start_profiler_cn.rst b/doc/fluid/api_cn/profiler_cn/start_profiler_cn.rst
index d99ee7db1..b082a003a 100644
--- a/doc/fluid/api_cn/profiler_cn/start_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/start_profiler_cn.rst
@@ -5,17 +5,13 @@ start_profiler
 
 .. py:function:: paddle.fluid.profiler.start_profiler(state)
 
-激活使用 profiler， 用户可以使用 ``fluid.profiler.start_profiler`` 和 ``fluid.profiler.stop_profiler`` 插入代码
-不能使用 ``fluid.profiler.profiler``
-
-
-如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `这里 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/advanced_usage/development/profiling/timeline_cn.html>`_
+激活使用性能分析器。除了 :ref:`cn_api_fluid_profiler` 外，用户还可以使用 :ref:`cn_api_fluid_profiler_start_profiler` 和 :ref:`cn_api_fluid_profiler_stop_profiler` 来激活和停止使用性能分析器。
 
 参数:
-  - **state** (string) – profiling state, 取值为 'CPU' 或 'GPU' 或 'All', 'CPU' 代表只分析 cpu. 'GPU' 代表只分析 GPU . 'All' 会产生 timeline.
+  - **state** (str) –  性能分析状态, 取值为 'CPU' 或 'GPU' 或 'All'。'CPU'表示只分析CPU上的性能；'GPU'表示同时分析CPU和GPU上的性能；'All'表示除了同时分析CPU和GPU上的性能外，还将生成性能分析的时间轴信息 :ref:`fluid_timeline` 。
 
 抛出异常:
-  - ``ValueError`` – 如果state 取值不在 ['CPU', 'GPU', 'All']中
+  - ``ValueError`` – 如果state取值不在 ['CPU', 'GPU', 'All']中，则抛出异常。
 
 **代码示例**
 
@@ -30,13 +26,3 @@ start_profiler
             profiler.reset_profiler()
         # except each iteration
     profiler.stop_profiler('total', '/tmp/profile')
-
-                # ...
-
-
-
-
-
-
-
-
diff --git a/doc/fluid/api_cn/profiler_cn/stop_profiler_cn.rst b/doc/fluid/api_cn/profiler_cn/stop_profiler_cn.rst
index 4445342e1..e16cd2ef7 100644
--- a/doc/fluid/api_cn/profiler_cn/stop_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/stop_profiler_cn.rst
@@ -5,16 +5,15 @@ stop_profiler
 
 .. py:function:: paddle.fluid.profiler.stop_profiler(sorted_key=None, profile_path='/tmp/profile')
 
-停止 profiler， 用户可以使用 ``fluid.profiler.start_profiler`` 和 ``fluid.profiler.stop_profiler`` 插入代码
-不能使用 ``fluid.profiler.profiler``
+停止使用性能分析器。除了 :ref:`cn_api_fluid_profiler` 外，用户还可以使用 :ref:`cn_api_fluid_profiler_start_profiler` 和 :ref:`cn_api_fluid_profiler_stop_profiler` 来激活和停止使用性能分析器。
 
 参数:
-  - **sorted_key** (string) – 如果为None，prfile的结果将按照事件的第一次结束时间顺序打印。否则，结果将按标志排序。标志取值为"call"、"total"、"max"、"min" "ave"之一，根据调用着的数量进行排序。total表示按总执行时间排序，max 表示按最大执行时间排序。min 表示按最小执行时间排序。ave表示按平均执行时间排序。
-  - **profile_path** (string) - 如果 state == 'All', 结果将写入文件 profile proto.
+  - **sorted_key** (str，可选) – 性能分析结果的打印顺序，取值为None、'call'、'total'、'max'、'min'、'ave'之一。默认值为None，表示按照第一次结束时间顺序打印；'call'表示按调用的数量进行排序；'total'表示按总执行时间排序；'max'表示按最大执行时间排序；'min'表示按最小执行时间排序；'ave'表示按平均执行时间排序。
+  - **profile_path** (str，可选) –  如果性能分析状态为'All', 将生成的时间轴信息写入profile_path，默认输出文件为 ``/tmp/profile`` 。
 
 
 抛出异常:
-  - ``ValueError`` – 如果state 取值不在 ['CPU', 'GPU', 'All']中
+  - ``ValueError`` – 如果sorted_key取值不在 [None, 'calls', 'total', 'max', 'min', 'ave']中，则抛出异常。
 
 **代码示例**
 
@@ -29,10 +28,3 @@ stop_profiler
             profiler.reset_profiler()
             # except each iteration
     profiler.stop_profiler('total', '/tmp/profile')
-
-
-
-
-
-
-
-- 
GitLab