提交 d3d6748f 编写于 作者: L liangjianzhong

revise the timeline tool document for fleet training

上级 3d298e3c
...@@ -60,9 +60,19 @@ python Paddle/tools/timeline.py --profile_path=/tmp/profile --timeline_path=time ...@@ -60,9 +60,19 @@ python Paddle/tools/timeline.py --profile_path=/tmp/profile --timeline_path=time
## 分布式使用 ## 分布式使用
一般来说,分布式的训练程序都会有两种程序:pserver和trainer。我们提供了把pserver和trainer的profile日志用timeline来显示的方式。 一般来说,分布式的训练程序都会有两种程序:pserver和trainer。我们提供了把pserver和trainer的profile日志用timeline来显示的方式。
1. trainer打开方式与[本地使用](#local)部分的第1步相同 1. trainer打开方式与[本地使用](#local)部分的第1步基本相同,但因为存在多个trainer,需要对每个trainer做区分。例如:
```python
# or other method to get the unique id of the current trainer
trainer_id = int(os.environ.get('PADDLE_TRAINER_ID'))
if pass_id == 0 and batch_id == 5:
profiler.start_profiler("All")
elif pass_id == 0 and batch_id == 10:
profiler.stop_profiler("total", "/tmp/profile_"+ str(trainer_id))
```
1. pserver可以通过加两个环境变量打开profile,例如: 2. pserver可以通过加两个环境变量打开profile,例如:
``` ```
FLAGS_rpc_server_profile_period=10 FLAGS_rpc_server_profile_path=./tmp/pserver python train.py FLAGS_rpc_server_profile_period=10 FLAGS_rpc_server_profile_path=./tmp/pserver python train.py
``` ```
......
...@@ -62,7 +62,17 @@ python Paddle/tools/timeline.py --profile_path=/tmp/profile --timeline_path=time ...@@ -62,7 +62,17 @@ python Paddle/tools/timeline.py --profile_path=/tmp/profile --timeline_path=time
## Distributed ## Distributed
This tool can support distributed train programs(pserver and trainer) too. This tool can support distributed train programs(pserver and trainer) too.
1. Open traniner profiler just like how to use in [local](#local). 1. Open traniner profiler just like how to use in [local](#local), but remember to adjust the path of profile to each trainer, since there maybe more than one trainer in the same node.
```python
# or other method to get the unique id of the current trainer
trainer_id = int(os.environ.get('PADDLE_TRAINER_ID'))
if pass_id == 0 and batch_id == 5:
profiler.start_profiler("All")
elif pass_id == 0 and batch_id == 10:
profiler.stop_profiler("total", "/tmp/profile_"+ str(trainer_id))
```
2. Open pserver profiler: add two environment variables, e.g.: 2. Open pserver profiler: add two environment variables, e.g.:
``` ```
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册