diff --git a/doc/fluid/advanced_guide/performance_improving/analysis_tools/timeline_cn.md b/doc/fluid/advanced_guide/performance_improving/analysis_tools/timeline_cn.md old mode 100644 new mode 100755 index e40afcf3f4cc311747de9be5cbe9eacc2ca44175..a924943907fd208fbe864ddd5f54c2805124e35c --- a/doc/fluid/advanced_guide/performance_improving/analysis_tools/timeline_cn.md +++ b/doc/fluid/advanced_guide/performance_improving/analysis_tools/timeline_cn.md @@ -60,9 +60,19 @@ python Paddle/tools/timeline.py --profile_path=/tmp/profile --timeline_path=time ## 分布式使用 一般来说,分布式的训练程序都会有两种程序:pserver和trainer。我们提供了把pserver和trainer的profile日志用timeline来显示的方式。 -1. trainer打开方式与[本地使用](#local)部分的第1步相同 +1. trainer打开方式与[本地使用](#local)部分的第1步基本相同,但因为存在多个trainer,需要对每个trainer做区分。例如: + ```python + # or other method to get the unique id of the current trainer + trainer_id = int(os.environ.get('PADDLE_TRAINER_ID')) + + if pass_id == 0 and batch_id == 5: + profiler.start_profiler("All") + elif pass_id == 0 and batch_id == 10: + profiler.stop_profiler("total", "/tmp/profile_"+ str(trainer_id)) + + ``` -1. pserver可以通过加两个环境变量打开profile,例如: +2. pserver可以通过加两个环境变量打开profile,例如: ``` FLAGS_rpc_server_profile_period=10 FLAGS_rpc_server_profile_path=./tmp/pserver python train.py ``` diff --git a/doc/fluid/advanced_guide/performance_improving/analysis_tools/timeline_en.md b/doc/fluid/advanced_guide/performance_improving/analysis_tools/timeline_en.md old mode 100644 new mode 100755 index fb51802a168452a0649ebbcd0a6f4d37c07ea823..3ba1a8295840ae72563a0edd1bce2a22ac269941 --- a/doc/fluid/advanced_guide/performance_improving/analysis_tools/timeline_en.md +++ b/doc/fluid/advanced_guide/performance_improving/analysis_tools/timeline_en.md @@ -62,7 +62,17 @@ python Paddle/tools/timeline.py --profile_path=/tmp/profile --timeline_path=time ## Distributed This tool can support distributed train programs(pserver and trainer) too. -1. Open traniner profiler just like how to use in [local](#local). +1. Open traniner profiler just like how to use in [local](#local), but remember to adjust the path of profile to each trainer, since there maybe more than one trainer in the same node. + ```python + # or other method to get the unique id of the current trainer + trainer_id = int(os.environ.get('PADDLE_TRAINER_ID')) + + if pass_id == 0 and batch_id == 5: + profiler.start_profiler("All") + elif pass_id == 0 and batch_id == 10: + profiler.stop_profiler("total", "/tmp/profile_"+ str(trainer_id)) + + ``` 2. Open pserver profiler: add two environment variables, e.g.: ```