Created by: gavin1332
PR types
Others
PR changes
Others
Describe
Print the log of local rank 0 to stdout
while using paddle.distributed.launch
with --log_dir
set.
paddle.distributed.launch
creates multiple sub-processes and these sub-processes redirect their stdout
to log_dir
while it is set. The main process waits all sub-processes with no training log printed to the stdout
which make a false appearance of hanging to users and is not friendly to monitor the progress of training if a user is not aware of workerlog
.
So this PR prints the log of local rank 0 to stdout
for easy inspecting of training status.