Use different clustersΒΆ
PaddlePaddle supports running jobs on several platforms including: - Kubernetes open-source system for automating deployment, scaling, and management of containerized applications from Google. - OpenMPI Mature high performance parallel computing framework. - Fabric A cluster management tool. Write scripts to submit jobs or manage the cluster.
We’ll introduce cluster job management on these platforms. The examples can be found under cluster_train_v2 .
These cluster platforms provide API or environment variables for training processes, when the job is dispatched to different nodes. Like node ID, IP or total number of nodes etc.