diff --git a/docs/parallel_training/cluster.rst b/docs/design.rst similarity index 89% rename from docs/parallel_training/cluster.rst rename to docs/design.rst index ce688ce2e912621e89329c10c6032371545cad96..b2a76a898707b86d890671d8dae9dedfbd07b862 100644 --- a/docs/parallel_training/cluster.rst +++ b/docs/design.rst @@ -1,11 +1,8 @@ Parl Cluster ============ -Get Started -########### - Cluster Structure Overview --------------------------- +########################## | There are three core concepts in a Parl cluster: master, worker and client. @@ -19,12 +16,13 @@ Cluster Structure Overview - **Client:** For each training program, there is a unique global client which submits tasks to the master node. -.. image:: ./cluster_structure.png +.. image:: ./parallel_training/cluster_structure.png :width: 600px :align: center Master ------- +###### + | There is only one master node in each parl cluster, we can start a master by calling ``xparl start --port 1234`` with a assigned port number. This command will also simultaneously start a local worker which connects to the new @@ -43,14 +41,14 @@ Master status of the cluster (i.e. total cpu number, used cpu number, load average ...) to the monitor. -.. image:: ./master.png +.. image:: ./parallel_training/master.png :width: 600px :align: center Worker ------- +###### -| We can add more computation resources to a existed cluster by calling +| We can add more computation resources to an existed cluster by calling ``xparl --connect master_address`` command. This command will create a local **Worker** object and then connect to the cluster. @@ -62,27 +60,27 @@ Worker job from the job buffer, start a new job and update worker information to the master node. -.. image:: ./worker.png +.. image:: ./parallel_training/worker.png :width: 600px :align: center Client ------- +###### | We have a global client for each training program, it submits training tasks to the master node. User do not need to interact with client object directly. - We can create a new global client or get an existed global client by calling + We can create a new global client and connect it to the cluster by calling ``parl.connect(master_address)``. | The global client will read local python scripts and configuration files, which will later be sent to remote jobs. -.. image:: ./client.png +.. image:: ./parallel_training/client.png :width: 600px :align: center Actor ------ +##### | **Actor** is an object defined by users which aims to solve a specific task. We use ``@parl.remote_class`` decorator to convert an actor to a @@ -108,6 +106,6 @@ Actor | When the actor call a function, the real computation will be executed in the job process by job's local actor. -.. image:: ./actor.png +.. image:: ./parallel_training/actor.png :width: 600px :align: center diff --git a/docs/images/parl.graffle b/docs/images/parl.graffle index 021d34a2b2e3381e97379ec4fb81e6832ddf0037..b0b0d57b3465a8836be03b87ecd8b086a3e7d590 100644 Binary files a/docs/images/parl.graffle and b/docs/images/parl.graffle differ diff --git a/docs/index.rst b/docs/index.rst index 9e92ec79d9b499315ded8026891e8d60c988b0b2..3235aec000f3c817a2de1689786b02d67250d280 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -66,7 +66,6 @@ Abstractions :caption: Parallel Training parallel_training/overview.rst - parallel_training/cluster.rst parallel_training/setup.rst parallel_training/recommended_practice.rst @@ -75,6 +74,7 @@ Abstractions :caption: High-quality Implementations implementations.rst + design.rst .. toctree:: :maxdepth: 1 diff --git a/docs/parallel_training/client.png b/docs/parallel_training/client.png index 5d06abe4498c8cd8225296479674616b887f5603..688538310447f9701e5132f72880ac2c7baf624b 100644 Binary files a/docs/parallel_training/client.png and b/docs/parallel_training/client.png differ diff --git a/docs/parallel_training/cluster_structure.png b/docs/parallel_training/cluster_structure.png index 3c24a2d721b8afed374e901b4b36cd63e83d2bb1..ce4829bc44f8e1656ed5487d7e35c4263a2ef76e 100644 Binary files a/docs/parallel_training/cluster_structure.png and b/docs/parallel_training/cluster_structure.png differ diff --git a/docs/parallel_training/master.png b/docs/parallel_training/master.png index 21d51cabac641c4e9a24fa410fbd33b1637f933a..093da84a55d6d1c2806f5c7094bc2c21d48dc21d 100644 Binary files a/docs/parallel_training/master.png and b/docs/parallel_training/master.png differ diff --git a/docs/parallel_training/worker.png b/docs/parallel_training/worker.png index 93944ec5b98d45a16b225de534caaa88bfcc0a81..ebe2a9bbb8e3232843315f51bd8e1f2f36ae66ad 100644 Binary files a/docs/parallel_training/worker.png and b/docs/parallel_training/worker.png differ diff --git a/parl/remote/client.py b/parl/remote/client.py index 9b20e404c237896a7f8c0cc2d9ec45910edf17f9..eb09b109812c91d9cb1b5b11c3190e3be099da72 100644 --- a/parl/remote/client.py +++ b/parl/remote/client.py @@ -32,8 +32,8 @@ class Client(object): connect to the same global client in a training task. Attributes: - submit_task_socket (zmq.Context.socket): A socket which submits job to - the master node. + submit_task_socket (zmq.Context.socket): A socket which submits tasks to + the master node. pyfiles (bytes): A serialized dictionary containing the code of python files in local working directory. executable_path (str): File path of the executable python script.