pserver_client.md 5.0 KB
Newer Older
H
Helin Wang 已提交
1 2 3 4
# Design Doc: The Client Library of Parameter Server

For an overview of trainer's role, please refer to [distributed training design doc](README.md). In this design doc, we will discuss the parameter server's client library, which will manage communication with parameter servers. The library will be implemented in [Go](https://golang.org/) and made available as a static or dynamic library with a C header file.

H
Helin Wang 已提交
5 6 7 8
## Parameter Initialization

The parameters on parameter servers need to be initialized. To provide maximum flexibility, we need to allow trainer initialized the parameters. Only one trainer will do the initialization, the other trainers will wait for the completion of initialization and get the parameters from the parameter servers.

H
Helin Wang 已提交
9 10
### Trainer Selection

H
Helin Wang 已提交
11
To select the trainer for initialization, every trainer will try to get a distributed lock, whoever owns the lock will do the initialization. As illustrated below:
H
Helin Wang 已提交
12

H
Helin Wang 已提交
13 14
<img src="./src/init_lock.png">

H
Helin Wang 已提交
15 16
### Selection Process

H
Helin Wang 已提交
17 18 19 20 21
The select process is encapsulated in the C API function:
```c
int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
```
The selected trainer's call to `paddle_begin_init_params` will return with 1, and the other trainers' call to `paddle_begin_init_params` will block until initialization is done, and return 0. As illustrated below:
H
Helin Wang 已提交
22

H
Helin Wang 已提交
23 24
<img src="./src/pserver_init.png">

H
Helin Wang 已提交
25 26 27 28 29 30 31 32 33 34
## C Interface

```c
#define PADDLE_ELEMENT_TYPE_INT32   0
#define PADDLE_ELEMENT_TYPE_UINT32  1
#define PADDLE_ELEMENT_TYPE_INT64   2
#define PADDLE_ELEMENT_TYPE_UINT64  3
#define PADDLE_ELEMENT_TYPE_FLOAT32 4
#define PADDLE_ELEMENT_TYPE_FLOAT64 5

H
Helin Wang 已提交
35 36 37 38 39 40 41
typedef struct {
  char* name;
  int   element_type;
  void* content;
  int   content_len;
} paddle_parameter, paddle_gradient;

H
Helin Wang 已提交
42 43 44 45 46 47
typedef struct paddle_pserver_client paddle_pserver_client;

paddle_pserver_client* paddle_new_pserver_client();
void paddle_pserver_client_release(paddle_pserver_client* client);

/**
H
Helin Wang 已提交
48 49
 * @brief paddle_begin_init_params begins to initialize parameters on
 * parameter servers.
H
Helin Wang 已提交
50
 *
H
Helin Wang 已提交
51 52
 * paddle_begin_init_params will be called from multiple trainers,
 * only one trainer will be selected to initialize the parameters on
H
Helin Wang 已提交
53 54
 * parameter servers. Other trainers will be blocked until the
 * initialization is done, and they need to get the initialized
H
Helin Wang 已提交
55
 * parameters from parameter servers using @paddle_get_params.
H
Helin Wang 已提交
56
 *
H
Helin Wang 已提交
57 58 59 60
 * @param config_proto serialized parameter server configuration in
 * Protocol Buffers format.
 * @return 1 if the trainer is selected to initialize parameter
 * servers, otherwise 0.
H
Helin Wang 已提交
61
 */
H
Helin Wang 已提交
62
int paddle_begin_init_params(paddle_pserver_client* client, const char* config_proto);
H
Helin Wang 已提交
63 64 65 66 67

/**
 * @brief paddle_init_param initializes the parameter on parameter
 * servers.
 *
H
Helin Wang 已提交
68
 * @param param the parameter to initialize.
H
Helin Wang 已提交
69 70 71 72
 * @return 0 if successful, otherwise -1. On failure, the trainer
 * needs to restart the entire initialization process (starting from
 * @paddle_begin_init_param). Or simply exit the program and wait for
 * the cluster management system to restart the trainer.
H
Helin Wang 已提交
73
 */
H
Helin Wang 已提交
74
int paddle_init_param(paddle_pserver_client* client, paddle_parameter params);
H
Helin Wang 已提交
75 76

/**
H
Helin Wang 已提交
77
 * @brief paddle_finish_init_params tells parameter servers client has
H
Helin Wang 已提交
78 79
 * sent all parameters to parameter servers as initialization.
 *
H
Helin Wang 已提交
80 81 82 83
 * @return 0 if successful, otherwise -1. On failure, the trainer
 * needs to restart the entire initialization process (starting from
 * @paddle_begin_init_param). Or simply exit the program and wait for
 * the cluster management system to restart the trainer.
H
Helin Wang 已提交
84
 */
H
Helin Wang 已提交
85
int paddle_finish_init_params(paddle_pserver_client* client);
H
Helin Wang 已提交
86 87

/**
H
Helin Wang 已提交
88
 * @brief paddle_send_grads sends gradients to parameter servers for
H
Helin Wang 已提交
89 90
 * updating parameters.
 *
H
Helin Wang 已提交
91 92 93
 * @param grads the array of gradients to send.
 * @param total the total number of gradient inside the gradient array.
 * @param learning_rate the learning rate for the gradients.
H
Helin Wang 已提交
94 95
 * @return 0 if successful, otherwise -1.
 */
H
Helin Wang 已提交
96
int paddle_send_grads(paddle_pserver_client* client, const paddle_gradient* grads, int total, double learning_rate);
H
Helin Wang 已提交
97 98

/**
H
Helin Wang 已提交
99
 * @brief paddle_set_params sets parameters to parameter servers.
H
Helin Wang 已提交
100
 *
H
Helin Wang 已提交
101
 * @param params the array of parameters to set to parameter servers.
H
Helin Wang 已提交
102 103
 * @param total the total number of parameters inside the parameter
 * array.
H
Helin Wang 已提交
104 105
 * @return 0 if successful, otherwise -1.
 */
H
Helin Wang 已提交
106
int paddle_set_params(paddle_pserver_client* client, const paddle_parameter* params, int total);
H
Helin Wang 已提交
107 108

/**
H
Helin Wang 已提交
109
 * @brief paddle_get_params gets parameters from parameter servers.
H
Helin Wang 已提交
110
 *
H
Helin Wang 已提交
111 112 113
 * @param names the array of names of the parameters to get.
 * @param dst the destination array of parameters to save to.
 * @param total the total number of parameters to get.
H
Helin Wang 已提交
114 115
 * @return 0 if successful, otherwise -1.
 */
H
Helin Wang 已提交
116
int paddle_get_params(paddle_pserver_client* client, const char** names, paddle_parameter* dst, int total);
H
Helin Wang 已提交
117 118 119 120 121

/**
 * @brief paddle_save_model indicates parameters to save the parameter
 * to the given path
 *
H
Helin Wang 已提交
122
 * @param path the path to save parameters.
H
Helin Wang 已提交
123 124 125 126
 * @return 0 if successful, otherwise -1.
 */
int paddle_save_model(paddle_pserver_client* client, const char* path);
```