From 191a3268da8c22bab5bbdeced5e684fefcf7c8d3 Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Mon, 8 May 2017 17:42:10 -0700 Subject: [PATCH] fix according to comments --- doc/design/cluster_train/data_dispatch.md | 4 ++-- doc/design/cluster_train/master_process.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/design/cluster_train/data_dispatch.md b/doc/design/cluster_train/data_dispatch.md index f60c3b843d..241902cca4 100644 --- a/doc/design/cluster_train/data_dispatch.md +++ b/doc/design/cluster_train/data_dispatch.md @@ -23,8 +23,8 @@ 在数据集可以被训练之前,文件需要预先被转换成PaddlePaddle集群内部的存储格式(RecordIO)。我们提供两个转换方式: -- 提供给用户本地转换的库,用户可以编写程序完成转换。 -- 用户可以上传自己的数据集,在集群运行MapReduce job完成转换。 +1. 用户在本地转换好再上传 +1. 用户上传数据后,在机群上运行转换程序 转换生成的文件名会是以下格式: diff --git a/doc/design/cluster_train/master_process.md b/doc/design/cluster_train/master_process.md index 949811b4f7..e0be8df634 100644 --- a/doc/design/cluster_train/master_process.md +++ b/doc/design/cluster_train/master_process.md @@ -1,12 +1,12 @@ # Design Doc: Master Process -For an overview of master process' role, please refer to [distributed training design doc](./README.md). In this design doc we will discuss the master process in more details. The master will be implemented in [golang](https://golang.org/). +For an overview of master process' role, please refer to [distributed training design doc](./README.md). In this design doc we will discuss the master process in more details. The master will be implemented in [Go](https://golang.org/). ## Dataset -A dataset is represented by a list of files in *RecordIO* format on the distributed filesystem, each RecordIO file consists of multiple *blocks*, and each block has multiple data instances. +A dataset is a list of files in *RecordIO* format. A RecordIO file consists of chunks, whereas each chunk consists some records. ## Task Queue @@ -14,7 +14,7 @@ As mentioned in [distributed training design doc](./README.md), a *task* is a da ### Task Queue Creation -1. Each trainer will make an RPC call (using [golang rpc](https://golang.org/pkg/net/rpc/)) to the master process, telling it the RecordIO files representing the dataset specified by the user. Since every trainer will tell the master process the same dataset, only the first RPC call will be honored. +1. Each trainer will make an RPC call (using Go's [rpc](https://golang.org/pkg/net/rpc/) package) to the master process, telling it the RecordIO files representing the dataset specified by the user. Since every trainer will tell the master process the same dataset, only the first RPC call will be honored. The RPC interface is: ```go -- GitLab