update by comment

9b914cb5 · Yancey1989 · 29153a7b · 9b914cb5 · 9b914cb5
Showing with 8 addition and 23 deletion

fluid/image_classification/dist_train/README.md fluid/image_classification/dist_train/README.md +7 -7

fluid/image_classification/dist_train/args.py fluid/image_classification/dist_train/args.py +1 -16

未找到文件。
--- a/fluid/image_classification/dist_train/README.md
+++ b/fluid/image_classification/dist_train/README.md
 # Distributed Image Classification Models Training

 This folder contains implementations of **Image Classification Models**, they are designed to support
-large-scaled distributed training with two distributed architecture: parameter server with RPC and ring-base with Nvidia NCCL2 library.
+large-scaled distributed training with two distributed mode: parameter server mode and NCCL2(Nvidia NCCL2 communication library) collective mode.

 ## Getting Started

-Before getting started, please make sure you have finished the imagenet [Data Preparation](../README.md#data-preparation).
+Before getting started, please make sure you have go throught the imagenet [Data Preparation](../README.md#data-preparation).

 1. The entrypoint file is `dist_train.py`, some important flags are as follows:

    - `model`, the model to run with, such as `ResNet50`, `ResNet101` and etc..
    - `batch_size`, the batch_size per device.
-    - `update_method`, specify the update method, local, pserver or nccl2.
+    - `update_method`, specify the update method, can choose from local, pserver or nccl2.
    - `device`, use CPU or GPU device.
    - `gpus`, the GPU device count that the process used.

@@ -29,7 +29,7 @@ Before getting started, please make sure you have finished the imagenet [Data Pr
    - `PADDLE_PSERVER_PORT`, the port of the parameter pserver listened on.
    - `PADDLE_TRAINER_IPS`, the trainer IP list, separated by ",", only be used with upadte_method is nccl2.

-### Pserver Server with RPC
+### Parameter Server Mode

 In this example, we launched 4 parameter server instances and 4 trainer instances in the cluster:

@@ -66,7 +66,7 @@ In this example, we launched 4 parameter server instances and 4 trainer instance

    ```

-### Ring-base with Nvidia NCCL2 library
+### NCCL2 Collective Mode

 1. launch trainer process

@@ -83,9 +83,9 @@ In this example, we launched 4 parameter server instances and 4 trainer instance
        --data_dir=../data/ILSVRC2012
    ```

-### Training Curve
+### Visualize the Training Process

-It's easy to draw the training curve accroding to the training logs, for example,
+It's easy to draw the learning curve accroding to the training logs, for example,
 the logs of ResNet50 is as follows:

 ``` text

--- a/fluid/image_classification/dist_train/args.py
+++ b/fluid/image_classification/dist_train/args.py
@@ -22,7 +22,7 @@ BENCHMARK_MODELS = [


 def parse_args():
-    parser = argparse.ArgumentParser('Fluid model benchmarks.')
+    parser = argparse.ArgumentParser('Distributed Image Classification Training.')
    parser.add_argument(
        '--model',
        type=str,
@@ -74,8 +74,6 @@ def parse_args():
        default='flowers',
        choices=['cifar10', 'flowers', 'imagenet'],
        help='Optional dataset for benchmark.')
-    parser.add_argument(
-        '--infer_only', action='store_true', help='If set, run forward only.')
    parser.add_argument(
        '--no_test',
        action='store_true',
@@ -84,10 +82,6 @@ def parse_args():
        '--memory_optimize',
        action='store_true',
        help='If set, optimize runtime memory before start.')
-    parser.add_argument(
-        '--use_fake_data',
-        action='store_true',
-        help='If set ommit the actual read data operators.')
    parser.add_argument(
        '--update_method',
        type=str,
@@ -104,19 +98,10 @@ def parse_args():
        action='store_true',
        default=False,
        help='Whether start pserver in async mode to support ASGD')
-    parser.add_argument(
-        '--use_reader_op',
-        action='store_true',
-        help='Whether to use reader op, and must specify the data path if set this to true.'
-    )
    parser.add_argument(
        '--no_random',
        action='store_true',
        help='If set, keep the random seed and do not shuffle the data.')
-    parser.add_argument(
-        '--use_lars',
-        action='store_true',
-        help='If set, use lars for optimizers, ONLY support resnet module.')
    parser.add_argument(
        '--reduce_strategy',
        type=str,