diff --git a/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md b/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md index 77ac2ee61a5bc09984ec0ed2ace5fbf9865654ad..62228dbac2b726fadd4394bee87d4902d18b25a9 100644 --- a/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md +++ b/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md @@ -96,7 +96,7 @@ There are two modes in term of memory management in `PaddleBuf` : In the two modes, the first is more convenient while the second strictly controls memory management to facilitate integration with `tcmalloc` and other libraries. -### Upgrade performance based on contrib::AnalysisConfig (Prerelease) +### Upgrade performance based on contrib::AnalysisConfig AnalyisConfig is at the stage of pre-release and protected by `namespace contrib` , which may be adjusted in the future. @@ -106,9 +106,11 @@ The usage of `AnalysisConfig` is similiar with that of `NativeConfig` but the fo ```c++ AnalysisConfig config; -config.model_dir = xxx; -config.use_gpu = false; // GPU optimization is not supported at present -config.specify_input_name = true; // it needs to set name of input +config.SetModel(dirname); // set the directory of the model +config.EnableUseGpu(100, 0 /*gpu id*/); // use GPU,or +config.DisableGpu(); // use CPU +config.SwitchSpecifyInputNames(true); // need to appoint the name of your input +config.SwitchIrOptim(); // turn on the optimization switch,and a sequence of optimizations will be executed in operation ``` Note that input PaddleTensor needs to be allocated. Previous examples need to be revised as follows: @@ -147,7 +149,7 @@ For more specific examples, please refer to[LoD-Tensor Instructions](../../../us 1. If the CPU type permits, it's best to use the versions with support for AVX and MKL. 2. Reuse input and output `PaddleTensor` to avoid frequent memory allocation resulting in low performance -3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU inference +3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU or GPU inference ## Code Demo diff --git a/doc/fluid/beginners_guide/install/install_Windows_en.md b/doc/fluid/beginners_guide/install/install_Windows_en.md index 4002f824f995603b18d15dd3b0e195e555ed7a59..2f60fff40466869665f4ecc5389fd3f46a1c42f1 100644 --- a/doc/fluid/beginners_guide/install/install_Windows_en.md +++ b/doc/fluid/beginners_guide/install/install_Windows_en.md @@ -9,9 +9,8 @@ This instruction will show you how to install PaddlePaddle on Windows. The foll **Note** : -* The current version does not support NCCL, distributed training, AVX, warpctc and MKL related functions. +* The current version does not support NCCL, distributed training related functions. -* Currently, only PaddlePaddle for CPU is supported on Windows. @@ -30,14 +29,20 @@ Version of pip or pip3 should be equal to or above 9.0.1 . * Install PaddlePaddle +* ***CPU version of PaddlePaddle***: Execute `pip install paddlepaddle` or `pip3 install paddlepaddle` to download and install PaddlePaddle. - +* ***GPU version of PaddlePaddle***: +Execute `pip install paddlepaddle-gpu`(python2.7) or `pip3 install paddlepaddle-gpu`(python3.x) to download and install PaddlePaddle. + ## ***Verify installation*** After completing the installation, you can use `python` or `python3` to enter the python interpreter and then use `import paddle.fluid` to verify that the installation was successful. ## ***How to uninstall*** +* ***CPU version of PaddlePaddle***: Use the following command to uninstall PaddlePaddle : `pip uninstallpaddlepaddle `or `pip3 uninstall paddlepaddle` +* ***GPU version of PaddlePaddle***: +Use the following command to uninstall PaddlePaddle : `pip uninstall paddlepaddle-gpu` or `pip3 uninstall paddlepaddle-gpu` diff --git a/doc/fluid/user_guides/howto/training/cluster_howto_en.rst b/doc/fluid/user_guides/howto/training/cluster_howto_en.rst index 5610ee3cdab7d9102673a80e35a75ab3906f5615..2d329b6ca96e0dd8643f2cf882c7ea3eac10aab4 100644 --- a/doc/fluid/user_guides/howto/training/cluster_howto_en.rst +++ b/doc/fluid/user_guides/howto/training/cluster_howto_en.rst @@ -205,6 +205,25 @@ For example: Currently, distributed training using NCCL2 only supports synchronous training. The distributed training using NCCL2 mode is more suitable for the model which is relatively large and needs \ synchronous training and GPU training. If the hardware device supports RDMA and GPU Direct, this can achieve high distributed training performance. +Start Up NCCL2 Distributed Training in Muti-Process Mode +++++++++++++++++++++++++++++++++++++++++++++++ + + Usually you can get better multi-training performance by using multi-process mode to start up NCCL2 distributed training assignment. Paddle provides :code:`paddle.distributed.launch` module to start up multi-process assignment, after which each training process will use an independent GPU device. + +Attention during usage: + + * set the number of nodes: set the number of nodes of an assignment by the environment variable :code:`PADDLE_NUM_TRAINERS` , and this variable will also be set in every training process. + * set the number of devices of each node: by activating the parameter :code:`--gpus` , you can set the number of GPU devices of each node, and the sequence number of each process will be set in the environment variable :code:`PADDLE_TRAINER_ID` automatically. + * data segment: mult-process mode means one process in each device. Generally, each process manages a part of training data, in order to make sure that all processes can manage the whole data set. + * entrance file: entrance file is the training script for actual startup. + * journal: for each training process, the joural is saved in the default :code:`./mylog` directory, and you can assign by the parameter :code:`--log_dir` . + + startup example: + + .. code-block:: bash + + > PADDLE_NUM_TRAINERS= python -m paddle.distributed.launch train.py --gpus --arg1 --arg2 ... + Important Notes on NCCL2 Distributed Training ++++++++++++++++++++++++++++++++++++++++++++++ @@ -215,7 +234,7 @@ exit at the final iteration. There are two common ways: - Each node only trains fixed number of batches per pass, which is controlled by python codes. If a node has more data than this fixed amount, then these marginal data will not be trained. -**Note** : If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2. +**Note** : If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2. Assuming you need to use :code:`eth2` as the communication device, you need to set the following environment variables: diff --git a/doc/fluid/user_guides/index_cn.rst b/doc/fluid/user_guides/index_cn.rst index c64a97f166866009d506503186ac40524fe6b189..9696955c255c8bbea5123634e5773e24a1a2116f 100644 --- a/doc/fluid/user_guides/index_cn.rst +++ b/doc/fluid/user_guides/index_cn.rst @@ -15,7 +15,7 @@ - `训练神经网络 <../user_guides/howto/training/index_cn.html>`_:介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量 - - `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.md>`_:介绍在 Fluid 下使用DyGraph + - `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.html>`_:介绍在 Fluid 下使用DyGraph - `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index_cn.html>`_:介绍在 Fluid 下进行模型评估和调试的方法,包括: