add translation and fix deadlink (#814)

* hotfix deadlink (#811) * Update native_infer_en.md (#787) * Update install_Windows_en.md (#790) * Update install_Windows_en.md * Update install_Windows_en.md * Update cluster_howto_en.rst (#791) * Update cluster_howto_en.rst * Update cluster_howto_en.rst * Update doc/fluid/user_guides/howto/training/cluster_howto_en.rst Co-Authored-By: N acosta123 <42226556+acosta123@users.noreply.github.com> * Update doc/fluid/user_guides/howto/training/cluster_howto_en.rst Co-Authored-By: N acosta123 <42226556+acosta123@users.noreply.github.com> * Update cluster_howto_en.rst * Update index_cn.rst (#813)

add translation and fix deadlink (#814)
* hotfix deadlink (#811) * Update native_infer_en.md (#787) * Update install_Windows_en.md (#790) * Update install_Windows_en.md * Update install_Windows_en.md * Update cluster_howto_en.rst (#791) * Update cluster_howto_en.rst * Update cluster_howto_en.rst * Update doc/fluid/user_guides/howto/training/cluster_howto_en.rst Co-Authored-By: N acosta123 <42226556+acosta123@users.noreply.github.com> * Update doc/fluid/user_guides/howto/training/cluster_howto_en.rst Co-Authored-By: N acosta123 <42226556+acosta123@users.noreply.github.com> * Update cluster_howto_en.rst * Update index_cn.rst (#813)
e14a084c · Cheerego · GitHub · e9fa88b4 · e14a084c · e14a084c
4 changed file
--- a/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md
+++ b/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md
@@ -96,7 +96,7 @@ There are two modes in term of memory management in `PaddleBuf` :
 In the two modes, the first is more convenient while the second strictly controls memory management to facilitate integration with `tcmalloc` and other libraries.
-### Upgrade performance based on contrib::AnalysisConfig (Prerelease)
+### Upgrade performance based on contrib::AnalysisConfig
 AnalyisConfig is at the stage of pre-release and protected by `namespace contrib` , which may be adjusted in the future.
@@ -106,9 +106,11 @@ The usage of `AnalysisConfig` is similiar with that of `NativeConfig` but the fo
 ```c++
 AnalysisConfig config;
-config.model_dir = xxx;
+config.SetModel(dirname);                // set the directory of the model
-config.use_gpu = false;  // GPU optimization is not supported at present
+config.EnableUseGpu(100, 0 /*gpu id*/);  // use GPU,or
-config.specify_input_name = true; // it needs to set name of input
+config.DisableGpu();                     // use CPU
+config.SwitchSpecifyInputNames(true);    // need to appoint the name of your input
+config.SwitchIrOptim();     // turn on the optimization switch,and a sequence of optimizations will be executed in operation                      
 ```
 Note that input PaddleTensor needs to be allocated. Previous examples need to be revised as follows:
@@ -147,7 +149,7 @@ For more specific examples, please refer to[LoD-Tensor Instructions](../../../us
 1. If the CPU type permits, it's best to use the versions with support for AVX and MKL.
 2. Reuse input and output `PaddleTensor` to avoid frequent memory allocation resulting in low performance
-3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU inference 
+3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU or GPU inference 
 ## Code Demo

--- a/doc/fluid/beginners_guide/install/install_Windows_en.md
+++ b/doc/fluid/beginners_guide/install/install_Windows_en.md
@@ -9,9 +9,8 @@ This instruction will show you how to install PaddlePaddle on Windows.  The foll
 **Note** : 
-* The current version does not support NCCL, distributed training, AVX, warpctc and MKL related functions.
+* The current version does not support NCCL, distributed training related functions.
-* Currently, only PaddlePaddle for CPU is supported on Windows.
@@ -30,14 +29,20 @@ Version of pip or pip3 should be equal to or above 9.0.1 .
 * Install PaddlePaddle
+* ***CPU version of PaddlePaddle***:
 Execute `pip install paddlepaddle` or `pip3 install paddlepaddle` to download and install PaddlePaddle.
+* ***GPU version of PaddlePaddle***:
+Execute `pip install paddlepaddle-gpu`(python2.7) or `pip3 install paddlepaddle-gpu`(python3.x) to download and install PaddlePaddle.
 ## ***Verify installation***
 After completing the installation, you can use `python` or `python3` to enter the python interpreter and then use `import paddle.fluid` to verify that the installation was successful.
 ## ***How to uninstall***
+* ***CPU version of PaddlePaddle***:
 Use the following command to uninstall PaddlePaddle : `pip uninstallpaddlepaddle `or `pip3 uninstall paddlepaddle`
+* ***GPU version of PaddlePaddle***:
+Use the following command to uninstall PaddlePaddle : `pip uninstall paddlepaddle-gpu` or `pip3 uninstall paddlepaddle-gpu`
--- a/doc/fluid/user_guides/howto/training/cluster_howto_en.rst
+++ b/doc/fluid/user_guides/howto/training/cluster_howto_en.rst
@@ -205,6 +205,25 @@ For example:
 Currently, distributed training using NCCL2 only supports synchronous training. The distributed training using NCCL2 mode is more suitable for the model which is relatively large and needs \
 synchronous training and GPU training. If the hardware device supports RDMA and GPU Direct, this can achieve high distributed training performance.
+Start Up NCCL2 Distributed Training in Muti-Process Mode
++++++++++++++++++++++++++++++++++++++++++++++
+ Usually you can get better multi-training performance by using multi-process mode to start up NCCL2 distributed training assignment. Paddle provides :code:`paddle.distributed.launch` module to start up multi-process assignment, after which each training process will use an independent GPU device.
+Attention during usage:
+ * set the number of nodes: set the number of nodes of an assignment by the environment variable :code:`PADDLE_NUM_TRAINERS` , and this variable will also be set in every training process.
+ * set the number of devices of each node: by activating the parameter :code:`--gpus` , you can set the number of GPU devices of each node, and the sequence number of each process will be set in the environment variable :code:`PADDLE_TRAINER_ID` automatically.
+ * data segment: mult-process mode means one process in each device. Generally, each process manages a part of training data, in order to make sure that all processes can manage the whole data set.
+ * entrance file: entrance file is the training script for actual startup.
+ * journal: for each training process, the joural is saved in the default :code:`./mylog` directory, and you can assign by the parameter :code:`--log_dir` .
+  startup example:
+  .. code-block:: bash
+     > PADDLE_NUM_TRAINERS=<TRAINER_COUNT> python -m paddle.distributed.launch train.py --gpus <NUM_GPUS_ON_HOSTS> <ENTRYPOINT_SCRIPT> --arg1 --arg2 ...
 Important Notes on NCCL2 Distributed Training
 ++++++++++++++++++++++++++++++++++++++++++++++
@@ -215,7 +234,7 @@ exit at the final iteration. There are two common ways:
 - Each node only trains fixed number of batches per pass, which is controlled by python codes. If a node has more data than this fixed amount, then these 
  marginal data will not be trained.
-**Note** :  If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2.
+**Note** : If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2.
 Assuming you need to use :code:`eth2` as the communication device, you need to set the following environment variables:

--- a/doc/fluid/user_guides/index_cn.rst
+++ b/doc/fluid/user_guides/index_cn.rst
@@ -15,7 +15,7 @@
    - `训练神经网络 <../user_guides/howto/training/index_cn.html>`_：介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量
-	- `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.md>`_：介绍在 Fluid 下使用DyGraph       
+    - `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.html>`_：介绍在 Fluid 下使用DyGraph
    - `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index_cn.html>`_：介绍在 Fluid 下进行模型评估和调试的方法，包括：