未验证 提交 e14a084c 编写于 作者: C Cheerego 提交者: GitHub

add translation and fix deadlink (#814)

* hotfix deadlink (#811)

* Update native_infer_en.md (#787)

* Update install_Windows_en.md (#790)

* Update install_Windows_en.md

* Update install_Windows_en.md

* Update cluster_howto_en.rst (#791)

* Update cluster_howto_en.rst

* Update cluster_howto_en.rst

* Update doc/fluid/user_guides/howto/training/cluster_howto_en.rst
Co-Authored-By: Nacosta123 <42226556+acosta123@users.noreply.github.com>

* Update doc/fluid/user_guides/howto/training/cluster_howto_en.rst
Co-Authored-By: Nacosta123 <42226556+acosta123@users.noreply.github.com>

* Update cluster_howto_en.rst

* Update index_cn.rst (#813)
上级 e9fa88b4
...@@ -96,7 +96,7 @@ There are two modes in term of memory management in `PaddleBuf` : ...@@ -96,7 +96,7 @@ There are two modes in term of memory management in `PaddleBuf` :
In the two modes, the first is more convenient while the second strictly controls memory management to facilitate integration with `tcmalloc` and other libraries. In the two modes, the first is more convenient while the second strictly controls memory management to facilitate integration with `tcmalloc` and other libraries.
### Upgrade performance based on contrib::AnalysisConfig (Prerelease) ### Upgrade performance based on contrib::AnalysisConfig
AnalyisConfig is at the stage of pre-release and protected by `namespace contrib` , which may be adjusted in the future. AnalyisConfig is at the stage of pre-release and protected by `namespace contrib` , which may be adjusted in the future.
...@@ -106,9 +106,11 @@ The usage of `AnalysisConfig` is similiar with that of `NativeConfig` but the fo ...@@ -106,9 +106,11 @@ The usage of `AnalysisConfig` is similiar with that of `NativeConfig` but the fo
```c++ ```c++
AnalysisConfig config; AnalysisConfig config;
config.model_dir = xxx; config.SetModel(dirname); // set the directory of the model
config.use_gpu = false; // GPU optimization is not supported at present config.EnableUseGpu(100, 0 /*gpu id*/); // use GPU,or
config.specify_input_name = true; // it needs to set name of input config.DisableGpu(); // use CPU
config.SwitchSpecifyInputNames(true); // need to appoint the name of your input
config.SwitchIrOptim(); // turn on the optimization switch,and a sequence of optimizations will be executed in operation
``` ```
Note that input PaddleTensor needs to be allocated. Previous examples need to be revised as follows: Note that input PaddleTensor needs to be allocated. Previous examples need to be revised as follows:
...@@ -147,7 +149,7 @@ For more specific examples, please refer to[LoD-Tensor Instructions](../../../us ...@@ -147,7 +149,7 @@ For more specific examples, please refer to[LoD-Tensor Instructions](../../../us
1. If the CPU type permits, it's best to use the versions with support for AVX and MKL. 1. If the CPU type permits, it's best to use the versions with support for AVX and MKL.
2. Reuse input and output `PaddleTensor` to avoid frequent memory allocation resulting in low performance 2. Reuse input and output `PaddleTensor` to avoid frequent memory allocation resulting in low performance
3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU inference 3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU or GPU inference
## Code Demo ## Code Demo
......
...@@ -9,9 +9,8 @@ This instruction will show you how to install PaddlePaddle on Windows. The foll ...@@ -9,9 +9,8 @@ This instruction will show you how to install PaddlePaddle on Windows. The foll
**Note** : **Note** :
* The current version does not support NCCL, distributed training, AVX, warpctc and MKL related functions. * The current version does not support NCCL, distributed training related functions.
* Currently, only PaddlePaddle for CPU is supported on Windows.
...@@ -30,14 +29,20 @@ Version of pip or pip3 should be equal to or above 9.0.1 . ...@@ -30,14 +29,20 @@ Version of pip or pip3 should be equal to or above 9.0.1 .
* Install PaddlePaddle * Install PaddlePaddle
* ***CPU version of PaddlePaddle***:
Execute `pip install paddlepaddle` or `pip3 install paddlepaddle` to download and install PaddlePaddle. Execute `pip install paddlepaddle` or `pip3 install paddlepaddle` to download and install PaddlePaddle.
* ***GPU version of PaddlePaddle***:
Execute `pip install paddlepaddle-gpu`(python2.7) or `pip3 install paddlepaddle-gpu`(python3.x) to download and install PaddlePaddle.
## ***Verify installation*** ## ***Verify installation***
After completing the installation, you can use `python` or `python3` to enter the python interpreter and then use `import paddle.fluid` to verify that the installation was successful. After completing the installation, you can use `python` or `python3` to enter the python interpreter and then use `import paddle.fluid` to verify that the installation was successful.
## ***How to uninstall*** ## ***How to uninstall***
* ***CPU version of PaddlePaddle***:
Use the following command to uninstall PaddlePaddle : `pip uninstallpaddlepaddle `or `pip3 uninstall paddlepaddle` Use the following command to uninstall PaddlePaddle : `pip uninstallpaddlepaddle `or `pip3 uninstall paddlepaddle`
* ***GPU version of PaddlePaddle***:
Use the following command to uninstall PaddlePaddle : `pip uninstall paddlepaddle-gpu` or `pip3 uninstall paddlepaddle-gpu`
...@@ -205,6 +205,25 @@ For example: ...@@ -205,6 +205,25 @@ For example:
Currently, distributed training using NCCL2 only supports synchronous training. The distributed training using NCCL2 mode is more suitable for the model which is relatively large and needs \ Currently, distributed training using NCCL2 only supports synchronous training. The distributed training using NCCL2 mode is more suitable for the model which is relatively large and needs \
synchronous training and GPU training. If the hardware device supports RDMA and GPU Direct, this can achieve high distributed training performance. synchronous training and GPU training. If the hardware device supports RDMA and GPU Direct, this can achieve high distributed training performance.
Start Up NCCL2 Distributed Training in Muti-Process Mode
++++++++++++++++++++++++++++++++++++++++++++++
Usually you can get better multi-training performance by using multi-process mode to start up NCCL2 distributed training assignment. Paddle provides :code:`paddle.distributed.launch` module to start up multi-process assignment, after which each training process will use an independent GPU device.
Attention during usage:
* set the number of nodes: set the number of nodes of an assignment by the environment variable :code:`PADDLE_NUM_TRAINERS` , and this variable will also be set in every training process.
* set the number of devices of each node: by activating the parameter :code:`--gpus` , you can set the number of GPU devices of each node, and the sequence number of each process will be set in the environment variable :code:`PADDLE_TRAINER_ID` automatically.
* data segment: mult-process mode means one process in each device. Generally, each process manages a part of training data, in order to make sure that all processes can manage the whole data set.
* entrance file: entrance file is the training script for actual startup.
* journal: for each training process, the joural is saved in the default :code:`./mylog` directory, and you can assign by the parameter :code:`--log_dir` .
startup example:
.. code-block:: bash
> PADDLE_NUM_TRAINERS=<TRAINER_COUNT> python -m paddle.distributed.launch train.py --gpus <NUM_GPUS_ON_HOSTS> <ENTRYPOINT_SCRIPT> --arg1 --arg2 ...
Important Notes on NCCL2 Distributed Training Important Notes on NCCL2 Distributed Training
++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++
...@@ -215,7 +234,7 @@ exit at the final iteration. There are two common ways: ...@@ -215,7 +234,7 @@ exit at the final iteration. There are two common ways:
- Each node only trains fixed number of batches per pass, which is controlled by python codes. If a node has more data than this fixed amount, then these - Each node only trains fixed number of batches per pass, which is controlled by python codes. If a node has more data than this fixed amount, then these
marginal data will not be trained. marginal data will not be trained.
**Note** : If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2. **Note** : If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2.
Assuming you need to use :code:`eth2` as the communication device, you need to set the following environment variables: Assuming you need to use :code:`eth2` as the communication device, you need to set the following environment variables:
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
- `训练神经网络 <../user_guides/howto/training/index_cn.html>`_:介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量 - `训练神经网络 <../user_guides/howto/training/index_cn.html>`_:介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量
- `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.md>`_:介绍在 Fluid 下使用DyGraph - `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.html>`_:介绍在 Fluid 下使用DyGraph
- `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index_cn.html>`_:介绍在 Fluid 下进行模型评估和调试的方法,包括: - `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index_cn.html>`_:介绍在 Fluid 下进行模型评估和调试的方法,包括:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册