未验证 提交 e14a084c 编写于 作者: C Cheerego 提交者: GitHub

add translation and fix deadlink (#814)

* hotfix deadlink (#811)

* Update native_infer_en.md (#787)

* Update install_Windows_en.md (#790)

* Update install_Windows_en.md

* Update install_Windows_en.md

* Update cluster_howto_en.rst (#791)

* Update cluster_howto_en.rst

* Update cluster_howto_en.rst

* Update doc/fluid/user_guides/howto/training/cluster_howto_en.rst
Co-Authored-By: Nacosta123 <42226556+acosta123@users.noreply.github.com>

* Update doc/fluid/user_guides/howto/training/cluster_howto_en.rst
Co-Authored-By: Nacosta123 <42226556+acosta123@users.noreply.github.com>

* Update cluster_howto_en.rst

* Update index_cn.rst (#813)
上级 e9fa88b4
......@@ -96,7 +96,7 @@ There are two modes in term of memory management in `PaddleBuf` :
In the two modes, the first is more convenient while the second strictly controls memory management to facilitate integration with `tcmalloc` and other libraries.
### Upgrade performance based on contrib::AnalysisConfig (Prerelease)
### Upgrade performance based on contrib::AnalysisConfig
AnalyisConfig is at the stage of pre-release and protected by `namespace contrib` , which may be adjusted in the future.
......@@ -106,9 +106,11 @@ The usage of `AnalysisConfig` is similiar with that of `NativeConfig` but the fo
```c++
AnalysisConfig config;
config.model_dir = xxx;
config.use_gpu = false; // GPU optimization is not supported at present
config.specify_input_name = true; // it needs to set name of input
config.SetModel(dirname); // set the directory of the model
config.EnableUseGpu(100, 0 /*gpu id*/); // use GPU,or
config.DisableGpu(); // use CPU
config.SwitchSpecifyInputNames(true); // need to appoint the name of your input
config.SwitchIrOptim(); // turn on the optimization switch,and a sequence of optimizations will be executed in operation
```
Note that input PaddleTensor needs to be allocated. Previous examples need to be revised as follows:
......@@ -147,7 +149,7 @@ For more specific examples, please refer to[LoD-Tensor Instructions](../../../us
1. If the CPU type permits, it's best to use the versions with support for AVX and MKL.
2. Reuse input and output `PaddleTensor` to avoid frequent memory allocation resulting in low performance
3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU inference
3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU or GPU inference
## Code Demo
......
......@@ -9,9 +9,8 @@ This instruction will show you how to install PaddlePaddle on Windows. The foll
**Note** :
* The current version does not support NCCL, distributed training, AVX, warpctc and MKL related functions.
* The current version does not support NCCL, distributed training related functions.
* Currently, only PaddlePaddle for CPU is supported on Windows.
......@@ -30,14 +29,20 @@ Version of pip or pip3 should be equal to or above 9.0.1 .
* Install PaddlePaddle
* ***CPU version of PaddlePaddle***:
Execute `pip install paddlepaddle` or `pip3 install paddlepaddle` to download and install PaddlePaddle.
* ***GPU version of PaddlePaddle***:
Execute `pip install paddlepaddle-gpu`(python2.7) or `pip3 install paddlepaddle-gpu`(python3.x) to download and install PaddlePaddle.
## ***Verify installation***
After completing the installation, you can use `python` or `python3` to enter the python interpreter and then use `import paddle.fluid` to verify that the installation was successful.
## ***How to uninstall***
* ***CPU version of PaddlePaddle***:
Use the following command to uninstall PaddlePaddle : `pip uninstallpaddlepaddle `or `pip3 uninstall paddlepaddle`
* ***GPU version of PaddlePaddle***:
Use the following command to uninstall PaddlePaddle : `pip uninstall paddlepaddle-gpu` or `pip3 uninstall paddlepaddle-gpu`
......@@ -205,6 +205,25 @@ For example:
Currently, distributed training using NCCL2 only supports synchronous training. The distributed training using NCCL2 mode is more suitable for the model which is relatively large and needs \
synchronous training and GPU training. If the hardware device supports RDMA and GPU Direct, this can achieve high distributed training performance.
Start Up NCCL2 Distributed Training in Muti-Process Mode
++++++++++++++++++++++++++++++++++++++++++++++
Usually you can get better multi-training performance by using multi-process mode to start up NCCL2 distributed training assignment. Paddle provides :code:`paddle.distributed.launch` module to start up multi-process assignment, after which each training process will use an independent GPU device.
Attention during usage:
* set the number of nodes: set the number of nodes of an assignment by the environment variable :code:`PADDLE_NUM_TRAINERS` , and this variable will also be set in every training process.
* set the number of devices of each node: by activating the parameter :code:`--gpus` , you can set the number of GPU devices of each node, and the sequence number of each process will be set in the environment variable :code:`PADDLE_TRAINER_ID` automatically.
* data segment: mult-process mode means one process in each device. Generally, each process manages a part of training data, in order to make sure that all processes can manage the whole data set.
* entrance file: entrance file is the training script for actual startup.
* journal: for each training process, the joural is saved in the default :code:`./mylog` directory, and you can assign by the parameter :code:`--log_dir` .
startup example:
.. code-block:: bash
> PADDLE_NUM_TRAINERS=<TRAINER_COUNT> python -m paddle.distributed.launch train.py --gpus <NUM_GPUS_ON_HOSTS> <ENTRYPOINT_SCRIPT> --arg1 --arg2 ...
Important Notes on NCCL2 Distributed Training
++++++++++++++++++++++++++++++++++++++++++++++
......@@ -215,7 +234,7 @@ exit at the final iteration. There are two common ways:
- Each node only trains fixed number of batches per pass, which is controlled by python codes. If a node has more data than this fixed amount, then these
marginal data will not be trained.
**Note** : If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2.
**Note** : If there are multiple network devices in the system, you need to manually specify the devices used by NCCL2.
Assuming you need to use :code:`eth2` as the communication device, you need to set the following environment variables:
......
......@@ -15,7 +15,7 @@
- `训练神经网络 <../user_guides/howto/training/index_cn.html>`_:介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量
- `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.md>`_:介绍在 Fluid 下使用DyGraph
- `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.html>`_:介绍在 Fluid 下使用DyGraph
- `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index_cn.html>`_:介绍在 Fluid 下进行模型评估和调试的方法,包括:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册