train_with_DALI_en.md 2.7 KB
Newer Older
G
gaotingquan 已提交
1 2
# Train with DALI

3
---
G
gaotingquan 已提交
4

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## Contents
* [1. Preface](#1)
* [2. Installing DALI](#2)
* [3. Using DALI](#3)
* [4. Train with FP16](#4)

<a name='1'></a>

## 1. Preface

[The NVIDIA Data Loading Library](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html) (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It can build Dataloader of PaddlePaddle.

Since the Deep Learning relies on a large amount of data in the training stage, these data need to be loaded and preprocessed. These operations are usually executed on the CPU, which limits the further improvement of the training speed, especially when the batch_size is large, which become the bottleneck of training speed. DALI can use GPU to accelerate these operations, thereby further improve the training speed.

<a name='2'></a>

## 2. Installing DALI
G
gaotingquan 已提交
22 23 24 25 26 27 28 29 30 31 32 33 34

DALI only support Linux x64 and version of CUDA is 10.2 or later.

* For CUDA 10:

    pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100

* For CUDA 11.0:

    pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110

For more information about installing DALI, please refer to [DALI](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html).

35 36 37 38 39
<a name='3'></a>

## 3. Using DALI

Paddleclas supports training with DALI. Since DALI only supports GPU training, `CUDA_VISIBLE_DEVICES` needs to be set, and DALI needs to occupy GPU memory, so it needs to reserve GPU memory for Dali. To train with DALI, just set the fields in the training config `use_dali = True`, or start the training by the following command:
G
gaotingquan 已提交
40 41 42 43 44

```shell
# set the GPUs that can be seen
export CUDA_VISIBLE_DEVICES="0"

45
python ppcls/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml -o Global.use_dali=True
G
gaotingquan 已提交
46 47 48 49 50 51
```

And you can train with muti-GPUs:

```shell
# set the GPUs that can be seen
52
export CUDA_VISIBLE_DEVICES="0,1,2,3"
G
gaotingquan 已提交
53 54 55 56 57

# set the GPU memory used for neural network training, generally 0.8 or 0.7, and the remaining GPU memory is reserved for DALI
export FLAGS_fraction_of_gpu_memory_to_use=0.80

python -m paddle.distributed.launch \
58 59 60 61
    --gpus="0,1,2,3" \
    ppcls/train.py \
        -c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml \
        -o Global.use_dali=True
G
gaotingquan 已提交
62 63
```

64 65 66
<a name='4'></a>

## 4. Train with FP16
G
gaotingquan 已提交
67 68 69 70

On the basis of the above, using FP16 half-precision can further improve the training speed, you can refer to the following command.

```shell
71
export CUDA_VISIBLE_DEVICES=0,1,2,3
G
gaotingquan 已提交
72 73 74
export FLAGS_fraction_of_gpu_memory_to_use=0.8

python -m paddle.distributed.launch \
75 76 77
    --gpus="0,1,2,3" \
    ppcls/train.py \
    -c ./ppcls/configs/ImageNet/ResNet/ResNet50_fp16_dygraph.yaml
G
gaotingquan 已提交
78
```