Since the Deep learning relies on a large amount of data in the training stage, these data need to be loaded and preprocessed. These operations are usually executed on the CPU, which limits the further improvement of the training speed, especially when the batch_size is large, which become the bottleneck of speed. DALI can use GPU to accelerate these operations, thereby further improve the training speed.
## Installing DALI
DALI only support Linux x64 and version of CUDA is 10.0 or later.
DALI only support Linux x64 and version of CUDA is 10.2 or later.
* For CUDA 10:
...
...
@@ -25,7 +25,7 @@ Paddleclas supports training with DALI in static graph. Since DALI only supports
# set the GPUs that can be seen
export CUDA_VISIBLE_DEVICES="0"
# set the GPU memory used for neural network training, generally 0.8 or 0.7
# set the GPU memory used for neural network training, generally 0.8 or 0.7, and the remaining GPU memory is reserved for DALI
@@ -26,16 +26,24 @@ Among them, `-c` is used to specify the path of the configuration file, `-o` is
Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_en.md).
* The output log examples are as follows:
* If mixup or cutmix is used in training, only loss, lr (learning rate) and training time of the minibatch will be printed in the log.
* If mixup or cutmix is used in training, top-1 and top-k (default by 5) will not be printed in the log:
* If mixup or cutmix is not used during training, in addition to loss, lr (learning rate) and the training time of the minibatch, top-1 and top-k( The default is 5) will also be printed in the log.
* If mixup or cutmix is not used during training, in addition to the above information, top-1 and top-k (The default is 5) will also be printed in the log: