VisualDL, a visualization analysis tool of PaddlePaddle, provides a variety of charts to show the trends of parameters, and visualizes model structures, data samples, histograms of tensors, PR curves , ROC curves and high-dimensional data distributions. It enables users to understand the training process and the model structure more clearly and intuitively so as to optimize models efficiently. For more information, please refer to [VisualDL](https://github.com/PaddlePaddle/VisualDL/).
## Use VisualDL in PaddleClas
Now PaddleClas support use VisualDL to visualize the changes of learning rate, loss, accuracy in training.
### Set config and start training
You only need to set the field `Global.use_visualdl` to `True` in train config:
```yaml
# config.yaml
Global:
...
use_visualdl:True
...
```
PaddleClas will save the VisualDL logs to subdirectory `vdl/` under the output directory specified by `Global.output_dir`. And then you just need to start training normally:
```shell
python3 tools/train.py -c config.yaml
```
### Start VisualDL
After starting the training program, you can start the VisualDL service in a new terminal session:
```shell
visualdl --logdir ./output/vdl/
```
In the above command, `--logdir` specify the directory of the VisualDL logs produced in training. VisualDL will traverse and iterate to find the subdirectories of the specified directory to visualize all the experimental results. You can also use the following parameters to set the IP and port number of the VisualDL service:
*`--host`:ip, default is 127.0.0.1
*`--port`:port, default is 8040
More information about the command,please refer to [VisualDL](https://github.com/PaddlePaddle/VisualDL/blob/develop/README.md#2-launch-panel).
Then you can enter the address `127.0.0.1:8840` and view the training process in the browser:
[PaddleHub](https://github.com/PaddlePaddle/PaddleHub) is a pre-trained model application tool for PaddlePaddle.
Developers can conveniently use the high-quality pre-trained model combined with Fine-tune API to quickly complete the whole process from model migration to deployment.
All the pre-trained models of [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) have been collected by PaddleHub.
For further details, please refer to [PaddleHub website](https://www.paddlepaddle.org.cn/hub).
[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) is a set of lightweight inference engine which is fully functional, easy to use and then performs well. Lightweighting is reflected in the use of fewer bits to represent the weight and activation of the neural network, which can greatly reduce the size of the model, solve the problem of limited storage space of the mobile device, and the inference speed is better than other frameworks on the whole.
In [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), we uses Paddle-Lite to [evaluate the performance on the mobile device](../models/Mobile_en.md), in this section we uses the `MobileNetV1` model trained on the `ImageNet1k` dataset as an example to introduce how to use `Paddle-Lite` to evaluate the model speed on the mobile terminal (evaluated on SD855)
## Evaluation Steps
### Export the Inference Model
* First you should transform the saved model during training to the special model which can be used to inference, the special model can be exported by `tools/export_model.py`, the specific way of transform is as follows.
Finally the `model` and `parmas` can be saved in `inference/MobileNetV1`.
### Download Benchmark Binary File
* Use the adb (Android Debug Bridge) tool to connect the Android phone and the PC, then develop and debug. After installing adb and ensuring that the PC and the phone are successfully connected, use the following command to view the ARM version of the phone and select the pre-compiled library based on ARM version.
After the PC and mobile phone are successfully connected, use the following command to start the model evaluation.
```
sh deploy/lite/benchmark/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
```
Where `./benchmark_bin_v8` is the path of the benchmark binary file, `./inference` is the path of all the models that need to be evaluated, `result_armv8.txt` is the result file, and the final parameter `true` means that the model will be optimized before evaluation. Eventually, the evaluation result file of `result_armv8.txt` will be saved in the current folder. The specific performances are as follows.
```
PaddleLite Benchmark
Threads=1 Warmup=10 Repeats=30
MobileNetV1 min = 30.89100 max = 30.73600 average = 30.79750
Threads=2 Warmup=10 Repeats=30
MobileNetV1 min = 18.26600 max = 18.14000 average = 18.21637
Threads=4 Warmup=10 Repeats=30
MobileNetV1 min = 10.03200 max = 9.94300 average = 9.97627
```
Here is the model inference speed under different number of threads, the unit is FPS, taking model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.79750FPS`.
### Model Optimization and Speed Evaluation
* In II.III section, we mention that the model will be optimized before evaluation, here you can first optimize the model, and then directly load the optimized model for speed evaluation
* Paddle-Lite
In Paddle-Lite, we provides multiple strategies to automatically optimize the original training model, which contain Quantify, Subgraph fusion, Hybrid scheduling, Kernel optimization and so on. In order to make the optimization more convenient and easy to use, we provide opt tools to automatically complete the optimization steps and output a lightweight, optimal and executable model in Paddle-Lite, which can be downloaded on [Paddle-Lite Model Optimization Page](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html). Here we take `MacOS` as our development environment, download[opt_mac](https://paddlelite-data.bj.bcebos.com/model_optimize_tool/opt_mac) model optimization tools and use the following commands to optimize the model.
```shell
model_file="../MobileNetV1/model"
param_file="../MobileNetV1/params"
opt_models_dir="./opt_models"
mkdir${opt_models_dir}
./opt_mac --model_file=${model_file}\
--param_file=${param_file}\
--valid_targets=arm \
--optimize_out_type=naive_buffer \
--prefer_int8_kernel=false\
--optimize_out=${opt_models_dir}/MobileNetV1
```
Where the `model_file` and `param_file` are exported model file and the file address respectively, after transforming successfully, the `MobileNetV1.nb` will be saved in `opt_models`
Use the benchmark_bin file to load the optimized model for evaluation. The commands are as follows.
Int8 quantization is one of the key features in [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim).
It supports two kinds of training aware, **Dynamic strategy** and **Static strategy**,
layer-wise and channel-wise quantization,
and using PaddleLite to deploy models generated by PaddleSlim.
By using this toolkit, [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) quantized the mobilenet_v3_large_x1_0 model whose accuracy is 78.9% after distilled.
After quantized, the prediction speed is accelerated from 19.308ms to 14.395ms on SD855.
The storage size is reduced from 21M to 10M.
The top1 recognition accuracy rate is 75.9%.
For specific training methods, please refer to [PaddleSlim quant aware](../../../deploy/slim/README_en.md)。
[Paddle Serving](https://github.com/PaddlePaddle/Serving) aims to help deep-learning researchers to easily deploy online inference services, supporting one-click deployment of industry, high concurrency and efficient communication between client and server and supporting multiple programming languages to develop clients.
Taking HTTP inference service deployment as an example to introduce how to use PaddleServing to deploy model services in PaddleClas.
## Serving Install
It is recommends to use docker to install and deploy the Serving environment in the Serving official website, first, you need to pull the docker environment and create Serving-based docker.
`9292` is the port for sending the request, which is consistent with the Serving starting port, and `./docs/images/logo.png` is the test image, the final top1 label and probability are returned.
* For more Serving deployment, such RPC inference service, you can refer to the Serving official website: [https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet)
[The NVIDIA Data Loading Library](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html)(DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It can build Dataloader of Paddle.
Since the Deep learning relies on a large amount of data in the training stage, these data need to be loaded and preprocessed. These operations are usually executed on the CPU, which limits the further improvement of the training speed, especially when the batch_size is large, which become the bottleneck of speed. DALI can use GPU to accelerate these operations, thereby further improve the training speed.
## Installing DALI
DALI only support Linux x64 and version of CUDA is 10.2 or later.
For more information about installing DALI, please refer to [DALI](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html).
## Using DALI
Paddleclas supports training with DALI in static graph. Since DALI only supports GPU training, `CUDA_VISIBLE_DEVICES` needs to be set, and DALI needs to occupy GPU memory, so it needs to reserve GPU memory for Dali. To train with DALI, just set the fields in the training config `use_dali = True`, or start the training by the following command:
```shell
# set the GPUs that can be seen
export CUDA_VISIBLE_DEVICES="0"
# set the GPU memory used for neural network training, generally 0.8 or 0.7, and the remaining GPU memory is reserved for DALI
At present, **PaddleClas** requires **PaddlePaddle** version **>=2.0**. Docker is recomended to run Paddleclas, for more detailed information about docker and nvidia-docker, you can refer to the [tutorial](https://docs.docker.com/get-started/). If you do not want to use docker, you can skip section [2. (Recommended) Prepare a docker environment](#2), and go into section [3. Install PaddlePaddle using pip](#3).
At present, **PaddleClas** requires **PaddlePaddle** version `>=2.0`. Docker is recomended to run Paddleclas, for more detailed information about docker and nvidia-docker, you can refer to the [tutorial](https://docs.docker.com/get-started/). If you do not want to use docker, you can skip section [2. (Recommended) Prepare a docker environment](#2), and go into section [3. Install PaddlePaddle using pip](#3).
* Make sure the compiled source code is later than PaddlePaddle2.0.
* Indicate **WITH_DISTRIBUTE=ON** when compiling, Please refer to [Instruction](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3) for more details.
* Indicate `WITH_DISTRIBUTE=ON` when compiling, Please refer to [Instruction](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3) for more details.
* When running in docker, in order to ensure that the container has enough shared memory for dataloader acceleration of Paddle, please set the parameter `--shm-size=8g` at creating a docker container, if conditions permit, you can set it to a larger value.