diff --git a/docs/en/tutorials/getting_started_en.md b/docs/en/tutorials/getting_started_en.md index b535d4c77df464d958c991b67380685c514243a2..dccaec8f9d03d898b7a541011200bfc29e48fd12 100644 --- a/docs/en/tutorials/getting_started_en.md +++ b/docs/en/tutorials/getting_started_en.md @@ -2,37 +2,105 @@ --- Please refer to [Installation](install.md) to setup environment at first, and prepare ImageNet1K data by following the instruction mentioned in the [data](data.md) -## Setup +## 1. Training and Evaluation on Windows or CPU -**Setup PYTHONPATH:** +If training and evaluation are performed on Windows system or CPU, it is recommended to use the `tools/train_multi_platform.py` and `tools/eval_multi_platform.py` scripts. + + +## 1.1 Model training + +After preparing the configuration file, The training process can be started in the following way. + +``` +python tools/train_multi_platform.py \ + -c configs/ResNet/ResNet50.yaml \ + -o model_save_dir=./output/ \ + -o use_gpu=True +``` + +Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o model_save_dir=./output/` means to modify the `model_save_dir` in the configuration file to ` ./output/`. `-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`. + + +Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config.md). + +* The output log examples are as follows: + * If mixup or cutmix is used in training, only loss, lr (learning rate) and training time of the minibatch will be printed in the log. + + ``` + train step:890 loss: 6.8473 lr: 0.100000 elapse: 0.157s + ``` + + * If mixup or cutmix is not used during training, in addition to loss, lr (learning rate) and the training time of the minibatch, top-1 and top-k( The default is 5) will also be printed in the log. + + ``` + epoch:0 train step:13 loss:7.9561 top1:0.0156 top5:0.1094 lr:0.100000 elapse:0.193s + ``` + +During training, you can view loss changes in real time through `VisualDL`. The command is as follows. ```bash -export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH +visualdl --logdir ./scalar --host --port ``` -## Training and validating +### 1.2 Model finetuning -PaddleClas support `tools/train.py` and `tools/eval.py` to start training and validating. +* After configuring the configuration file, you can finetune it by loading the pretrained weights, The command is as shown below. -### Training +``` +python tools/train_multi_platform.py \ + -c configs/ResNet/ResNet50.yaml \ + -o pretrained_model="./pretrained/ResNet50_pretrained" +``` + +Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. + +### 1.3 Resume Training + +* If the training process is terminated for some reasons, you can also load the checkpoints to continue training. + +``` +python tools/train_multi_platform.py \ + -c configs/ResNet/ResNet50.yaml \ + -o checkpoints="./output/ResNet/0/ppcls" +``` + +The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, earning rate, optimizer and other information will be loaded using this parameter. + + +### 1.4 Model evaluation + +* The model evaluation process can be started as follows. ```bash -# PaddleClas use paddle.distributed.launch to start multi-cards and multiprocess training. -# Set FLAGS_selected_gpus to indicate GPU cards +python tools/eval_multi_platform.py \ + -c ./configs/eval.yaml \ + -o ARCHITECTURE.name="ResNet50_vd" \ + -o pretrained_model=path_to_pretrained_models +``` +You can modify the `ARCHITECTURE.name` field and `pretrained_model` field in `configs/eval.yaml` to configure the evaluation model, and you also can update the configuration through the -o parameter. + + +**Note:** When loading the pretrained model, you need to specify the prefix of the pretrained model. For example, the pretrained model path is `output/ResNet50_vd/19`, and the pretrained model filename is `output/ResNet50_vd/19/ppcls.pdparams`, the parameter `pretrained_model` needs to be specified as `output/ResNet50_vd/19/ppcls`, PaddleClas will automatically fill in the `.pdparams` suffix. + +### 2. Training and evaluation on Linux+GPU + +If you want to run PaddleClas on Linux with GPU, it is highly recommended to use the model training and evaluation scripts provided by PaddleClas: `tools/train.py` and `tools/eval.py`. + +### 2.1 Model training + +After preparing the configuration file, The training process can be started in the following way. + +```bash +# PaddleClas starts multi-card and multi-process training through launch +# Specify the GPU running card number by setting FLAGS_selected_gpus python -m paddle.distributed.launch \ --selected_gpus="0,1,2,3" \ tools/train.py \ -c ./configs/ResNet/ResNet50_vd.yaml ``` -- log: - -``` -epoch:0 train step:13 loss:7.9561 top1:0.0156 top5:0.1094 lr:0.100000 elapse:0.193 -``` - -add -o params to update configuration +The configuration can be updated by adding the `-o` parameter. ```bash python -m paddle.distributed.launch \ @@ -40,48 +108,62 @@ python -m paddle.distributed.launch \ tools/train.py \ -c ./configs/ResNet/ResNet50_vd.yaml \ -o use_mix=1 \ - --vdl_dir=./scalar/ - + --vdl_dir=./scalar/ ``` -- log: +The format of output log information is the same as above. + + + +### 2.2 Model finetuning + +* After configuring the configuration file, you can finetune it by loading the pretrained weights, The command is as shown below. ``` -epoch:0 train step:522 loss:1.6330 lr:0.100000 elapse:0.210 +python -m paddle.distributed.launch \ + --selected_gpus="0,1,2,3" \ + tools/train.py \ + -c configs/ResNet/ResNet50.yaml \ + -o pretrained_model="./pretrained/ResNet50_pretrained" ``` -or modify configuration directly to config fileds, please refer to [config](config.md) for more details. +Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. -use visuldl to visulize training loss in the real time +* There contains a lot of examples of model finetuning in [The quick start tutorial](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset. -```bash -visualdl --logdir ./scalar --host --port +### 2.3 Resume Training -``` +* If the training process is terminated for some reasons, you can also load the checkpoints to continue training. +``` +python -m paddle.distributed.launch \ + --selected_gpus="0,1,2,3" \ + tools/train.py \ + -c configs/ResNet/ResNet50.yaml \ + -o checkpoints="./output/ResNet/0/ppcls" +``` -### finetune +The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. -* please refer to [Trial](./quick_start.md) for more details. +### 2.4 Model evaluation -### validation +* The model evaluation process can be started as follows. ```bash -python tools/eval.py \ +python tools/eval_multi_platform.py \ -c ./configs/eval.yaml \ -o ARCHITECTURE.name="ResNet50_vd" \ -o pretrained_model=path_to_pretrained_models +``` -modify `configs/eval.yaml filed: `ARCHITECTURE.name` and filed: `pretrained_model` to config valid model or add -o params to update config directly. +You can modify the `ARCHITECTURE.name` field and `pretrained_model` field in `configs/eval.yaml` to configure the evaluation model, and you also can update the configuration through the -o parameter. -**NOTE: ** when loading the pretrained model, should ignore the suffix ```.pdparams``` +## 3. Model inference -## Predict +PaddlePaddle provides three ways to perform model inference. Next, how to use the inference engine to perforance model inference will be introduced. -PaddlePaddle supprot three predict interfaces -Use predicator interface to predict -First, export inference model +Firstly, you should export inference model using `tools/export_model.py`. ```bash python tools/export_model.py \ @@ -90,7 +172,8 @@ python tools/export_model.py \ --output_path=save_inference_dir ``` -Second, start predicator enginee: + +Secondly, Inference engine can be started using the following commands. ```bash python tools/infer/predict.py \ @@ -100,4 +183,4 @@ python tools/infer/predict.py \ --use_gpu=1 \ --use_tensorrt=True ``` -please refer to [inference](../extension/paddle_inference.md) for more details. +please refer to [inference](../extension/paddle_inference_en.md) for more details. diff --git a/docs/en/tutorials/install_en.md b/docs/en/tutorials/install_en.md index b79f397b3052cc1bdb514c499c0c5219edf22f52..8f646a45ae23a7187d6b6842025ea39a7f74737f 100644 --- a/docs/en/tutorials/install_en.md +++ b/docs/en/tutorials/install_en.md @@ -51,3 +51,12 @@ git clone https://github.com/PaddlePaddle/PaddleClas.git ``` pip install --upgrade -r requirements.txt ``` + +If the install process of visualdl failed, you can try the following commands. + +``` +pip3 install --upgrade visualdl==2.0.0b3 -i https://mirror.baidu.com/pypi/simple + +``` + +What's more, visualdl is just supported in python3, so python3 is needed if you want to use visualdl. diff --git a/docs/zh_CN/tutorials/getting_started.md b/docs/zh_CN/tutorials/getting_started.md index 8790faf9c037d9f5b7902eff96eabc922759bba8..a104e261ab5e0b9cf305cb9f02efcfa5ab4b0a7a 100644 --- a/docs/zh_CN/tutorials/getting_started.md +++ b/docs/zh_CN/tutorials/getting_started.md @@ -2,17 +2,89 @@ --- 请事先参考[安装指南](install.md)配置运行环境,并根据[数据说明](./data.md)文档准备ImageNet1k数据,本章节下面所有的实验均以ImageNet1k数据集为例。 -## 一、设置环境变量 +## 1. Windows或者CPU上训练与评估 -**设置PYTHONPATH环境变量:** +如果在windows系统或者CPU上进行训练与评估,推荐使用`tools/train_multi_platform.py`与`tools/eval_multi_platform.py`脚本。 + +### 1.1 模型训练 + +准备好配置文件之后,可以使用下面的方式启动训练。 + +``` +python tools/train_multi_platform.py \ + -c configs/ResNet/ResNet50.yaml \ + -o model_save_dir=./output/ \ + -o use_gpu=True +``` + +其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,`-o model_save_dir=./output/`表示将配置文件中的`model_save_dir`修改为`./output/`。`-o use_gpu=True`表示使用GPU进行训练。如果希望使用CPU进行训练,则需要将`use_gpu`设置为`False`。 + +也可以直接修改模型对应的配置文件更新配置。具体配置参数参考[配置文档](config.md)。 + +* 输出日志示例如下: + + * 如果在训练使用了mixup或者cutmix的数据增广方式,那么日志中只会打印出loss(损失)、lr(学习率)以及该minibatch的训练时间。 + + ``` + train step:890 loss: 6.8473 lr: 0.100000 elapse: 0.157s + ``` + + * 如果训练过程中没有使用mixup或者cutmix的数据增广,那么除了loss(损失)、lr(学习率)以及该minibatch的训练时间之外,日志中也会打印出top-1与top-k(默认为5)的信息。 + + ``` + epoch:0 train step:13 loss:7.9561 top1:0.0156 top5:0.1094 lr:0.100000 elapse:0.193s + ``` + +训练期间可以通过VisualDL实时观察loss变化,启动命令如下: ```bash -export PYTHONPATH=path_to_PaddleClas:$PYTHONPATH +visualdl --logdir ./scalar --host --port ``` -## 二、模型训练与评估 +### 1.2 模型微调 -PaddleClas 提供模型训练与评估脚本:`tools/train.py`和`tools/eval.py` +* 根据自己的数据集配置好配置文件之后,可以通过加载预训练模型进行微调,如下所示。 + +``` +python tools/train_multi_platform.py \ + -c configs/ResNet/ResNet50.yaml \ + -o pretrained_model="./pretrained/ResNet50_pretrained" +``` + +其中`pretrained_model`用于设置加载预训练权重的地址,使用时需要换成自己的预训练模型权重路径,也可以直接在配置文件中修改该路径。 + +### 1.3 模型恢复训练 + +* 如果训练任务因为其他原因被终止,也可以加载断点权重继续训练。 + +``` +python tools/train_multi_platform.py \ + -c configs/ResNet/ResNet50.yaml \ + -o checkpoints="./output/ResNet/0/ppcls" +``` + +其中配置文件不需要做任何修改,只需要在训练时添加`checkpoints`参数即可,表示加载的断点权重路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。 + + +### 1.4 模型评估 + +* 可以通过以下命令完成模型评估。 + +```bash +python tools/eval_multi_platform.py \ + -c ./configs/eval.yaml \ + -o ARCHITECTURE.name="ResNet50_vd" \ + -o pretrained_model=path_to_pretrained_models +``` + +可以更改`configs/eval.yaml`中的`ARCHITECTURE.name`字段和`pretrained_model`字段来配置评估模型,也可以通过-o参数更新配置。 + +**注意:** 加载预训练模型时,需要指定预训练模型的前缀,例如预训练模型参数所在的文件夹为`output/ResNet50_vd/19`,预训练模型参数的名称为`output/ResNet50_vd/19/ppcls.pdparams`,则`pretrained_model`参数需要指定为`output/ResNet50_vd/19/ppcls`,PaddleClas会自动补齐`.pdparams`的后缀。 + + +## 2. 基于Linux+GPU的模型训练与评估 + +如果机器环境为Linux+GPU,那么推荐使用PaddleClas 提供的模型训练与评估脚本:`tools/train.py`和`tools/eval.py`,可以更快地完成训练与评估任务。 ### 2.1 模型训练 @@ -28,12 +100,6 @@ python -m paddle.distributed.launch \ -c ./configs/ResNet/ResNet50_vd.yaml ``` -- 输出日志示例如下: - -``` -epoch:0 train step:13 loss:7.9561 top1:0.0156 top5:0.1094 lr:0.100000 elapse:0.193 -``` - 可以通过添加-o参数来更新配置: ```bash @@ -42,41 +108,58 @@ python -m paddle.distributed.launch \ tools/train.py \ -c ./configs/ResNet/ResNet50_vd.yaml \ -o use_mix=1 \ - --vdl_dir=./scalar/ - + --vdl_dir=./scalar/ ``` -- 输出日志示例如下: +输出日志信息的格式同上。 + +### 2.2 模型微调 + +* 根据自己的数据集配置好配置文件之后,可以通过加载预训练模型进行微调,如下所示。 ``` -epoch:0 train step:522 loss:1.6330 lr:0.100000 elapse:0.210 +python -m paddle.distributed.launch \ + --selected_gpus="0,1,2,3" \ + tools/train.py \ + -c configs/ResNet/ResNet50.yaml \ + -o pretrained_model="./pretrained/ResNet50_pretrained" ``` -也可以直接修改模型对应的配置文件更新配置。具体配置参数参考[配置文档](config.md)。 +其中`pretrained_model`用于设置加载预训练权重的地址,使用时需要换成自己的预训练模型权重路径,也可以直接在配置文件中修改该路径。 -训练期间可以通过VisualDL实时观察loss变化,启动命令如下: +* [30分钟玩转PaddleClas教程](./quick_start.md)中包含大量模型微调的示例,可以参考该章节在特定的数据集上进行模型微调。 -```bash -visualdl --logdir ./scalar --host --port + +### 2.3 模型恢复训练 + +* 如果训练任务,因为其他原因被终止,也可以加载断点权重继续训练。 ``` +python -m paddle.distributed.launch \ + --selected_gpus="0,1,2,3" \ + tools/train.py \ + -c configs/ResNet/ResNet50.yaml \ + -o checkpoints="./output/ResNet/0/ppcls" +``` +其中配置文件不需要做任何修改,只需要在训练时添加`checkpoints`参数即可,表示加载的断点权重路径,使用该参数会同时加载保存的模型参数权重和学习率、优化器等信息。 -### 2.2 模型微调 -* [30分钟玩转PaddleClas](./quick_start.md)中包含大量模型微调的示例,可以参考该章节在特定的数据集上进行模型微调。 +### 2.4 模型评估 -### 2.3 模型评估 +* 可以通过以下命令完成模型评估。 ```bash -python tools/eval.py \ - -c ./configs/eval.yaml \ - -o ARCHITECTURE.name="ResNet50_vd" \ - -o pretrained_model=path_to_pretrained_models +python -m paddle.distributed.launch \ + --selected_gpus="0" \ + tools/eval.py \ + -c ./configs/eval.yaml \ + -o ARCHITECTURE.name="ResNet50_vd" \ + -o pretrained_model=path_to_pretrained_models ``` + 可以更改configs/eval.yaml中的`ARCHITECTURE.name`字段和pretrained_model字段来配置评估模型,也可以通过-o参数更新配置。 -**注意:** 加载预训练模型时,需要指定预训练模型的前缀,例如预训练模型参数所在的文件夹为`output/ResNet50_vd/19`,预训练模型参数的名称为`output/ResNet50_vd/19/ppcls.pdparams`,则`pretrained_model`参数需要指定为`output/ResNet50_vd/19/ppcls`,PaddleClas会自动补齐`.pdparams`的后缀。 ## 三、模型推理 @@ -97,6 +180,6 @@ python tools/infer/predict.py \ -p params文件路径 \ -i 图片路径 \ --use_gpu=1 \ - --use_tensorrt=True + --use_tensorrt=False ``` 更多使用方法和推理方式请参考[分类预测框架](../extension/paddle_inference.md)。 diff --git a/docs/zh_CN/tutorials/install.md b/docs/zh_CN/tutorials/install.md index 861c40041083181239922a3e876f9d302f5618dc..449b8238e80b4ec3b75dc76c1cd87f6186b97319 100644 --- a/docs/zh_CN/tutorials/install.md +++ b/docs/zh_CN/tutorials/install.md @@ -67,3 +67,5 @@ visualdl可能出现安装失败,请尝试 pip3 install --upgrade visualdl==2.0.0b3 -i https://mirror.baidu.com/pypi/simple ``` + +此外,visualdl目前只支持在python3下运行,因此如果希望使用visualdl,需要使用python3。 diff --git a/tools/eval.py b/tools/eval.py index 7aa370c62bf7ac271f30c955a4b1978a93e51586..768dc4b374a1a871282f3b42b76cf0f30f950dcb 100644 --- a/tools/eval.py +++ b/tools/eval.py @@ -17,6 +17,11 @@ from __future__ import division from __future__ import print_function import os +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../'))) + import argparse import paddle.fluid as fluid diff --git a/tools/eval.sh b/tools/eval.sh index 7da3ad4c9411a532966d3544558f87e027050492..a86f56c8f0d4e456560c0495a525e2b36783ceb5 100644 --- a/tools/eval.sh +++ b/tools/eval.sh @@ -1,5 +1,3 @@ -export PYTHONPATH=$PWD:$PYTHONPATH - python -m paddle.distributed.launch \ --selected_gpus="0" \ tools/eval.py \ diff --git a/tools/eval_multi_platform.py b/tools/eval_multi_platform.py index 90f87f7e45aa42feaa3e08047a61903fb4b758cb..4083de6e2fdb442ebd999f704478206844a6a218 100644 --- a/tools/eval_multi_platform.py +++ b/tools/eval_multi_platform.py @@ -17,6 +17,11 @@ from __future__ import division from __future__ import print_function import os +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../'))) + import argparse import paddle.fluid as fluid diff --git a/tools/export_model.py b/tools/export_model.py index 763ce473c45d60ba577d65782f5d4737f38a10bc..c04c5018e0965b0e1f5aa5f0d94133a1974ac8db 100644 --- a/tools/export_model.py +++ b/tools/export_model.py @@ -12,6 +12,12 @@ # See the License for the specific language governing permissions and # limitations under the License. +import os +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../'))) + import argparse from ppcls.modeling import architectures diff --git a/tools/export_serving_model.py b/tools/export_serving_model.py index e6e4472cdbf8dfd1738dede98b5aa61121f8191a..10632ecd31fe0cf8734edef816d130af2c0738c6 100644 --- a/tools/export_serving_model.py +++ b/tools/export_serving_model.py @@ -14,6 +14,11 @@ import argparse import os +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../'))) + from ppcls.modeling import architectures import paddle.fluid as fluid diff --git a/tools/infer/infer.py b/tools/infer/infer.py index 83c1fb4b475d67c116c884319d462558de34caca..ec768fbabf6cefca0d5c82ff7eb80e07b4e2162a 100644 --- a/tools/infer/infer.py +++ b/tools/infer/infer.py @@ -13,13 +13,17 @@ # limitations under the License. import os -import utils +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../../'))) + import argparse import numpy as np - import paddle.fluid as fluid from ppcls.modeling import architectures +import utils def parse_args(): diff --git a/tools/run.sh b/tools/run.sh index 5e8043b1205f99dada7448d172f84cd0661df509..ad1e5448904815dc35c689b02f6b7a0094eddd22 100755 --- a/tools/run.sh +++ b/tools/run.sh @@ -1,7 +1,5 @@ #!/usr/bin/env bash -export PYTHONPATH=$PWD:$PYTHONPATH - python -m paddle.distributed.launch \ --selected_gpus="0,1,2,3" \ tools/train.py \ diff --git a/tools/train.py b/tools/train.py index 43c399fa6e0bc5466347e4eaa6d181e8354c3e76..a9c5855799684f6990adfebecabd8df88dc5fa6e 100644 --- a/tools/train.py +++ b/tools/train.py @@ -18,6 +18,11 @@ from __future__ import print_function import argparse import os +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../'))) + from sys import version_info import paddle.fluid as fluid diff --git a/tools/train_multi_platform.py b/tools/train_multi_platform.py index 0b1ea2c1beb19ba03b3be39e9a759f45800233c8..4477039301ec547c13716e5f561abf83a3fa7949 100644 --- a/tools/train_multi_platform.py +++ b/tools/train_multi_platform.py @@ -18,6 +18,11 @@ from __future__ import print_function import argparse import os +import sys +__dir__ = os.path.dirname(os.path.abspath(__file__)) +sys.path.append(__dir__) +sys.path.append(os.path.abspath(os.path.join(__dir__, '../'))) + from sys import version_info import paddle.fluid as fluid