add docs of train_infer on windows (#5462)

* add docs of train_infer on windows * modify according to comments * modify according to comments2

add docs of train_infer on windows (#5462)
* add docs of train_infer on windows * modify according to comments * modify according to comments2
e7b2fdcd · Sing_chan · GitHub · 7e86f81d · e7b2fdcd · e7b2fdcd
6 changed file
--- a/tutorials/mobilenetv3_prod/Step6/README.md
+++ b/tutorials/mobilenetv3_prod/Step6/README.md
@@ -71,7 +71,7 @@ cd models/tutorials/mobilenetv3_prod/Step6
 * 安装paddlepaddle：如果您已经安装了2.2或者以上版本的paddlepaddle，那么无需运行下面的命令安装paddlepaddle。

 ```bash
-# 需要安装2.2及以上版本的Paddle，如果
+# 需要安装2.2及以上版本的Paddle
 # 安装GPU版本的Paddle
 pip install paddlepaddle-gpu==2.2.0
 # 安装CPU版本的Paddle

--- a/tutorials/mobilenetv3_prod/Step6/docs/windows_train_infer_python.md
+++ b/tutorials/mobilenetv3_prod/Step6/docs/windows_train_infer_python.md
+# MobileNetV3
+
+## 目录
+
+
+- [1. 准备数据与环境](#1)
+    - [1.1 准备环境](#1.1)
+    - [1.2 准备数据](#1.2)
+    - [1.3 准备模型](#1.3)
+- [2. 开始使用](#2)
+    - [2.1 模型训练](#2.1)
+    - [2.2 模型评估](#2.2)
+    - [2.3 模型预测](#2.3)
+- [3. 模型推理部署](#3)
+- [4. TIPC自动化测试脚本](#4)
+
+<a name="1"></a>
+
+本文档主要介绍MobileNetV3模型在Windows平台的推理开发流程，有关MobileNetV3模型和数据集的介绍参考 [首页](../REDAME.md)。需要注意，在Windows平台上执行命令和Linux平台略有不同，主要体现在：下载与解压数据、设置环境变量、数据加载等方面。此外Windows平台只支持单卡的训练与预测。
+## 1. 准备环境与数据
+
+
+<a name="1.1"></a>
+
+### 1.1 准备环境
+
+* 下载代码
+
+```bash
+git clone https://github.com/PaddlePaddle/models.git
+cd models/tutorials/mobilenetv3_prod/Step6
+```
+
+* 安装paddlepaddle：如果您已经安装了2.2或者以上版本的paddlepaddle，那么无需运行下面的命令安装paddlepaddle。
+
+```bash
+# 需要安装2.2及以上版本的Paddle
+# 安装GPU版本的Paddle
+pip install paddlepaddle-gpu==2.2.0
+# 安装CPU版本的Paddle
+pip install paddlepaddle==2.2.0
+```
+
+安装完成之后，可以使用下面的命令验证是否安装成功
+
+```python
+import paddle
+paddle.utils.run_check()
+```
+
+如果出现了`PaddlePaddle is installed successfully!`等输出内容，如下所示，则说明安装成功。
+
+```
+W0119 16:25:14.953202  7104 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.4, Runtime API Version: 10.2
+W0119 16:25:14.953202  7104 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+PaddlePaddle works well on 1 GPU.
+PaddlePaddle works well on 1 GPUs.
+PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
+```
+
+更多安装方法可以参考：[Paddle安装指南](https://www.paddlepaddle.org.cn/)。
+
+* 安装requirements
+
+```bash
+pip install -r requirements.txt
+```
+
+<a name="1.2"></a>
+
+### 1.2 准备数据
+
+如果您已经下载好ImageNet1k数据集，那么该步骤可以跳过，如果您没有，则可以从[ImageNet官网](https://image-net.org/download.php)申请下载。
+
+如果只是希望快速体验模型训练功能，则可以直接解压`test_images/lite_data.tar`，其中包含16张训练图像以及16张验证图像。
+
+```bash
+python -c "import shutil;shutil.unpack_archive('test_images/lite_data.tar', extract_dir='./',format='tar')"
+```
+
+执行该命令后，会在当前路径下解压出对应的数据集文件夹lite_data
+
+
+<a name="1.3"></a>
+
+### 1.3 准备模型
+
+如果您希望直接体验评估或者预测推理过程，可以使用下面的命令下载 MobileNetV3 预训练模型，直接体验模型评估、预测、推理部署等内容。
+
+```bash
+# 下载预训练模型
+pip install wget
+python -c "import wget;wget.download('https://paddle-model-ecology.bj.bcebos.com/model/mobilenetv3_reprod/mobilenet_v3_small_pretrained.pdparams')"
+# 下载推理模型
+# coming soon!
+```
+
+
+<a name="2"></a>
+
+## 2. 开始使用
+
+<a name="2.1"></a>
+
+### 2.1 模型训练
+
+* 单机单卡训练
+
+```bash
+# 在Windows平台，DataLoader只支持单进程模式，因此需要设置 workers 为0
+set CUDA_VISIBLE_DEVICES=0
+python train.py --data-path=./ILSVRC2012 --lr=0.1 --batch-size=256 --workers=0
+```
+
+部分训练日志如下所示。
+
+```
+[Epoch 1, iter: 4780] top1: 0.10312, top5: 0.27344, lr: 0.01000, loss: 5.34719, avg_reader_cost: 0.03644 sec, avg_batch_cost: 0.05536 sec, avg_samples: 64.0, avg_ips: 1156.08863 images/sec.
+[Epoch 1, iter: 4790] top1: 0.08750, top5: 0.24531, lr: 0.01000, loss: 5.28853, avg_reader_cost: 0.05164 sec, avg_batch_cost: 0.06852 sec, avg_samples: 64.0, avg_ips: 934.08427 images/sec.
+```
+
+**注意**：目前Windows平台只支持单卡训练与预测。
+
+更多配置参数可以参考[train.py](./train.py)的`get_args_parser`函数。
+
+<a name="2.2"></a>
+
+### 2.2 模型评估
+
+该项目中，训练与评估脚本相同，指定`--test-only`参数即可完成预测过程。
+
+```bash
+# 在Windows平台，DataLoader只支持单进程模式，因此需要设置 workers 为0
+python train.py --test-only --data-path=./ILSVRC2012 --pretrained=./mobilenet_v3_small_pretrained.pdparams --workers=0
+```
+
+期望输出如下。
+
+```
+Test:  [   0/1563]  eta: 1:14:20  loss: 1.0456 (1.0456)  acc1: 0.7812 (0.7812)  acc5: 0.9062 (0.9062)  time: 2.8539  data: 2.8262
+...
+Test:  [1500/1563]  eta: 0:00:05  loss: 1.2878 (1.9196)  acc1: 0.7344 (0.5639)  acc5: 0.8750 (0.7893)  time: 0.0623  data: 0.0534
+Test: Total time: 0:02:05
+ * Acc@1 0.564 Acc@5 0.790
+```
+
+<a name="2.3"></a>
+
+### 2.3 模型预测
+
+* 使用GPU预测
+
+```bash
+python tools/predict.py --pretrained=./mobilenet_v3_small_pretrained.pdparams --img-path=images/demo.jpg
+```
+
+对于下面的图像进行预测
+
+<div align="center">
+    <img src="./images/demo.jpg" width=300">
+</div>
+
+最终输出结果为`class_id: 8, prob: 0.9091238975524902`，表示预测的类别ID是`8`，置信度为`0.909`。
+
+* 使用CPU预测
+
+```bash
+python tools/predict.py --pretrained=./mobilenet_v3_small_pretrained.pdparams --img-path=images/demo.jpg --device=cpu
+```
+
+<a name="3"></a>
+
+## 3. 模型推理部署
+
+coming soon!
+
+<a name="4"></a>
+
+## 4. TIPC自动化测试脚本
+
+coming soon!

--- a/tutorials/mobilenetv3_prod/Step6/train.py
+++ b/tutorials/mobilenetv3_prod/Step6/train.py
@@ -45,13 +45,13 @@ def train_one_epoch(model,
        if amp_level is not None:
            with paddle.amp.auto_cast(level=amp_level):
                output = model(image)
-                loss = criterion(output, target)
+                loss = criterion(output, target.astype("int64"))
            scaled = scaler.scale(loss)
            scaled.backward()
            scaler.minimize(optimizer, scaled)
        else:
            output = model(image)
-            loss = criterion(output, target)
+            loss = criterion(output, target.astype("int64"))
            loss.backward()
            optimizer.step()

@@ -94,10 +94,10 @@ def evaluate(model, criterion, data_loader, print_freq=100, amp_level=None):
            if amp_level is not None:
                with paddle.amp.auto_cast(level=amp_level):
                    output = model(image)
-                    loss = criterion(output, target)
+                    loss = criterion(output, target.astype("int64"))
            else:
                output = model(image)
-                loss = criterion(output, target)
+                loss = criterion(output, target.astype("int64"))

            acc1, acc5 = utils.accuracy(output, target, topk=(1, 5))
            # FIXME need to take into account that the datasets

--- a/tutorials/mobilenetv3_prod/Step6/utils.py
+++ b/tutorials/mobilenetv3_prod/Step6/utils.py
@@ -144,7 +144,7 @@ def accuracy(output, target, topk=(1, )):

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
-        correct = pred.equal(target)
+        correct = pred.equal(target.astype("int64"))

        res = []
        for k in topk:

--- a/tutorials/tipc/README.md
+++ b/tutorials/tipc/README.md
@@ -36,4 +36,4 @@
    - Linux XPU2 基础训练推理开发文档 (coming soon)
    - [Linux DCU 基础训练推理开发文档](./linux_dcu_train_infer_python/README.md)
    - Linux NPU 基础训练推理开发文档 (coming soon)
-    - Windows GPU 基础训练推理开发文档 (coming soon)
+    - [Windows GPU/CPU 基础训练推理开发文档](./windows_train_infere_python/README.md)
--- a/tutorials/tipc/windows_train_infer_python/README.md
+++ b/tutorials/tipc/windows_train_infer_python/README.md
-# Windows 基础训练推理开发文档
+# Windows GPU/CPU 基础训练推理开发文档

 # 目录

 - [1. 简介](#1)
- [2. Windows 基础训练推理功能开发与规范](#2)
-    - [2.1 开发流程](#2.1)
-    - [2.2 核验点](#2.2)
- [3. Windows 基础训练推理测试开发与规范](#3)
-    - [3.1 开发流程](#3.1)
-    - [3.2 核验点](#3.2)
+- [2. Windows GPU/CPU 基础训练推理功能开发与规范](#2)
+- [3. Windows GPU/CPU 基础训练推理测试开发与规范](#3)
+
+<a name="1"></a>
+
+## 1. 简介
+
+Windows GPU/CPU 基础训练推理开发过程主要步骤与[《Linux GPU/CPU 基础训练推理开发》](../train_infer_python/README.md)一致，对应的mobilenet_v3_small模型示例参考[MobileNetV3](https://github.com/PaddlePaddle/models/blob/release/2.2/tutorials/mobilenetv3_prod/Step6/docs/windows_train_infer_python.md)。
+
+**注意事项：**
+* 由于Windows只支持单卡训练与预测，需要设置环境变量 `set CUDA_VISIBLE_DEVICES=0`。
+* Python在Windows上int数据默认为int32类型，在调用某些函数时会报错："it holds int, but desires to be int64_t。此时需要显式调用astype("int64")，将输入转换为int64类型。
+
+<a name="2"></a>
+
+## 2. Windows GPU/CPU 基础训练推理功能开发与规范
+
+参考[《Linux GPU/CPU 基础训练推理开发文档》](../train_infer_python/README.md)。
+
+<a name="3"></a>
+
+## 3. Windows GPU/CPU 基础训练推理测试开发与规范
+
+参考[《Linux GPU/CPU 基础训练推理测试开发规范》](../train_infer_python/test_train_infer_python.md)。
\ No newline at end of file