diff --git a/model_zoo/official/cv/inceptionv3/README.md b/model_zoo/official/cv/inceptionv3/README.md index 0d84497ac5b62f911b208df181c2740e15ee7eda..5ebbef82302e7b2c6adca1dd45fda129db25a2b4 100644 --- a/model_zoo/official/cv/inceptionv3/README.md +++ b/model_zoo/official/cv/inceptionv3/README.md @@ -1,23 +1,77 @@ -# Inception-v3 Example +# Contents -## Description +- [InceptionV3 Description](#InceptionV3-description) +- [Model Architecture](#model-architecture) +- [Dataset](#dataset) +- [Features](#features) + - [Mixed Precision](#mixed-precision) +- [Environment Requirements](#environment-requirements) +- [Script Description](#script-description) + - [Script and Sample Code](#script-and-sample-code) + - [Training Process](#training-process) + - [Evaluation Process](#evaluation-process) + - [Evaluation](#evaluation) +- [Model Description](#model-description) + - [Performance](#performance) + - [Training Performance](#evaluation-performance) + - [Inference Performance](#evaluation-performance) +- [Description of Random Situation](#description-of-random-situation) +- [ModelZoo Homepage](#modelzoo-homepage) -This is an example of training Inception-v3 in MindSpore. +# [InceptionV3 Description](#contents) -## Requirements +InceptionV3 by Google is the 3rd version in a series of Deep Learning Convolutional Architectures. -- Install [Mindspore](http://www.mindspore.cn/install/en). -- Downlaod the dataset. +[Paper](https://arxiv.org/pdf/1512.00567.pdf) Min Sun, Ali Farhadi, Steve Seitz. Ranking Domain-Specific Highlights by Analyzing Edited Videos[J]. 2014. -## Structure +# [Model architecture](#contents) + +The overall network architecture of InceptionV3 is show below: + +[Link](https://arxiv.org/pdf/1905.02244) + + +# [Dataset](#contents) + +Dataset used can refer to paper. + +- Dataset size: ~125G, 1.2W colorful images in 1000 classes + - Train: 120G, 1.2W images + - Test: 5G, 50000 images +- Data format: RGB images. + - Note: Data will be processed in src/dataset.py + +# [Features](#contents) + +## [Mixed Precision(Ascend)](#contents) + +The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. +For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. + +# [Environment Requirements](#contents) + +- Hardware(Ascend/GPU) + - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. +- Framework + - [MindSpore](http://10.90.67.50/mindspore/archive/20200506/OpenSource/me_vm_x86/) +- For more information, please check the resources below: + - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) + - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) + +# [Script description](#contents) + +## [Script and sample code](#contents) ```shell . └─Inception-v3 ├─README.md ├─scripts + ├─run_standalone_train.sh # launch standalone training with ascend platform(1p) ├─run_standalone_train_for_gpu.sh # launch standalone training with gpu platform(1p) + ├─run_distribute_train.sh # launch distributed training with ascend platform(8p) ├─run_distribute_train_for_gpu.sh # launch distributed training with gpu platform(8p) + ├─run_eval.sh # launch evaluating with ascend platform └─run_eval_for_gpu.sh # launch evaluating with gpu platform ├─src ├─config.py # parameter configuration @@ -30,12 +84,10 @@ This is an example of training Inception-v3 in MindSpore. └─train.py # train net ``` +## [Script Parameters](#contents) -## Parameter Configuration - -Parameters for both training and evaluating can be set in config.py - -``` +```python +Major parameters in train.py and config.py are: 'random_seed': 1, # fix random seed 'rank': 0, # local rank of distributed 'group_size': 1, # world size of distributed @@ -59,14 +111,22 @@ Parameters for both training and evaluating can be set in config.py 'is_save_on_master': 1 # save checkpoint on rank0, distributed parameters ``` +## [Training process](#contents) +### Usage -## Running the example -### Train +You can start training using python or shell scripts. The usage of shell scripts as follows: -#### Usage +- Ascend: +``` +# distribute training example(8p) +sh run_distribute_train.sh RANK_TABLE_FILE DATA_PATH +# standalone training +sh run_standalone_train.sh DEVICE_ID DATA_PATH +``` +- GPU: ``` # distribute training example(8p) sh run_distribute_train_for_gpu.sh DATA_DIR @@ -74,42 +134,94 @@ sh run_distribute_train_for_gpu.sh DATA_DIR sh run_standalone_train_for_gpu.sh DEVICE_ID DATA_DIR ``` -#### Launch +### Launch + +``` +# training example + python: + Ascend: python train.py --dataset_path /dataset/train --platform Ascend + GPU: python train.py --dataset_path /dataset/train --platform GPU -```bash -# distributed training example(8p) for GPU -sh scripts/run_distribute_train_for_gpu.sh /dataset/train -# standalone training example for GPU -sh scripts/run_standalone_train_for_gpu.sh 0 /dataset/train + shell: + # distributed training example(8p) for GPU + sh scripts/run_distribute_train_for_gpu.sh /dataset/train + # standalone training example for GPU + sh scripts/run_standalone_train_for_gpu.sh 0 /dataset/train ``` -#### Result +### Result -You can find checkpoint file together with result in log. +Training result will be stored in the example path. Checkpoints will be stored at `. /checkpoint` by default, and training log will be redirected to `./log.txt` like followings. -### Evaluation +``` +epoch: 0 step: 1251, loss is 5.7787247 +Epoch time: 360760.985, per step time: 288.378 +epoch: 1 step: 1251, loss is 4.392868 +Epoch time: 160917.911, per step time: 128.631 +``` +## [Eval process](#contents) -#### Usage +### Usage +You can start training using python or shell scripts. The usage of shell scripts as follows: + +- Ascend: sh run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT +- GPU: sh run_eval_for_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT + +### Launch + +``` +# eval example + python: + Ascend: python eval.py --dataset_path DATA_DIR --checkpoint PATH_CHECKPOINT --platform Ascend + GPU: python eval.py --dataset_path DATA_DIR --checkpoint PATH_CHECKPOINT --platform GPU + + shell: + Ascend: sh run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT + GPU: sh run_eval_for_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT ``` -# Evaluation -sh run_eval_for_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT -``` -#### Launch +> checkpoint can be produced in training process. + +### Result -```bash -# Evaluation with checkpoint -sh scripts/run_eval_for_gpu.sh 0 /dataset/val ./checkpoint/inceptionv3-rank3-247_1251.ckpt +Evaluation result will be stored in the example path, you can find result like the followings in `log.txt`. + +``` +metric: {'Loss': 1.778, 'Top1-Acc':0.788, 'Top5-Acc':0.942} ``` -> checkpoint can be produced in training process. +# [Model description](#contents) + +## [Performance](#contents) + +### Training Performance + +| Parameters | InceptionV3 | | +| -------------------------- | ---------------------------------------------------------- | ------------------------- | +| Model Version | | | +| Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G | +| uploaded Date | 08/21/2020 | 08/21/2020 | +| MindSpore Version | 0.6.0-beta | 0.6.0-beta | +| Training Parameters | src/config.py | src/config.py | +| Optimizer | RMSProp | RMSProp | +| Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy | +| outputs | probability | probability | +| Loss | 1.98 | 1.98 | +| Accuracy | ACC1[78.8%] ACC5[94.2%] | ACC1[78.7%] ACC5[94.1%] | +| Total time | 11h | 72h | +| Params (M) | 103M | 103M | +| Checkpoint for Fine tuning | 313M | 312.41 | +| Model for inference | | | -#### Result +#### Inference Performance -Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log. +To be added. + +# [Description of Random Situation](#contents) + +In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py. + +# [ModelZoo Homepage](#contents) -``` -acc=78.75%(TOP1) -acc=94.07%(TOP5) -``` \ No newline at end of file +Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). \ No newline at end of file diff --git a/model_zoo/official/cv/inceptionv3/src/config.py b/model_zoo/official/cv/inceptionv3/src/config.py index 262aa20b7f1ffdd28bf866347b3d236b82d645b4..7c765e733bffea90029c594b9ef7af2c9d054acc 100644 --- a/model_zoo/official/cv/inceptionv3/src/config.py +++ b/model_zoo/official/cv/inceptionv3/src/config.py @@ -64,7 +64,7 @@ config_ascend = edict({ 'weight_decay': 0.00004, 'momentum': 0.9, 'opt_eps': 1.0, - 'keep_checkpoint_max': 100, + 'keep_checkpoint_max': 10, 'ckpt_path': './checkpoint/', 'is_save_on_master': 0, 'dropout_keep_prob': 0.8,