提交 6149c506 编写于 作者: W Wei Shengyu

Merge branch 'PaddlePaddle:develop_reg' into develop_reg

Global:
infer_imgs: "./dataset/logo_demo_data_v1.0/query/logo_AKG.jpg"
infer_imgs: "./dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg"
det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer/"
rec_inference_model_dir: "./models/logo_rec_ResNet50_Logo3K_v1.0_infer/"
batch_size: 1
......
......@@ -28,6 +28,8 @@ from vector_search import Graph_Index
from utils import logger
from utils import config
from utils.get_image_list import get_image_list
from utils.draw_bbox import draw_bbox_results
class SystemPredictor(object):
def __init__(self, config):
......@@ -40,7 +42,8 @@ class SystemPredictor(object):
self.return_k = self.config['IndexProcess']['return_k']
self.search_budget = self.config['IndexProcess']['search_budget']
self.Searcher = Graph_Index(dist_type=config['IndexProcess']['dist_type'])
self.Searcher = Graph_Index(
dist_type=config['IndexProcess']['dist_type'])
self.Searcher.load(config['IndexProcess']['index_path'])
def predict(self, img):
......@@ -53,13 +56,16 @@ class SystemPredictor(object):
rec_results = self.rec_predictor.predict(crop_img)
#preds["feature"] = rec_results
preds["bbox"] = [xmin, ymin, xmax, ymax]
scores, docs = self.Searcher.search(query=rec_results, return_k=self.return_k, search_budget=self.search_budget)
scores, docs = self.Searcher.search(
query=rec_results,
return_k=self.return_k,
search_budget=self.search_budget)
preds["rec_docs"] = docs
preds["rec_scores"] = scores
output.append(preds)
return output
def main(config):
system_predictor = SystemPredictor(config)
......@@ -69,6 +75,7 @@ def main(config):
for idx, image_file in enumerate(image_list):
img = cv2.imread(image_file)[:, :, ::-1]
output = system_predictor.predict(img)
draw_bbox_results(img[:, :, ::-1], output, image_file)
print(output)
return
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import numpy as np
import cv2
def draw_bbox_results(image, results, input_path, save_dir=None):
for result in results:
[xmin, ymin, xmax, ymax] = result["bbox"]
image = cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (0, 255, 0),
2)
image_name = os.path.basename(input_path)
if save_dir is None:
save_dir = "output"
os.makedirs(save_dir, exist_ok=True)
output_path = os.path.join(save_dir, image_name)
cv2.imwrite(output_path, image)
CXX=/usr/bin/g++-5
CXX=g++
ifeq ($(OS),Windows_NT)
postfix=dll
else
postfix=so
endif
all : index
index.so : src/config.h src/graph.h src/data.h interface.cc
$(CXX) -shared -fPIC interface.cc -o index.so -std=c++11 -Ofast -march=native -g -flto -funroll-loops -DOMP -fopenmp
index : src/config.h src/graph.h src/data.h interface.cc
${CXX} -shared -fPIC interface.cc -o index.${postfix} -std=c++11 -Ofast -march=native -g -flto -funroll-loops -DOMP -fopenmp
clean :
rm index.${postfix}
\ No newline at end of file
# 向量检索
## 简介
## 1. 简介
一些垂域识别任务(如车辆、商品等)需要识别的类别数较大,往往采用基于检索的方式,通过查询向量与底库向量进行快速的最近邻搜索,获得匹配的预测类别。向量检索模块提供基础的近似最近邻搜索算法,基于百度自研的Möbius算法,一种基于图的近似最近邻搜索算法,用于最大内积搜索 (MIPS)。 该模块提供python接口,支持numpy和 tensor类型向量,支持L2和Inner Product距离计算。
......@@ -10,35 +9,67 @@ Mobius 算法细节详见论文 ([Möbius Transformation for Fast Inner Produc
## 安装
## 2. 安装
### 2.1 直接使用提供的库文件
该文件夹下有已经编译好的`index.so`(gcc8.2.0下编译,用于Linux)以及`index.dll`(gcc10.3.0下编译,用于Windows),可以跳过2.2与2.3节,直接使用。
如果因为gcc版本过低或者环境不兼容的问题,导致库文件无法使用,则需要在不同的平台下手动编译库文件。
**注意:**
请确保您的 C++ 编译器支持 C++11 标准。
### 2.2 Linux上编译生成库文件
运行下面的命令,安装gcc与g++。
```shell
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install build-essential gcc g++
```
若index.so不可用,在项目目录下运行以下命令生成新的index.so文件
可以通过命令`gcc -v`查看gcc版本。
make index.so
进入该文件夹,直接运行`make`即可,如果希望重新生成`index.so`文件,可以首先使用`make clean`清除已经生成的缓存,再使用`make`生成更新之后的库文件。
编译环境: g++ 5.4.0 , 9.3.0. 其他版本也可能工作。 请确保您的 C++ 编译器支持 C++11 标准。
### 2.3 Windows上编译生成库文件
Windows上首先需要安装gcc编译工具,推荐使用[TDM-GCC](https://jmeubank.github.io/tdm-gcc/articles/2020-03/9.2.0-release),进入官网之后,可以选择合适的版本进行下载。推荐下载[tdm64-gcc-10.3.0-2.exe](https://github.com/jmeubank/tdm-gcc/releases/download/v10.3.0-tdm64-2/tdm64-gcc-10.3.0-2.exe)
## 快速使用
下载完成之后,按照默认的安装步骤进行安装即可。这里有3点需要注意:
1. 向量检索模块依赖于openmp,因此在安装到`choose components`步骤的时候,需要勾选上`openmp`的安装选项,否则之后编译的时候会报错`libgomp.spec: No such file or directory`[参考链接](https://github.com/dmlc/xgboost/issues/1027)
2. 安装过程中会提示是否需要添加到系统的环境变量中,这里建议勾选上,否则之后使用的时候还需要手动添加系统环境变量。
3. Linux上的编译命令为`make`,Windows上为`mingw32-make`,这里需要区分一下。
安装完成后,可以打开一个命令行终端,通过命令`gcc -v`查看gcc版本。
在该文件夹下,运行命令`mingw32-make`,即可生成`index.dll`库文件。如果希望重新生成`index.dll`文件,可以首先使用`mingw32-make clean`清除已经生成的缓存,再使用`mingw32-make`生成更新之后的库文件。
## 3. 快速使用
import numpy as np
from interface import Graph_Index
# 随机产生样本
index_vectors = np.random.rand(100000,128).astype(np.float32)
query_vector = np.random.rand(128).astype(np.float32)
index_vectors = np.random.rand(100000,128).astype(np.float32)
query_vector = np.random.rand(128).astype(np.float32)
index_docs = ["ID_"+str(i) for i in range(100000)]
# 初始化索引结构
indexer = Graph_Index(dist_type="IP") #支持"IP"和"L2"
indexer.build(gallery_vectors=index_vectors, gallery_docs=index_docs, pq_size=100, index_path='test')
# 查询
scores, docs = indexer.search(query=query_vector, return_k=10, search_budget=100)
print(scores)
print(docs)
# 保存与加载
indexer.dump(index_path="test")
indexer.load(index_path="test")
indexer.dump(index_path="test")
indexer.load(index_path="test")
无法预览此类型文件
......@@ -18,48 +18,77 @@ import numpy.ctypeslib as ctl
import numpy as np
import os
import json
import platform
from ctypes import *
from numpy.ctypeslib import ndpointer
__dir__ = os.path.dirname(os.path.abspath(__file__))
so_path = os.path.join(__dir__, "index.so")
if platform.system() == "Windows":
lib_filename = "index.dll"
else:
lib_filename = "index.so"
so_path = os.path.join(__dir__, lib_filename)
lib = ctypes.cdll.LoadLibrary(so_path)
class IndexContext(Structure):
_fields_=[("graph",c_void_p),
("data",c_void_p)]
_fields_ = [("graph", c_void_p), ("data", c_void_p)]
# for mobius IP index
build_mobius_index = lib.build_mobius_index
build_mobius_index.restype = None
build_mobius_index.argtypes = [ctl.ndpointer(np.float32, flags='aligned, c_contiguous'), ctypes.c_int, ctypes.c_int, ctypes.c_int, ctypes.c_double, ctypes.c_char_p]
build_mobius_index.argtypes = [
ctl.ndpointer(
np.float32, flags='aligned, c_contiguous'), ctypes.c_int, ctypes.c_int,
ctypes.c_int, ctypes.c_double, ctypes.c_char_p
]
search_mobius_index = lib.search_mobius_index
search_mobius_index.restype = None
search_mobius_index.argtypes = [ctl.ndpointer(np.float32, flags='aligned, c_contiguous'), ctypes.c_int, ctypes.c_int,ctypes.c_int,POINTER(IndexContext),ctl.ndpointer(np.uint64, flags='aligned, c_contiguous'),ctl.ndpointer(np.float64, flags='aligned, c_contiguous')]
search_mobius_index.argtypes = [
ctl.ndpointer(
np.float32, flags='aligned, c_contiguous'), ctypes.c_int, ctypes.c_int,
ctypes.c_int, POINTER(IndexContext), ctl.ndpointer(
np.uint64, flags='aligned, c_contiguous'), ctl.ndpointer(
np.float64, flags='aligned, c_contiguous')
]
load_mobius_index_prefix = lib.load_mobius_index_prefix
load_mobius_index_prefix.restype = None
load_mobius_index_prefix.argtypes = [ctypes.c_int, ctypes.c_int, POINTER(IndexContext), ctypes.c_char_p]
load_mobius_index_prefix.argtypes = [
ctypes.c_int, ctypes.c_int, POINTER(IndexContext), ctypes.c_char_p
]
save_mobius_index_prefix = lib.save_mobius_index_prefix
save_mobius_index_prefix.restype = None
save_mobius_index_prefix.argtypes = [POINTER(IndexContext), ctypes.c_char_p]
# for L2 index
build_l2_index = lib.build_l2_index
build_l2_index.restype = None
build_l2_index.argtypes = [ctl.ndpointer(np.float32, flags='aligned, c_contiguous'), ctypes.c_int, ctypes.c_int, ctypes.c_int, ctypes.c_char_p]
build_l2_index.argtypes = [
ctl.ndpointer(
np.float32, flags='aligned, c_contiguous'), ctypes.c_int, ctypes.c_int,
ctypes.c_int, ctypes.c_char_p
]
search_l2_index = lib.search_l2_index
search_l2_index.restype = None
search_l2_index.argtypes = [ctl.ndpointer(np.float32, flags='aligned, c_contiguous'), ctypes.c_int, ctypes.c_int,ctypes.c_int,POINTER(IndexContext),ctl.ndpointer(np.uint64, flags='aligned, c_contiguous'),ctl.ndpointer(np.float64, flags='aligned, c_contiguous')]
search_l2_index.argtypes = [
ctl.ndpointer(
np.float32, flags='aligned, c_contiguous'), ctypes.c_int, ctypes.c_int,
ctypes.c_int, POINTER(IndexContext), ctl.ndpointer(
np.uint64, flags='aligned, c_contiguous'), ctl.ndpointer(
np.float64, flags='aligned, c_contiguous')
]
load_l2_index_prefix = lib.load_l2_index_prefix
load_l2_index_prefix.restype = None
load_l2_index_prefix.argtypes = [ctypes.c_int, ctypes.c_int, POINTER(IndexContext), ctypes.c_char_p]
load_l2_index_prefix.argtypes = [
ctypes.c_int, ctypes.c_int, POINTER(IndexContext), ctypes.c_char_p
]
save_l2_index_prefix = lib.save_l2_index_prefix
save_l2_index_prefix.restype = None
......@@ -70,51 +99,68 @@ release_context.restype = None
release_context.argtypes = [POINTER(IndexContext)]
class Graph_Index(object):
"""
graph index
"""
def __init__(self, dist_type="IP"):
self.dim = 0
self.total_num = 0
self.dist_type = dist_type
self.mobius_pow = 2.0
self.index_context = IndexContext(0,0)
self.index_context = IndexContext(0, 0)
self.gallery_doc_dict = {}
self.with_attr = False
assert dist_type in ["IP", "L2"], "Only support IP and L2 distance ..."
def build(self, gallery_vectors, gallery_docs=[], pq_size=100, index_path='graph_index/'):
def build(self,
gallery_vectors,
gallery_docs=[],
pq_size=100,
index_path='graph_index/'):
"""
build index
"""
if paddle.is_tensor(gallery_vectors):
gallery_vectors = gallery_vectors.numpy()
gallery_vectors = gallery_vectors.numpy()
assert gallery_vectors.ndim == 2, "Input vector must be 2D ..."
self.total_num = gallery_vectors.shape[0]
self.dim = gallery_vectors.shape[1]
assert (len(gallery_docs) == self.total_num if len(gallery_docs)>0 else True)
print("training index -> num: {}, dim: {}, dist_type: {}".format(self.total_num, self.dim, self.dist_type))
assert (len(gallery_docs) == self.total_num
if len(gallery_docs) > 0 else True)
print("training index -> num: {}, dim: {}, dist_type: {}".format(
self.total_num, self.dim, self.dist_type))
if not os.path.exists(index_path):
os.makedirs(index_path)
if self.dist_type == "IP":
build_mobius_index(gallery_vectors,self.total_num,self.dim, pq_size, self.mobius_pow, create_string_buffer((index_path+"/index").encode('utf-8')))
load_mobius_index_prefix(self.total_num, self.dim, ctypes.byref(self.index_context), create_string_buffer((index_path+"/index").encode('utf-8')))
build_mobius_index(
gallery_vectors, self.total_num, self.dim, pq_size,
self.mobius_pow,
create_string_buffer((index_path + "/index").encode('utf-8')))
load_mobius_index_prefix(
self.total_num, self.dim,
ctypes.byref(self.index_context),
create_string_buffer((index_path + "/index").encode('utf-8')))
else:
build_l2_index(gallery_vectors,self.total_num,self.dim, pq_size, create_string_buffer((index_path+"/index").encode('utf-8')))
load_l2_index_prefix(self.total_num, self.dim, ctypes.byref(self.index_context), create_string_buffer((index_path+"/index").encode('utf-8')))
self.gallery_doc_dict = {}
build_l2_index(
gallery_vectors, self.total_num, self.dim, pq_size,
create_string_buffer((index_path + "/index").encode('utf-8')))
load_l2_index_prefix(
self.total_num, self.dim,
ctypes.byref(self.index_context),
create_string_buffer((index_path + "/index").encode('utf-8')))
self.gallery_doc_dict = {}
if len(gallery_docs) > 0:
self.with_attr = True
for i in range(gallery_vectors.shape[0]):
self.gallery_doc_dict[str(i)] = gallery_docs[i]
self.gallery_doc_dict[str(i)] = gallery_docs[i]
self.gallery_doc_dict["total_num"] = self.total_num
self.gallery_doc_dict["dim"] = self.dim
......@@ -134,15 +180,19 @@ class Graph_Index(object):
ret_score = np.zeros(return_k, dtype=np.float64)
if paddle.is_tensor(query):
query = query.numpy()
query = query.numpy()
if self.dist_type == "IP":
search_mobius_index(query,self.dim,search_budget,return_k,ctypes.byref(self.index_context),ret_id,ret_score)
search_mobius_index(query, self.dim, search_budget, return_k,
ctypes.byref(self.index_context), ret_id,
ret_score)
else:
search_l2_index(query,self.dim,search_budget,return_k,ctypes.byref(self.index_context),ret_id,ret_score)
search_l2_index(query, self.dim, search_budget, return_k,
ctypes.byref(self.index_context), ret_id,
ret_score)
ret_id = ret_id.tolist()
ret_doc = []
if self.with_attr:
if self.with_attr:
for i in range(return_k):
ret_doc.append(self.gallery_doc_dict[str(ret_id[i])])
return ret_score, ret_doc
......@@ -155,28 +205,35 @@ class Graph_Index(object):
os.makedirs(index_path)
if self.dist_type == "IP":
save_mobius_index_prefix(ctypes.byref(self.index_context),create_string_buffer((index_path+"/index").encode('utf-8')))
save_mobius_index_prefix(
ctypes.byref(self.index_context),
create_string_buffer((index_path + "/index").encode('utf-8')))
else:
save_l2_index_prefix(ctypes.byref(self.index_context), create_string_buffer((index_path+"/index").encode('utf-8')))
save_l2_index_prefix(
ctypes.byref(self.index_context),
create_string_buffer((index_path + "/index").encode('utf-8')))
with open(index_path + "/info.json", "w") as f:
json.dump(self.gallery_doc_dict, f)
def load(self, index_path):
self.gallery_doc_dict = {}
with open(index_path + "/info.json", "r") as f:
self.gallery_doc_dict = json.load(f)
self.total_num = self.gallery_doc_dict["total_num"]
self.dim = self.gallery_doc_dict["dim"]
self.dist_type = self.gallery_doc_dict["dist_type"]
self.dist_type = self.gallery_doc_dict["dist_type"]
self.with_attr = self.gallery_doc_dict["with_attr"]
if self.dist_type == "IP":
load_mobius_index_prefix(self.total_num,self.dim,ctypes.byref(self.index_context), create_string_buffer((index_path+"/index").encode('utf-8')))
load_mobius_index_prefix(
self.total_num, self.dim,
ctypes.byref(self.index_context),
create_string_buffer((index_path + "/index").encode('utf-8')))
else:
load_l2_index_prefix(self.total_num,self.dim,ctypes.byref(self.index_context), create_string_buffer((index_path+"/index").encode('utf-8')))
load_l2_index_prefix(
self.total_num, self.dim,
ctypes.byref(self.index_context),
create_string_buffer((index_path + "/index").encode('utf-8')))
# Installation
---
This tutorial introduces how to install PaddleClas and its requirements.
## Introduction
## 1. Install PaddlePaddle
This document introduces how to install PaddleClas and its requirements.
`PaddlePaddle 2.0` or later is required for PaddleClas. You can use the following steps to install PaddlePaddle.
## Install PaddlePaddle
### 1.1 Environment requirements
Python 3.x, CUDA 10.0, CUDNN7.6.4 nccl2.1.2 and later version are required at first, For now, PaddleClas only support training on the GPU device. Please follow the instructions in the [Installation](http://www.paddlepaddle.org.cn/install/quick) if the PaddlePaddle on the device is lower than 2.0.0.
- python 3.x
- cuda >= 10.1 (necessary if you want to use paddlepaddle-gpu)
- cudnn >= 7.6.4 (necessary if you want to use paddlepaddle-gpu)
- nccl >= 2.1.2 (necessary if you want the use distributed training/eval)
- gcc >= 8.2
Docker is recomended to run Paddleclas, for more detailed information about docker and nvidia-docker, you can refer to the [tutorial](https://www.runoob.com/docker/docker-tutorial.html).
### Install PaddlePaddle using pip
When you use cuda10.1, the driver version needs to be larger or equal than 418.39. When you use cuda10.2, the driver version needs to be larger or equal than 440.33. For more cuda versions and specific driver versions, you can refer to the [link](https://docs.nvidia.com/deploy/cuda-compatibility/index.html).
If you do not want to use docker, you can skip section 1.2 and go into section 1.3 directly.
### 1.2 (Recommended) Prepare a docker environment. The first time you use this docker image, it will be downloaded automatically. Please be patient.
```
# Switch to the working directory
cd /home/Projects
# You need to create a docker container for the first run, and do not need to run the current command when you run it again
# Create a docker container named ppcls and map the current directory to the /paddle directory of the container
# It is recommended to set a shared memory greater than or equal to 8G through the --shm-size parameter
sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it docker pull paddlepaddle/paddle:2.1.0 /bin/bash
# Use the following command to create a container if you want to use GPU in the container
sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it docker pull paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
```
You can also visit [DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/) to get more docker images.
```
# use ctrl+P+Q to exit docker, to re-enter docker using the following command:
sudo docker container exec -it ppcls /bin/bash
```
### 1.3 Install PaddlePaddle using pip
If you want to use PaddlePaddle on GPU, you can use the following command to install PaddlePaddle.
......@@ -25,11 +58,11 @@ If you want to use PaddlePaddle on CPU, you can use the following command to ins
pip3 install paddlepaddle --upgrade -i https://mirror.baidu.com/pypi/simple
```
### Install PaddlePaddle from source code
You can also compile PaddlePaddle from source code, please refer to [Installation](http://www.paddlepaddle.org.cn/install/quick).
**Note:**
* If you have already installed CPU version of PaddlePaddle and want to use GPU version now, you should uninstall CPU version of PaddlePaddle and then install GPU version to avoid package confusion.
* You can also compile PaddlePaddle from source code, please refer to [PaddlePaddle Installation tutorial](http://www.paddlepaddle.org.cn/install/quick) to more compilation options.
Verify Installation
### 1.4 Verify Installation process
```python
import paddle
......@@ -43,14 +76,15 @@ python3 -c "import paddle; print(paddle.__version__)"
```
Note:
- Make sure the compiled version is later than PaddlePaddle2.0.
- Make sure the compiled source code is later than PaddlePaddle2.0.
- Indicate **WITH_DISTRIBUTE=ON** when compiling, Please refer to [Instruction](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3) for more details.
- When running in docker, in order to ensure that the container has enough shared memory for data read acceleration of Paddle, please set the parameter `--shm_size=8g` at creating a docker container, if conditions permit, you can set it to a larger value.
- When running in docker, in order to ensure that the container has enough shared memory for dataloader acceleration of Paddle, please set the parameter `--shm_size=8g` at creating a docker container, if conditions permit, you can set it to a larger value.
## Install PaddleClas
**Clone PaddleClas: **
## 2. Install PaddleClas
### 2.1 Clone PaddleClas source code
```
git clone https://github.com/PaddlePaddle/PaddleClas.git -b develop
......@@ -62,16 +96,10 @@ If it is too slow for you to download from github, you can download PaddleClas f
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop
```
**Install requirements**
```
pip3 install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple
```
### 2.2 Install requirements
If the install process of visualdl failed, you can try the following commands.
PaddleClas dependencies are listed in file `requirements.txt`, you can use the following command to install the dependencies.
```
pip3 install --upgrade visualdl -i https://mirror.baidu.com/pypi/simple
pip3 install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple
```
What's more, visualdl is just supported in python3, so python3 is needed if you want to use visualdl.
# Quick Start of Recognition
This tutorial contains 3 parts: Environment Preparation, Image Recognition Experience, and Unknown Category Image Recognition Experience.
If the image category already exists in the image index database, then you can take a reference to chapter [Image Recognition Experience](#image_recognition_experience),to complete the progress of image recognition;If you wish to recognize unknow category image, which is not included in the index database,you can take a reference to chapter [Unknown Category Image Recognition Experience](#unkonw_category_image_recognition_experience),to complete the process of creating an index to recognize it。
## Catalogue
* [1. Enviroment Preparation](#enviroment_preperation )
* [2. Image Recognition Experience](#image_recognition_experience)
* [2.1 Download and Unzip the Inference Model and Demo Data](#download_and_unzip_the_inference_model_and_demo_data)
* [2.2 Logo Recognition and Retrieval](#Logo_recognition_and_retrival)
* [2.2.1 Single Image Recognition](#recognition_of_single_image)
* [2.2.2 Folder-based Batch Recognition](#folder_based_batch_recognition)
* [3. Unknown Category Image Recognition Experience](#unkonw_category_image_recognition_experience)
* [3.1 Build the Base Library Based on Our Own Dataset](#build_the_base_library_based_on_your_own_dataset)
* [3.2 ecognize the Unknown Category Images](#Image_differentiation_based_on_the_new_index_library)
<a name="enviroment_preparation"></a>
## 1. Enviroment Preparation
* Installation:Please take a reference to [Quick Installation ](./installation.md)to configure the PaddleClas environment.
* Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`.
```
cd deploy
```
<a name="image_recognition_experience"></a>
## 2. Image Recognition Experience
The detection model with the recognition inference model for the 4 directions (Logo, Cartoon Face, Vehicle, Product), the address for downloading the test data and the address of the corresponding configuration file are as follows.
| Models Introduction | Recommended Scenarios | Test Data Address | inference Model | Predict Config File | Config File to Build Index Database |
| ------------ | ------------- | ------- | -------- | ------- | -------- |
| Generic mainbody detection model | General Scenarios | - |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | - |
| Logo Recognition Model | Logo Scenario | [Data Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar) | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) | [build_logo.yaml](../../../deploy/configs/build_logo.yaml) |
| Cartoon Face Recognition Model| Cartoon Face Scenario | [Data Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/cartoon_demo_data_v1.0.tar) | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | [build_cartoon.yaml](../../../deploy/configs/build_cartoon.yaml) |
| Vehicle Subclassification Model | Vehicle Scenario | [Data Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/vehicle_demo_data_v1.0.tar) | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | [build_vehicle.yaml](../../../deploy/configs/build_vehicle.yaml) |
| Product Recignition Model | Product Scenario | [Data Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar) | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_Inshop_v1.0_infer.tar) | [inference_inshop.yaml](../../../deploy/configs/) | [build_inshop.yaml](../../../deploy/configs/build_inshop.yaml) |
**Attention**:If you do not have wget installed on Windows, you can download the model by copying the link into your browser and unzipping it in the appropriate folder; for Linux or macOS users, you can right-click and copy the download link to download it via the `wget` command.
* You can download and unzip the data and models by following the command below
```shell
mkdir dataset
cd dataset
# Download the demo data and unzip
wget {Data download link} && tar -xf {Name of the tar archive}
cd ..
mkdir models
cd models
# Download and unzip the inference model
wget {Models download link} && tar -xf {Name of the tar archive}
cd ..
```
<a name="download_and_unzip_the_inference_model_and_demo_data"></a>
### 2.1 Download and Unzip the Inference Model and Demo Data
Take the Logo recognition as an example, download the detection model, recognition model and Logo recognition demo data with the following commands.
```shell
mkdir models
cd models
# Download the generic detection inference model and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
# Download and unpack the inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar
cd ..
mkdir dataset
cd dataset
# Download the demo data and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar
cd ..
```
Once unpacked, the `dataset` folder should have the following file structure.
```
├── logo_demo_data_v1.0
│ ├── data_file.txt
│ ├── gallery
│ ├── index
│ └── query
├── ...
```
The `data_file.txt` is images list used to build the index database, the `gallery` folder contains all the original images used to build the index database, the `index` folder contains the index files generated by building the index database, and the `query` is the demo image used to test the recognition effect.
The `models` folder should have the following file structure.
```
├── logo_rec_ResNet50_Logo3K_v1.0_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
├── ppyolov2_r50vd_dcn_mainbody_v1.0_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
<a name="Logo_recognition_and_retrival"></a>
### 2.2 Logo Recognition and Retrival
Take the Logo recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction)。
<a name="recognition_of_single_image"></a>
#### 2.2.1 Single Image Recognition
Run the following command to identify and retrieve the image `. /dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg` for recognition and retrieval
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml
```
The image to be retrieved is shown below.
<div align="center">
<img src="../../images/recognition/logo_demo/query/logo_auxx-1.jpg" width = "400" />
</div>
The final output is shown below.
```
[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}]
```
where bbox indicates the location of the detected subject, rec_docs indicates the labels corresponding to a number of images in the index dabase that are most similar to the detected subject, and rec_scores indicates the corresponding similarity.
<a name="folder_based_batch_recognition"></a>
#### 2.2.2 Folder-based Batch Recognition
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter.
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query"
```
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_path` field.
<a name="unkonw_category_image_recognition_experience"></a>
## 3. Recognize Images of Unknown Category
To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows:
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg"
```
The image to be retrieved is shown below.
<div align="center">
<img src="../../images/recognition/logo_demo/query/logo_cola.jpg" width = "400" />
</div>
The output is as follows:
```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}]
```
Since the index infomation is not included in the corresponding index databse, the recognition results are not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database.
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories ,which does not require retraining.
<a name="build_the_base_library_based_on_your_own_dataset"></a>
### 3.1 Build the Base Library Based on Your Own Dataset
First, you need to obtain the original images to store in the database (store in `./dataset/logo_demo_data_v1.0/gallery`) and the corresponding label infomation,recording the category of the original images and the label information)store in the text file `./dataset/logo_demo_data_v1.0/data_file_update.txt`
Then use the following command to build the index to accelerate the retrieval process after recognition.
```shell
python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
```
Finally, the new index information is stored in the folder`./dataset/logo_demo_data_v1.0/index_update`. Use the new index database for the above index.
<a name="Image_differentiation_based_on_the_new_index_library"></a>
### 3.2 Recognize the Unknown Category Images
To recognize the image `./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`, run the command as follows.
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
```
The output is as follows:
```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}]
```
The recognition result is correct.
# 特征学习
此部分主要是针对`RecModel`的训练模式进行说明。`RecModel`的训练模式,主要是为了支持车辆识别(车辆细分类、ReID)、Logo识别、动漫人物识别、商品识别等特征学习的应用。与在`ImageNet`上训练普通的分类网络不同的是,此训练模式,主要有以下特征
- 支持对`backbone`的输出进行截断,即支持提取任意中间层的特征信息
- 支持在`backbone`的feature输出层后,添加可配置的网络层,即`Neck`部分
- 支持`ArcMargin``metric learning` 相关loss函数,提升特征学习能力
## yaml文件说明
`RecModel`的训练模式与普通分类训练的配置类似,配置文件主要分为以下几个部分:
### 1 全局设置部分
```yaml
Global:
# 如为null则从头开始训练。若指定中间训练保存的状态地址,则继续训练
checkpoints: null
# pretrained model路径或者 bool类型
pretrained_model: null
# 模型保存路径
output_dir: "./output/"
device: "gpu"
class_num: 30671
# 保存模型的粒度,每个epoch保存一次
save_interval: 1
eval_during_train: True
eval_interval: 1
# 训练的epoch数
epochs: 160
# log输出频率
print_batch_step: 10
# 是否使用visualdl库
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
# 使用retrival的方式进行评测
eval_mode: "retrieval"
```
### 2 数据部分
```yaml
DataLoader:
Train:
dataset:
# 具体使用的Dataset的的名称
name: "VeriWild"
# 使用此数据集的具体参数
image_root: "./dataset/VeRI-Wild/images/"
cls_label_path: "./dataset/VeRI-Wild/train_test_split/train_list_start0.txt"
# 图像增广策略:ResizeImage、RandFlipImage等
transform_ops:
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- AugMix:
prob: 0.5
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.4
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
batch_size: 128
num_instances: 2
drop_last: False
shuffle: True
loader:
num_workers: 6
use_shared_memory: False
```
`val dataset`设置与`train dataset`除图像增广策略外,设置基本一致
### 3 Backbone的具体设置
```yaml
Arch:
# 使用RecModel模式进行训练
name: "RecModel"
# 导出inference model的具体配置
infer_output_key: "features"
infer_add_softmax: False
# 使用的Backbone
Backbone:
name: "ResNet50"
pretrained: True
# 使用此层作为Backbone的feature输出,name为ResNet50的full_name
BackboneStopLayer:
name: "adaptive_avg_pool2d_0"
# Backbone的基础上,新增网络层。此模型添加1x1的卷积层(embedding)
Neck:
name: "VehicleNeck"
in_channels: 2048
out_channels: 512
# 增加ArcMargin, 即ArcLoss的具体实现
Head:
name: "ArcMargin"
embedding_size: 512
class_num: 431
margin: 0.15
scale: 32
```
`Neck`部分为在`bacbone`基础上,添加的网络层,可根据需求添加。 如在ReID任务中,添加一个输出长度为512的`embedding`层,可由此部分实现。需注意的是,`Neck`部分需对应好`BackboneStopLayer`层的输出维度。一般来说,`Neck`部分为网络的最终特征输出层。
`Head`部分主要是为了支持`metric learning`等具体loss函数,如`ArcMargin`([ArcFace Loss](https://arxiv.org/abs/1801.07698)的fc层的具体实现),在完成训练后,一般将此部分剔除。
### 4 Loss的设置
```yaml
Loss:
Train:
- CELoss:
weight: 1.0
- SupConLoss:
weight: 1.0
# SupConLoss的具体参数
views: 2
Eval:
- CELoss:
weight: 1.0
```
训练时同时使用`CELoss``SupConLoss`,权重比例为`1:1`,测试时只使用`CELoss`
### 5 优化器设置
```yaml
Optimizer:
# 使用的优化器名称
name: Momentum
# 优化器具体参数
momentum: 0.9
lr:
# 使用的学习率调节具体名称
name: MultiStepDecay
# 学习率调节算法具体参数
learning_rate: 0.01
milestones: [30, 60, 70, 80, 90, 100, 120, 140]
gamma: 0.5
verbose: False
last_epoch: -1
regularizer:
name: 'L2'
coeff: 0.0005
```
### 6 Eval Metric设置
```yaml
Metric:
Eval:
# 使用Recallk和mAP两种评价指标
- Recallk:
topk: [1, 5]
- mAP: {}
```
......@@ -8,15 +8,15 @@
全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)
## 数据集及预处理
## 1 数据集及预处理
### LogoDet-3K数据集
### 1.1 LogoDet-3K数据集
<img src="../../images/logo/logodet3k.jpg" style="zoom:50%;" />
LogoDet-3K数据集是具有完整标注的Logo数据集,有3000个标识类别,约20万个高质量的人工标注的标识对象和158652张图片。相关数据介绍参考[原论文](https://arxiv.org/abs/2008.05359)
## 数据预处理
### 1.2 数据预处理
由于原始的数据集中,图像包含标注的检测框,在识别阶段只考虑检测器抠图后的logo区域,因此采用原始的标注框抠出Logo区域图像构成训练集,排除背景在识别阶段的影响。对数据集进行划分,产生155427张训练集,覆盖3000个logo类别(同时作为测试时gallery图库),3225张测试集,用于作为查询集。抠图后的训练集可[在此下载](https://arxiv.org/abs/2008.05359)
- 图像`Resize`到224
......@@ -25,44 +25,7 @@ LogoDet-3K数据集是具有完整标注的Logo数据集,有3000个标识类
- Normlize:归一化到0~1
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
在配置文件中设置如下,详见`transform_ops`部分:
```yaml
DataLoader:
Train:
dataset:
# 具体使用的Dataset的的名称
name: "LogoDataset"
# 使用此数据集的具体参数
image_root: "dataset/LogoDet-3K-crop/train/"
cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+train.txt"
# 图像增广策略:ResizeImage、RandFlipImage等
transform_ops:
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- AugMix:
prob: 0.5
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sampler:
name: DistributedRandomIdentitySampler
batch_size: 128
num_instances: 2
drop_last: False
shuffle: True
loader:
num_workers: 6
use_shared_memory: False
```
## Backbone的具体设置
## 2 Backbone的具体设置
具体是用`ResNet50`作为backbone,主要做了如下修改:
......@@ -74,111 +37,12 @@ DataLoader:
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
在配置文件中Backbone设置如下:
```yaml
Arch:
# 使用RecModel模型进行训练,目前支持普通ImageNet和RecModel两个方式
name: "RecModel"
# 导出inference model的具体配置
infer_output_key: "features"
infer_add_softmax: False
# 使用的Backbone
Backbone:
name: "ResNet50_last_stage_stride1"
pretrained: True
# 使用此层作为Backbone的feature输出,name为具体层的full_name
BackboneStopLayer:
name: "adaptive_avg_pool2d_0"
# Backbone的基础上,新增网络层。此模型添加1x1的卷积层(embedding)
Neck:
name: "VehicleNeck"
in_channels: 2048
out_channels: 512
# 增加CircleMargin head
Head:
name: "CircleMargin"
margin: 0.35
scale: 64
embedding_size: 512
```
## Loss的设置
## 3 Loss的设置
在Logo识别中,使用了[Pairwise Cosface + CircleMargin](https://arxiv.org/abs/2002.10857) 联合训练,其中权重比例为1:1
具体代码详见:[PairwiseCosface](../../../ppcls/loss/pairwisecosface.py)[CircleMargin](../../../ppcls/arch/gears/circlemargin.py)
在配置文件中设置如下:
```yaml
Loss:
Train:
- CELoss:
weight: 1.0
- PairwiseCosface:
margin: 0.35
gamma: 64
weight: 1.0
Eval:
- CELoss:
weight: 1.0
```
## 其他相关设置
### Optimizer设置
```yaml
Optimizer:
# 使用的优化器名称
name: Momentum
# 优化器具体参数
momentum: 0.9
lr:
# 使用的学习率调节具体名称
name: Cosine
# 学习率调节算法具体参数
learning_rate: 0.01
regularizer:
name: 'L2'
coeff: 0.0001
```
### Eval Metric设置
```yaml
Metric:
Eval:
# 使用Recallk和mAP两种评价指标
- Recallk:
topk: [1, 5]
- mAP: {}
```
### 其他超参数设置
```yaml
Global:
# 如为null则从头开始训练。若指定中间训练保存的状态地址,则继续训练
checkpoints: null
pretrained_model: null
output_dir: "./output/"
device: "gpu"
class_num: 3000
# 保存模型的粒度,每个epoch保存一次
save_interval: 1
eval_during_train: True
eval_interval: 1
# 训练的epoch数
epochs: 120
# log输出频率
print_batch_step: 10
# 是否使用visualdl库
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
# 使用retrival的方式进行评测
eval_mode: "retrieval"
```
其他部分参数,详见[配置文件](../../../ppcls/configs/Logo/ResNet50_ReID.yaml)
# 车辆细粒度分类
细粒度分类,是对属于某一类基础类别的图像进行子类别的细粉,如各种鸟、各种花、各种矿石之间。顾名思义,车辆细粒度分类是对车辆的不同子类别进行分类。
其训练过程与车辆ReID相比,有以下不同:
- 数据集不同
- Loss设置不同
其他部分请详见[车辆ReID](./vehicle_reid.md)
整体配置文件:[ResNet50.yaml](../../../ppcls/configs/Vehicle/ResNet50.yaml)
## 1 数据集
在此demo中,使用[CompCars](http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html)作为训练数据集。
![](../../images/recognotion/vehicle/CompCars.png)
图像主要来自网络和监控数据,其中网络数据包含163个汽车制造商、1716个汽车型号的汽车。共**136,726**张全车图像,**27,618**张部分车图像。其中网络汽车数据包含bounding box、视角、5个属性(最大速度、排量、车门数、车座数、汽车类型)。监控数据包含**50,000**张前视角图像。
值得注意的是,此数据集中需要根据自己的需要生成不同的label,如本demo中,将不同年份生产的相同型号的车辆视为同一类,因此,类别总数为:431类。
## 2 Loss设置
与车辆ReID不同,在此分类中,Loss使用的是[TtripLet Loss](../../../ppcls/loss/triplet.py) + [ArcLoss](../../../ppcls/arch/gears/arcmargin.py),权重比例1:1。
# 车辆ReID
ReID,也就是 Re-identification,其定义是利用算法,在图像库中找到要搜索的目标的技术,所以它是属于图像检索的一个子问题。而车辆ReID就是给定一张车辆图像,找出同一摄像头不同的拍摄图像,或者不同摄像头下拍摄的同一车辆图像的过程。在此过程中,如何提取鲁棒特征,尤为重要。因此,此文档主要对车辆ReID中训练特征提取网络部分做相关介绍,内容如下:
- 数据集及预处理方式
- Backbone的具体设置
- Loss函数的相关设置
全部的超参数及具体配置:[ResNet50_ReID.yaml](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)
## 1 数据集及预处理
### 1.1 VERI-Wild数据集
<img src="../../images/recognotion/vehicle/cars.JPG" style="zoom:50%;" />
此数据集是在一个大型闭路电视监控系统,在无约束的场景下,一个月内(30*24小时)中捕获的。该系统由174个摄像头组成,其摄像机分布在200多平方公里的大型区域。原始车辆图像集包含1200万个车辆图像,经过数据清理和标注,采集了416314张40671个不同的车辆图像。[具体详见论文](https://github.com/PKU-IMRE/VERI-Wild)
### 1.2 数据预处理
由于原始的数据集中,车辆图像已经是由检测器检测后crop出的车辆图像,因此无需像训练`ImageNet`中图像crop操作。整体的数据增强方式,按照顺序如下:
- 图像`Resize`到224
- 随机水平翻转
- [AugMix](https://arxiv.org/abs/1912.02781v1)
- Normlize:归一化到0~1
- [RandomErasing](https://arxiv.org/pdf/1708.04896v2.pdf)
## 2 Backbone的具体设置
具体是用`ResNet50`作为backbone,但在`ResNet50`基础上做了如下修改:
- 0在最后加入一个embedding 层,即1x1的卷积层,特征维度为512
具体代码:[ResNet50_last_stage_stride1](../../../ppcls/arch/backbone/variant_models/resnet_variant.py)
## 3 Loss的设置
车辆ReID中,使用了[SupConLoss](https://arxiv.org/abs/2004.11362) + [ArcLoss](https://arxiv.org/abs/1801.07698),其中权重比例为1:1
具体代码详见:[SupConLoss代码](../../../ppcls/loss/supconloss.py)[ArcLoss代码](../../../ppcls/arch/gears/arcmargin.py)
其他部分的具体设置,详见[配置文件](../../../ppcls/configs/Vehicle/ResNet50_ReID.yaml)
......@@ -22,13 +22,13 @@ PaddleClas目前支持的训练/评估环境如下:
准备好配置文件之后,可以使用下面的方式启动训练。
```
python tools/train.py \
-c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="" \
-o use_gpu=True
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=False \
-o Global.device=gpu
```
其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o pretrained_model=""`表示不使用预训练模型,`-o use_gpu=True`表示使用GPU进行训练。如果希望使用CPU进行训练,则需要将`use_gpu`设置为`False`
其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o Arch.pretrained=False`表示不使用预训练模型,`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练,则需要将`Global.device`设置为`cpu`
更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)
......@@ -37,9 +37,9 @@ python tools/train.py \
* 如果在训练中使用了mixup或者cutmix的数据增广方式,那么日志中将不会打印top-1与top-k(默认为5)信息:
```
...
epoch:0 , train step:20 , loss: 4.53660, lr: 0.003750, batch_cost: 1.23101 s, reader_cost: 0.74311 s, ips: 25.99489 images/sec, eta: 0:12:43
[Train][Epoch 3/20][Avg]CELoss: 6.46287, loss: 6.46287
...
END epoch:1 valid top1: 0.01569, top5: 0.06863, loss: 4.61747, batch_cost: 0.26155 s, reader_cost: 0.16952 s, batch_cost_sum: 10.72348 s, ips: 76.46772 images/sec.
[Eval][Epoch 3][Avg]CELoss: 5.94309, loss: 5.94309, top1: 0.01961, top5: 0.07941
...
```
......@@ -47,9 +47,9 @@ python tools/train.py \
```
...
epoch:0 , train step:30 , top1: 0.06250, top5: 0.09375, loss: 4.62766, lr: 0.003728, batch_cost: 0.64089 s, reader_cost: 0.18857 s, ips: 49.93080 images/sec, eta: 0:06:18
[Train][Epoch 3/20][Avg]CELoss: 6.12570, loss: 6.12570, top1: 0.01765, top5: 0.06961
...
END epoch:0 train top1: 0.01310, top5: 0.04738, loss: 4.65124, batch_cost: 0.64089 s, reader_cost: 0.18857 s, batch_cost_sum: 13.45863 s, ips: 49.93080 images/sec.
[Eval][Epoch 3][Avg]CELoss: 5.40727, loss: 5.40727, top1: 0.07549, top5: 0.20980
...
```
......@@ -60,13 +60,13 @@ python tools/train.py \
根据自己的数据集路径设置好配置文件后,可以通过加载预训练模型的方式进行微调,如下所示。
```
python tools/train.py \
-c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained" \
-o use_gpu=True
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=True \
-o Global.device=gpu
```
其中`-o pretrained_model`用于设置加载预训练模型权重文件的地址,使用时需要换成自己的预训练模型权重文件的路径,也可以直接在配置文件中修改该路径。
其中`Arch.pretrained`设置为`True`表示加载ImageNet的预训练模型,此外,`Arch.pretrained`也可以指定具体的模型权重文件的地址,使用时需要换成自己的预训练模型权重文件的路径。
我们也提供了大量基于`ImageNet-1k`数据集的预训练模型,模型列表及下载地址详见[模型库概览](../models/models_intro.md)
......@@ -76,28 +76,29 @@ python tools/train.py \
如果训练任务因为其他原因被终止,也可以加载断点权重文件,继续训练:
```
python tools/train.py \
-c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-o last_epoch=5 \
-o use_gpu=True
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-o Optimizer.lr.last_epoch=5 \
-o Global.device=gpu
```
其中配置文件不需要做任何修改,只需要在继续训练时设置`checkpoints`参数即可,表示加载的断点权重文件路径,使用该参数会同时加载保存的断点权重和学习率、优化器等信息。
**注意**
* 参数`-o last_epoch=5`表示将上一次训练轮次数记为`5`,即本次训练轮次数从`6`开始计算,该值默认为-1,表示本次训练轮次数从`0`开始计算。
* 参数`-o Optimizer.lr.last_epoch=5`表示将上一次训练轮次数记为`5`,即本次训练轮次数从`6`开始计算,该值默认为-1,表示本次训练轮次数从`0`开始计算。
* `-o Global.checkpoints`参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点`5`继续训练,则`Global.checkpoints`参数只需设置为`"../output/MobileNetV3_large_x1_0/epoch_5"`,PaddleClas会自动补充后缀名。
* `-o checkpoints`参数无需包含断点权重文件的后缀名,上述训练命令会在训练过程中生成如下所示的断点权重文件,若想从断点`5`继续训练,则`checkpoints`参数只需设置为`"./output/MobileNetV3_large_x1_0_gpupaddle/5/ppcls"`,PaddleClas会自动补充后缀名。
```shell
output/
── MobileNetV3_large_x1_0
├── 0
│ ├── ppcls.pdopt
│ └── ppcls.pdparams
├── 1
│ ├── ppcls.pdopt
│ └── ppcls.pdparams
output
── MobileNetV3_large_x1_0
│   ├── best_model.pdopt
│   ├── best_model.pdparams
│   ├── best_model.pdstates
│   ├── epoch_1.pdopt
│   ├── epoch_1.pdparams
│   ├── epoch_1.pdstates
.
.
.
......@@ -109,20 +110,18 @@ python tools/train.py \
可以通过以下命令进行模型评估。
```bash
python tools/eval.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-o load_static_weights=False
python3 tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
上述命令将使用`./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml`作为配置文件,对上述训练得到的模型`./output/MobileNetV3_large_x1_0/best_model/ppcls`进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过`-o`参数更新配置,如上所示。
上述命令将使用`./configs/quick_start/MobileNetV3_large_x1_0.yaml`作为配置文件,对上述训练得到的模型`./output/MobileNetV3_large_x1_0/best_model`进行评估。你也可以通过更改配置文件中的参数来设置评估,也可以通过`-o`参数更新配置,如上所示。
可配置的部分评估参数说明如下:
* `ARCHITECTURE.name`:模型名称
* `pretrained_model`:待评估的模型文件路径
* `load_static_weights`:待评估模型是否为静态图模型
* `Arch.name`:模型名称
* `Global.pretrained_model`:待评估的模型文件路径
**注意:** 如果模型为动态图模型,则在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas会自动补齐`.pdparams`的后缀,如[1.3 模型恢复训练](#1.3)
**注意:** 在加载待评估模型时,需要指定模型文件的路径,但无需包含文件后缀名,PaddleClas会自动补齐`.pdparams`的后缀,如[1.3 模型恢复训练](#1.3)
<a name="2"></a>
## 2. 基于Linux+GPU的模型训练与评估
......@@ -138,23 +137,11 @@ python tools/eval.py \
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
```
其中,`-c`用于指定配置文件的路径,可通过配置文件修改相关训练配置信息,也可以通过添加`-o`参数来更新配置:
```bash
python -m paddle.distributed.launch \
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="" \
-o use_gpu=True
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml
```
`-o`用于指定需要修改或者添加的参数,其中`-o pretrained_model=""`表示不使用预训练模型,`-o use_gpu=True`表示使用GPU进行训练。
输出日志信息的格式同上,详见[1.1 模型训练](#1.1)
......@@ -165,14 +152,14 @@ python -m paddle.distributed.launch \
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained"
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=True
```
其中`pretrained_model`用于设置加载预训练权重文件的路径,使用时需要换成自己的预训练模型权重文件路径,也可以直接在配置文件中修改该路径。
其中`Arch.pretrained``True``False`,当然也可以设置加载预训练权重文件的路径,使用时需要换成自己的预训练模型权重文件路径,也可以直接在配置文件中修改该路径。
30分钟玩转PaddleClas[尝鲜版](./quick_start_new_user.md)[进阶版](./quick_start_professional.md)中包含大量模型微调的示例,可以参考该章节在特定的数据集上进行模型微调。
......@@ -185,16 +172,16 @@ python -m paddle.distributed.launch \
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-o last_epoch=5 \
-o use_gpu=True
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-o Optimizer.lr.last_epoch=5 \
-o Global.device=gpu
```
其中配置文件不需要做任何修改,只需要在训练时设置`checkpoints`参数与`last_epoch`参数即可,该参数表示加载的断点权重文件路径,使用该参数会同时加载保存的模型参数权重和学习率、优化器等信息,详见[1.3 模型恢复训练](#1.3)
其中配置文件不需要做任何修改,只需要在训练时设置`Global.checkpoints`参数与`Optimizer.lr.last_epoch`参数即可,该参数表示加载的断点权重文件路径,使用该参数会同时加载保存的模型参数权重和学习率、优化器等信息,详见[1.3 模型恢复训练](#1.3)
### 2.4 模型评估
......@@ -202,10 +189,11 @@ python -m paddle.distributed.launch \
可以通过以下命令进行模型评估。
```bash
python tools/eval.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-o load_static_weights=False
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
参数说明详见[1.4 模型评估](#1.4)
......@@ -217,26 +205,16 @@ python tools/eval.py \
模型训练完成之后,可以加载训练得到的预训练模型,进行模型预测。在模型库的 `tools/infer/infer.py` 中提供了完整的示例,只需执行下述命令即可完成模型预测:
```python
python tools/infer/infer.py \
-i 待预测的图片文件路径 \
--model MobileNetV3_large_x1_0 \
--pretrained_model "./output/MobileNetV3_large_x1_0/best_model/ppcls" \
--use_gpu True \
--class_num 1000
python3 tools/infer.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
参数说明:
+ `image_file`(简写 i):待预测的图片文件路径或者批量预测时的图片文件夹,如 `./test.jpeg`
+ `model`:模型名称,如 `MobileNetV3_large_x1_0`
+ `pretrained_model`:模型权重文件路径,如 `./output/MobileNetV3_large_x1_0/best_model/ppcls`
+ `use_gpu` : 是否开启GPU训练,默认值:`True`
+ `class_num` : 类别数,默认为1000,需要根据自己的数据进行修改。
+ `resize_short`: 对输入图像进行等比例缩放,表示最短边的尺寸,默认值:`256`
+ `resize`: 对`resize_short`操作后的进行居中裁剪,表示裁剪的尺寸,默认值:`224`
+ `pre_label_image` : 是否对图像数据进行预标注,默认值:`False`
+ `pre_label_out_idr` : 预标注图像数据的输出文件夹,当`pre_label_image=True`时,会在该文件夹下面生成很多个子文件夹,每个文件夹名称为类别id,其中存储模型预测属于该类别的所有图像。
**注意**: 如果使用`Transformer`系列模型,如`DeiT_***_384`, `ViT_***_384`等,请注意模型的输入数据尺寸,需要设置参数`resize_short=384`, `resize=384`
+ `Infer.infer_imgs`:待预测的图片文件路径或者批量预测时的图片文件夹。
+ `Global.pretrained_model`:模型权重文件路径,如 `./output/MobileNetV3_large_x1_0/best_model`
<a name="model_inference"></a>
......@@ -246,42 +224,39 @@ python tools/infer/infer.py \
首先,对训练好的模型进行转换:
```bash
python tools/export_model.py \
--model MobileNetV3_large_x1_0 \
--pretrained_model ./output/MobileNetV3_large_x1_0/best_model/ppcls \
--output_path ./inference \
--class_dim 1000
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=output/MobileNetV3_large_x1_0/best_model
```
其中,参数`--model`用于指定模型名称,`--pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[1.3 模型恢复训练](#1.3)),`--output_path`用于指定转换后模型的存储路径,`class_dim`表示模型所包含的类别数,默认为1000。
**注意**
1. `--output_path`表示输出的inference模型文件夹路径,若`--output_path=./inference`,则会在`inference`文件夹下生成`inference.pdiparams``inference.pdmodel``inference.pdiparams.info`文件。
2. 可以通过设置参数`--img_size`指定模型输入图像的`shape`,默认为`224`,表示图像尺寸为`224*224`,请根据实际情况修改。
其中,`Global.pretrained_model`用于指定模型文件路径,该路径仍无需包含模型文件后缀名(如[1.3 模型恢复训练](#1.3))。
上述命令将生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),然后可以使用预测引擎进行推理:
进入deploy目录下:
```bash
python tools/infer/predict.py \
--image_file 图片路径 \
--model_file "./inference/inference.pdmodel" \
--params_file "./inference/inference.pdiparams" \
--use_gpu=True \
--use_tensorrt=False
cd deploy
```
其中:
+ `image_file`:待预测的图片文件路径,如 `./test.jpeg`
+ `model_file`:模型结构文件路径,如 `./inference/inference.pdmodel`
+ `params_file`:模型权重文件路径,如 `./inference/inference.pdiparams`
+ `use_tensorrt`:是否使用 TesorRT 预测引擎,默认值:`True`
+ `use_gpu`:是否使用 GPU 预测,默认值:`True`
+ `enable_mkldnn`:是否启用`MKL-DNN`加速,默认为`False`。注意`enable_mkldnn``use_gpu`同时为`True`时,将忽略`enable_mkldnn`,而使用GPU运行。
+ `resize_short`: 对输入图像进行等比例缩放,表示最短边的尺寸,默认值:`256`
+ `resize`: 对`resize_short`操作后的进行居中裁剪,表示裁剪的尺寸,默认值:`224`
+ `enable_calc_topk`: 是否计算预测结果的Topk精度指标,默认为`False`
+ `gt_label_path`: 图像文件名以及真值标签文件,当`enable_calc_topk`为True时生效,用于读取待预测的图像列表及其标签。
执行命令进行预测,由于默认class_id_map_file是ImageNet数据集的映射文件,所以此处需要置None。
**注意**: 如果使用`Transformer`系列模型,如`DeiT_***_384`, `ViT_***_384`等,请注意模型的输入数据尺寸,需要设置参数`resize_short=384`, `resize=384`
```bash
python3 python/predict_cls.py \
-c configs/inference_cls.yaml \
-o Global.infer_imgs=../dataset/flowers102/jpg/image_00001.jpg \
-o Global.inference_model_dir=../inference/ \
-o PostProcess.class_id_map_file=None
* 如果你希望评测模型速度,建议使用该脚本(`tools/infer/predict.py`),同时开启TensorRT加速预测。
其中:
+ `Global.infer_imgs`:待预测的图片文件路径。
+ `Global.inference_model_dir`:inference模型结构文件路径,如 `../inference/inference.pdmodel`
+ `Global.use_tensorrt`:是否使用 TesorRT 预测引擎,默认值:`False`
+ `Global.use_gpu`:是否使用 GPU 预测,默认值:`True`
+ `Global.enable_mkldnn`:是否启用`MKL-DNN`加速,默认为`False`。注意`enable_mkldnn``use_gpu`同时为`True`时,将忽略`enable_mkldnn`,而使用GPU运行。
+ `Global.use_fp16`:是否启用`FP16`,默认为`False`
* 如果你希望提升评测模型速度,使用gpu评测时,建议开启TensorRT加速预测,使用cpu评测时,建议开启MKL-DNN加速预测。
# 安装说明
---
本章将介绍如何安装PaddleClas及其依赖项。
## 一、简介
本章将介绍如何安装PaddleClas及其依赖项。
## 1. 安装PaddlePaddle
运行PaddleClas需要`PaddlePaddle 2.0`或更高版本。可以参考下面的步骤安装PaddlePaddle。
### 1.1 环境要求
- python 3.x
- cuda >= 10.1 (如果使用paddlepaddle-gpu)
- cudnn >= 7.6.4 (如果使用paddlepaddle-gpu)
- nccl >= 2.1.2 (如果使用分布式训练/评估)
- gcc >= 8.2
建议使用我们提供的docker运行PaddleClas,有关docker、nvidia-docker使用请参考[链接](https://www.runoob.com/docker/docker-tutorial.html)
在cuda10.1时,建议显卡驱动版本大于等于418.39;在使用cuda10.2时,建议显卡驱动版本大于440.33,更多cuda版本与要求的显卡驱动版本可以参考[链接](https://docs.nvidia.com/deploy/cuda-compatibility/index.html)
## 二、安装PaddlePaddle
如果不使用docker,可以直接跳过1.2部分内容,从1.3部分开始执行。
运行PaddleClas需要`PaddlePaddle 2.0`或更高版本。请参照[安装文档](http://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
### 通过pip安装PaddlePaddle
### 1.2 (建议)准备docker环境。第一次使用这个镜像,会自动下载该镜像,请耐心等待。
如果已经安装好了cuda、cudnn、nccl或者安装好了docker、nvidia-docker运行环境,可以pip安装最新GPU版本PaddlePaddle
```
# 切换到工作目录下
cd /home/Projects
# 首次运行需创建一个docker容器,再次运行时不需要运行当前命令
# 创建一个名字为ppcls的docker容器,并将当前目录映射到容器的/paddle目录下
如果您希望在CPU环境下使用docker,使用docker而不是nvidia-docker创建docker,设置docker容器共享内存shm-size为8G,建议设置8G以上
sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it docker pull paddlepaddle/paddle:2.1.0 /bin/bash
如果希望使用GPU版本的容器,请运行以下命令创建容器。
sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it docker pull paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
```
您也可以访问[DockerHub](https://hub.docker.com/r/paddlepaddle/paddle/tags/)获取与您机器适配的镜像。
```
# ctrl+P+Q可退出docker 容器,重新进入docker 容器使用如下命令
sudo docker container exec -it ppcls /bin/bash
```
### 1.3 通过pip安装PaddlePaddle
运行下面的命令,通过pip安装最新GPU版本PaddlePaddle
```bash
pip3 install paddlepaddle-gpu --upgrade -i https://mirror.baidu.com/pypi/simple
......@@ -25,10 +61,12 @@ pip3 install paddlepaddle-gpu --upgrade -i https://mirror.baidu.com/pypi/simple
pip3 install paddlepaddle --upgrade -i https://mirror.baidu.com/pypi/simple
```
### 源码编译PaddlePaddle
**注意:**
* 如果先安装了CPU版本的paddlepaddle,之后想切换到GPU版本,那么需要首先卸载CPU版本的paddle,再安装GPU版本的paddle,否则容易导致使用的paddle版本混乱。
* 您也可以从源码编译安装PaddlePaddle,请参照[PaddlePaddle 安装文档](http://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
您也可以从源码编译安装PaddlePaddle,请参照[安装文档](http://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。
### 1.4 验证是否安装成功
使用以下命令可以验证PaddlePaddle是否安装成功。
......@@ -47,18 +85,12 @@ python3 -c "import paddle; print(paddle.__version__)"
- 从源码编译的PaddlePaddle版本号为0.0.0,请确保使用了PaddlePaddle 2.0及之后的源码编译。
- PaddleClas基于PaddlePaddle高性能的分布式训练能力,若您从源码编译,请确保打开编译选项,**WITH_DISTRIBUTE=ON**。具体编译选项参考[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3)
- 在docker中运行时,为保证docker容器有足够的共享内存用于Paddle的数据读取加速,在创建docker容器时,请设置参数`--shm_size=8g`,条件允许的话可以设置为更大的值。
-
**运行环境需求:**
- Python3
- CUDA >= 9.0
- cuDNN >= 7.6.4
- nccl >= 2.1.2
## 2. 安装PaddleClas
## 三、安装PaddleClas
**克隆PaddleClas模型库:**
### 2.1 克隆PaddleClas模型库
```bash
git clone https://github.com/PaddlePaddle/PaddleClas.git -b develop
......@@ -70,20 +102,10 @@ git clone https://github.com/PaddlePaddle/PaddleClas.git -b develop
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop
```
**安装Python依赖库:**
### 2.2 安装Python依赖库
Python依赖库在`requirements.txt`中给出,可通过如下命令安装:
```bash
pip3 install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple
```
visualdl可能出现安装失败,请尝试
```bash
pip3 install --upgrade visualdl -i https://mirror.baidu.com/pypi/simple
```
此外,visualdl目前只支持在python3下运行,因此如果希望使用visualdl,需要使用python3。
......@@ -51,34 +51,30 @@ train/n01440764/n01440764_10027.JPEG 0
对于读入的数据,需要通过数据转换,将原始的图像数据进行转换。训练时,标准的数据预处理包含:`DecodeImage`, `RandCropImage`, `RandFlipImage`, `NormalizeImage`, `ToCHWImage`。在配置文件中体现如下,数据预处理主要包含在`transforms`字段中,以列表形式呈现,会按照顺序对数据依次做这些转换。
```yaml
TRAIN:
batch_size: 256 # 所有训练设备上的总batch size
num_workers: 4 # 训练时每块设备上的进程数
file_list: "./dataset/ILSVRC2012/train_list.txt" # 训练标签文件
data_dir: "./dataset/ILSVRC2012/" # 训练图片文件夹
shuffle_seed: 0 # 随机打散的种子数
transforms:
- DecodeImage: # 对图像文件进行解码,转成numpy矩阵
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage: # 对图像做随机裁剪
- RandCropImage:
size: 224
- RandFlipImage: # 对图像做随机翻转
- RandFlipImage:
flip_code: 1
- NormalizeImage: # 对图像做归一化
scale: 1./255.
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage: # 将图像从HWC格式转成CHW格式
mix:
- MixupOperator: # mixup数据增广,在全局配置use_mix=True时生效
alpha: 0.2
```
PaddleClas中也包含了`AutoAugment`, `RandAugment`等数据增广方法,也可以通过在配置文件中配置,从而添加到训练过程的数据预处理中。每个数据转换的方法均以类实现,方便迁移和复用,更多的数据处理具体实现过程可以参考`ppcls/data/imaug/operators.py`
PaddleClas中也包含了`AutoAugment`, `RandAugment`等数据增广方法,也可以通过在配置文件中配置,从而添加到训练过程的数据预处理中。每个数据转换的方法均以类实现,方便迁移和复用,更多的数据处理具体实现过程可以参考`ppcls/data/preprocess/ops/`下的代码
对于组成一个batch的数据,也可以使用mixup或者cutmix等方法进行数据增广。PaddleClas中集成了`MixupOperator`, `CutmixOperator`, `FmixOperator`等基于batch的数据增广方法,可以在配置文件中配置mix参数进行配置,更加具体的实现可以参考`ppcls/data/imaug/batch_operators.py`
对于组成一个batch的数据,也可以使用mixup或者cutmix等方法进行数据增广。PaddleClas中集成了`MixupOperator`, `CutmixOperator`, `FmixOperator`等基于batch的数据增广方法,可以在配置文件中配置mix参数进行配置,更加具体的实现可以参考`ppcls/data/preprocess/batch_ops/batch_operators.py`
图像分类中,数据后处理主要为`argmax`操作,在此不再赘述。
......@@ -87,69 +83,47 @@ PaddleClas中也包含了`AutoAugment`, `RandAugment`等数据增广方法,也
在配置文件中,模型结构定义如下
```yaml
ARCHITECTURE:
name: "EfficientNetB0"
params: # 模型需要传入的额外参数,如果没有可不填
padding_type : "SAME"
override_params:
drop_connect_rate: 0.1
Arch:
name: ResNet50
pretrained: False
use_ssld: False
```
`Arch.name`表示模型名称,`Arch.pretrained`表示是否添加预训练模型。所有的模型名称均在`ppcls/arch/backbone/__init__.py`中定义。
`ARCHITECTURE.name`表示模型名称,`ARCHITECTURE.params`表示需要额外传入的参数,默认为空。所有的模型名称均在`/ppcls/modeling/architectures/__init__.py`中定义。
对应的,在`tools/program.py`中,通过`create_model`方法创建模型对象。
对应的,在`ppcls/arch/__init__.py`中,通过`build_model`方法创建模型对象。
```python
def create_model(architecture, classes_num):
name = architecture["name"]
params = architecture.get("params", {})
return architectures.__dict__[name](class_dim=classes_num, **params)
def build_model(config):
config = copy.deepcopy(config)
model_type = config.pop("name")
mod = importlib.import_module(__name__)
arch = getattr(mod, model_type)(**config)
return arch
```
* 损失函数
PaddleClas中,包含了`CELoss`, `MixCELoss`, `GoogLeNetLoss`, `JSDivLoss`, `MultiLabelLoss`等损失函数,均定义在`ppcls/modeling/loss.py`中。
PaddleClas中,包含了`CELoss`, `JSDivLoss`, `TripletLoss`, `CenterLoss`等损失函数,均定义在`ppcls/loss`中。
`tools/program.py`文件中,使用`create_loss`构建模型的损失函数,不同训练策略中所需要的损失函数与计算方法不同,PaddleClas在构建损失函数过程中,主要考虑了以下几个因素。
`ppcls/loss/__init__.py`文件中,使用`CombinedLoss`来构建及合并损失函数,不同训练策略中所需要的损失函数与计算方法不同,PaddleClas在构建损失函数过程中,主要考虑了以下几个因素。
1. 是否使用label smooth
2. 是否使用mixup或者cutmix
3. 是否使用蒸馏方法进行训练
4. 是否进行多标签训练
4. 是否是训练metric learning
```python
def create_loss(feeds,
out,
architecture,
classes_num=1000,
epsilon=None,
use_mix=False,
use_distillation=False,
multilabel=False):
if architecture["name"] == "GoogLeNet":
assert len(out) == 3, "GoogLeNet should have 3 outputs"
loss = GoogLeNetLoss(class_dim=classes_num, epsilon=epsilon)
return loss(out[0], out[1], out[2], feeds["label"])
if use_distillation:
assert len(out) == 2, ("distillation output length must be 2, "
"but got {}".format(len(out)))
loss = JSDivLoss(class_dim=classes_num, epsilon=epsilon)
return loss(out[1], out[0])
if use_mix:
loss = MixCELoss(class_dim=classes_num, epsilon=epsilon)
feed_y_a = feeds['y_a']
feed_y_b = feeds['y_b']
feed_lam = feeds['lam']
return loss(out, feed_y_a, feed_y_b, feed_lam)
else:
if not multilabel:
loss = CELoss(class_dim=classes_num, epsilon=epsilon)
else:
loss = MultiLabelLoss(class_dim=classes_num, epsilon=epsilon)
return loss(out, feeds["label"])
用户可以在配置文件中指定损失函数的类型及权重,如在训练中添加TripletLossV2,配置文件如下:
```yaml
Loss:
Train:
- CELoss:
weight: 1.0
- TripletLossV2:
weight: 1.0
margin: 0.5
```
* 优化器和学习率衰减、权重衰减策略
......@@ -158,48 +132,53 @@ def create_loss(feeds,
权重衰减策略是一种比较常用的正则化方法,主要用于防止模型过拟合。PaddleClas中提供了`L1Decay``L2Decay`两种权重衰减策略。
学习率衰减是图像分类任务中必不可少的精度提升训练方法,PaddleClas目前支持`Cosine`, `Piecewise`, `CosineWarmup`, `ExponentialWarmup`等学习率衰减策略。
学习率衰减是图像分类任务中必不可少的精度提升训练方法,PaddleClas目前支持`Cosine`, `Piecewise`, `Linear`等学习率衰减策略。
在配置文件中,优化器和权重衰减策略可以通过以下的字段进行配置。
在配置文件中,优化器、权重衰减策略、学习率衰减策略可以通过以下的字段进行配置。
```yaml
OPTIMIZER:
function: 'Momentum' # Momentum优化器
params:
momentum: 0.9
regularizer:
function: 'L2' # L1 means L1Decay, L2 means L2Decay
factor: 0.00010
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
```
学习率衰减策略可以通过以下的字段进行配置。
```yaml
LEARNING_RATE:
function: 'Piecewise' # Piecewise学习率衰减策略
params:
lr: 0.1 # 初始学习率
decay_epochs: [30, 60, 90] # 学习率下降时对应的epoch数量
gamma: 0.1 # 学习率衰减倍数
```
`tools/program.py`中使用`create_optimizer`创建优化器和学习率对象。
`ppcls/optimizer/__init__.py`中使用`build_optimizer`创建优化器和学习率对象。
```python
def create_optimizer(config, parameter_list=None):
# create learning_rate instance
lr_config = config['LEARNING_RATE']
lr_config['params'].update({
'epochs': config['epochs'],
'step_each_epoch':
config['total_images'] // config['TRAIN']['batch_size'],
})
lr = LearningRateBuilder(**lr_config)()
# create optimizer instance
opt_config = config['OPTIMIZER']
opt = OptimizerBuilder(**opt_config)
return opt(lr, parameter_list), lr
def build_optimizer(config, epochs, step_each_epoch, parameters):
config = copy.deepcopy(config)
# step1 build lr
lr = build_lr_scheduler(config.pop('lr'), epochs, step_each_epoch)
logger.debug("build lr ({}) success..".format(lr))
# step2 build regularization
if 'regularizer' in config and config['regularizer'] is not None:
reg_config = config.pop('regularizer')
reg_name = reg_config.pop('name') + 'Decay'
reg = getattr(paddle.regularizer, reg_name)(**reg_config)
else:
reg = None
logger.debug("build regularizer ({}) success..".format(reg))
# step3 build optimizer
optim_name = config.pop('name')
if 'clip_norm' in config:
clip_norm = config.pop('clip_norm')
grad_clip = paddle.nn.ClipGradByNorm(clip_norm=clip_norm)
else:
grad_clip = None
optim = getattr(optimizer, optim_name)(learning_rate=lr,
weight_decay=reg,
grad_clip=grad_clip,
**config)(parameters=parameters)
logger.debug("build optimizer ({}) success..".format(optim))
return optim, lr
```
不同优化器和权重衰减策略均以类的形式实现具体实现可以参考文件`ppcls/optimizer/optimizer.py`不同的学习率衰减策略可以参考文件`ppcls/optimizer/learning_rate.py`
......@@ -210,27 +189,22 @@ def create_optimizer(config, parameter_list=None):
模型在训练的时候可以设置模型保存的间隔也可以选择每隔若干个epoch对验证集进行评估从而可以保存在验证集上精度最佳的模型配置文件中可以通过下面的字段进行配置
```yaml
save_interval: 1 # 模型保存的epoch间隔
validate: True # 是否进行训练时评估
valid_interval: 1 # 评估的epoch间隔
Global:
save_interval: 1 # 模型保存的epoch间隔
eval_during_train: True # 是否进行训练时评估
eval_interval: 1 # 评估的epoch间隔
```
模型存储是通过 Paddle 框架的 `paddle.save()` 函数实现的,存储的是模型的 persistable 版本,便于继续训练。具体实现如下
```python
def save_model(net, optimizer, model_path, epoch_id, prefix='ppcls'):
# just save model in trainer_id=0
if paddle.distributed.get_rank() != 0:
return
ef save_model(program, model_path, epoch_id, prefix='ppcls'):
model_path = os.path.join(model_path, str(epoch_id))
_mkdir_if_not_exist(model_path)
model_prefix = os.path.join(model_path, prefix)
# save student model during distillation
_save_student_model(net, model_prefix)
paddle.save(net.state_dict(), model_prefix + ".pdparams")
paddle.save(optimizer.state_dict(), model_prefix + ".pdopt")
logger.info("Already save model in {}".format(model_path))
paddle.static.save(program, model_prefix)
logger.info(
logger.coloring("Already save model in {}".format(model_path), "HEADER"))
```
在保存的时候有两点需要注意:
......
......@@ -75,20 +75,6 @@ cd ../../
### 预训练模型下载
```shell
# 创建文件夹pretrained文件夹并进入
mkdir pretrained && cd pretrained
# 下载预训练模型
# 下载ResNet50_vd模型
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
# 下载ShuffleNetV2_x0_25模型
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams
# 回到PaddleClas主目录
cd ..
```
Windows操作如上提示,在PaddleClas根目录下创建相应文件夹,并下载好预训练模型后,放到此文件夹中。
### 训练模型
#### 使用CPU进行模型训练
......@@ -99,20 +85,19 @@ Windows操作如上提示,在PaddleClas根目录下创建相应文件夹,并
```shell
#windows在cmd中进入PaddleClas根目录,执行此命令
python tools/train.py -c ./configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml
python tools/train.py -c ./ppcls/configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml
```
- `-c` 参数是指定训练的配置文件路径,训练的具体超参数可查看`yaml`文件
- `yaml``use_gpu` 参数设置为`False`,即使用CPU进行训练(若不设置,此参数默认为`True`
- `yaml``Global.device` 参数设置为`cpu`,即使用CPU进行训练(若不设置,此参数默认为`True`
- `yaml`文件中`epochs`参数设置为20,说明对整个数据集进行20个epoch迭代,预计训练20分钟左右(不同CPU,训练时间略有不同),此时训练模型不充分。若提高训练模型精度,请将此参数设大,如**40**,训练时间也会相应延长
##### 使用预训练模型
```shell
python tools/train.py -c ./configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml -o pretrained_model="pretrained/ShuffleNetV2_x0_25_pretrained"
```
python tools/train.py -c ./ppcls/configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml -o Arch.pretrained=True
- `-o` 参数加入预训练模型地址,注意:预训练模型路径不要加上:`.pdparams`
- `-o` 参数可以选择为True或False,也可以是预训练模型存放路径,当选择为True时,预训练权重会自动下载到本地。注意:若为预训练模型路径,则不要加上:`.pdparams`
可以使用将使用与不使用预训练模型训练进行对比,观察loss的下降情况。
......@@ -137,7 +122,7 @@ python tools/train.py -c ./configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml
##### 不使用预训练模型
```shell
python tools/train.py -c ./configs/quick_start/ResNet50_vd.yaml
python3 tools/train.py -c ./ppcls/configs/quick_start/ResNet50_vd.yaml
```
训练完成后,验证集的`Top1 Acc`曲线如下所示,最高准确率为0.2735。训练精度曲线下图所示
......@@ -149,13 +134,12 @@ python tools/train.py -c ./configs/quick_start/ResNet50_vd.yaml
基于ImageNet1k分类预训练模型进行微调,训练脚本如下所示
```shell
python tools/train.py -c ./configs/quick_start/ResNet50_vd_finetune.yaml
python3 tools/train.py -c ./ppcls/configs/quick_start/ResNet50_vd.yaml -o Arch.pretrained=True
```
**注**:
- 此训练脚本使用GPU,如使用CPU可按照上文中[使用CPU进行模型训练](#使用CPU进行模型训练)所示,进行修改
-[不使用预训练模型](#不使用预训练模型)`yaml`文件的主要不同,此`ymal`文件中加入 `pretrained_model` 参数,此参数指明预训练模型的位置
验证集的`Top1 Acc`曲线如下所示,最高准确率为0.9402,加载预训练模型之后,flowers102数据集精度大幅提升,绝对精度涨幅超过65%。
......@@ -167,35 +151,16 @@ python tools/train.py -c ./configs/quick_start/ResNet50_vd_finetune.yaml
```shell
cd $path_to_PaddleClas
python tools/infer/infer.py --model ShuffleNetV2_x0_25 -i dataset/flowers102/jpg/image_00001.jpg --pretrained_model output/ShuffleNetV2_x0_25/best_model/ppcls --class_num 102 --use_gpu False
python3 tools/infer.py -c ./ppcls/configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg -o Global.pretrained_model=output/ShuffleNetV2_x0_25/best_model
```
其中主要参数如下:
- `--model`:训练时使用擦网络模型,如 ShuffleNetV2_x0_25、ResNet50_vd,具体可查看训练时`yaml`文件中**ARCHITECTURE****name**参数的值
- `-i`:图像文件路径或者图像所在目录
- `--pretrained_model`: 存放的模型权重位置。上述CPU训练过程中,最优模型存放位置如下:`output/ShuffleNetV2_x0_25/best_model/ppcls.pdparams`,此时此参数应如下填写:`output/ShuffleNetV2_x0_25/best_model/ppcls`,去掉`.pdparams`
- `--class_num`:为图像类别数,`flowers102`数据集为102类。若用其他数据集,改成相应类别数即可
- `--use_gpu`:是否使用GPU
`-i`输入为单张图像路径,运行成功后,示例结果如下:
`File:image_00001.jpg, Top-1 result: class id(s): [72], score(s): [0.03]`
`[{'class_ids': [76, 65, 34, 9, 69], 'scores': [0.91762, 0.01801, 0.00833, 0.0071, 0.00669], 'file_name': 'dataset/flowers102/jpg/image_00001.jpg', 'label_names': []}]`
`-i`输入为图像集所在目录,运行成功后,示例结果如下:
```txt
File:image_02993.jpg, Top-1 result: class id(s): [77], score(s): [0.02]
File:image_00448.jpg, Top-1 result: class id(s): [77], score(s): [0.02]
File:image_08001.jpg, Top-1 result: class id(s): [77], score(s): [0.01]
File:image_00804.jpg, Top-1 result: class id(s): [100], score(s): [0.02]
File:image_01842.jpg, Top-1 result: class id(s): [100], score(s): [0.02]
File:image_02790.jpg, Top-1 result: class id(s): [70], score(s): [0.05]
File:image_03412.jpg, Top-1 result: class id(s): [100], score(s): [0.02]
File:image_05196.jpg, Top-1 result: class id(s): [77], score(s): [0.02]
File:image_06860.jpg, Top-1 result: class id(s): [70], score(s): [0.03]
File:image_05312.jpg, Top-1 result: class id(s): [77], score(s): [0.02]
File:image_05930.jpg, Top-1 result: class id(s): [100], score(s): [0.02]
File:image_05711.jpg, Top-1 result: class id(s): [77], score(s): [0.01]
File:image_01180.jpg, Top-1 result: class id(s): [70], score(s): [0.03]
[{'class_ids': [76, 65, 34, 9, 69], 'scores': [0.91762, 0.01801, 0.00833, 0.0071, 0.00669], 'file_name': 'dataset/flowers102/jpg/image_00001.jpg', 'label_names': []}, {'class_ids': [76, 69, 34, 28, 9], 'scores': [0.77122, 0.06295, 0.02537, 0.02531, 0.0251], 'file_name': 'dataset/flowers102/jpg/image_00002.jpg', 'label_names': []}, {'class_ids': [99, 76, 81, 85, 16], 'scores': [0.26374, 0.20423, 0.07818, 0.06042, 0.05499], 'file_name': 'dataset/flowers102/jpg/image_00003.jpg', 'label_names': []}, {'class_ids': [9, 37, 34, 24, 76], 'scores': [0.17784, 0.16651, 0.14539, 0.12096, 0.04816], 'file_name': 'dataset/flowers102/jpg/image_00004.jpg', 'label_names': []}, {'class_ids': [76, 66, 91, 16, 13], 'scores': [0.95494, 0.00688, 0.00596, 0.00352, 0.00308], 'file_name': 'dataset/flowers102/jpg/image_00005.jpg', 'label_names': []}, {'class_ids': [76, 66, 34, 8, 43], 'scores': [0.44425, 0.07487, 0.05609, 0.05609, 0.03667], 'file_name': 'dataset/flowers102/jpg/image_00006.jpg', 'label_names': []}, {'class_ids': [86, 93, 81, 22, 21], 'scores': [0.44714, 0.13582, 0.07997, 0.0514, 0.03497], 'file_name': 'dataset/flowers102/jpg/image_00007.jpg', 'label_names': []}, {'class_ids': [13, 76, 81, 18, 97], 'scores': [0.26771, 0.1734, 0.06576, 0.0451, 0.03986], 'file_name': 'dataset/flowers102/jpg/image_00008.jpg', 'label_names': []}, {'class_ids': [34, 76, 8, 5, 9], 'scores': [0.67224, 0.31896, 0.00241, 0.00227, 0.00102], 'file_name': 'dataset/flowers102/jpg/image_00009.jpg', 'label_names': []}, {'class_ids': [76, 34, 69, 65, 66], 'scores': [0.95185, 0.01101, 0.00875, 0.00452, 0.00406], 'file_name': 'dataset/flowers102/jpg/image_00010.jpg', 'label_names': []}]
```
其中,列表的长度为batch_size的大小。
......@@ -25,36 +25,6 @@ tar -xf CIFAR100.tar
cd ../
```
#### 1.1.2 准备NUS-WIDE-SCENE
* 创建并进入`dataset/NUS-WIDE-SCENE`目录,下载并解压NUS-WIDE-SCENE数据集。
```shell
mkdir dataset/NUS-WIDE-SCENE
cd dataset/NUS-WIDE-SCENE
wget https://paddle-imagenet-models-name.bj.bcebos.com/data/NUS-SCENE-dataset.tar
tar -xf NUS-SCENE-dataset.tar
```
* 返回`PaddleClas`根目录
```
cd ../../
```
### 1.2 模型准备
通过下面的命令下载所需要的预训练模型。
```bash
mkdir pretrained
cd pretrained
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_ssld_pretrained.pdparams
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams
cd ../
```
## 二、模型训练
......@@ -69,8 +39,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o model_save_dir="output_CIFAR"
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.output_dir="output_CIFAR"
```
......@@ -86,8 +56,9 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/professional/ResNet50_vd_CIFAR100_finetune.yaml \
-o model_save_dir="output_CIFAR"
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.output_dir="output_CIFAR" \
-o Arch.pretrained=True
```
验证集最高准确率为0.718左右,加载预训练模型之后,CIFAR100数据集精度大幅提升,绝对精度涨幅30\%
......@@ -99,8 +70,10 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/professional/ResNet50_vd_ssld_CIFAR100_finetune.yaml \
-o model_save_dir="output_CIFAR"
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.output_dir="output_CIFAR" \
-o Arch.pretrained=True \
-o Arch.use_ssld=True
```
最终CIFAR100验证集上精度指标为0.73,相对于79.12\%预训练模型的微调结构,新数据集指标可以再次提升1.2\%
......@@ -112,31 +85,14 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/professional/MobileNetV3_large_x1_0_CIFAR100_finetune.yaml \
-o model_save_dir="output_CIFAR"
-c ./ppcls/configs/quick_start/professional/MobileNetV3_large_x1_0_CIFAR100_finetune.yaml \
-o Global.output_dir="output_CIFAR" \
-o Arch.pretrained=True
```
验证集最高准确率为0.601左右, 较ResNet50_vd低近12%。
### 2.2 多标签训练
* 基于ImageNet1k分类预训练模型进行微调NUS-WIDE-SCENE数据集,该是数据集NUS-WIDE的一个子集,类别数目为33类,图片总数是17463张,训练脚本如下所示。
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
-o model_save_dir="output_NUS-WIDE-SCENE"
```
训练10epoch之后,验证集最好的准确率应该在0.95左右。
* 零基础训练(不加载预训练模型)只需要将配置文件中的`pretrained_model`置为`""`即可。
## 三、数据增广
PaddleClas包含了很多数据增广的方法,如Mixup、Cutout、RandomErasing等,具体的方法可以参考[数据增广的章节](../advanced_tutorials/image_augmentation/ImageAugment.md)
......@@ -150,8 +106,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/professional/ResNet50_vd_mixup_CIFAR100_finetune.yaml \
-o model_save_dir="output_CIFAR"
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_mixup_CIFAR100_finetune.yaml \
-o Global.output_dir="output_CIFAR"
```
......@@ -161,7 +117,7 @@ python3 -m paddle.distributed.launch \
* **注意**
* 其他数据增广的配置文件可以参考`configs/DataAugment`中的配置文件。
* 其他数据增广的配置文件可以参考`ppcls/configs/DataAugment`中的配置文件。
* 训练CIFAR100的迭代轮数较少,因此进行训练时,验证集的精度指标可能会有1\%左右的波动。
......@@ -172,18 +128,42 @@ python3 -m paddle.distributed.launch \
PaddleClas包含了自研的SSLD知识蒸馏方案,具体的内容可以参考[知识蒸馏章节](../advanced_tutorials/distillation/distillation.md)本小节将尝试使用知识蒸馏技术对MobileNetV3_large_x1_0模型进行训练,使用`2.1.2小节`训练得到的ResNet50_vd模型作为蒸馏所用的教师模型,首先将`2.1.2小节`训练得到的ResNet50_vd模型保存到指定目录,脚本如下。
```shell
cp -r output_CIFAR/ResNet50_vd/best_model/ ./pretrained/CIFAR100_R50_vd_final/
mkdir pretrained
cp -r output_CIFAR/ResNet50_vd/best_model.pdparams ./pretrained/
```
配置文件中数据数量、模型结构、预训练地址以及训练的数据配置如下:
配置文件中模型名字、教师模型哈学生模型的配置、预训练地址配置以及freeze_params配置如下,其中freeze_params_list中的两个值分别代表教师模型和学生模型是否冻结参数训练。
```yaml
Arch:
name: "DistillationModel"
# if not null, its lengths should be same as models
pretrained_list:
# if not null, its lengths should be same as models
freeze_params_list:
- True
- False
models:
- Teacher:
name: ResNet50_vd
pretrained: "./pretrained/best_model"
- Student:
name: MobileNetV3_large_x1_0
pretrained: True
```
Loss配置如下,其中训练Loss是学生模型的输出和教师模型的输出的交叉熵、验证Loss是学生模型的输出和真实标签的交叉熵。
```yaml
total_images: 50000
ARCHITECTURE:
name: 'ResNet50_vd_distill_MobileNetV3_large_x1_0'
pretrained_model:
- "./pretrained/CIFAR100_R50_vd_final/ppcls"
- "./pretrained/MobileNetV3_large_x1_0_pretrained/”
Loss:
Train:
- DistillationCELoss:
weight: 1.0
model_name_pairs:
- ["Student", "Teacher"]
Eval:
- DistillationGTCELoss:
weight: 1.0
model_names: ["Student"]
```
最终的训练脚本如下所示。
......@@ -193,8 +173,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/professional/R50_vd_distill_MV3_large_x1_0_CIFAR100.yaml \
-o model_save_dir="output_CIFAR"
-c ./ppcls/configs/quick_start/professional/R50_vd_distill_MV3_large_x1_0_CIFAR100.yaml \
-o Global.output_dir="output_CIFAR"
```
......@@ -217,20 +197,19 @@ python3 -m paddle.distributed.launch \
```bash
python3 tools/eval.py \
-c ./configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o pretrained_model="./output_CIFAR/ResNet50_vd/best_model/ppcls"
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.pretrained_model="output_CIFAR/ResNet50_vd/best_model"
```
#### 5.1.2 单标签分类模型预测
模型训练完成之后,可以加载训练得到的预训练模型,进行模型预测。在模型库的 `tools/infer/infer.py` 中提供了完整的示例,只需执行下述命令即可完成模型预测:
模型训练完成之后,可以加载训练得到的预训练模型,进行模型预测。在模型库的 `tools/infer.py` 中提供了完整的示例,只需执行下述命令即可完成模型预测:
```python
python3 tools/infer/infer.py \
-i "./dataset/CIFAR100/test/0/0001.png" \
--model ResNet50_vd \
--pretrained_model "./output_CIFAR/ResNet50_vd/best_model/ppcls" \
--use_gpu True
python3 tools/infer.py \
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Infer.infer_imgs=./dataset/CIFAR100/test/0/0001.png \
-o Global.pretrained_model=output_CIFAR/ResNet50_vd/best_model
```
......@@ -241,53 +220,41 @@ python3 tools/infer/infer.py \
```bash
python3 tools/export_model.py \
--model ResNet50_vd \
--pretrained_model ./output_CIFAR/ResNet50_vd/best_model/ppcls \
--output_path ./inference \
--class_dim 100 \
--img_size 32
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.pretrained_model=output_CIFAR/ResNet50_vd/best_model
```
其中,参数`--model`用于指定模型名称,`--pretrained_model`用于指定模型文件路径,`--output_path`用于指定转换后模型的存储路径
* 默认会在`inference`文件夹下生成`inference.pdiparams``inference.pdmodel``inference.pdiparams.info`文件
* **注意**
* `--output_path`表示输出的inference模型文件夹路径,若`--output_path=./inference`,则会在`inference`文件夹下生成`inference.pdiparams``inference.pdmodel``inference.pdiparams.info`文件。
使用预测引擎进行推理:
* 可以通过设置参数`--img_size`指定模型输入图像的`shape`,默认为`224`,表示图像尺寸为`224*224`,请根据实际情况修改。
上述命令将生成模型结构文件(`inference.pdmodel`)和模型权重文件(`inference.pdiparams`),然后可以使用预测引擎进行推理:
进入deploy目录下:
```bash
python3 tools/infer/predict.py \
--image_file "./dataset/CIFAR100/test/0/0001.png" \
--model_file "./inference/inference.pdmodel" \
--params_file "./inference/inference.pdiparams" \
--use_gpu=True \
--use_tensorrt=False
cd deploy
```
更改inference_cls.yaml文件,由于训练CIFAR100采用的分辨率是32x32,所以需要改变相关的分辨率,最终配置文件中的图像预处理如下:
### 5.2 多标签分类模型评估与预测
#### 5.2.1 多标签分类模型评估
训练好模型之后,可以通过以下命令实现对模型精度的评估。
```bash
python3 tools/eval.py \
-c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
-o pretrained_model="./output_NUS-WIDE-SCENE/ResNet50_vd/best_model/ppcls"
```yaml
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 36
- CropImage:
size: 32
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
```
评估指标采用mAP,验证集的mAP应该在0.57左右。
#### 5.2.2 多标签分类模型预测
执行命令进行预测,由于默认class_id_map_file是ImageNet数据集的映射文件,所以此处需要置None。
```bash
python3 tools/infer/infer.py \
-i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
--model ResNet50_vd \
--pretrained_model "./output_NUS-WIDE-SCENE/ResNet50_vd/best_model/ppcls" \
--use_gpu True \
--multilabel True \
--class_num 33
python3 python/predict_cls.py \
-c configs/inference_cls.yaml \
-o Global.infer_imgs=../dataset/CIFAR100/test/0/0001.png \
-o PostProcess.class_id_map_file=None
```
# 图像识别快速开始
图像识别主要包含3个部分:主体检测得到检测框、识别提取特征、根据特征进行检索
本文档包含3个部分:环境配置、图像识别体验、未知类别的图像识别体验
## 1. 环境配置
如果图像类别已经存在于图像索引库中,那么可以直接参考[图像识别体验](#图像识别体验)章节,完成图像识别过程;如果希望识别未知类别的图像,即图像类别之前不存在于索引库中,那么可以参考[未知类别的图像识别体验](#未知类别的图像识别体验)章节,完成建立索引并识别的过程。
* 请先参考[快速安装](./installation.md)配置PaddleClas运行环境。
## 目录
* [1. 环境配置](#环境配置)
* [2. 图像识别体验](#图像识别体验)
* [2.1 下载、解压inference 模型与demo数据](#下载、解压inference_模型与demo数据)
* [2.2 Logo识别与检索](#Logo识别与检索)
* [2.2.1 识别单张图像](#识别单张图像)
* [2.2.2 基于文件夹的批量识别](#基于文件夹的批量识别)
* [3. 未知类别的图像识别体验](#未知类别的图像识别体验)
* [3.1 基于自己的数据集构建索引库](#基于自己的数据集构建索引库)
* [3.2 基于新的索引库的图像识别](#基于新的索引库的图像识别)
注意:
**本部分内容需要在`deploy`文件夹下运行,在PaddleClas代码的根目录下,可以通过以下方法进入该文件夹**
<a name="环境配置"></a>
## 1. 环境配置
```shell
cd deploy
```
* 安装:请先参考[快速安装](./installation.md)配置PaddleClas运行环境。
* 进入`deploy`运行目录。本部分所有内容与命令均需要在`deploy`目录下运行,可以通过下面的命令进入`deploy`目录。
## 2. inference 模型和数据下载
```
cd deploy
```
检测模型与4个方向(Logo、动漫人物、车辆、商品)的识别inference模型以及测试数据下载方法如下。。
<a name="图像识别体验"></a>
## 2. 图像识别体验
| 模型简介 | 推荐场景 | 测试数据地址 | inference模型 |
| ------------ | ------------- | ------- | -------- |
| 通用主体检测模型 | 通用场景 | - |[下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) |
| Logo识别模型 | Logo场景 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) |
| 动漫人物识别模型 | 动漫人物场景 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/cartoon_demo_data_v1.0.tar) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) |
| 车辆细分类模型 | 车辆场景 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/vehicle_demo_data_v1.0.tar) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) |
| 商品识别模型 | 商品场景 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar) | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_Inshop_v1.0_infer.tar) |
检测模型与4个方向(Logo、动漫人物、车辆、商品)的识别inference模型、测试数据下载地址以及对应的配置文件地址如下。
| 模型简介 | 推荐场景 | 测试数据地址 | inference模型 | 预测配置文件 | 构建索引库的配置文件 |
| ------------ | ------------- | ------- | -------- | ------- | -------- |
| 通用主体检测模型 | 通用场景 | - |[模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | - |
| Logo识别模型 | Logo场景 | [数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar) | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) | [build_logo.yaml](../../../deploy/configs/build_logo.yaml) |
| 动漫人物识别模型 | 动漫人物场景 | [数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/cartoon_demo_data_v1.0.tar) | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | [build_cartoon.yaml](../../../deploy/configs/build_cartoon.yaml) |
| 车辆细分类模型 | 车辆场景 | [数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/vehicle_demo_data_v1.0.tar) | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | [build_vehicle.yaml](../../../deploy/configs/build_vehicle.yaml) |
| 商品识别模型 | 商品场景 | [数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/product_demo_data_v1.0.tar) | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_Inshop_v1.0_infer.tar) | [inference_inshop.yaml](../../../deploy/configs/) | [build_inshop.yaml](../../../deploy/configs/build_inshop.yaml) |
**注意**:windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下
**注意**:windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下;linux或者macOS用户可以右键点击,然后复制下载链接,即可通过`wget`命令下载。
* 下载并解压数据与模型
* 可以按照下面的命令下载并解压数据与模型
```shell
mkdir dataset
cd dataset
# 下载demo数据并解压
wget {url/of/data} && tar -xf {name/of/data/package}
wget {数据下载链接地址} && tar -xf {压缩包的名称}
cd ..
mkdir models
cd models
# 下载识别inference模型并解压
wget {url/of/inference model} && tar -xf {name/of/inference model/package}
wget {模型下载链接地址} && tar -xf {压缩包的名称}
cd ..
```
### 2.1 下载通用检测模型
<a name="下载、解压inference_模型与demo数据"></a>
### 2.1 下载、解压inference 模型与demo数据
Logo识别为例,下载通用检测、识别模型以及Logo识别demo数据,命令如下。
```shell
mkdir models
cd models
# 下载通用检测inference模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
cd ..
```
### 2.1 Logo识别
以Logo识别demo为例,按照下面的命令下载demo数据与模型。
# 下载识别inference模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar
```shell
cd ..
mkdir dataset
cd dataset
# 下载demo数据并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/logo_demo_data_v1.0.tar && tar -xf logo_demo_data_v1.0.tar
cd ..
mkdir models
cd models
# 下载识别inference模型并解压
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar && tar -xf logo_rec_ResNet50_Logo3K_v1.0_infer.tar
cd ..
```
解压完毕后,`dataset`文件夹下应有如下文件结构:
......@@ -89,6 +94,8 @@ cd ..
├── ...
```
其中`data_file.txt`是用于构建索引库的图像列表文件,`gallery`文件夹中是所有用于构建索引库的图像原始文件,`index`文件夹中是构建索引库生成的索引文件,`query`是用来测试识别效果的demo图像。
`models`文件夹下应有如下文件结构:
```
......@@ -102,80 +109,102 @@ cd ..
│ └── inference.pdmodel
```
按照下面的方式可以完成对于图片的检索
<a name="Logo识别与检索"></a>
### 2.2 Logo识别与检索
以Logo识别demo为例,展示识别与检索过程(如果希望尝试其他方向的识别与检索效果,在下载解压好对应的demo数据与模型之后,替换对应的配置文件即可完成预测)。
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml
```
配置文件中,部分关键字段解释如下
<a name="识别单张图像"></a>
#### 2.2.1 识别单张图像
```yaml
Global:
infer_imgs: "./dataset/logo_demo_data_v1.0/query/" # 预测图像
det_inference_model_dir: "./models/ppyolov2_r50vd_dcn_mainbody_v1.0_infer/" # 检测inference模型文件夹
rec_inference_model_dir: "./models/logo_rec_ResNet50_Logo3K_v1.0_infer/" # 识别inference模型文件夹
batch_size: 1 # 预测的批大小
image_shape: [3, 640, 640] # 检测的图像尺寸
threshold: 0.5 # 检测的阈值,得分超过该阈值的检测框才会被检出
max_det_results: 1 # 用于图像识别的检测框数量,符合阈值条件的检测框中,根据得分,最多对其中的max_det_results个检测框做后续的识别
运行下面的命令,对图像`./dataset/logo_demo_data_v1.0/query/logo_auxx-1.jpg`进行识别与检索
# indexing engine config
IndexProcess:
index_path: "./dataset/logo_demo_data_v1.0/index/" # 索引文件夹,用于识别特征提取之后的索引
search_budget: 100
return_k: 5 # 从底库中反馈return_k个数量的最相似内容
dist_type: "IP"
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml
```
待检索图像如下所示。
<div align="center">
<img src="../../images/recognition/logo_demo/query/logo_auxx-1.jpg" width = "400" />
</div>
最终输出结果如下
最终输出结果如下。
```
[{'bbox': [25, 21, 483, 382], 'rec_docs': ['AKG', 'AKG', 'AKG', 'AKG', 'AKG'], 'rec_scores': array([2.32288337, 2.31903863, 2.28398442, 2.16804123, 2.10190272])}]
[{'bbox': [129, 219, 230, 253], 'rec_docs': ['auxx-2', 'auxx-1', 'auxx-2', 'auxx-1', 'auxx-2'], 'rec_scores': array([3.09635019, 3.09635019, 2.83965826, 2.83965826, 2.64057827])}]
```
其中bbox表示检测出的主体所在位置,rec_docs表示底库中与检出主体最相近的若干张图像对应的标签,rec_scores表示对应的相似度。
其中bbox表示检测出的主体所在位置,rec_docs表示索引库中与检出主体最相近的若干张图像对应的标签,rec_scores表示对应的相似度。由rec_docs字段可以看出,返回的若干个结果均为aux,识别正确。
如果希望预测文件夹内的图像,可以直接修改配置文件,也可以通过下面的`-o`参数修改对应的配置。
<a name="基于文件夹的批量识别"></a>
#### 2.2.2 基于文件夹的批量识别
如果希望预测文件夹内的图像,可以直接修改配置文件中的`Global.infer_imgs`字段,也可以通过下面的`-o`参数修改对应的配置。
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query"
```
如果希望在底库中新增图像,重新构建idnex,可以使用下面的命令重新构建index。
更多地,可以通过修改`Global.rec_inference_model_dir`字段来更改识别inference模型的路径,通过修改`IndexProcess.index_path`字段来更改索引库索引的路径。
<a name="未知类别的图像识别体验"></a>
## 3. 未知类别的图像识别体验
对图像`./dataset/logo_demo_data_v1.0/query/logo_cola.jpg`进行识别,命令如下
```shell
python3.7 python/build_gallery.py -c configs/build_logo.yaml
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg"
```
待检索图像如下所示。
<div align="center">
<img src="../../images/recognition/logo_demo/query/logo_cola.jpg" width = "400" />
</div>
输出结果如下
```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['Arcam', 'univox', 'univox', 'Arecont Vision', 'univox'], 'rec_scores': array([0.47730467, 0.47625482, 0.46496609, 0.46296868, 0.45239362])}]
```
其中index相关配置如下。
由于默认的索引库中不包含对应的索引信息,所以这里的识别结果有误,此时我们可以通过构建新的索引库的方式,完成未知类别的图像识别。
当索引库中的图像无法覆盖我们实际识别的场景时,即在预测未知类别的图像时,我们需要将对应类别的相似图像添加到索引库中,从而完成对未知类别的图像识别,这一过程是不需要重新训练的。
<a name="基于自己的数据集构建索引库"></a>
### 3.1 基于自己的数据集构建索引库
```yaml
# indexing engine config
IndexProcess:
index_path: "./dataset/logo_demo_data_v1.0/index/" # 保存的索引地址
image_root: "./dataset/logo_demo_data_v1.0/" # 图像的根目录
data_file: "./dataset/logo_demo_data_v1.0/data_file.txt" # 图像的数据list文本,每一行包含图像的文件名与标签信息
delimiter: "\t"
dist_type: "IP"
pq_size: 100
embedding_size: 512 # 特征维度
首先需要获取待入库的原始图像文件(保存在`./dataset/logo_demo_data_v1.0/gallery`文件夹中)以及对应的标签信息,记录原始图像文件的文件名与标签信息)保存在文本文件`./dataset/logo_demo_data_v1.0/data_file_update.txt`中)。
然后使用下面的命令构建index索引,加速识别后的检索过程。
```shell
python3.7 python/build_gallery.py -c configs/build_logo.yaml -o IndexProcess.data_file="./dataset/logo_demo_data_v1.0/data_file_update.txt" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
```
需要改动的内容为:
1. 在图像根目录下面添加对应的图像内容(也可以在其子文件夹下面,保证最终根目录与数据list文本中添加的文件名合并之后,图像存在即可)
2. 图像的数据list文本中添加图像新的内容,每行包含图像文件名以及对应的标签信息。
最终新的索引信息保存在文件夹`./dataset/logo_demo_data_v1.0/index_update`中。
### 2.2 其他任务的识别
<a name="基于新的索引库的图像识别"></a>
### 3.2 基于新的索引库的图像识别
如果希望尝试其他方向的识别与检索效果,在下载解压好对应的demo数据与模型之后,替换对应的配置文件即可完成预测
使用新的索引库,对上述图像进行识别,运行命令如下
```shell
python3.7 python/predict_system.py -c configs/inference_logo.yaml -o Global.infer_imgs="./dataset/logo_demo_data_v1.0/query/logo_cola.jpg" -o IndexProcess.index_path="./dataset/logo_demo_data_v1.0/index_update"
```
输出结果如下。
```
[{'bbox': [635, 0, 1382, 1043], 'rec_docs': ['coca cola', 'coca cola', 'coca cola', 'coca cola', 'coca cola'], 'rec_scores': array([0.57111013, 0.56019932, 0.55656564, 0.54122502, 0.48266801])}]
```
| 场景 | 预测配置文件 | 构建底库的配置文件 |
| ---- | ----- | ----- |
| 动漫人物 | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | [build_cartoon.yaml](../../../deploy/configs/build_cartoon.yaml) |
| 车辆 | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | [build_vehicle.yaml](../../../deploy/configs/build_vehicle.yaml) |
| 商品 | [inference_inshop.yaml](../../../deploy/configs/) | [build_inshop.yaml](../../../deploy/configs/build_inshop.yaml) |
识别结果正确。
mode: 'train'
ARCHITECTURE:
name: "AlexNet"
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.01
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0001
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DPN107'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 200
topk: 5
image_shape: [3, 224, 224]
use_mix: True
ls_epsilon: 0.1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
- MixupOperator:
alpha: 0.2
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DPN131'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 200
topk: 5
image_shape: [3, 224, 224]
use_mix: True
ls_epsilon: 0.1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
- MixupOperator:
alpha: 0.2
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DPN68'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 200
topk: 5
image_shape: [3, 224, 224]
use_mix: True
ls_epsilon: 0.1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
- MixupOperator:
alpha: 0.2
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DPN92'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 200
topk: 5
image_shape: [3, 224, 224]
use_mix: True
ls_epsilon: 0.1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
- MixupOperator:
alpha: 0.2
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DPN98'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 200
topk: 5
image_shape: [3, 224, 224]
use_mix: True
ls_epsilon: 0.1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
- MixupOperator:
alpha: 0.2
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: "DarkNet53"
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 200
topk: 5
image_shape: [3, 256, 256]
use_mix: True
ls_epsilon: 0.1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0001
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 256
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
- MixupOperator:
alpha: 0.2
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 256
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DeiT_base_distilled_patch16_224'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000100
TRAIN:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 248
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DeiT_base_distilled_patch16_384'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 384, 384]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000100
TRAIN:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 384
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 426
- CropImage:
size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DeiT_base_patch16_224'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000100
TRAIN:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 248
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DeiT_base_patch16_384'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 384, 384]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000100
TRAIN:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 384
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 426
- CropImage:
size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DeiT_small_distilled_patch16_224'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000100
TRAIN:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 248
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DeiT_small_patch16_224'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000100
TRAIN:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 248
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DeiT_tiny_distilled_patch16_224'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000100
TRAIN:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 248
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DeiT_tiny_patch16_224'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.000100
TRAIN:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 248
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DenseNet121'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DenseNet161'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DenseNet169'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DenseNet201'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'DenseNet264'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: "EfficientNetB0"
params:
padding_type : "SAME"
override_params:
drop_connect_rate: 0.1
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 360
topk: 5
image_shape: [3, 224, 224]
use_ema: True
ema_decay: 0.9999
use_aa: True
ls_epsilon: 0.1
LEARNING_RATE:
function: 'ExponentialWarmup'
params:
lr: 0.032
OPTIMIZER:
function: 'RMSProp'
params:
momentum: 0.9
rho: 0.9
epsilon: 0.001
regularizer:
function: 'L2'
factor: 0.00001
TRAIN:
batch_size: 512
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
interpolation: 2
- RandFlipImage:
flip_code: 1
- AutoAugment:
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 128
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
interpolation: 2
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'GhostNet_x0_5'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 360
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: 0.1
LEARNING_RATE:
function: 'CosineWarmup'
params:
lr: 0.8
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0000400
TRAIN:
batch_size: 2048
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'GhostNet_x1_0'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 360
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: 0.1
LEARNING_RATE:
function: 'CosineWarmup'
params:
lr: 0.4
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0000400
TRAIN:
batch_size: 1024
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'GhostNet_x1_3'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 360
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: 0.1
LEARNING_RATE:
function: 'CosineWarmup'
params:
lr: 0.4
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0000400
TRAIN:
batch_size: 1024
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- AutoAugment:
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'HRNet_W18_C'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'HRNet_W30_C'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'HRNet_W32_C'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'HRNet_W40_C'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'HRNet_W44_C'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'HRNet_W48_C'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'HRNet_W64_C'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 5
image_shape: [3, 224, 224]
use_mix: False
ls_epsilon: -1
LEARNING_RATE:
function: 'Piecewise'
params:
lr: 0.1
decay_epochs: [30, 60, 90]
gamma: 0.1
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: "GoogLeNet"
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 200
topk: 5
image_shape: [3, 224, 224]
LEARNING_RATE:
function: 'CosineWarmup'
params:
lr: 0.01
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.0001
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
VALID:
batch_size: 64
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mode: 'train'
ARCHITECTURE:
name: 'InceptionV3'
pretrained_model: ""
model_save_dir: "./output/"
classes_num: 1000
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 200
topk: 5
image_shape: [3, 299, 299]
use_mix: True
ls_epsilon: 0.1
LEARNING_RATE:
function: 'Cosine'
params:
lr: 0.045
OPTIMIZER:
function: 'Momentum'
params:
momentum: 0.9
regularizer:
function: 'L2'
factor: 0.00010
TRAIN:
batch_size: 256
num_workers: 4
file_list: "./dataset/ILSVRC2012/train_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 299
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
mix:
- MixupOperator:
alpha: 0.2
VALID:
batch_size: 16
num_workers: 4
file_list: "./dataset/ILSVRC2012/val_list.txt"
data_dir: "./dataset/ILSVRC2012/"
shuffle_seed: 0
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 320
- CropImage:
size: 299
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
此差异已折叠。
......@@ -96,7 +96,7 @@ DataLoader:
dataset:
name: LogoDataset
image_root: "dataset/LogoDet-3K-crop/val/"
cls_label_path: "LogoDet-3K-crop/LogoDet-3K+query.txt"
cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+query.txt"
transform_ops:
- DecodeImage:
to_rgb: True
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册