未验证 提交 eb60742f 编写于 作者: R Rosun 提交者: GitHub

add SemSegPaddle (#4192)

* add SemSegPaddle

* update README.md
上级 2b8f904e
# SemSegPaddle: A Paddle-based Framework for Deep Learning in Semantic Segmentation
This is a Paddle implementation of semantic segmentation models on multiple datasets, including Cityscapes, Pascal Context, and ADE20K.
## Updates
- [**2020/01/08**] We release ***PSPNet-ResNet101*** and ***GloRe-ResNet101*** models on Pascal Context and Cityscapes datasets.
## Highlights
Synchronized Batch Normlization is important for segmenation.
- The implementation is easy to use as it is pure-python, no any C++ extra extension libs.
- Paddle provides sync_batch_norm.
## Support models
We split our models into backbone and decoder network, where backbone network are transfered from classification networks.
Backbone:
- ResNet
- ResNeXt
- HRNet
- EfficientNet
Decoder:
- PSPNet: [Pyramid Scene Parsing Network](http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf)
- DeepLabv3: [Rethinking Atrous Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1706.05587)
- GloRe: [Graph-Based Global Reasoning Networks](http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Graph-Based_Global_Reasoning_Networks_CVPR_2019_paper.pdf)
- GINet: [GINet: Graph Interaction Netowrk for Scene Parsing]()
## Peformance
- Performance of Cityscapes validation set.
**Method** | **Backbone** | **lr** | **BatchSize** | **epoch** | **mean IoU (Single-scale)** | **Trained weights** |
------------|:------------:|:----------:|:--------------:|:------------:|:---------------------------:|------------------------|
PSPNet | resnet101 | 0.01 | 8 | 80 | 78.1 | [pspnet_resnet_cityscapes_epoch_80.pdparams](https://pan.baidu.com/s/1adfvtq2JnLKRv_j7lOmW1A)|
GloRe | resnet101 | 0.01 | 8 | 80 | 78.4 | [pspnet_resnet_pascalcontext_epoch_80.pdparams](https://pan.baidu.com/s/1r4SbrYKbVk38c0dXZLAi9w) |
- Performance of Pascal-context validation set.
**Method** | **Backbone** | **lr** | **BatchSize** | **epoch** | **mean IoU (Single-scale)** | **Trained weights** |
------------|:------------:|:----------:|:--------------:|:------------:|:---------------------------:|:----------------------:|
PSPNet | resnet101 | 0.005 | 16 | 80 | 48.9 | [glore_resnet_cityscapes_epoch_80.pdparams](https://pan.baidu.com/s/1l7-sqt2DsUunD9l4YivgQw) |
GloRe | resnet101 | 0.005 | 16 | 80 | 48.4 | [glore_resnet_pascalcontext_epoch_80.pdparams](https://pan.baidu.com/s/1rVuk7OfSj-AXR3ZCFGNmKg) |
## Environment
This repo is developed under the following configurations:
- Hardware: 4 GPUs for training, 1 GPU for testing
- Software: Centos 6.10, ***CUDA>=9.2 Python>=3.6, Paddle>=1.6***
## Quick start: training and testing models
### 1. Preparing data
Download the [Cityscapes](https://www.cityscapes-dataset.com/) dataset. It should have this basic structure:
cityscapes/
├── cityscapes_list
│ ├── test.lst
│ ├── train.lst
│ ├── train+.lst
│ ├── train++.lst
│ ├── trainval.lst
│ └── val.lst
├── gtFine
│ ├── test
│ ├── train
│ └── val
├── leftImg8bit
│ ├── test
│ ├── train
│ └── val
├── license.txt
└── README
Download Pascal-Context dataset. It should have this basic structure:
pascalContext/
├── GroundTruth_trainval_mat
├── GroundTruth_trainval_png
├── JPEGImages
├── pascal_context_train.txt
├── pascal_context_val.txt
├── README.md
└── VOCdevkit
Then, create symlinks for the Cityscapes and Pascal-Context datasets
```
cd SemSegPaddle/data
ln -s $cityscapes ./
ln -s $pascalContext ./
```
### 2. Download pretrained weights
Downlaod pretrained [resnet-101](https://pan.baidu.com/s/1niXBDZnLlUIulB7FY068DQ) weights file, and put it into the directory: ***./pretrained_model***
Then, run the following command:
```
tar -zxvf ./repretrained/resnet101_v2.tgz -C pretrained_model
```
### 3. Training
select confiure file for training according to the DECODER\_NAME, BACKBONE\_NAME and DATASET\_NAME.
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch train.py --use_gpu --use_mpio \
--cfg ./configs/pspnet_res101_cityscapes.yaml
```
### 4. Testing
select confiure file for testing according to the DECODER\_NAME, BACKBONE\_NAME and DATASET\_NAME.
Single-scale testing:
```
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--use_mpio \
--cfg ./configs/pspnet_res101_cityscapes.yaml
```
Multi-scale testing:
```
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--use_mpio \
--multi_scales \
--cfg ./configs/pspnet_res101_cityscapes.yaml
```
## Contact
If you have any questions regarding the repo, please create an issue.
DATAAUG:
RAND_SCALE_MIN: 0.75
RAND_SCALE_MAX: 2.0
BASE_SIZE: 1024
CROP_SIZE: 769
EXTRA: True
TRAIN_BATCH_SIZE_PER_GPU: 2
NUM_TRAINERS: 4
EVAL_BATCH_SIZE: 1
DATASET:
DATASET_NAME: "cityscapes"
DATA_DIR: "./data/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "./data/cityscapes/cityscapes_list/test.lst"
TRAIN_FILE_LIST: "./data/cityscapes/cityscapes_list/train.lst"
VAL_FILE_LIST: "./data/cityscapes/cityscapes_list/val.lst"
IGNORE_INDEX: 255
DATA_DIM: 3
MODEL:
MODEL_NAME: "deeplabv3"
DEFAULT_NORM_TYPE: "bn"
MULTI_LOSS_WEIGHT: [1.0,0.4]
BACKBONE: "resnet"
BACKBONE_LAYERS: 101
BACKBONE_MULTI_GRID: True
DEEPLABv3:
DEPTH_MULTIPLIER: 1
ASPP_WITH_SEP_CONV: True
AuxHead: True
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
MODEL_SAVE_DIR: "./snapshots/deeplabv3_resnet_cityscapes/"
SNAPSHOT_EPOCH: 1
TEST:
TEST_MODEL: "./snapshots/deeplabv3_resnet_cityscapes"
BASE_SIZE: 2048
CROP_SIZE: 769
SLIDE_WINDOW: True
SOLVER:
LR: 0.01
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 80
LOSS: "['softmax_loss']"
DATAAUG:
RAND_SCALE_MIN: 0.5
RAND_SCALE_MAX: 2.0
BASE_SIZE: 520
CROP_SIZE: 520
EXTRA: True
TRAIN_BATCH_SIZE_PER_GPU: 4
NUM_TRAINERS: 4
EVAL_BATCH_SIZE: 1
DATASET:
DATASET_NAME: "pascalContext"
DATA_DIR: "./data/pascalContext/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 59
TEST_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
TRAIN_FILE_LIST: "./data/pascalContext/pascal_context_train.txt"
VAL_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
IGNORE_INDEX: -1
DATA_DIM: 3
SEPARATOR: ' '
MODEL:
MODEL_NAME: "deeplabv3"
DEFAULT_NORM_TYPE: "bn"
MULTI_LOSS_WEIGHT: [1.0,0.4]
BACKBONE: "resnet"
BACKBONE_LAYERS: 101
BACKBONE_MULTI_GRID: True
DEEPLABv3:
DEPTH_MULTIPLIER: 1
ASPP_WITH_SEP_CONV: True
AuxHead: True
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
MODEL_SAVE_DIR: "./snapshots/deeplabv3_resnet_pascalcontext/"
SNAPSHOT_EPOCH: 1
TEST:
TEST_MODEL: "./snapshots/deeplabv3_resnet_pascalcontext"
BASE_SIZE: 520
CROP_SIZE: 520
SLIDE_WINDOW: True
SOLVER:
LR: 0.005
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 80
LOSS: "['softmax_loss']"
DATAAUG:
RAND_SCALE_MIN: 0.5
RAND_SCALE_MAX: 2.0
BASE_SIZE: 1024
CROP_SIZE: 769
EXTRA: True
TRAIN_BATCH_SIZE_PER_GPU: 2
NUM_TRAINERS: 4
EVAL_BATCH_SIZE: 1
DATASET:
DATASET_NAME: "cityscapes"
DATA_DIR: "./data/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "./data/cityscapes/cityscapes_list/test.lst"
TRAIN_FILE_LIST: "./data/cityscapes/cityscapes_list/train.lst"
VAL_FILE_LIST: "./data/cityscapes/cityscapes_list/val.lst"
IGNORE_INDEX: 255
DATA_DIM: 3
MODEL:
MODEL_NAME: "glore"
DEFAULT_NORM_TYPE: "bn"
MULTI_LOSS_WEIGHT: [1.0, 0.4]
BACKBONE: "resnet"
BACKBONE_LAYERS: 101
BACKBONE_MULTI_GRID: True
GLORE:
DEPTH_MULTIPLIER: 1
AuxHead: True
TRAIN:
MODEL_SAVE_DIR: "snapshots/glore_res101_cityscapes/"
PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
SNAPSHOT_EPOCH: 1
TEST:
TEST_MODEL: "snapshots/glore_res101_cityscapes"
BASE_SIZE: 2048
CROP_SIZE: 769
SLIDE_WINDOW: True
SOLVER:
LR: 0.01
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 80
LOSS: "['softmax_loss']"
DATAAUG:
RAND_SCALE_MIN: 0.5
RAND_SCALE_MAX: 2.0
BASE_SIZE: 520
CROP_SIZE: 520
EXTRA: True
TRAIN_BATCH_SIZE_PER_GPU: 4
NUM_TRAINERS: 4
EVAL_BATCH_SIZE: 1
DATASET:
DATASET_NAME: "pascalContext"
DATA_DIR: "./data/pascalContext/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 59
TEST_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
TRAIN_FILE_LIST: "./data/pascalContext/pascal_context_train.txt"
VAL_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
IGNORE_INDEX: -1
DATA_DIM: 3
SEPARATOR: ' '
MODEL:
MODEL_NAME: "glore"
DEFAULT_NORM_TYPE: "bn"
MULTI_LOSS_WEIGHT: [1.0,0.4]
BACKBONE: "resnet"
BACKBONE_LAYERS: 101
BACKBONE_MULTI_GRID: True
GLORE:
DEPTH_MULTIPLIER: 1
AuxHead: True
TEST:
TEST_MODEL: "snapshots/glore_res101_pascalContext"
BASE_SIZE: 520
CROP_SIZE: 520
SLIDE_WINDOW: True
TRAIN:
MODEL_SAVE_DIR: "snapshots/glore_res101_pascalContext/"
PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
SNAPSHOT_EPOCH: 1
SOLVER:
LR: 0.005
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 80
LOSS: "['softmax_loss']"
DATAAUG:
RAND_SCALE_MIN: 0.75
RAND_SCALE_MAX: 2.0
BASE_SIZE: 2048
CROP_SIZE: 769
SLIDE_WINDOW: True
TRAIN_BATCH_SIZE_PER_GPU: 2
EVAL_BATCH_SIZE: 1
NUM_TRAINERS: 4
DATASET:
DATASET_NAME: "cityscapes"
DATA_DIR: "./data/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "./data/cityscapes/cityscapes_list/test.lst"
TRAIN_FILE_LIST: "./data/cityscapes/cityscapes_list/train.lst"
VAL_FILE_LIST: "./data/cityscapes/cityscapes_list/val.lst"
IGNORE_INDEX: 255
DATA_DIM: 3
MODEL:
MODEL_NAME: "pspnet"
DEFAULT_NORM_TYPE: "bn"
MULTI_LOSS_WEIGHT: [1.0,]
BACKBONE: "hrnet"
PSPNET:
DEPTH_MULTIPLIER: 1
AuxHead: False
TRAIN:
MODEL_SAVE_DIR: "snapshots/pspnet_hrnet_cityscapes/"
PRETRAINED_MODEL_DIR: "./pretrained_model/HRNet_W40_C_pretrained/"
SNAPSHOT_EPOCH: 1
TEST:
TEST_MODEL: "snapshots/pspnet_hrnet_cityscapes"
BASE_SIZE: 2048
CROP_SIZE: 769
SLIDE_WINDOW: True
SOLVER:
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 240
LOSS: "['softmax_loss']"
DATAAUG:
RAND_SCALE_MIN: 0.5
RAND_SCALE_MAX: 2.0
BASE_SIZE: 520
CROP_SIZE: 520
EXTRA: True
TRAIN_BATCH_SIZE_PER_GPU: 2
NUM_TRAINERS: 4
EVAL_BATCH_SIZE: 1
DATASET:
DATASET_NAME: "ade"
DATA_DIR: "./data/ade/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 150
TEST_FILE_LIST: "./data/ade/ade_val.lst"
TRAIN_FILE_LIST: "./data/ade/ade_train.lst"
VAL_FILE_LIST: "./data/ade/ade_val.lst"
IGNORE_INDEX: -1
DATA_DIM: 3
MODEL:
MODEL_NAME: "pspnet"
DEFAULT_NORM_TYPE: "bn"
MULTI_LOSS_WEIGHT: [1.0, 0.4]
BACKBONE: "resnet"
BACKBONE_LAYERS: 101
BACKBONE_MULTI_GRID: True
PSPNET:
DEPTH_MULTIPLIER: 1
AuxHead: True
TEST:
TEST_MODEL: "snapshots/pspnet_res101_ade/"
BASE_SIZE: 520
CROP_SIZE: 520
SLIDE_WINDOW: True
TRAIN:
MODEL_SAVE_DIR: "snapshots/pspnet_res101_ade/"
PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
SNAPSHOT_EPOCH: 10
SOLVER:
LR: 0.01
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 120
LOSS: "['softmax_loss']"
DATAAUG:
RAND_SCALE_MIN: 0.5
RAND_SCALE_MAX: 2.0
BASE_SIZE: 1024
CROP_SIZE: 769
EXTRA: True
TRAIN_BATCH_SIZE_PER_GPU: 2
NUM_TRAINERS: 4
EVAL_BATCH_SIZE: 1
DATASET:
DATASET_NAME: "cityscapes"
DATA_DIR: "./data/cityscapes/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 19
TEST_FILE_LIST: "./data/cityscapes/cityscapes_list/test.lst"
TRAIN_FILE_LIST: "./data/cityscapes/cityscapes_list/train.lst"
VAL_FILE_LIST: "./data/cityscapes/cityscapes_list/val.lst"
IGNORE_INDEX: 255
DATA_DIM: 3
MODEL:
MODEL_NAME: "pspnet"
DEFAULT_NORM_TYPE: "bn"
MULTI_LOSS_WEIGHT: [1.0, 0.4]
BACKBONE: "resnet"
BACKBONE_LAYERS: 101
BACKBONE_MULTI_GRID: True
PSPNET:
DEPTH_MULTIPLIER: 1
AuxHead: True
TRAIN:
MODEL_SAVE_DIR: "snapshots/pspnet_res101_cityscapes/"
PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
SNAPSHOT_EPOCH: 1
TEST:
TEST_MODEL: "snapshots/pspnet_res101_cityscapes"
BASE_SIZE: 2048
CROP_SIZE: 769
SLIDE_WINDOW: True
SOLVER:
LR: 0.01
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 80
LOSS: "['softmax_loss']"
DATAAUG:
RAND_SCALE_MIN: 0.5
RAND_SCALE_MAX: 2.0
BASE_SIZE: 520
CROP_SIZE: 520
EXTRA: True
TRAIN_BATCH_SIZE_PER_GPU: 4
NUM_TRAINERS: 4
EVAL_BATCH_SIZE: 1
DATASET:
DATASET_NAME: "pascalContext"
DATA_DIR: "./data/pascalContext/"
IMAGE_TYPE: "rgb" # choice rgb or rgba
NUM_CLASSES: 59
TEST_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
TRAIN_FILE_LIST: "./data/pascalContext/pascal_context_train.txt"
VAL_FILE_LIST: "./data/pascalContext/pascal_context_val.txt"
IGNORE_INDEX: -1
DATA_DIM: 3
SEPARATOR: ' '
MODEL:
MODEL_NAME: "pspnet"
DEFAULT_NORM_TYPE: "bn"
MULTI_LOSS_WEIGHT: [1.0,0.4]
BACKBONE: "resnet"
BACKBONE_LAYERS: 101
BACKBONE_MULTI_GRID: True
PSPNET:
DEPTH_MULTIPLIER: 1
AuxHead: True
TEST:
TEST_MODEL: "snapshots/pspnet_res101_pascalContext"
BASE_SIZE: 520
CROP_SIZE: 520
SLIDE_WINDOW: True
TRAIN:
MODEL_SAVE_DIR: "snapshots/pspnet_res101_pascalContext/"
PRETRAINED_MODEL_DIR: "./pretrained_model/resnet101_v2/"
SNAPSHOT_EPOCH: 1
SOLVER:
LR: 0.005
LR_POLICY: "poly"
OPTIMIZER: "sgd"
NUM_EPOCHS: 80
LOSS: "['softmax_loss']"
please create symlinks for datasets
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
# GPU memory garbage collection optimization flags
os.environ['FLAGS_eager_delete_tensor_gb'] = "0.0"
import sys
import time
import argparse
import functools
import pprint
import cv2
import numpy as np
import paddle
import paddle.fluid as fluid
import math
from src.utils.config import cfg
from src.utils.timer import Timer, calculate_eta
from src.models.model_builder import build_model
from src.models.model_builder import ModelPhase
from src.datasets import build_dataset
from src.utils.metrics import ConfusionMatrix
def parse_args():
parser = argparse.ArgumentParser(description='SemsegPaddle')
parser.add_argument(
'--cfg',
dest='cfg_file',
help='Config file for training (and optionally testing)',
default=None,
type=str)
parser.add_argument(
'--use_gpu',
dest='use_gpu',
help='Use gpu or cpu',
action='store_true',
default=False)
parser.add_argument(
'--use_mpio',
dest='use_mpio',
help='Use multiprocess IO or not',
action='store_true',
default=False)
parser.add_argument(
'opts',
help='See utils/config.py for all options',
default=None,
nargs=argparse.REMAINDER)
parser.add_argument(
'--multi_scales',
dest='multi_scales',
help='Use multi_scales for eval',
action='store_true',
default=False)
parser.add_argument(
'--flip',
dest='flip',
help='flip the image or not',
action='store_true',
default=False)
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def evaluate(cfg, ckpt_dir=None, use_gpu=False, use_mpio=False, multi_scales=False, flip=False, **kwargs):
np.set_printoptions(precision=5, suppress=True)
num_classes = cfg.DATASET.NUM_CLASSES
base_size = cfg.TEST.BASE_SIZE
crop_size = cfg.TEST.CROP_SIZE
startup_prog = fluid.Program()
test_prog = fluid.Program()
dataset = build_dataset(cfg.DATASET.DATASET_NAME,
file_list=cfg.DATASET.VAL_FILE_LIST,
mode=ModelPhase.EVAL,
data_dir=cfg.DATASET.DATA_DIR)
def data_generator():
#TODO: check is batch reader compatitable with Windows
if use_mpio:
data_gen = dataset.multiprocess_generator(
num_processes=cfg.DATALOADER.NUM_WORKERS,
max_queue_size=cfg.DATALOADER.BUF_SIZE)
else:
data_gen = dataset.generator()
for b in data_gen:
yield b[0], b[1], b[2]
py_reader, avg_loss, out, grts, masks = build_model(
test_prog, startup_prog, phase=ModelPhase.EVAL)
py_reader.decorate_sample_generator(
data_generator, drop_last=False, batch_size=cfg.EVAL_BATCH_SIZE, places=fluid.cuda_places())
# Get device environment
places = fluid.cuda_places() if use_gpu else fluid.cpu_places()
place = places[0]
dev_count = len(places)
print("#Device count: {}".format(dev_count))
exe = fluid.Executor(place)
exe.run(startup_prog)
test_prog = test_prog.clone(for_test=True)
ckpt_dir = cfg.TEST.TEST_MODEL if not ckpt_dir else ckpt_dir
if ckpt_dir is not None:
filename= '{}_{}_{}_epoch_{}.pdparams'.format(str(cfg.MODEL.MODEL_NAME),
str(cfg.MODEL.BACKBONE), str(cfg.DATASET.DATASET_NAME), cfg.SOLVER.NUM_EPOCHS)
print("loading testing model file: {}/{}".format(ckpt_dir, filename))
fluid.io.load_params(exe, ckpt_dir, main_program=test_prog, filename=filename)
# Use streaming confusion matrix to calculate mean_iou
np.set_printoptions(
precision=4, suppress=True, linewidth=160, floatmode="fixed")
conf_mat = ConfusionMatrix(cfg.DATASET.NUM_CLASSES, streaming=True)
#fetch_list: return of the model
fetch_list = [avg_loss.name, out.name]
num_images = 0
step = 0
all_step = cfg.DATASET.VAL_TOTAL_IMAGES // cfg.EVAL_BATCH_SIZE
timer = Timer()
timer.start()
for data in py_reader():
mask = np.array(data[0]['mask'])
label = np.array(data[0]['label'])
image_org = np.array(data[0]['image'])
image = np.transpose(image_org, (0, 2, 3, 1)) # BCHW->BHWC
image = np.squeeze(image)
if cfg.TEST.SLIDE_WINDOW:
if not multi_scales:
scales = [1.0]
else:
scales = [0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 2.25] if cfg.DATASET.DATASET_NAME == 'cityscapes' else [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]
#scales = [0.75, 1.0, 1.25] # fast multi-scale testing
#strides
stride = int(crop_size *1.0 / 3) # 1/3 > 2/3 > 1/2 for input_size: 769 x 769
h, w = image.shape[0:2]
scores = np.zeros(shape=[num_classes, h, w], dtype='float32')
for scale in scales:
long_size = int(math.ceil(base_size * scale))
if h > w:
height = long_size
width = int(1.0 * w * long_size / h + 0.5)
short_size = width
else:
width = long_size
height = int(1.0 * h * long_size / w + 0.5)
short_size = height
# print('org_img_size: {}x{}, rescale_img_size: {}x{}'.format(h, w, height, width))
cur_img = image_resize(image, height, width)
# pading
if long_size <= crop_size:
pad_img = pad_single_image(cur_img, crop_size)
label_feed, mask_feed = get_feed(pad_img)
pad_img = mapper_image(pad_img)
loss, pred1 = exe.run(
test_prog,
feed={'image':pad_img, 'label':label_feed, 'mask':mask_feed},
fetch_list = fetch_list,
return_numpy=True)
pred1 = np.array(pred1)
outputs = pred1[:, :, :height, :width]
if flip:
pad_img_flip = flip_left_right_image(cur_img)
pad_img_flip = pad_single_image(pad_img_flip, crop_size)
label_feed, mask_feed = get_feed(pad_img_flip)
pad_img_flip = mapper_image(pad_img_flip)
loss, pred1 = exe.run(
test_prog,
feed={'image':pad_img_flip, 'label':label_feed, 'mask':mask_feed},
fetch_list = fetch_list,
return_numpy=True)
pred1 = np.flip(pred1, 3)
outputs += pred1[:, :, :height, :width]
else:
if short_size < crop_size:
pad_img = pad_single_image(cur_img, crop_size)
else:
pad_img = cur_img
ph, pw = pad_img.shape[0:2]
#slid window
h_grids = int(math.ceil(1.0 * (ph - crop_size) / stride)) + 1
w_grids = int(math.ceil(1.0 * (pw - crop_size) / stride)) + 1
outputs = np.zeros(shape=[1, num_classes, ph, pw], dtype='float32')
count_norm = np.zeros(shape=[1, 1, ph, pw], dtype='int32')
for idh in range(h_grids):
for idw in range(w_grids):
h0 = idh * stride
w0 = idw * stride
h1 = min(h0 + crop_size, ph)
w1 = min(w0 + crop_size, pw)
#print('(h0,w0,h1,w1):({},{},{},{})'.format(h0, w0, h1, w1))
crop_img = crop_image(pad_img, h0, w0, h1, w1)
pad_crop_img = pad_single_image(crop_img, crop_size)
label_feed, mask_feed = get_feed(pad_crop_img)
pad_crop_img = mapper_image(pad_crop_img)
loss, pred1 = exe.run(
test_prog,
feed={'image':pad_crop_img, 'label':label_feed, 'mask':mask_feed},
fetch_list = fetch_list,
return_numpy=True)
pred1 = np.array(pred1)
outputs[:, :, h0:h1, w0:w1] += pred1[:, :, 0:h1-h0, 0:w1-w0]
count_norm[:, :, h0:h1, w0:w1] += 1
if flip:
pad_img_flip = flip_left_right_image(crop_img)
pad_img_flip = pad_single_image(pad_img_flip, crop_size)
label_feed, mask_feed = get_feed(pad_img_flip)
pad_img_flip = mapper_image(pad_img_flip)
loss, pred1 = exe.run(
test_prog,
feed={'image':pad_img_flip, 'label':label_feed, 'mask':mask_feed},
fetch_list = fetch_list,
return_numpy = True)
pred1 = np.flip(pred1, 3)
outputs[:, :, h0:h1, w0:w1] += pred1[:, :, 0:h1-h0, 0:w1-w0]
count_norm[:, :, h0:h1, w0:w1] += 1
outputs = 1.0 * outputs / count_norm
outputs = outputs[:, :, :height, :width]
with fluid.dygraph.guard():
outputs = fluid.dygraph.to_variable(outputs)
outputs = fluid.layers.resize_bilinear(outputs, out_shape=[h, w])
score = outputs.numpy()[0]
scores += score
else:
# taking the original image as the model input
loss, pred = exe.run(
test_prog,
feed={'image':image_org, 'label':label, 'mask':mask},
fetch_list = fetch_list,
return_numpy = True)
scores = pred[0]
# computing IoU with all scale result
pred = np.argmax(scores, axis=0).astype('int64')
pred = pred[np.newaxis, :, :, np.newaxis]
step += 1
num_images += pred.shape[0]
conf_mat.calculate(pred, label, mask)
_, iou = conf_mat.mean_iou()
_, acc = conf_mat.accuracy()
print("[EVAL] step={}/{} acc={:.4f} IoU={:.4f}".format(step, all_step, acc, iou))
category_iou, avg_iou = conf_mat.mean_iou()
category_acc, avg_acc = conf_mat.accuracy()
print("[EVAL] #image={} acc={:.4f} IoU={:.4f}".format(num_images, avg_acc, avg_iou))
print("[EVAL] Category IoU:", category_iou)
print("[EVAL] Category Acc:", category_acc)
print("[EVAL] Kappa:{:.4f}".format(conf_mat.kappa()))
print("flip = ", flip)
print("scales = ", scales)
return category_iou, avg_iou, category_acc, avg_acc
def image_resize(image, height, width):
if image.shape[0] == 3:
image = np.transpose(image, (1, 2, 0))
image = cv2.resize(image, (width, height), interpolation=cv2.INTER_LINEAR)
return image
def pad_single_image(image, crop_size):
h, w = image.shape[0:2]
pad_h = crop_size - h if h < crop_size else 0
pad_w = crop_size - w if w < crop_size else 0
image = cv2.copyMakeBorder(image, 0, pad_h, 0, pad_w, cv2.BORDER_CONSTANT,value=0)
return image
def mapper_image(image):
# HxWx3 -> 3xHxW -> 1x3xHxW
image_array = np.transpose(image, (2, 0, 1))
image_array = image_array.astype('float32')
image_array = image_array[np.newaxis, :]
return image_array
def flip_left_right_image(image):
return cv2.flip(image, 1)
def get_feed(image):
h, w = image.shape[0:2]
return np.zeros([1, 1, h, w], dtype='int32'), np.zeros([1, 1, h, w], dtype='int32')
def crop_image(image, h0, w0, h1, w1):
return image[h0:h1, w0:w1, :]
def main():
args = parse_args()
if args.cfg_file is not None:
cfg.update_from_file(args.cfg_file)
if args.opts:
cfg.update_from_list(args.opts)
cfg.check_and_infer()
print(pprint.pformat(cfg))
evaluate(cfg, **args.__dict__)
if __name__ == '__main__':
main()
#!/bin/bash
# Deeplabv3_Res101_Cityscapes
# 1.1 Training
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --use_gpu \
--use_mpio \
--cfg ./configs/deeplabv3_res101_cityscapes.yaml | tee -a train.log 2>&1
# 1.2 single-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--cfg ./configs/deeplabv3_res101_cityscapes.yaml
# 1.3 multi-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--multi_scales \
--cfg ./configs/deeplabv3_res101_cityscapes.yaml
#!/bin/bash
# Deeplabv3_Res101_PascalContext
# 1.1 Training
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --use_gpu \
--use_mpio \
--cfg ./configs/deeplabv3_res101_pascalcontext.yaml | tee -a train.log 2>&1
# 1.2 single-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--cfg ./configs/deeplabv3_res101_pascalcontext.yaml
# 1.3 multi-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--multi_scales \
--cfg ./configs/deeplabv3_res101_pascalcontext.yaml
#!/bin/bash
# GloRe_Res101_Cityscapes
# 1.1 Training
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --use_gpu \
--use_mpio \
--cfg ./configs/glore_res101_cityscapes.yaml | tee -a train.log 2>&1
# 1.2 single-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--cfg ./configs/glore_res101_cityscapes.yaml
# 1.3 multi-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--multi_scales \
--cfg ./configs/glore_res101_cityscapes.yaml
#!/bin/bash
# GloRe_Res101_PascalContext
:<<!
# 1.1 Training
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --use_gpu \
--use_mpio \
--cfg ./configs/glore_res101_pascalcontext.yaml | tee -a train.log 2>&1
!
# 1.2 single-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--cfg ./configs/glore_res101_pascalcontext.yaml
:<<!
# 1.3 multi-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--multi_scales \
--cfg ./configs/glore_res101_pascalcontext.yaml
!
#!/bin/bash
#PSPNet_Res101_Cityscapes
# 1.1 training
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --use_gpu \
--cfg ./configs/pspnet_res101_cityscapes.yaml | tee -a train.log 2>&1
# 1.2 single-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--cfg ./configs/pspnet_res101_cityscapes.yaml
# 1.3 multi-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--multi_scales \
--cfg ./configs/pspnet_res101_cityscapes.yaml
#!/bin/bash
#PSPNet_Res101_PascalContext
# 1.1 training
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --use_gpu \
--cfg ./configs/pspnet_res101_pascalcontext.yaml | tee -a train.log 2>&1
# 1.2 single-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--cfg ./configs/pspnet_res101_pascalcontext.yaml
# 1.3 multi-scale testing
CUDA_VISIBLE_DEVICES=0 python eval.py --use_gpu \
--multi_scales \
--cfg ./configs/pspnet_res101_pascalcontext.yaml
please put the pretrained weights of backbone here
from . import datasets, models, utils
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .cityscapes import CityscapesSeg
from .pascal_context import PascalContextSeg
from .ade import AdeSeg
datasets ={
'cityscapes': CityscapesSeg,
'pascalcontext': PascalContextSeg,
'adechallengedata2016': AdeSeg,
}
def build_dataset(name, **kwargs):
return datasets[name.lower()](**kwargs)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import sys
import os
import math
import random
import functools
import io
import time
import codecs
import numpy as np
import paddle
import paddle.fluid as fluid
import cv2
from PIL import Image
import copy
from src.utils.config import cfg
from src.models.model_builder import ModelPhase
from .baseseg import BaseSeg
class AdeSeg(BaseSeg):
def __init__(self,
file_list,
data_dir,
shuffle=False,
mode=ModelPhase.TRAIN, base_size=520, crop_size=520, rand_scale=True):
super(AdeSeg, self).__init__(file_list, data_dir, shuffle, mode, base_size, crop_size, rand_scale)
def _mask_transform(self, mask):
target = np.array(mask).astype('int32') - 1
return target
def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
# original image cv2.imread flag setting
cv2_imread_flag = cv2.IMREAD_COLOR
if cfg.DATASET.IMAGE_TYPE == "rgba":
# If use RBGA 4 channel ImageType, use IMREAD_UNCHANGED flags to
# reserver alpha channel
cv2_imread_flag = cv2.IMREAD_UNCHANGED
#print("line: ", line)
parts = line.strip().split(cfg.DATASET.SEPARATOR)
if len(parts) != 2:
if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
raise Exception("File list format incorrect! It should be"
" image_name{}label_name\\n".format(
cfg.DATASET.SEPARATOR))
img_name, grt_name = parts[0], None
else:
img_name, grt_name = parts[0], parts[1]
img_path = os.path.join(src_dir, img_name)
img = self.cv2_imread(img_path, cv2_imread_flag)
if grt_name is not None:
grt_path = os.path.join(src_dir, grt_name)
grt = self.pil_imread(grt_path)
else:
grt = None
if img is None:
raise Exception(
"Empty image, src_dir: {}, img: {} & lab: {}".format(
src_dir, img_path, grt_path))
img_height = img.shape[0]
img_width = img.shape[1]
#print('img.shape',img.shape)
if grt is not None:
grt_height = grt.shape[0]
grt_width = grt.shape[1]
if img_height != grt_height or img_width != grt_width:
raise Exception(
"source img and label img must has the same size")
else:
if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
raise Exception(
"Empty image, src_dir: {}, img: {} & lab: {}".format(
src_dir, img_path, grt_path))
if len(img.shape) < 3:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
grt = self._mask_transform(grt)
return img, grt, img_name, grt_name
from __future__ import print_function
import sys
import os
import math
import random
import functools
import io
import time
import codecs
import numpy as np
import paddle
import paddle.fluid as fluid
import cv2
import copy
from PIL import Image, ImageOps, ImageFilter, ImageEnhance
from src.models.model_builder import ModelPhase
from src.utils.config import cfg
from .data_utils import GeneratorEnqueuer
class BaseSeg(object):
def __init__(self, file_list, data_dir, shuffle=False, mode=ModelPhase.TRAIN, base_size=1024, crop_size=769, rand_scale=True):
self.mode = mode
self.shuffle = shuffle
self.data_dir = data_dir
self.shuffle_seed = 0
self.crop_size = crop_size
self.base_size = base_size # short edge when training
self.rand_scale = rand_scale
# NOTE: Please ensure file list was save in UTF-8 coding format
with codecs.open(file_list, 'r', 'utf-8') as flist:
self.lines = [line.strip() for line in flist]
self.all_lines = copy.deepcopy(self.lines)
if shuffle and cfg.NUM_TRAINERS > 1:
np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines)
elif shuffle:
np.random.shuffle(self.lines)
self.num_trainers= cfg.NUM_TRAINERS
self.trainer_id=cfg.TRAINER_ID
def generator(self):
if self.shuffle and cfg.NUM_TRAINERS > 1:
np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines)
num_lines = len(self.all_lines) // cfg.NUM_TRAINERS
self.lines = self.all_lines[num_lines * cfg.TRAINER_ID: num_lines * (cfg.TRAINER_ID + 1)]
self.shuffle_seed += 1
elif self.shuffle:
np.random.shuffle(self.lines)
for line in self.lines:
yield self.process_image(line, self.data_dir, self.mode)
def sharding_generator(self, pid=0, num_processes=1):
"""
Use line id as shard key for multiprocess io
It's a normal generator if pid=0, num_processes=1
"""
for index, line in enumerate(self.lines):
# Use index and pid to shard file list
if index % num_processes == pid:
yield self.process_image(line, self.data_dir, self.mode)
def batch_reader(self, batch_size):
br = self.batch(self.reader, batch_size)
for batch in br:
yield batch[0], batch[1], batch[2]
def multiprocess_generator(self, max_queue_size=32, num_processes=8):
# Re-shuffle file list
if self.shuffle and cfg.NUM_TRAINERS > 1:
np.random.RandomState(self.shuffle_seed).shuffle(self.all_lines)
num_lines = len(self.all_lines) // self.num_trainers
self.lines = self.all_lines[num_lines * self.trainer_id: num_lines * (self.trainer_id + 1)]
self.shuffle_seed += 1
elif self.shuffle:
np.random.shuffle(self.lines)
# Create multiple sharding generators according to num_processes for multiple processes
generators = []
for pid in range(num_processes):
generators.append(self.sharding_generator(pid, num_processes))
try:
enqueuer = GeneratorEnqueuer(generators)
enqueuer.start(max_queue_size=max_queue_size, workers=num_processes)
while True:
generator_out = None
while enqueuer.is_running():
if not enqueuer.queue.empty():
generator_out = enqueuer.queue.get(timeout=5)
break
else:
time.sleep(0.01)
if generator_out is None:
break
yield generator_out
finally:
if enqueuer is not None:
enqueuer.stop()
def batch(self, reader, batch_size, is_test=False, drop_last=False):
def batch_reader(is_test=False, drop_last=drop_last):
if is_test:
imgs, grts, img_names, valid_shapes, org_shapes = [], [], [], [], []
for img, grt, img_name, valid_shape, org_shape in reader():
imgs.append(img)
grts.append(grt)
img_names.append(img_name)
valid_shapes.append(valid_shape)
org_shapes.append(org_shape)
if len(imgs) == batch_size:
yield np.array(imgs), np.array(
grts), img_names, np.array(valid_shapes), np.array(
org_shapes)
imgs, grts, img_names, valid_shapes, org_shapes = [], [], [], [], []
if not drop_last and len(imgs) > 0:
yield np.array(imgs), np.array(grts), img_names, np.array(
valid_shapes), np.array(org_shapes)
else:
imgs, labs, ignore = [], [], []
bs = 0
for img, lab, ig in reader():
imgs.append(img)
labs.append(lab)
ignore.append(ig)
bs += 1
if bs == batch_size:
yield np.array(imgs), np.array(labs), np.array(ignore)
bs = 0
imgs, labs, ignore = [], [], []
if not drop_last and bs > 0:
yield np.array(imgs), np.array(labs), np.array(ignore)
return batch_reader(is_test, drop_last)
def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
raise NotImplemented
def pil_imread(self, file_path):
"""read pseudo-color label"""
im = Image.open(file_path)
return np.asarray(im)
def cv2_imread(self, file_path, flag=cv2.IMREAD_COLOR):
# resolve cv2.imread open Chinese file path issues on Windows Platform.
return cv2.imdecode(np.fromfile(file_path, dtype=np.uint8), flag)
def normalize_image(self, img):
img = img.transpose((2, 0, 1)).astype('float32') / 255.0
img_mean = np.array(cfg.MEAN).reshape((len(cfg.MEAN), 1, 1))
img_std = np.array(cfg.STD).reshape((len(cfg.STD), 1, 1))
img -= img_mean
img /= img_std
return img
def process_image(self, line, data_dir, mode):
""" process_image """
img, grt, img_name, grt_name = self.load_image( line, data_dir, mode=mode) # img.type: numpy.array, grt.type: numpy.array
if mode == ModelPhase.TRAIN:
# numpy.array convert to PIL.Image
img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
grt = Image.fromarray(grt.astype('uint8')).convert('L')
crop_size = self.crop_size
# random scale
if self.rand_scale:
short_size = random.randint(int(self.base_size * cfg.DATAAUG.RAND_SCALE_MIN), int(self.base_size * cfg.DATAAUG.RAND_SCALE_MAX))
else:
short_size = self.base_size
w, h = img.size
if h > w:
out_w = short_size
out_h = int(1.0 * h / w * out_w)
else:
out_h = short_size
out_w = int(1.0 * w / h * out_h)
img = img.resize((out_w, out_h), Image.BILINEAR)
grt = grt.resize((out_w, out_h), Image.NEAREST)
# rand flip
if random.random() > 0.5:
img = img.transpose(Image.FLIP_LEFT_RIGHT)
grt = grt.transpose(Image.FLIP_LEFT_RIGHT)
# padding
if short_size < crop_size:
pad_h = crop_size - out_h if out_h < crop_size else 0
pad_w = crop_size - out_w if out_w < crop_size else 0
img = ImageOps.expand(img, border=(pad_w // 2, pad_h // 2, pad_w - pad_w // 2, pad_h - pad_h // 2), fill=0)
grt = ImageOps.expand(grt, border=(pad_w // 2, pad_h // 2, pad_w - pad_w // 2, pad_h - pad_h // 2), fill=cfg.DATASET.IGNORE_INDEX)
# random crop
w, h = img.size
x = random.randint(0, w - crop_size)
y = random.randint(0, h - crop_size)
img = img.crop((x, y, x + crop_size, y + crop_size))
grt = grt.crop((x, y, x + crop_size, y + crop_size))
# gaussian blur
if cfg.DATAAUG_EXTRA:
if random.random() > 0.7:
img = img.filter(ImageFilter.GaussianBlur(radius=random.random()))
# PIL.Image -> cv2
img = cv2.cvtColor(np.asarray(img),cv2.COLOR_RGB2BGR)
grt = np.array(grt)
elif ModelPhase.is_eval(mode):
org_shape = [img.shape[0], img.shape[1]] # 1024 x 2048 for cityscapes
elif ModelPhase.is_visual(mode):
org_shape = [img.shape[0], img.shape[1]]
#img, grt = resize(img, grt, mode=mode)
valid_shape = [img.shape[0], img.shape[1]]
#img, grt = rand_crop(img, grt, mode=mode)
else:
raise ValueError("Dataset mode={} Error!".format(mode))
# Normalize image
img = self.normalize_image(img)
if ModelPhase.is_train(mode) or ModelPhase.is_eval(mode):
grt = np.expand_dims(np.array(grt).astype('int32'), axis=0)
ignore = (grt != cfg.DATASET.IGNORE_INDEX).astype('int32')
if ModelPhase.is_train(mode):
return (img, grt, ignore)
elif ModelPhase.is_eval(mode):
return (img, grt, ignore)
elif ModelPhase.is_visual(mode):
return (img, grt, img_name, valid_shape, org_shape)
from __future__ import print_function
import sys
import os
import math
import random
import functools
import io
import time
import codecs
import numpy as np
import paddle
import paddle.fluid as fluid
import cv2
from PIL import Image
import copy
from src.utils.config import cfg
from src.models.model_builder import ModelPhase
from .baseseg import BaseSeg
class CityscapesSeg(BaseSeg):
def __init__(self, file_list, data_dir, shuffle=False, mode=ModelPhase.TRAIN, base_size=1024, crop_size=769, rand_scale=True):
super(CityscapesSeg, self).__init__(file_list, data_dir, shuffle, mode, base_size, crop_size, rand_scale)
def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
# original image cv2.imread flag setting
cv2_imread_flag = cv2.IMREAD_COLOR
if cfg.DATASET.IMAGE_TYPE == "rgba":
# If use RBGA 4 channel ImageType, use IMREAD_UNCHANGED flags to
# reserver alpha channel
cv2_imread_flag = cv2.IMREAD_UNCHANGED
parts = line.strip().split(cfg.DATASET.SEPARATOR)
if len(parts) != 2:
if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
raise Exception("File list format incorrect! It should be image_name {} label_name\\n".format(cfg.DATASET.SEPARATOR))
img_name, grt_name = parts[0], None
else:
img_name, grt_name = parts[0], parts[1]
img_path = os.path.join(src_dir, img_name)
img = self.cv2_imread(img_path, cv2_imread_flag)
if grt_name is not None:
grt_path = os.path.join(src_dir, grt_name)
grt = self.pil_imread(grt_path)
else:
grt = None
img_height = img.shape[0]
img_width = img.shape[1]
if grt is not None:
grt_height = grt.shape[0]
grt_width = grt.shape[1]
id_to_trainid = [255, 255, 255, 255, 255,
255, 255, 255, 0, 1,
255, 255, 2, 3, 4,
255, 255, 255, 5, 255,
6, 7, 8, 9, 10,
11, 12, 13, 14, 15,
255, 255, 16, 17, 18]
grt_ = np.zeros([grt_height, grt_width])
for h in range(grt_height):
for w in range(grt_width):
grt_[h][w] = id_to_trainid[int(grt[h][w])+1]
if img_height != grt_height or img_width != grt_width:
raise Exception("source img and label img must has the same size")
else:
if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
raise Exception("Empty image, src_dir: {}, img: {} & lab: {}".format(src_dir, img_path, grt_path))
if len(img.shape) < 3:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
return img, grt_, img_name, grt_name
"""
This code is based on https://github.com/fchollet/keras/blob/master/keras/utils/data_utils.py
"""
import time
import numpy as np
import threading
import multiprocessing
try:
import queue
except ImportError:
import Queue as queue
class GeneratorEnqueuer(object):
"""
Multiple generators
Args:
generators:
wait_time (float): time to sleep in-between calls to `put()`.
"""
def __init__(self, generators, wait_time=0.05):
self.wait_time = wait_time
self._generators = generators
self._threads = []
self._stop_events = []
self.queue = None
self._manager = None
self.workers = 1
def start(self, workers=1, max_queue_size=16):
"""
Start worker threads which add data from the generator into the queue.
Args:
workers (int): number of worker threads
max_queue_size (int): queue size
(when full, threads could block on `put()`)
"""
self.workers = workers
def data_generator_task(pid):
"""
Data generator task.
"""
def task(pid):
if (self.queue is not None
and self.queue.qsize() < max_queue_size):
generator_output = next(self._generators[pid])
self.queue.put((generator_output))
else:
time.sleep(self.wait_time)
while not self._stop_events[pid].is_set():
try:
task(pid)
except Exception:
self._stop_events[pid].set()
break
try:
self._manager = multiprocessing.Manager()
self.queue = self._manager.Queue(maxsize=max_queue_size)
for pid in range(self.workers):
self._stop_events.append(multiprocessing.Event())
thread = multiprocessing.Process(
target=data_generator_task, args=(pid, ))
thread.daemon = True
self._threads.append(thread)
thread.start()
except:
self.stop()
raise
def is_running(self):
"""
Returns:
bool: Whether the worker theads are running.
"""
# If queue is not empty then still in runing state wait for consumer
if not self.queue.empty():
return True
for pid in range(self.workers):
if not self._stop_events[pid].is_set():
return True
return False
def stop(self, timeout=None):
"""
Stops running threads and wait for them to exit, if necessary.
Should be called by the same thread which called `start()`.
Args:
timeout(int|None): maximum time to wait on `thread.join()`.
"""
if self.is_running():
for pid in range(self.workers):
self._stop_events[pid].set()
for thread in self._threads:
if thread.is_alive():
thread.join(timeout)
if self._manager:
self._manager.shutdown()
self._threads = []
self._stop_events = []
self.queue = None
from __future__ import print_function
import sys
import os
import math
import random
import functools
import io
import time
import codecs
import numpy as np
import paddle
import paddle.fluid as fluid
import cv2
from PIL import Image
import copy
from src.utils.config import cfg
from src.models.model_builder import ModelPhase
from .baseseg import BaseSeg
class PascalContextSeg(BaseSeg):
def __init__(self,
file_list,
data_dir,
shuffle=False,
mode=ModelPhase.TRAIN, base_size=520, crop_size=520, rand_scale=True):
super(PascalContextSeg, self).__init__(file_list, data_dir, shuffle, mode, base_size, crop_size, rand_scale)
def _mask_transform(self, mask):
target = np.array(mask).astype('int32') - 1
return target
def load_image(self, line, src_dir, mode=ModelPhase.TRAIN):
# original image cv2.imread flag setting
cv2_imread_flag = cv2.IMREAD_COLOR
if cfg.DATASET.IMAGE_TYPE == "rgba":
# If use RBGA 4 channel ImageType, use IMREAD_UNCHANGED flags to
# reserver alpha channel
cv2_imread_flag = cv2.IMREAD_UNCHANGED
parts = line.strip().split(cfg.DATASET.SEPARATOR)
if len(parts) != 2:
if mode == ModelPhase.TRAIN or mode == ModelPhase.EVAL:
raise Exception("File list format incorrect! It should be"
" image_name{}label_name\\n".format(
cfg.DATASET.SEPARATOR))
img_name, grt_name = parts[0], None
else:
img_name, grt_name = parts[0], parts[1]
img_path = os.path.join(src_dir, img_name)
img = self.cv2_imread(img_path, cv2_imread_flag)
if grt_name is not None:
grt_path = os.path.join(src_dir, grt_name)
grt = self.pil_imread(grt_path)
else:
grt = None
if len(img.shape) < 3:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
grt = self._mask_transform(grt)
return img, grt, img_name, grt_name
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#import models.modeling
#import models.libs
#import models.backbone
from . import modeling, libs, backbone
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
from src.utils.config import cfg
class HRNet():
"""
Reference:
Sun, Ke, et al. "Deep High-Resolution Representation Learning for Human Pose Estimation.", In CVPR 2019
"""
def __init__(self, stride=4, seg_flag=False):
self.stride= stride
self.seg_flag=seg_flag
def conv_bn_layer(self, input, filter_size, num_filters, stride=1, padding=1, num_groups=1, if_act=True, name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=num_groups,
act=None,
param_attr=ParamAttr(initializer=MSRA(), name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(input=conv,
param_attr=ParamAttr(name=bn_name + "_scale",
initializer=fluid.initializer.Constant(1.0)),
bias_attr=ParamAttr(name=bn_name + "_offset",
initializer=fluid.initializer.Constant(0.0)),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
bn = fluid.layers.relu(bn)
return bn
def basic_block(self, input, num_filters, stride=1, downsample=False, name=None):
residual = input
conv = self.conv_bn_layer(input=input, filter_size=3, num_filters=num_filters, stride=stride, name=name + '_conv1')
conv = self.conv_bn_layer(input=conv, filter_size=3, num_filters=num_filters, if_act=False, name=name + '_conv2')
if downsample:
residual = self.conv_bn_layer(input=input, filter_size=1, num_filters=num_filters, if_act=False,
name=name + '_downsample')
return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
def bottleneck_block(self, input, num_filters, stride=1, downsample=False, name=None):
residual = input
conv = self.conv_bn_layer(input=input, filter_size=1, num_filters=num_filters, name=name + '_conv1')
conv = self.conv_bn_layer(input=conv, filter_size=3, num_filters=num_filters, stride=stride, name=name + '_conv2')
conv = self.conv_bn_layer(input=conv, filter_size=1, num_filters=num_filters * 4, if_act=False,
name=name + '_conv3')
if downsample:
residual = self.conv_bn_layer(input=input, filter_size=1, num_filters=num_filters * 4, if_act=False,
name=name + '_downsample')
return fluid.layers.elementwise_add(x=residual, y=conv, act='relu')
def fuse_layers(self, x, channels, multi_scale_output=True, name=None):
out = []
for i in range(len(channels) if multi_scale_output else 1):
residual = x[i]
shape = residual.shape
width = shape[-1]
height = shape[-2]
for j in range(len(channels)):
if j > i:
y = self.conv_bn_layer(x[j], filter_size=1, num_filters=channels[i], if_act=False,
name=name + '_layer_' + str(i + 1) + '_' + str(j + 1))
y = fluid.layers.resize_bilinear(input=y, out_shape=[height, width])
residual = fluid.layers.elementwise_add(x=residual, y=y, act=None)
elif j < i:
y = x[j]
for k in range(i - j):
if k == i - j - 1:
y = self.conv_bn_layer(y, filter_size=3, num_filters=channels[i], stride=2, if_act=False,
name=name + '_layer_' + str(i + 1) + '_' + str(j + 1) + '_' + str(k + 1))
else:
y = self.conv_bn_layer(y, filter_size=3, num_filters=channels[j], stride=2,
name=name + '_layer_' + str(i + 1) + '_' + str(j + 1) + '_' + str(k + 1))
residual = fluid.layers.elementwise_add(x=residual, y=y, act=None)
residual = fluid.layers.relu(residual)
out.append(residual)
return out
def branches(self, x, block_num, channels, name=None):
out = []
for i in range(len(channels)):
residual = x[i]
for j in range(block_num):
residual = self.basic_block(residual, channels[i],
name=name + '_branch_layer_' + str(i + 1) + '_' + str(j + 1))
out.append(residual)
return out
def high_resolution_module(self, x, channels, multi_scale_output=True, name=None):
residual = self.branches(x, 4, channels, name=name)
out = self.fuse_layers(residual, channels, multi_scale_output=multi_scale_output, name=name)
return out
def transition_layer(self, x, in_channels, out_channels, name=None):
num_in = len(in_channels)
num_out = len(out_channels)
out = []
for i in range(num_out):
if i < num_in:
if in_channels[i] != out_channels[i]:
residual = self.conv_bn_layer(x[i], filter_size=3, num_filters=out_channels[i],
name=name + '_layer_' + str(i + 1))
out.append(residual)
else:
out.append(x[i])
else:
residual = self.conv_bn_layer(x[-1], filter_size=3, num_filters=out_channels[i], stride=2,
name=name + '_layer_' + str(i + 1))
out.append(residual)
return out
def stage(self, x, num_modules, channels, multi_scale_output=True, name=None):
out = x
for i in range(num_modules):
if i == num_modules - 1 and multi_scale_output == False:
out = self.high_resolution_module(out, channels, multi_scale_output=False, name=name + '_' + str(i + 1))
else:
out = self.high_resolution_module(out, channels, name=name + '_' + str(i + 1))
return out
def layer1(self, input, name=None):
conv = input
for i in range(4):
conv = self.bottleneck_block(conv, num_filters=64, downsample=True if i == 0 else False,
name=name + '_' + str(i + 1))
return conv
#def highResolutionNet(input, num_classes):
def net(self, input, num_classes=1000):
channels_2 = cfg.MODEL.HRNET.STAGE2.NUM_CHANNELS
channels_3 = cfg.MODEL.HRNET.STAGE3.NUM_CHANNELS
channels_4 = cfg.MODEL.HRNET.STAGE4.NUM_CHANNELS
num_modules_2 = cfg.MODEL.HRNET.STAGE2.NUM_MODULES
num_modules_3 = cfg.MODEL.HRNET.STAGE3.NUM_MODULES
num_modules_4 = cfg.MODEL.HRNET.STAGE4.NUM_MODULES
x = self.conv_bn_layer(input=input, filter_size=3, num_filters=64, stride=2, if_act=True, name='layer1_1')
x = self.conv_bn_layer(input=x, filter_size=3, num_filters=64, stride=2, if_act=True, name='layer1_2')
la1 = self.layer1(x, name='layer2')
tr1 = self.transition_layer([la1], [256], channels_2, name='tr1')
st2 = self.stage(tr1, num_modules_2, channels_2, name='st2')
tr2 = self.transition_layer(st2, channels_2, channels_3, name='tr2')
st3 = self.stage(tr2, num_modules_3, channels_3, name='st3')
tr3 = self.transition_layer(st3, channels_3, channels_4, name='tr3')
st4 = self.stage(tr3, num_modules_4, channels_4, name='st4')
# upsample
shape = st4[0].shape
height, width = shape[-2], shape[-1]
st4[1] = fluid.layers.resize_bilinear(st4[1], out_shape=[height, width])
st4[2] = fluid.layers.resize_bilinear(st4[2], out_shape=[height, width])
st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=[height, width])
out = fluid.layers.concat(st4, axis=1)
if self.seg_flag and self.stride==4:
return out
last_channels = sum(channels_4)
out = conv_bn_layer(input=out, filter_size=1, num_filters=last_channels, stride=1, if_act=True, name='conv-2')
out= fluid.layers.conv2d(
input=out,
num_filters=num_classes,
filter_size=1,
stride=1,
padding=0,
act=None,
param_attr=ParamAttr(initializer=MSRA(), name='conv-1_weights'),
bias_attr=False)
out = fluid.layers.resize_bilinear(out, input.shape[2:])
return out
def hrnet():
model = HRNet(stride=4, seg_flag=True)
return model
if __name__ == '__main__':
image_shape = [3, 769, 769]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
logit = hrnet(image, 4)
print("logit:", logit.shape)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
__all__ = [
'MobileNetV2', 'MobileNetV2_x0_25', 'MobileNetV2_x0_5', 'MobileNetV2_x1_0',
'MobileNetV2_x1_5', 'MobileNetV2_x2_0', 'MobileNetV2_scale'
]
class MobileNetV2():
def __init__(self, scale=1.0, change_depth=False, output_stride=None):
self.scale = scale
self.change_depth = change_depth
self.bottleneck_params_list = [
(1, 16, 1, 1),
(6, 24, 2, 2),
(6, 32, 3, 2),
(6, 64, 4, 2),
(6, 96, 3, 1),
(6, 160, 3, 2),
(6, 320, 1, 1),
] if change_depth == False else [
(1, 16, 1, 1),
(6, 24, 2, 2),
(6, 32, 5, 2),
(6, 64, 7, 2),
(6, 96, 5, 1),
(6, 160, 3, 2),
(6, 320, 1, 1),
]
self.modify_bottle_params(output_stride)
def modify_bottle_params(self, output_stride=None):
if output_stride is not None and output_stride % 2 != 0:
raise Exception("output stride must to be even number")
if output_stride is None:
return
else:
stride = 2
for i, layer_setting in enumerate(self.bottleneck_params_list):
t, c, n, s = layer_setting
stride = stride * s
if stride > output_stride:
s = 1
self.bottleneck_params_list[i] = (t, c, n, s)
def net(self, input, class_dim=1000, end_points=None, decode_points=None):
scale = self.scale
change_depth = self.change_depth
#if change_depth is True, the new depth is 1.4 times as deep as before.
bottleneck_params_list = self.bottleneck_params_list
decode_ends = dict()
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
#conv1
input = self.conv_bn_layer(
input,
num_filters=int(32 * scale),
filter_size=3,
stride=2,
padding=1,
if_act=True,
name='conv1_1')
layer_count = 1
#print("node test:", layer_count, input.shape)
if check_points(layer_count, decode_points):
decode_ends[layer_count] = input
if check_points(layer_count, end_points):
return input, decode_ends
# bottleneck sequences
i = 1
in_c = int(32 * scale)
for layer_setting in bottleneck_params_list:
t, c, n, s = layer_setting
i += 1
input, depthwise_output = self.invresi_blocks(
input=input,
in_c=in_c,
t=t,
c=int(c * scale),
n=n,
s=s,
name='conv' + str(i))
in_c = int(c * scale)
layer_count += n
#print("node test:", layer_count, input.shape)
if check_points(layer_count, decode_points):
decode_ends[layer_count] = depthwise_output
if check_points(layer_count, end_points):
return input, decode_ends
#last_conv
input = self.conv_bn_layer(
input=input,
num_filters=int(1280 * scale) if scale > 1.0 else 1280,
filter_size=1,
stride=1,
padding=0,
if_act=True,
name='conv9')
input = fluid.layers.pool2d(
input=input,
pool_size=7,
pool_stride=1,
pool_type='avg',
global_pooling=True)
output = fluid.layers.fc(
input=input,
size=class_dim,
param_attr=ParamAttr(name='fc10_weights'),
bias_attr=ParamAttr(name='fc10_offset'))
return output
def conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
if_act=True,
name=None,
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=ParamAttr(name=name + '_weights'),
bias_attr=False)
bn_name = name + '_bn'
bn = fluid.layers.batch_norm(
input=conv,
param_attr=ParamAttr(name=bn_name + "_scale"),
bias_attr=ParamAttr(name=bn_name + "_offset"),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance')
if if_act:
return fluid.layers.relu6(bn)
else:
return bn
def shortcut(self, input, data_residual):
return fluid.layers.elementwise_add(input, data_residual)
def inverted_residual_unit(self,
input,
num_in_filter,
num_filters,
ifshortcut,
stride,
filter_size,
padding,
expansion_factor,
name=None):
num_expfilter = int(round(num_in_filter * expansion_factor))
channel_expand = self.conv_bn_layer(
input=input,
num_filters=num_expfilter,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=True,
name=name + '_expand')
bottleneck_conv = self.conv_bn_layer(
input=channel_expand,
num_filters=num_expfilter,
filter_size=filter_size,
stride=stride,
padding=padding,
num_groups=num_expfilter,
if_act=True,
name=name + '_dwise',
use_cudnn=False)
depthwise_output = bottleneck_conv
linear_out = self.conv_bn_layer(
input=bottleneck_conv,
num_filters=num_filters,
filter_size=1,
stride=1,
padding=0,
num_groups=1,
if_act=False,
name=name + '_linear')
if ifshortcut:
out = self.shortcut(input=input, data_residual=linear_out)
return out, depthwise_output
else:
return linear_out, depthwise_output
def invresi_blocks(self, input, in_c, t, c, n, s, name=None):
first_block, depthwise_output = self.inverted_residual_unit(
input=input,
num_in_filter=in_c,
num_filters=c,
ifshortcut=False,
stride=s,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_1')
last_residual_block = first_block
last_c = c
for i in range(1, n):
last_residual_block, depthwise_output = self.inverted_residual_unit(
input=last_residual_block,
num_in_filter=last_c,
num_filters=c,
ifshortcut=True,
stride=1,
filter_size=3,
padding=1,
expansion_factor=t,
name=name + '_' + str(i + 1))
return last_residual_block, depthwise_output
def MobileNetV2_x0_25():
model = MobileNetV2(scale=0.25)
return model
def MobileNetV2_x0_5():
model = MobileNetV2(scale=0.5)
return model
def MobileNetV2_x1_0():
model = MobileNetV2(scale=1.0)
return model
def MobileNetV2_x1_5():
model = MobileNetV2(scale=1.5)
return model
def MobileNetV2_x2_0():
model = MobileNetV2(scale=2.0)
return model
def MobileNetV2_scale():
model = MobileNetV2(scale=1.2, change_depth=True)
return model
if __name__ == '__main__':
image_shape = [3, 224, 224]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
model = MobileNetV2_x1_0()
logit, decode_ends = model.net(image)
#print("logit:", logit.shape)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from src.utils.config import cfg
__all__ = [
"ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"
]
class ResNet():
def __init__(self, layers=50, scale=1.0):
self.layers = layers
self.scale = scale
def net(self,
input,
class_dim=1000,
end_points=None,
decode_points=None,
resize_points=None,
dilation_dict=None):
layers = self.layers
supported_layers = [18, 34, 50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
decode_ends = dict()
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
def get_dilated_rate(dilation_dict, idx):
if dilation_dict is None or idx not in dilation_dict:
return 1
else:
return dilation_dict[idx]
if layers == 18:
depth = [2, 2, 2, 2]
elif layers == 34 or layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
# stage_1: 3 3x3_Conv
conv = self.conv_bn_layer(
input=input,
num_filters=int(64 * self.scale),
filter_size=3,
stride=2,
act='relu',
name="conv1_1")
conv = self.conv_bn_layer(
input=conv,
num_filters=int(64 * self.scale),
filter_size=3,
stride=1,
act='relu',
name="conv1_2")
conv = self.conv_bn_layer(
input=conv,
num_filters=int(128 * self.scale),
filter_size=3,
stride=1,
act='relu',
name="conv1_3")
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
layer_count = 1
if check_points(layer_count, decode_points):
decode_ends[layer_count] = conv
if check_points(layer_count, end_points):
return conv, decode_ends
if layers >= 50:
for block in range(len(depth)):
for i in range(depth[block]): #depth = [3, 4, 23, 3]
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
dilation_rate = get_dilated_rate(dilation_dict, block)
# added by Rosun, employ multi-grid
if cfg.MODEL.BACKBONE_MULTI_GRID== True and block==3:
if i==0:
dilation_rate = dilation_rate*(i+1)
else:
dilation_rate = dilation_rate*(2*i) # 2, 4
print("employ multi-grid for resnet backbone network: dilation_rate={}\n".format(dilation_rate))
conv = self.bottleneck_block(
input=conv,
num_filters=int(num_filters[block] * self.scale),
stride=2
if i == 0 and block != 0 and dilation_rate == 1 else 1,
name=conv_name,
dilation=dilation_rate)
layer_count += 3
if check_points(layer_count, decode_points):
decode_ends[layer_count] = conv
if check_points(layer_count, end_points):
return conv, decode_ends
if check_points(layer_count, resize_points):
conv = self.interp(
conv,
np.ceil(
np.array(conv.shape[2:]).astype('int32') / 2))
pool = fluid.layers.pool2d(input=conv, pool_size=7, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=pool,
size=class_dim,
param_attr=fluid.param_attr.ParamAttr(initializer=fluid.initializer.Uniform(-stdv, stdv)))
else:
for block in range(len(depth)):
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.basic_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
is_first=block == i == 0,
name=conv_name)
layer_count += 2
if check_points(layer_count, decode_points):
decode_ends[layer_count] = conv
if check_points(layer_count, end_points):
return conv, decode_ends
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return out
def zero_padding(self, input, padding):
return fluid.layers.pad(
input, [0, 0, 0, 0, padding, padding, padding, padding])
def interp(self, input, out_shape):
out_shape = list(out_shape.astype("int32"))
return fluid.layers.resize_bilinear(input, out_shape=out_shape)
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
dilation=1,
groups=1,
act=None,
name=None):
bias_attr=False
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2 if dilation == 1 else 0,
dilation=dilation,
groups=groups,
act=None,
param_attr=ParamAttr(name=name + "_weights"),
bias_attr=bias_attr,
name=name + '.conv2d.output.1')
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
return fluid.layers.batch_norm(input=conv,
act=act,
name=bn_name + '.output.1',
param_attr=ParamAttr(name=bn_name + '_scale'),
bias_attr=ParamAttr(bn_name + '_offset'),
moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance', )
def shortcut(self, input, ch_out, stride, is_first, name):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1 or is_first == True:
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck_block(self, input, num_filters, stride, name, dilation=1):
if self.layers == 101:
strides = [1, stride]
else:
strides = [stride, 1]
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=1,
dilation=1,
stride=strides[0],
act='relu',
name=name + "_branch2a")
if dilation > 1:
conv0 = self.zero_padding(conv0, dilation)
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
dilation=dilation,
stride=strides[1],
act='relu',
name=name + "_branch2b")
conv2 = self.conv_bn_layer(
input=conv1,
num_filters=num_filters * 4,
dilation=1,
filter_size=1,
act=None,
name=name + "_branch2c")
short = self.shortcut(
input,
num_filters * 4,
stride,
is_first=False,
name=name + "_branch1")
return fluid.layers.elementwise_add(
x=short, y=conv2, act='relu', name=name + ".add.output.5")
def basic_block(self, input, num_filters, stride, is_first, name):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a")
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b")
short = self.shortcut(
input, num_filters, stride, is_first, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
def ResNet18():
model = ResNet(layers=18)
return model
def ResNet34():
model = ResNet(layers=34)
return model
def ResNet50():
model = ResNet(layers=50)
return model
def ResNet101():
model = ResNet(layers=101)
return model
def ResNet152():
model = ResNet(layers=152)
return model
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import paddle
import math
import paddle.fluid as fluid
from src.models.libs.model_libs import scope, name_scope
from src.models.libs.model_libs import bn, bn_relu, relu
from src.models.libs.model_libs import conv
from src.models.libs.model_libs import separate_conv
__all__ = ['xception_65', 'xception_41', 'xception_71']
def check_data(data, number):
if type(data) == int:
return [data] * number
assert len(data) == number
return data
def check_stride(s, os):
if s <= os:
return True
else:
return False
def check_points(count, points):
if points is None:
return False
else:
if isinstance(points, list):
return (True if count in points else False)
else:
return (True if count == points else False)
class Xception():
def __init__(self, backbone="xception_65"):
self.bottleneck_params = self.gen_bottleneck_params(backbone)
self.backbone = backbone
def gen_bottleneck_params(self, backbone='xception_65'):
if backbone == 'xception_65':
bottleneck_params = {
"entry_flow": (3, [2, 2, 2], [128, 256, 728]),
"middle_flow": (16, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
elif backbone == 'xception_41':
bottleneck_params = {
"entry_flow": (3, [2, 2, 2], [128, 256, 728]),
"middle_flow": (8, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
elif backbone == 'xception_71':
bottleneck_params = {
"entry_flow": (5, [2, 1, 2, 1, 2], [128, 256, 256, 728, 728]),
"middle_flow": (16, 1, 728),
"exit_flow": (2, [2, 1], [[728, 1024, 1024], [1536, 1536,
2048]])
}
else:
raise Exception(
"xception backbont only support xception_41/xception_65/xception_71"
)
return bottleneck_params
def net(self,
input,
output_stride=32,
num_classes=1000,
end_points=None,
decode_points=None):
self.stride = 2
self.block_point = 0
self.output_stride = output_stride
self.decode_points = decode_points
self.short_cuts = dict()
with scope(self.backbone):
# Entry flow
data = self.entry_flow(input)
if check_points(self.block_point, end_points):
return data, self.short_cuts
# Middle flow
data = self.middle_flow(data)
if check_points(self.block_point, end_points):
return data, self.short_cuts
# Exit flow
data = self.exit_flow(data)
if check_points(self.block_point, end_points):
return data, self.short_cuts
data = fluid.layers.reduce_mean(data, [2, 3], keep_dim=True)
data = fluid.layers.dropout(data, 0.5)
stdv = 1.0 / math.sqrt(data.shape[1] * 1.0)
with scope("logit"):
out = fluid.layers.fc(
input=data,
size=num_classes,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
name='weights',
initializer=fluid.initializer.Uniform(-stdv, stdv)),
bias_attr=fluid.param_attr.ParamAttr(name='bias'))
return out
def entry_flow(self, data):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.09))
with scope("entry_flow"):
with scope("conv1"):
data = bn_relu(
conv(
data, 32, 3, stride=2, padding=1,
param_attr=param_attr))
with scope("conv2"):
data = bn_relu(
conv(
data, 64, 3, stride=1, padding=1,
param_attr=param_attr))
# get entry flow params
block_num = self.bottleneck_params["entry_flow"][0]
strides = self.bottleneck_params["entry_flow"][1]
chns = self.bottleneck_params["entry_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("entry_flow"):
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
stride = strides[i] if check_stride(s * strides[i],
output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def middle_flow(self, data):
block_num = self.bottleneck_params["middle_flow"][0]
strides = self.bottleneck_params["middle_flow"][1]
chns = self.bottleneck_params["middle_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("middle_flow"):
for i in range(block_num):
block_point = block_point + 1
with scope("block" + str(i + 1)):
stride = strides[i] if check_stride(s * strides[i],
output_stride) else 1
data, short_cuts = self.xception_block(
data, chns[i], [1, 1, strides[i]], skip_conv=False)
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def exit_flow(self, data):
block_num = self.bottleneck_params["exit_flow"][0]
strides = self.bottleneck_params["exit_flow"][1]
chns = self.bottleneck_params["exit_flow"][2]
strides = check_data(strides, block_num)
chns = check_data(chns, block_num)
assert (block_num == 2)
# params to control your flow
s = self.stride
block_point = self.block_point
output_stride = self.output_stride
with scope("exit_flow"):
with scope('block1'):
block_point += 1
stride = strides[0] if check_stride(s * strides[0],
output_stride) else 1
data, short_cuts = self.xception_block(data, chns[0],
[1, 1, stride])
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
with scope('block2'):
block_point += 1
stride = strides[1] if check_stride(s * strides[1],
output_stride) else 1
data, short_cuts = self.xception_block(
data,
chns[1], [1, 1, stride],
dilation=2,
has_skip=False,
activation_fn_in_separable_conv=True)
s = s * stride
if check_points(block_point, self.decode_points):
self.short_cuts[block_point] = short_cuts[1]
self.stride = s
self.block_point = block_point
return data
def xception_block(self,
input,
channels,
strides=1,
filters=3,
dilation=1,
skip_conv=True,
has_skip=True,
activation_fn_in_separable_conv=False):
repeat_number = 3
channels = check_data(channels, repeat_number)
filters = check_data(filters, repeat_number)
strides = check_data(strides, repeat_number)
data = input
results = []
for i in range(repeat_number):
with scope('separable_conv' + str(i + 1)):
if not activation_fn_in_separable_conv:
data = relu(data)
data = separate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation)
else:
data = separate_conv(
data,
channels[i],
strides[i],
filters[i],
dilation=dilation,
act=relu)
results.append(data)
if not has_skip:
return data, results
if skip_conv:
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(
loc=0.0, scale=0.09))
with scope('shortcut'):
skip = bn(
conv(
input,
channels[-1],
1,
strides[-1],
groups=1,
padding=0,
param_attr=param_attr))
else:
skip = input
return data + skip, results
def xception_65():
model = Xception("xception_65")
return model
def xception_41():
model = Xception("xception_41")
return model
def xception_71():
model = Xception("xception_71")
return model
if __name__ == '__main__':
image_shape = [3, 224, 224]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
model = xception_65()
logit = model.net(image)
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import paddle
import paddle.fluid as fluid
from src.utils.config import cfg
import contextlib
bn_regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0)
name_scope = ""
@contextlib.contextmanager
def scope(name):
global name_scope
bk = name_scope
name_scope = name_scope + name + '/'
yield
name_scope = bk
def max_pool(input, kernel, stride, padding):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='max',
pool_stride=stride,
pool_padding=padding)
return data
def avg_pool(input, kernel, stride, padding=0):
data = fluid.layers.pool2d(
input,
pool_size=kernel,
pool_type='avg',
pool_stride=stride,
pool_padding=padding)
return data
def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
N, C, H, W = input.shape
if C % G != 0:
# print "group can not divide channle:", C, G
for d in range(10):
for t in [d, -d]:
if G + t <= 0: continue
if C % (G + t) == 0:
G = G + t
break
if C % G == 0:
# print "use group size:", G
break
assert C % G == 0
x = fluid.layers.group_norm(
input,
groups=G,
param_attr=param_attr,
bias_attr=bias_attr,
name=name_scope + 'group_norm')
return x
def bn(*args, **kargs):
if cfg.MODEL.DEFAULT_NORM_TYPE == 'bn':
with scope('BatchNorm'):
return fluid.layers.batch_norm(
*args,
epsilon=cfg.MODEL.DEFAULT_EPSILON,
momentum=cfg.MODEL.BN_MOMENTUM,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer),
moving_mean_name=name_scope + 'moving_mean',
moving_variance_name=name_scope + 'moving_variance',
**kargs)
elif cfg.MODEL.DEFAULT_NORM_TYPE == 'gn':
with scope('GroupNorm'):
return group_norm(
args[0],
cfg.MODEL.DEFAULT_GROUP_NUMBER,
eps=cfg.MODEL.DEFAULT_EPSILON,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer))
else:
raise Exception("Unsupport norm type:" + cfg.MODEL.DEFAULT_NORM_TYPE)
def bn_zero(*args, **kargs):
if cfg.MODEL.DEFAULT_NORM_TYPE == 'bn':
with scope('BatchNormZeroInit'):
return fluid.layers.batch_norm(
*args,
epsilon=cfg.MODEL.DEFAULT_EPSILON,
momentum=cfg.MODEL.BN_MOMENTUM,
param_attr=fluid.ParamAttr(
name=name_scope + 'gamma', regularizer=bn_regularizer,
initializer=fluid.initializer.ConstantInitializer(value=0.0)),
bias_attr=fluid.ParamAttr(
name=name_scope + 'beta', regularizer=bn_regularizer,
initializer=fluid.initializer.ConstantInitializer(value=0.0)),
moving_mean_name=name_scope + 'moving_mean',
moving_variance_name=name_scope + 'moving_variance',
**kargs)
def bn_relu(data):
return fluid.layers.relu(bn(data))
def relu(data):
return fluid.layers.relu(data)
def conv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = fluid.ParamAttr(
name=name_scope + 'biases',
regularizer=None,
initializer=fluid.initializer.ConstantInitializer(value=0.0))
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d(*args, **kargs)
def deconv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights'
if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = name_scope + 'biases'
else:
kargs['bias_attr'] = False
return fluid.layers.conv2d_transpose(*args, **kargs)
def separate_conv(input, channel, stride, filter, dilation=1, act=None):
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.33))
with scope('depthwise'):
input = conv(
input,
input.shape[1],
filter,
stride,
groups=input.shape[1],
padding=(filter // 2) * dilation,
dilation=dilation,
use_cudnn=False,
param_attr=param_attr)
input = bn(input)
if act: input = act(input)
param_attr = fluid.ParamAttr(
name=name_scope + 'weights',
regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('pointwise'):
input = conv(
input, channel, 1, 1, groups=1, padding=0, param_attr=param_attr)
input = bn(input)
if act: input = act(input)
return input
def FCNHead(input, mid_feat_channel, num_classes, output_shape):
# Arch: Conv_3x3 + BN + ReLU + Dropout + Conv_1x1
# Conv_3x3 + BN + ReLU
aux_seg_name= "Aux_layer1"
with scope(aux_seg_name):
conv_feat= conv(input, mid_feat_channel, filter_size=3, padding=1, bias_attr=False, name=aux_seg_name + '_conv')
bn_feat = bn(conv_feat, act='relu')
# Dropout
dropout_out = fluid.layers.dropout(bn_feat, dropout_prob=0.1, name="Aux_dropout")
# Conv_1x1 + bilinear_upsample
aux_seg_name= "Aux_layer2"
with scope(aux_seg_name):
aux_logit = conv(dropout_out, num_classes, filter_size=1, bias_attr=True, name= aux_seg_name + '_logit_conv')
aux_logit_interp = fluid.layers.resize_bilinear(aux_logit, out_shape=output_shape, name= aux_seg_name + '_logit_interp')
return aux_logit_interp
def conv1d(x, output_channels, name_scope, bias_attr=False):
'''
x:B, C, N
reshape to 4D --> conv2d --> reshape to 3D
'''
B, C, N = x.shape
with scope(name_scope):
x = fluid.layers.reshape(x, shape=[B, C, N, 1])
if bias_attr:
x = conv(x, output_channels, filter_size=1, name=name_scope, bias_attr=bias_attr)
else:
x = conv(x, output_channels, filter_size=1, name=name_scope)
x = fluid.layers.reshape(x, shape=[B, C, N])
return x
import sys
import struct
import importlib
import paddle.fluid as fluid
import numpy as np
from paddle.fluid.proto.framework_pb2 import VarType
from src.utils import solver
from src.utils.config import cfg
from src.utils.loss import multi_softmax_with_loss, multi_dice_loss, multi_bce_loss
class ModelPhase(object):
"""
Standard name for model phase in PaddleSeg
The following standard keys are defined:
* `TRAIN`: training mode.
* `EVAL`: testing/evaluation mode.
* `PREDICT`: prediction/inference mode.
* `VISUAL` : visualization mode
"""
TRAIN = 'train'
EVAL = 'eval'
PREDICT = 'predict'
VISUAL = 'visual'
@staticmethod
def is_train(phase):
return phase == ModelPhase.TRAIN
@staticmethod
def is_predict(phase):
return phase == ModelPhase.PREDICT
@staticmethod
def is_eval(phase):
return phase == ModelPhase.EVAL
@staticmethod
def is_visual(phase):
return phase == ModelPhase.VISUAL
@staticmethod
def is_valid_phase(phase):
""" Check valid phase """
if ModelPhase.is_train(phase) or ModelPhase.is_predict(phase) \
or ModelPhase.is_eval(phase) or ModelPhase.is_visual(phase):
return True
return False
def map_model_name(model_name):
name_dict = {
"deeplabv3": "deeplabv3.deeplabv3",
"pspnet": "pspnet.pspnet",
"glore": "glore.glore",
}
if model_name in name_dict.keys():
return name_dict[model_name]
else:
raise Exception(
"unknow model name, only support unet, deeplabv3p, icnet")
def get_func(func_name):
"""Helper to return a function object by name. func_name must identify a
function in this module or the path to a function relative to the base
'modeling' module.
"""
print("func_name: ", func_name)
if func_name == '':
return None
try:
parts = func_name.split('.')
# Refers to a function in this module
if len(parts) == 1:
return globals()[parts[0]]
# Otherwise, assume we're referencing a module under modeling
module_name = 'src.models.' + '.'.join(parts[:-1])
print("module_name: ", module_name)
# method 1
#from src.models.modeling import pspnet
# method 2
module = importlib.import_module(module_name)
return getattr(module, parts[-1])
except Exception:
print('Failed to find function: {}'.format(func_name))
return module
def softmax(logit):
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.softmax(logit)
logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
return logit
def sigmoid_to_softmax(logit):
"""
one channel to two channel
"""
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.sigmoid(logit)
logit_back = 1 - logit
logit = fluid.layers.concat([logit_back, logit], axis=-1)
logit = fluid.layers.transpose(logit, [0, 3, 1, 2])
return logit
def build_model(main_prog, start_prog, phase=ModelPhase.TRAIN):
if not ModelPhase.is_valid_phase(phase):
raise ValueError("ModelPhase {} is not valid!".format(phase))
if ModelPhase.is_train(phase):
width = cfg.DATAAUG.CROP_SIZE
height = cfg.DATAAUG.CROP_SIZE
else:
width = cfg.TEST.CROP_SIZE
height = cfg.TEST.CROP_SIZE
image_shape = [cfg.DATASET.DATA_DIM, height, width]
grt_shape = [1, height, width]
class_num = cfg.DATASET.NUM_CLASSES
with fluid.program_guard(main_prog, start_prog):
with fluid.unique_name.guard():
# 在导出模型的时候,增加图像标准化预处理,减小预测部署时图像的处理流程
# 预测部署时只须对输入图像增加batch_size维度即可
if ModelPhase.is_predict(phase):
origin_image = fluid.layers.data(name='image',
shape=[ -1, 1, 1, cfg.DATASET.DATA_DIM],
dtype='float32',
append_batch_size=False)
image = fluid.layers.transpose(origin_image, [0, 3, 1, 2])
origin_shape = fluid.layers.shape(image)[-2:]
mean = np.array(cfg.MEAN).reshape(1, len(cfg.MEAN), 1, 1)
mean = fluid.layers.assign(mean.astype('float32'))
std = np.array(cfg.STD).reshape(1, len(cfg.STD), 1, 1)
std = fluid.layers.assign(std.astype('float32'))
image = (image/255 - mean)/std
image = fluid.layers.resize_bilinear(image,
out_shape=[height, width], align_corners=False, align_mode=0)
else:
image = fluid.layers.data( name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data( name='label', shape=grt_shape, dtype='int32')
mask = fluid.layers.data( name='mask', shape=grt_shape, dtype='int32')
# use PyReader when doing traning and evaluation
if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
iterable = True if ModelPhase.is_eval(phase) else False
print("iterable: ", iterable)
py_reader = fluid.io.PyReader(
feed_list=[image, label, mask],
capacity=cfg.DATALOADER.BUF_SIZE,
iterable=iterable,
use_double_buffer=True,
return_list=False)
model_name = map_model_name(cfg.MODEL.MODEL_NAME)
model_func = get_func("modeling." + model_name)
loss_type = cfg.SOLVER.LOSS
if not isinstance(loss_type, list):
loss_type = list(loss_type)
# dice_loss或bce_loss只适用两类分割中
if class_num > 2 and (("dice_loss" in loss_type) or ("bce_loss" in loss_type)):
raise Exception("dice loss and bce loss is only applicable to binary classfication")
# 在两类分割情况下,当loss函数选择dice_loss或bce_loss的时候,最后logit输出通道数设置为1
if ("dice_loss" in loss_type) or ("bce_loss" in loss_type):
class_num = 1
if "softmax_loss" in loss_type:
raise Exception("softmax loss can not combine with dice loss or bce loss")
logits = model_func(image, class_num)
# 根据选择的loss函数计算相应的损失函数
if ModelPhase.is_train(phase) or ModelPhase.is_eval(phase):
loss_valid = False
avg_loss_list = []
valid_loss = []
if "softmax_loss" in loss_type:
avg_loss_list.append(multi_softmax_with_loss(logits,
label, mask,class_num))
loss_valid = True
valid_loss.append("softmax_loss")
if "dice_loss" in loss_type:
avg_loss_list.append(multi_dice_loss(logits, label, mask))
loss_valid = True
valid_loss.append("dice_loss")
if "bce_loss" in loss_type:
avg_loss_list.append(multi_bce_loss(logits, label, mask))
loss_valid = True
valid_loss.append("bce_loss")
if not loss_valid:
raise Exception("SOLVER.LOSS: {} is set wrong. it should "
"include one of (softmax_loss, bce_loss, dice_loss) at least"
" example: ['softmax_loss'], ['dice_loss'], ['bce_loss', 'dice_loss']".format(cfg.SOLVER.LOSS))
invalid_loss = [x for x in loss_type if x not in valid_loss]
if len(invalid_loss) > 0:
print("Warning: the loss {} you set is invalid. it will not be included in loss computed.".format(invalid_loss))
avg_loss = 0
for i in range(0, len(avg_loss_list)):
avg_loss += avg_loss_list[i]
#get pred result in original size
if isinstance(logits, tuple):
logit = logits[0]
else:
logit = logits
if logit.shape[2:] != label.shape[2:]:
logit = fluid.layers.resize_bilinear(logit, label.shape[2:])
# return image input and logit output for inference graph prune
if ModelPhase.is_predict(phase):
# 两类分割中,使用dice_loss或bce_loss返回的logit为单通道,进行到两通道的变换
if class_num == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = softmax(logit)
logit = fluid.layers.resize_bilinear(logit, out_shape=origin_shape, align_corners=False, align_mode=0)
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.argmax(logit, axis=3)
return origin_image, logit
if class_num == 1:
out = sigmoid_to_softmax(logit)
out = fluid.layers.transpose(out, [0, 2, 3, 1])
else:
out = fluid.layers.transpose(logit, [0, 2, 3, 1])
pred = fluid.layers.argmax(out, axis=3)
pred = fluid.layers.unsqueeze(pred, axes=[3])
if ModelPhase.is_visual(phase):
if class_num == 1:
logit = sigmoid_to_softmax(logit)
else:
logit = softmax(logit)
return pred, logit
if ModelPhase.is_eval(phase):
out = fluid.layers.transpose(out, [0, 3, 1, 2]) #unnormalized probability
#return py_reader, avg_loss, pred, label, mask
return py_reader, avg_loss, out, label, mask
if ModelPhase.is_train(phase):
optimizer = solver.Solver(main_prog, start_prog)
decayed_lr = optimizer.optimise(avg_loss)
return py_reader, avg_loss, decayed_lr, pred, label, mask
def to_int(string, dest="I"):
return struct.unpack(dest, string)[0]
def parse_shape_from_file(filename):
with open(filename, "rb") as file:
version = file.read(4)
lod_level = to_int(file.read(8), dest="Q")
for i in range(lod_level):
_size = to_int(file.read(8), dest="Q")
_ = file.read(_size)
version = file.read(4)
tensor_desc_size = to_int(file.read(4))
tensor_desc = VarType.TensorDesc()
tensor_desc.ParseFromString(file.read(tensor_desc_size))
return tuple(tensor_desc.dims)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import paddle
import paddle.fluid as fluid
from src.utils.config import cfg
from src.models.libs.model_libs import scope, name_scope
from src.models.libs.model_libs import bn, bn_relu, relu, FCNHead
from src.models.libs.model_libs import conv
from src.models.libs.model_libs import separate_conv
from src.models.backbone.mobilenet_v2 import MobileNetV2 as mobilenet_backbone
from src.models.backbone.xception import Xception as xception_backbone
from src.models.backbone.resnet import ResNet as resnet_backbone
from src.models.backbone.hrnet import HRNet as hrnet_backbone
def ASPPHead(input, mid_channel, num_classes, output_shape):
# Arch of Atorus Spatial Pyramid Pooling Module:
#
# |----> ImagePool + Conv_1x1 + BN + ReLU + bilinear_interp-------->|————————|
# | | |
# |----> Conv_1x1 + BN + ReLU -------->| |
# | | |
# x----->|----> AtrousConv_3x3 + BN + ReLU -------->| concat |----> Conv_1x1 + BN + ReLU -->Dropout --> Conv_1x1
# | | |
# |----> AtrousConv_3x3 + BN + ReLU -------->| |
# | | |
# |----> AtorusConv_3x3 + BN + ReLU -------->|________|
#
#
if cfg.MODEL.BACKBONE_OUTPUT_STRIDE == 16:
aspp_ratios = [6, 12, 18]
elif cfg.MODEL.BACKBONE_OUTPUT_STRIDE == 8:
aspp_ratios = [12, 24, 36]
else:
raise Exception("deeplab only support stride 8 or 16")
param_attr = fluid.ParamAttr(name=name_scope + 'weights', regularizer=None,
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.06))
with scope('ASPPHead'):
with scope("image_pool"):
image_avg = fluid.layers.reduce_mean( input, [2, 3], keep_dim=True)
image_avg = bn_relu( conv( image_avg, mid_channel, 1, 1, groups=1, padding=0, param_attr=param_attr))
image_avg = fluid.layers.resize_bilinear(image_avg, input.shape[2:])
with scope("aspp0"):
aspp0 = bn_relu( conv( input, mid_channel, 1, 1, groups=1, padding=0, param_attr=param_attr))
with scope("aspp1"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp1 = separate_conv( input, mid_channel, 1, 3, dilation=aspp_ratios[0], act=relu)
else:
aspp1 = bn_relu( conv( input, mid_channel, stride=1, filter_size=3, dilation=aspp_ratios[0],
padding=aspp_ratios[0], param_attr=param_attr))
with scope("aspp2"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp2 = separate_conv( input, mid_channel, 1, 3, dilation=aspp_ratios[1], act=relu)
else:
aspp2 = bn_relu( conv( input, mid_channel, stride=1, filter_size=3, dilation=aspp_ratios[1],
padding=aspp_ratios[1], param_attr=param_attr))
with scope("aspp3"):
if cfg.MODEL.DEEPLAB.ASPP_WITH_SEP_CONV:
aspp3 = separate_conv( input, mid_channel, 1, 3, dilation=aspp_ratios[2], act=relu)
else:
aspp3 = bn_relu( conv( input, mid_channel, stride=1, filter_size=3, dilation=aspp_ratios[2],
padding=aspp_ratios[2], param_attr=param_attr))
with scope("concat"):
feat = fluid.layers.concat([image_avg, aspp0, aspp1, aspp2, aspp3], axis=1)
feat = bn_relu( conv( feat, 2*mid_channel, 1, 1, groups=1, padding=0, param_attr=param_attr))
feat = fluid.layers.dropout(feat, 0.1)
# Conv_1x1 + bilinear_upsample
seg_name = "logit"
with scope(seg_name):
param_attr = fluid.ParamAttr( name= seg_name+'_weights',
regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
logit = conv(feat, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=seg_name+'_conv')
logit_interp = fluid.layers.resize_bilinear(logit, out_shape=output_shape, name=seg_name+'_interp')
return logit_interp
def mobilenetv2(input):
# Backbone: mobilenetv2结构配置
# DEPTH_MULTIPLIER: mobilenetv2的scale设置,默认1.0
# OUTPUT_STRIDE:下采样倍数
# end_points: mobilenetv2的block数
# decode_point: 从mobilenetv2中引出分支所在block数, 作为decoder输入
scale = cfg.MODEL.DEEPLABv3.DEPTH_MULTIPLIER
output_stride = cfg.MODEL.DEEPLABv3.OUTPUT_STRIDE
model = mobilenet_backbone(scale=scale, output_stride=output_stride)
end_points = 18
decode_point = 4
data, decode_shortcuts = model.net(
input, end_points=end_points, decode_points=decode_point)
decode_shortcut = decode_shortcuts[decode_point]
return data, decode_shortcut
def xception(input):
# Backbone: Xception结构配置, xception_65, xception_41, xception_71三种可选
# decode_point: 从Xception中引出分支所在block数,作为decoder输入
# end_point:Xception的block数
cfg.MODEL.DEFAULT_EPSILON = 1e-3
model = xception_backbone(cfg.MODEL.BACKBONE)
backbone = cfg.MODEL.BACKBONE
output_stride = cfg.MODEL.DEEPLABv3.OUTPUT_STRIDE
if '65' in backbone:
decode_point = 2
end_points = 21
if '41' in backbone:
decode_point = 2
end_points = 13
if '71' in backbone:
decode_point = 3
end_points = 23
data, decode_shortcuts = model.net(
input,
output_stride=output_stride,
end_points=end_points,
decode_points=decode_point)
decode_shortcut = decode_shortcuts[decode_point]
return data, decode_shortcut
def resnet(input):
# dilation_dict:
# key: stage num
# value: dilation factor
scale = cfg.MODEL.DEEPLABv3.DEPTH_MULTIPLIER
layers = cfg.MODEL.BACKBONE_LAYERS
end_points = layers - 1
decode_points = [91,100 ] # [10, 22, 91, 100], for obtaining feature maps of res2,res3, res4, and res5
dilation_dict = {2:2, 3:4}
model = resnet_backbone(layers, scale)
res5, feat_dict = model.net(input,
end_points=end_points,
dilation_dict=dilation_dict,
decode_points=decode_points)
return res5, feat_dict
def hrnet(input):
model = hrnet_backbone(stride=4, seg_flag=True)
feats = model.net(input)
return feats
def deeplabv3(input, num_classes):
"""
Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation", in arXiv:1706:05587
"""
if 'xception' in cfg.MODEL.BACKBONE:
data, decode_shortcut = xception(input)
elif 'mobilenet' in cfg.MODEL.BACKBONE:
data, decode_shortcut = mobilenetv2(input)
elif 'resnet' in cfg.MODEL.BACKBONE:
res5, feat_dict = resnet(input)
res4 = feat_dict[91]
elif 'hrnet' in cfg.MODEL.BACKBONE:
res5 = hrnet(input)
else:
raise Exception("deeplabv3 only support xception, mobilenet, resnet, and hrnet backbone")
logit = ASPPHead(res5, mid_channel= 256, num_classes= num_classes, output_shape= input.shape[2:])
if cfg.MODEL.DEEPLABv3.AuxHead:
aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
return logit, aux_logit
return logit
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from src.models.libs.model_libs import scope, name_scope
from src.models.libs.model_libs import avg_pool, conv, bn, bn_zero, conv1d, FCNHead
from src.models.backbone.resnet import ResNet as resnet_backbone
from src.utils.config import cfg
def get_logit_interp(input, num_classes, out_shape, name="logit"):
# 1x1_Conv
param_attr = fluid.ParamAttr(
name= name + 'weights',
regularizer= fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
initializer= fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope(name):
logit = conv(input, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=name+'_conv')
logit_interp = fluid.layers.resize_bilinear( logit, out_shape=out_shape, name=name+'_interp')
return logit_interp
def gcn_module(name_scope, x, num_node, num_state):
'''
input: any tensor of 3D, B,C,N
'''
print(x.shape)
h = fluid.layers.transpose(x, perm=[0, 2, 1]) #B,C,N-->B,N,C
h = conv1d(h, num_node, name_scope+'_conv1d1', bias_attr=True)
h = fluid.layers.transpose(h, perm=[0, 2, 1]) #B,C,N
h = fluid.layers.elementwise_add(h, x, act='relu')
h = conv1d(h, num_state, name_scope+'_conv1d2', bias_attr= False)
return h
def gru_module(x, num_state, num_node):
'''
Global Reasoning Unit: projection --> graph reasoning --> reverse projection
params:
x: B x C x H x W
num_state: the dimension of each vertex feature
num_node: the number of vertet
output: B x C x H x W
feature trans:
B, C, H, W --> B, N, H, W --> B, N, H*W -->B, N, C1 -->B, C1, N-->B, C1, N-->B, C1, H*W-->B, C, H, W
--> B, C1,H, W -->B, C1,H*W -->B, H*W, C1
'''
# generate B
num_batch, C, H, W = x.shape
with scope('projection'):
B = conv(x, num_node,
filter_size=1,
bias_attr=True,
name='projection'+'_conv') #num_batch, node, H, W
B = fluid.layers.reshape(B, shape=[num_batch, num_node, H*W]) # Projection Matrix: num_batch, node, L=H*W
# reduce dimension
with scope('reduce_channel'):
x_reduce = conv(x, num_state,
filter_size=1,
bias_attr=True,
name='reduce_channel'+'_conv') #num_batch, num_state, H, W
x_reduce = fluid.layers.reshape(x_reduce, shape=[num_batch, num_state, H*W]) #num_batch, num_state, L
x_reduce = fluid.layers.transpose(x_reduce, perm=[0, 2, 1]) #num_batch, L, num_state
V = fluid.layers.transpose(fluid.layers.matmul(B, x_reduce), perm=[0,2,1]) #num_batch, num_state, num_node
#L = fluid.layers.fill_constant(shape=[1], value=H*W, dtype='float32')
#V = fluid.layers.elementwise_div(V, L)
new_V = gcn_module('gru'+'_gcn', V, num_node, num_state)
B = fluid.layers.reshape(B, shape= [num_batch, num_node, H*W])
D = fluid.layers.transpose(B, perm=[0, 2, 1])
Y = fluid.layers.matmul(D, fluid.layers.transpose(new_V, perm=[0, 2, 1]))
Y = fluid.layers.transpose(Y, perm=[0, 2, 1])
Y = fluid.layers.reshape(Y, shape=[num_batch, num_state, H, W])
with scope('extend_dim'):
Y = conv(Y, C, filter_size=1, bias_attr=False, name='extend_dim'+'_conv')
#Y = bn_zero(Y)
Y = bn(Y)
out = fluid.layers.elementwise_add(Y, x)
return out
def resnet(input):
# end_points: end_layer of resnet backbone
# dilation_dict: dilation factor for stages_key
scale = cfg.MODEL.GLORE.DEPTH_MULTIPLIER
layers = cfg.MODEL.BACKBONE_LAYERS
end_points = layers - 1
dilation_dict = {2:2, 3:4}
decode_points= [91, 100]
model = resnet_backbone(layers, scale)
res5, feat_dict = model.net(input,
end_points=end_points,
dilation_dict=dilation_dict,
decode_points= decode_points)
return res5, feat_dict
def glore(input, num_classes):
"""
Reference:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks", In CVPR 2019
"""
# Backbone: ResNet
res5, feat_dict = resnet(input)
res4= feat_dict[91]
# 3x3 Conv. 2048 -> 512
reduce_kernel=3
if cfg.DATASET.DATASET_NAME=='cityscapes':
reduce_kernel=1
with scope('feature'):
feature = conv(res5, 512, filter_size=reduce_kernel, bias_attr=False, name='feature_conv')
feature = bn(feature, act='relu')
# GRU Module
gru_output = gru_module(feature, num_state= 128, num_node = 64)
dropout = fluid.layers.dropout(gru_output, dropout_prob=0.1, name="dropout")
logit = get_logit_interp(dropout, num_classes, input.shape[2:])
if cfg.MODEL.GLORE.AuxHead:
aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
return logit, aux_logit
return logit
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from src.models.libs.model_libs import scope, name_scope
from src.models.libs.model_libs import avg_pool, conv, bn, conv1d, FCNHead
from src.models.backbone.resnet import ResNet as resnet_backbone
from src.utils.config import cfg
def get_logit_interp(input, num_classes, out_shape, name="logit"):
# 根据类别数决定最后一层卷积输出, 并插值回原始尺寸
param_attr = fluid.ParamAttr(
name= name + 'weights',
regularizer= fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
initializer= fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope(name):
logit = conv(input, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=name+'_conv')
logit_interp = fluid.layers.resize_bilinear( logit, out_shape=out_shape, name=name+'_interp')
return logit_interp
def gcn_module(name_scope, x, num_node, num_state):
'''
input: any tensor of 3D, B,C,N
'''
h = fluid.layers.transpose(x, perm=[0, 2, 1]) #B,C,N-->B,N,C
h = conv1d(h, num_node, name_scope+'_conv1d1')
h = fluid.layers.transpose(h, perm=[0, 2, 1]) #B,C,N
h = fluid.layers.elementwise_add(h, x, act='relu')
h = conv1d(h, num_state, name_scope+'_conv1d2')
return h
def gru_module(x, num_state, num_node):
'''
Global Reasoning Unit: projection --> graph reasoning --> reverse projection
params:
x: B x C x H x W
num_state: the dimension of each vertex feature
num_node: the number of vertet
output: B x C x H x W
feature trans:
B, C, H, W --> B, N, H, W --> B, N, H*W -->B, N, C1 -->B, C1, N-->B, C1, N-->B, C1, H*W-->B, C, H, W
--> B, C1,H, W -->B, C1,H*W -->B, H*W, C1
'''
num_batch, C, H, W = x.shape
with scope('projection'):
B = conv(x, num_node,
filter_size=1,
bias_attr=True,
name='projection'+'_conv') #num_batch, node, H, W
B = fluid.layers.reshape(B, shape=[num_batch, num_node, H*W]) #num_batch, node, L=H*W
with scope('reduce_channel'):
x_reduce = conv(x, num_state,
filter_size=1,
bias_attr=True,
name='reduce_channel'+'_conv') #num_batch, num_state, H, W
x_reduce = fluid.layers.reshape(x_reduce, shape=[num_batch, num_state, H*W]) #num_batch, num_state, L
x_reduce = fluid.layers.transpose(x_reduce, perm=[0, 2, 1]) #num_batch, L, num_state
V = fluid.layers.transpose(fluid.layers.matmul(B, x_reduce), perm=[0,2,1]) #num_batch, num_state, num_node
new_V = gcn_module('gru'+'_gcn', V, num_node, num_state)
D = fluid.layers.transpose(B, perm=[0, 2, 1])
Y = fluid.layers.matmul(D, fluid.layers.transpose(new_V, perm=[0, 2, 1]))
Y = fluid.layers.transpose(Y, perm=[0, 2, 1])
Y = fluid.layers.reshape(Y, shape=[num_batch, num_state, H, W])
with scope('extand_dim'):
Y = conv(Y, C, filter_size=1, bias_attr=True, name='extend_dim'+'_conv')
Y = bn(Y)
out = fluid.layers.elementwise_add(Y, x)
return out
def resnet(input):
# PSPNET backbone: resnet, 默认resnet50
# end_points: resnet终止层数
# dilation_dict: resnet block数及对应的膨胀卷积尺度
scale = cfg.MODEL.GLORE.DEPTH_MULTIPLIER
layers = cfg.MODEL.BACKBONE_LAYERS
end_points = layers - 1
dilation_dict = {2:2, 3:4}
decode_points= [91, 100]
model = resnet_backbone(layers, scale)
res5, feat_dict = model.net(input,
end_points=end_points,
dilation_dict=dilation_dict,
decode_points= decode_points)
return res5, feat_dict
def glore(input, num_classes):
"""
Reference:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks", In CVPR 2019
"""
# Backbone: ResNet
res5, feat_dict = resnet(input)
res4= feat_dict[91]
# Conv_1x1 for reduce dimension
with scope('feature'):
feature = conv(res5, 512, filter_size=1, bias_attr=True, name='feature_conv')
feature = bn(feature, act='relu')
# GRU Module
gru_output = gru_module(feature, 128, 64)
dropout = fluid.layers.dropout(gru_output, dropout_prob=0.1, name="dropout")
# 根据类别数决定最后一层卷积输出, 并插值回原始尺寸
logit = get_logit_interp(dropout, num_classes, input.shape[2:])
if cfg.MODEL.GLORE.AuxHead:
aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
return logit, aux_logit
return logit
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from src.models.libs.model_libs import scope, name_scope
from src.models.libs.model_libs import avg_pool, conv, bn, bn_zero, conv1d, FCNHead
from src.models.backbone.resnet import ResNet as resnet_backbone
from src.utils.config import cfg
def get_logit_interp(input, num_classes, out_shape, name="logit"):
# 1x1_Conv
param_attr = fluid.ParamAttr(
name= name + 'weights',
regularizer= fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
initializer= fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
with scope(name):
logit = conv(input, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=name+'_conv')
logit_interp = fluid.layers.resize_bilinear( logit, out_shape=out_shape, name=name+'_interp')
return logit_interp
def gcn_module(name_scope, x, num_node, num_state):
'''
input: any tensor of 3D, B,C,N
'''
print(x.shape)
h = fluid.layers.transpose(x, perm=[0, 2, 1]) #B,C,N-->B,N,C
h = conv1d(h, num_node, name_scope+'_conv1d1')
h = fluid.layers.transpose(h, perm=[0, 2, 1]) #B,C,N
h = fluid.layers.elementwise_add(h, x, act='relu')
h = conv1d(h, num_state, name_scope+'_conv1d2')
return h
def gru_module(x, num_state, num_node):
'''
Global Reasoning Unit: projection --> graph reasoning --> reverse projection
params:
x: B x C x H x W
num_state: the dimension of each vertex feature
num_node: the number of vertet
output: B x C x H x W
feature trans:
B, C, H, W --> B, N, H, W --> B, N, H*W -->B, N, C1 -->B, C1, N-->B, C1, N-->B, C1, H*W-->B, C, H, W
--> B, C1,H, W -->B, C1,H*W -->B, H*W, C1
'''
num_batch, C, H, W = x.shape
with scope('projection'):
B = conv(x, num_node,
filter_size=1,
bias_attr=True,
name='projection'+'_conv') #num_batch, node, H, W
B = fluid.layers.reshape(B, shape=[num_batch, num_node, H*W]) #num_batch, node, L=H*W
with scope('reduce_channel'):
x_reduce = conv(x, num_state,
filter_size=1,
bias_attr=True,
name='reduce_channel'+'_conv') #num_batch, num_state, H, W
x_reduce = fluid.layers.reshape(x_reduce, shape=[num_batch, num_state, H*W]) #num_batch, num_state, L
x_reduce = fluid.layers.transpose(x_reduce, perm=[0, 2, 1]) #num_batch, L, num_state
V = fluid.layers.transpose(fluid.layers.matmul(B, x_reduce), perm=[0,2,1]) #num_batch, num_state, num_node
L = fluid.layers.fill_constant(shape=[1], value=H*W, dtype='float32')
V = fluid.layers.elementwise_div(V, L)
new_V = gcn_module('gru'+'_gcn', V, num_node, num_state)
D = fluid.layers.transpose(B, perm=[0, 2, 1])
Y = fluid.layers.matmul(D, fluid.layers.transpose(new_V, perm=[0, 2, 1]))
Y = fluid.layers.transpose(Y, perm=[0, 2, 1])
Y = fluid.layers.reshape(Y, shape=[num_batch, num_state, H, W])
with scope('extend_dim'):
Y = conv(Y, C, filter_size=1, bias_attr=True, name='extend_dim'+'_conv')
#Y = bn_zero(Y)
Y = bn(Y)
out = fluid.layers.elementwise_add(Y, x)
return out
def resnet(input):
# end_points: end_layer of resnet backbone
# dilation_dict: dilation factor for stages_key
scale = cfg.MODEL.GLORE.DEPTH_MULTIPLIER
layers = cfg.MODEL.BACKBONE_LAYERS
end_points = layers - 1
dilation_dict = {2:2, 3:4}
decode_points= [91, 100]
model = resnet_backbone(layers, scale)
res5, feat_dict = model.net(input,
end_points=end_points,
dilation_dict=dilation_dict,
decode_points= decode_points)
return res5, feat_dict
def glore(input, num_classes):
"""
Reference:
Chen, Yunpeng, et al. "Graph-Based Global Reasoning Networks", In CVPR 2019
"""
# Backbone: ResNet
res5, feat_dict = resnet(input)
res4= feat_dict[91]
# Conv_1x1 for reduce dimension
with scope('feature'):
feature = conv(res5, 512, filter_size=3, bias_attr=True, name='feature_conv')
feature = bn(feature, act='relu')
# GRU Module
gru_output = gru_module(feature, 128, 64)
dropout = fluid.layers.dropout(gru_output, dropout_prob=0.1, name="dropout")
logit = get_logit_interp(dropout, num_classes, input.shape[2:])
if cfg.MODEL.GLORE.AuxHead:
aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
return logit, aux_logit
return logit
from __future__ import division
from __future__ import print_function
import sys
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from src.models.libs.model_libs import scope, name_scope
from src.models.libs.model_libs import avg_pool, conv, bn, FCNHead
from src.models.backbone.resnet import ResNet as resnet_backbone
from src.models.backbone.hrnet import HRNet as hrnet_backbone
from src.utils.config import cfg
def PSPHead(input, out_features, num_classes, output_shape):
# Arch of Pyramid Scene Parsing Module:
#
# |----> Pool_1x1 + Conv_1x1 + BN + ReLU + bilinear_interp-------->|————————|
# | | |
# |----> Pool_2x2 + Conv_1x1 + BN + ReLU + bilinear_interp-------->| |
# x ------>| | concat |----> Conv_3x3 + BN + ReLU -->Dropout --> Conv_1x1
# | |----> Pool_3x3 + Conv_1x1 + BN + ReLU + bilinear_interp-------->| |
# | | | |
# | |----> Pool_6x6 + Conv_1x1 + BN + ReLU + bilinear_interp-------->|________|
# | ^
# |——————————————————————————————————————————————————————————————————————————————|
#
cat_layers = []
sizes = (1,2,3,6)
# 4 parallel pooling branches
for size in sizes:
psp_name = "psp" + str(size)
with scope(psp_name):
pool_feat = fluid.layers.adaptive_pool2d(input, pool_size=[size, size], pool_type='avg',
name=psp_name+'_adapool')
conv_feat = conv(pool_feat, out_features, filter_size=1, bias_attr=True,
name= psp_name + '_conv')
bn_feat = bn(conv_feat, act='relu')
interp = fluid.layers.resize_bilinear(bn_feat, out_shape=input.shape[2:], name=psp_name+'_interp')
cat_layers.append(interp)
cat_layers = [input] + cat_layers[::-1]
cat = fluid.layers.concat(cat_layers, axis=1, name='psp_cat')
# Conv_3x3 + BN + ReLU
psp_end_name = "psp_end"
with scope(psp_end_name):
data = conv(cat, out_features, filter_size=3, padding=1, bias_attr=True, name=psp_end_name)
out = bn(data, act='relu')
# Dropout
dropout_out = fluid.layers.dropout(out, dropout_prob=0.1, name="dropout")
# Conv_1x1 + bilinear_upsample
seg_name = "logit"
with scope(seg_name):
param_attr = fluid.ParamAttr( name= seg_name+'_weights',
regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.0),
initializer=fluid.initializer.TruncatedNormal(loc=0.0, scale=0.01))
logit = conv(dropout_out, num_classes, filter_size=1, param_attr=param_attr, bias_attr=True, name=seg_name+'_conv')
logit_interp = fluid.layers.resize_bilinear(logit, out_shape=output_shape, name=seg_name+'_interp')
return logit_interp
def resnet(input):
# dilation_dict:
# key: stage num
# value: dilation factor
scale = cfg.MODEL.PSPNET.DEPTH_MULTIPLIER
layers = cfg.MODEL.BACKBONE_LAYERS
end_points = layers - 1
decode_points = [91, 100] # [10, 22, 91, 100], for obtaining feature maps of res2,res3, res4, and res5
dilation_dict = {2:2, 3:4}
model = resnet_backbone(layers, scale)
res5, feat_dict = model.net(input,
end_points=end_points,
dilation_dict=dilation_dict,
decode_points=decode_points)
return res5, feat_dict
def hrnet(input):
model = hrnet_backbone(stride=4, seg_flag=True)
feats = model.net(input)
return feats
def pspnet(input, num_classes):
"""
Reference:
Zhao, Hengshuang, et al. "Pyramid scene parsing network.", In CVPR 2017
"""
if 'resnet' in cfg.MODEL.BACKBONE:
res5, feat_dict = resnet(input)
res4 = feat_dict[91]
elif 'hrnet' in cfg.MODEL.BACKBONE:
res5 = hrnet(input)
else:
raise Exception("pspnet only support resnet and hrnet backbone")
logit = PSPHead(res5, 512, num_classes, input.shape[2:])
if cfg.MODEL.PSPNET.AuxHead:
aux_logit = FCNHead(res4, 256, num_classes, input.shape[2:])
return logit, aux_logit
return logit
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""A simple attribute dictionary used for representing configuration options."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import copy
import codecs
from ast import literal_eval
import yaml
import six
class SegConfig(dict):
def __init__(self, *args, **kwargs):
super(SegConfig, self).__init__(*args, **kwargs)
self.immutable = False
def __setattr__(self, key, value, create_if_not_exist=True):
if key in ["immutable"]:
self.__dict__[key] = value
return
t = self
keylist = key.split(".")
for k in keylist[:-1]:
t = t.__getattr__(k, create_if_not_exist)
t.__getattr__(keylist[-1], create_if_not_exist)
t[keylist[-1]] = value
def __getattr__(self, key, create_if_not_exist=True):
if key in ["immutable"]:
return self.__dict__[key]
if not key in self:
if not create_if_not_exist:
raise KeyError
self[key] = SegConfig()
return self[key]
def __setitem__(self, key, value):
#
if self.immutable:
raise AttributeError(
'Attempted to set "{}" to "{}", but SegConfig is immutable'.
format(key, value))
#
if isinstance(value, six.string_types):
try:
value = literal_eval(value)
except ValueError:
pass
except SyntaxError:
pass
super(SegConfig, self).__setitem__(key, value)
def update_from_segconfig(self, other):
if isinstance(other, dict):
other = SegConfig(other)
assert isinstance(other, SegConfig)
diclist = [("", other)]
while len(diclist):
prefix, tdic = diclist[0]
diclist = diclist[1:]
for key, value in tdic.items():
key = "{}.{}".format(prefix, key) if prefix else key
if isinstance(value, dict):
diclist.append((key, value))
continue
try:
self.__setattr__(key, value, create_if_not_exist=False)
except KeyError:
raise KeyError('Non-existent config key: {}'.format(key))
def check_and_infer(self):
if self.DATASET.IMAGE_TYPE in ['rgb', 'gray']:
self.DATASET.DATA_DIM = 3
elif self.DATASET.IMAGE_TYPE in ['rgba']:
self.DATASET.DATA_DIM = 4
else:
raise KeyError(
'DATASET.IMAGE_TYPE config error, only support `rgb`, `gray` and `rgba`'
)
if self.MEAN is not None:
self.DATASET.PADDING_VALUE = [x*255.0 for x in self.MEAN]
"""
if not self.TRAIN_CROP_SIZE:
raise ValueError(
'TRAIN_CROP_SIZE is empty! Please set a pair of values in format (width, height)'
)
if not self.EVAL_CROP_SIZE:
raise ValueError(
'EVAL_CROP_SIZE is empty! Please set a pair of values in format (width, height)'
)
"""
# Ensure file list is use UTF-8 encoding
train_sets = codecs.open(self.DATASET.TRAIN_FILE_LIST, 'r', 'utf-8').readlines()
val_sets = codecs.open(self.DATASET.VAL_FILE_LIST, 'r', 'utf-8').readlines()
test_sets = codecs.open(self.DATASET.TEST_FILE_LIST, 'r', 'utf-8').readlines()
self.DATASET.TRAIN_TOTAL_IMAGES = len(train_sets)
self.DATASET.VAL_TOTAL_IMAGES = len(val_sets)
self.DATASET.TEST_TOTAL_IMAGES = len(test_sets)
if self.MODEL.MODEL_NAME == 'icnet' and \
len(self.MODEL.MULTI_LOSS_WEIGHT) != 3:
self.MODEL.MULTI_LOSS_WEIGHT = [1.0, 0.4, 0.16]
def update_from_list(self, config_list):
if len(config_list) % 2 != 0:
raise ValueError(
"Command line options config format error! Please check it: {}".
format(config_list))
for key, value in zip(config_list[0::2], config_list[1::2]):
try:
self.__setattr__(key, value, create_if_not_exist=False)
except KeyError:
raise KeyError('Non-existent config key: {}'.format(key))
def update_from_file(self, config_file):
with codecs.open(config_file, 'r', 'utf-8') as file:
dic = yaml.load(file, Loader=yaml.FullLoader)
self.update_from_segconfig(dic)
def set_immutable(self, immutable):
self.immutable = immutable
for value in self.values():
if isinstance(value, SegConfig):
value.set_immutable(immutable)
def is_immutable(self):
return self.immutable
# -*- coding: utf-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
from __future__ import unicode_literals
from .collect import SegConfig
import numpy as np
cfg = SegConfig()
########################## 基本配置 ###########################################
# 均值,图像预处理减去的均值
#cfg.MEAN = [0.5, 0.5, 0.5]
cfg.MEAN = [0.485, 0.456, 0.406]
# 标准差,图像预处理除以标准差·
cfg.STD = [0.229, 0.224, 0.225]
# 批处理大小
cfg.TRAIN_BATCH_SIZE_PER_GPU = 2
cfg.TRAIN_BATCH_SIZE= 8
cfg.EVAL_BATCH_SIZE= 8
# 多进程训练总进程数
cfg.NUM_TRAINERS = 1
# 多进程训练进程ID
cfg.TRAINER_ID = 0
########################## 数据载入配置 #######################################
# 数据载入时的并发数, 建议值8
cfg.DATALOADER.NUM_WORKERS = 8
# 数据载入时缓存队列大小, 建议值256
cfg.DATALOADER.BUF_SIZE = 256
########################## 数据集配置 #########################################
cfg.DATASET.DATASET_NAME = 'cityscapes'
# 数据主目录目录
cfg.DATASET.DATA_DIR = './data_local/cityscapes/'
# 训练集列表
cfg.DATASET.TRAIN_FILE_LIST = './data_local/cityscapes/train.list'
# 训练集数量
cfg.DATASET.TRAIN_TOTAL_IMAGES = 5
# 验证集列表
cfg.DATASET.VAL_FILE_LIST = './data_local/cityscapes/val.list'
# 验证数据数量
cfg.DATASET.VAL_TOTAL_IMAGES = 50
# 测试数据列表
cfg.DATASET.TEST_FILE_LIST = './data_local/cityscapes/test.list'
# 测试数据数量
cfg.DATASET.TEST_TOTAL_IMAGES = 1525
# Tensorboard 可视化的数据集
cfg.DATASET.VIS_FILE_LIST = None
# 类别数(需包括背景类)
cfg.DATASET.NUM_CLASSES = 19
# 输入图像类型, 支持三通道'rgb',四通道'rgba',单通道灰度图'gray'
cfg.DATASET.IMAGE_TYPE = 'rgb'
# 输入图片的通道数
cfg.DATASET.DATA_DIM = 3
# 数据列表分割符, 默认为空格
cfg.DATASET.SEPARATOR = '\t'
# 忽略的像素标签值, 默认为255,一般无需改动
cfg.DATASET.IGNORE_INDEX = 255
# 数据增强是图像的padding值
cfg.DATASET.PADDING_VALUE = [127.5, 127.5, 127.5]
########################### 数据增强配置 ######################################
cfg.DATAAUG.EXTRA = True
cfg.DATAAUG.BASE_SIZE = 1024
cfg.DATAAUG.CROP_SIZE = 769
cfg.DATAAUG.RAND_SCALE_MIN = 0.75
cfg.DATAAUG.RAND_SCALE_MAX = 2.0
########################### 训练配置 ##########################################
# 模型保存路径
cfg.TRAIN.MODEL_SAVE_DIR = ''
# 预训练模型路径
cfg.TRAIN.PRETRAINED_MODEL_DIR = ''
# 是否resume,继续训练
cfg.TRAIN.RESUME_MODEL_DIR = ''
# 是否使用多卡间同步BatchNorm均值和方差
cfg.TRAIN.SYNC_BATCH_NORM = True
# 模型参数保存的epoch间隔数,可用来继续训练中断的模型
cfg.TRAIN.SNAPSHOT_EPOCH = 10
########################### 模型优化相关配置 ##################################
# 初始学习率
cfg.SOLVER.LR = 0.001
# 学习率下降方法, 支持poly piecewise cosine 三种
cfg.SOLVER.LR_POLICY = "poly"
# 优化算法, 支持SGD和Adam两种算法
cfg.SOLVER.OPTIMIZER = "sgd"
# 动量参数
cfg.SOLVER.MOMENTUM = 0.9
# 二阶矩估计的指数衰减率
cfg.SOLVER.MOMENTUM2 = 0.999
# 学习率Poly下降指数
cfg.SOLVER.POWER = 0.9
# step下降指数
cfg.SOLVER.GAMMA = 0.1
# step下降间隔
cfg.SOLVER.DECAY_EPOCH = [10, 20]
# 学习率权重衰减,0-1
#cfg.SOLVER.WEIGHT_DECAY = 0.0001
cfg.SOLVER.WEIGHT_DECAY = 0.00004
# 训练开始epoch数,默认为1
cfg.SOLVER.BEGIN_EPOCH = 1
# 训练epoch数,正整数
cfg.SOLVER.NUM_EPOCHS = 30
# loss的选择,支持softmax_loss, bce_loss, dice_loss
cfg.SOLVER.LOSS = ["softmax_loss"]
# 是否开启warmup学习策略
cfg.SOLVER.LR_WARMUP = False
# warmup的迭代次数
cfg.SOLVER.LR_WARMUP_STEPS = 2000
########################## 测试配置 ###########################################
# 测试模型路径
cfg.TEST.TEST_MODEL = ''
cfg.TEST.BASE_SIZE = 2048
cfg.TEST.CROP_SIZE = 769
cfg.TEST.SLIDE_WINDOW = True
########################## 模型通用配置 #######################################
# 模型名称, 支持pspnet, deeplabv3, glore, ginet
cfg.MODEL.MODEL_NAME = ''
# BatchNorm类型: bn、gn(group_norm)
cfg.MODEL.DEFAULT_NORM_TYPE = 'bn'
# 多路损失加权值
cfg.MODEL.MULTI_LOSS_WEIGHT = [1.0, 0.4]
# DEFAULT_NORM_TYPE为gn时group数
cfg.MODEL.DEFAULT_GROUP_NUMBER = 32
# 极小值, 防止分母除0溢出,一般无需改动
cfg.MODEL.DEFAULT_EPSILON = 1e-5
# BatchNorm动量, 一般无需改动
cfg.MODEL.BN_MOMENTUM = 0.99
# 是否使用FP16训练
cfg.MODEL.FP16 = False
# 混合精度训练需对LOSS进行scale, 默认为动态scale,静态scale可以设置为512.0
cfg.MODEL.SCALE_LOSS = "DYNAMIC"
# backbone network, (resnet, hrnet, xception_65, mobilenetv2)
cfg.MODEL.BACKBONE= "resnet"
# backbone_layer: 101 and 50 for resnet
cfg.MODEL.BACKBONE_LAYERS=101
# strides= input.size / feature_maps.size
cfg.MODEL.BACKBONE_OUTPUT_STRIDE=8
cfg.MODEL.BACKBONE_MULTI_GRID = False
########################## PSPNET模型配置 ######################################
# RESNET backbone scale 设置
cfg.MODEL.PSPNET.DEPTH_MULTIPLIER = 1
# Aux loss
cfg.MODEL.PSPNET.AuxHead= True
########################## GloRe模型配置 ######################################
# RESNET backbone scale 设置
cfg.MODEL.GLORE.DEPTH_MULTIPLIER = 1
# Aux loss
cfg.MODEL.GLORE.AuxHead= True
########################## DeepLabv3模型配置 ####################################
# MobileNet v2 backbone scale 设置
cfg.MODEL.DEEPLABv3.DEPTH_MULTIPLIER = 1.0
# ASPP是否使用可分离卷积
cfg.MODEL.DEEPLABv3.ASPP_WITH_SEP_CONV = True
cfg.MODEL.DEEPLABv3.AuxHead= True
########################## HRNET模型配置 ######################################
# HRNET STAGE2 设置
cfg.MODEL.HRNET.STAGE2.NUM_MODULES = 1
cfg.MODEL.HRNET.STAGE2.NUM_CHANNELS = [40, 80]
# HRNET STAGE3 设置
cfg.MODEL.HRNET.STAGE3.NUM_MODULES = 4
cfg.MODEL.HRNET.STAGE3.NUM_CHANNELS = [40, 80, 160]
# HRNET STAGE4 设置
cfg.MODEL.HRNET.STAGE4.NUM_MODULES = 3
cfg.MODEL.HRNET.STAGE4.NUM_CHANNELS = [40, 80, 160, 320]
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import paddle.fluid as fluid
def nccl2_prepare(args, startup_prog, main_prog):
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
t = fluid.DistributeTranspiler(config=config)
envs = args.dist_env
t.transpile(
envs["trainer_id"],
trainers=','.join(envs["trainer_endpoints"]),
current_endpoint=envs["current_endpoint"],
startup_program=startup_prog,
program=main_prog)
def pserver_prepare(args, train_prog, startup_prog):
config = fluid.DistributeTranspilerConfig()
config.slice_var_up = args.split_var
t = fluid.DistributeTranspiler(config=config)
envs = args.dist_env
training_role = envs["training_role"]
t.transpile(
envs["trainer_id"],
program=train_prog,
pservers=envs["pserver_endpoints"],
trainers=envs["num_trainers"],
sync_mode=not args.async_mode,
startup_program=startup_prog)
if training_role == "PSERVER":
pserver_program = t.get_pserver_program(envs["current_endpoint"])
pserver_startup_program = t.get_startup_program(
envs["current_endpoint"],
pserver_program,
startup_program=startup_prog)
return pserver_program, pserver_startup_program
elif training_role == "TRAINER":
train_program = t.get_trainer_program()
return train_program, startup_prog
else:
raise ValueError(
'PADDLE_TRAINING_ROLE environment variable must be either TRAINER or PSERVER'
)
def nccl2_prepare_paddle(trainer_id, startup_prog, main_prog):
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
t = fluid.DistributeTranspiler(config=config)
t.transpile(
trainer_id,
trainers=os.environ.get('PADDLE_TRAINER_ENDPOINTS'),
current_endpoint=os.environ.get('PADDLE_CURRENT_ENDPOINT'),
startup_program=startup_prog,
program=main_prog)
def prepare_for_multi_process(exe, build_strategy, train_prog):
# prepare for multi-process
trainer_id = int(os.environ.get('PADDLE_TRAINER_ID', 0))
num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
if num_trainers < 2: return
build_strategy.num_trainers = num_trainers
build_strategy.trainer_id = trainer_id
# NOTE(zcd): use multi processes to train the model,
# and each process use one GPU card.
startup_prog = fluid.Program()
nccl2_prepare_paddle(trainer_id, startup_prog, train_prog)
# the startup_prog are run two times, but it doesn't matter.
exe.run(startup_prog)
import os
from paddle import fluid
def load_fp16_vars(executor, dirname, program):
load_dirname = os.path.normpath(dirname)
def _if_exist(var):
name = var.name[:-7] if var.name.endswith('.master') else var.name
b = os.path.exists(os.path.join(load_dirname, name))
if not b and isinstance(var, fluid.framework.Parameter):
print("===== {} not found ====".format(var.name))
return b
load_prog = fluid.Program()
load_block = load_prog.global_block()
vars = list(filter(_if_exist, program.list_vars()))
for var in vars:
new_var = fluid.io._clone_var_in_block_(load_block, var)
name = var.name[:-7] if var.name.endswith('.master') else var.name
file_path = os.path.join(load_dirname, name)
load_block.append_op(
type='load',
inputs={},
outputs={'Out': [new_var]},
attrs={
'file_path': file_path,
'load_as_fp16': var.dtype == fluid.core.VarDesc.VarType.FP16
})
executor.run(load_prog)
\ No newline at end of file
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import paddle.fluid as fluid
import numpy as np
import importlib
from src.utils.config import cfg
def softmax_with_loss(logit, label, ignore_mask=None, num_classes=2):
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
label = fluid.layers.elementwise_min( label, fluid.layers.assign(np.array([num_classes - 1], dtype=np.int32)))
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
logit = fluid.layers.reshape(logit, [-1, num_classes])
label = fluid.layers.reshape(label, [-1, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.reshape(ignore_mask, [-1, 1])
loss, probs = fluid.layers.softmax_with_cross_entropy(
logit,
label,
ignore_index=cfg.DATASET.IGNORE_INDEX,
return_softmax=True)
loss = loss * ignore_mask
avg_loss = fluid.layers.mean(loss) / fluid.layers.mean(ignore_mask)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return avg_loss
# to change, how to appicate ignore index and ignore mask
def dice_loss(logit, label, ignore_mask=None, epsilon=0.00001):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception("dice loss is only applicable to one channel classfication")
ignore_mask = fluid.layers.cast(ignore_mask, 'float32')
logit = fluid.layers.transpose(logit, [0, 2, 3, 1])
label = fluid.layers.transpose(label, [0, 2, 3, 1])
label = fluid.layers.cast(label, 'int64')
ignore_mask = fluid.layers.transpose(ignore_mask, [0, 2, 3, 1])
logit = fluid.layers.sigmoid(logit)
logit = logit * ignore_mask
label = label * ignore_mask
reduce_dim = list(range(1, len(logit.shape)))
inse = fluid.layers.reduce_sum(logit * label, dim=reduce_dim)
dice_denominator = fluid.layers.reduce_sum(
logit, dim=reduce_dim) + fluid.layers.reduce_sum(
label, dim=reduce_dim)
dice_score = 1 - inse * 2 / (dice_denominator + epsilon)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return fluid.layers.reduce_mean(dice_score)
def bce_loss(logit, label, ignore_mask=None):
if logit.shape[1] != 1 or label.shape[1] != 1 or ignore_mask.shape[1] != 1:
raise Exception("bce loss is only applicable to binary classfication")
label = fluid.layers.cast(label, 'float32')
loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=logit,
label=label,
ignore_index=cfg.DATASET.IGNORE_INDEX,
normalize=True) # or False
loss = fluid.layers.reduce_sum(loss)
label.stop_gradient = True
ignore_mask.stop_gradient = True
return loss
def multi_softmax_with_loss(logits, label, ignore_mask=None, num_classes=2):
if isinstance(logits, tuple):
print("logits.type: ",type(logits))
avg_loss = 0
for i, logit in enumerate(logits):
logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
logit_mask = (logit_label.astype('int32') !=
cfg.DATASET.IGNORE_INDEX).astype('int32')
loss = softmax_with_loss(logit, logit_label, logit_mask,
num_classes)
avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
else:
avg_loss = softmax_with_loss(logits, label, ignore_mask, num_classes)
return avg_loss
def multi_dice_loss(logits, label, ignore_mask=None):
if isinstance(logits, tuple):
avg_loss = 0
for i, logit in enumerate(logits):
logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
logit_mask = (logit_label.astype('int32') !=
cfg.DATASET.IGNORE_INDEX).astype('int32')
loss = dice_loss(logit, logit_label, logit_mask)
avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
else:
avg_loss = dice_loss(logits, label, ignore_mask)
return avg_loss
def multi_bce_loss(logits, label, ignore_mask=None):
if isinstance(logits, tuple):
avg_loss = 0
for i, logit in enumerate(logits):
logit_label = fluid.layers.resize_nearest(label, logit.shape[2:])
logit_mask = (logit_label.astype('int32') !=
cfg.DATASET.IGNORE_INDEX).astype('int32')
loss = bce_loss(logit, logit_label, logit_mask)
avg_loss += cfg.MODEL.MULTI_LOSS_WEIGHT[i] * loss
else:
avg_loss = bce_loss(logits, label, ignore_mask)
return avg_loss
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import numpy as np
from scipy.sparse import csr_matrix
class ConfusionMatrix(object):
"""
Confusion Matrix for segmentation evaluation
"""
def __init__(self, num_classes=2, streaming=False):
self.confusion_matrix = np.zeros([num_classes, num_classes],
dtype='int64')
self.num_classes = num_classes
self.streaming = streaming
def calculate(self, pred, label, ignore=None):
# If not in streaming mode, clear matrix everytime when call `calculate`
if not self.streaming:
self.zero_matrix()
label = np.transpose(label, (0, 2, 3, 1))
ignore = np.transpose(ignore, (0, 2, 3, 1))
mask = np.array(ignore) == 1
label = np.asarray(label)[mask]
pred = np.asarray(pred)[mask]
one = np.ones_like(pred)
# Accumuate ([row=label, col=pred], 1) into sparse matrix
spm = csr_matrix((one, (label, pred)),
shape=(self.num_classes, self.num_classes))
spm = spm.todense()
self.confusion_matrix += spm
def zero_matrix(self):
""" Clear confusion matrix """
self.confusion_matrix = np.zeros([self.num_classes, self.num_classes],
dtype='int64')
def mean_iou(self):
iou_list = []
avg_iou = 0
# TODO: use numpy sum axis api to simpliy
vji = np.zeros(self.num_classes, dtype=int)
vij = np.zeros(self.num_classes, dtype=int)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
for c in range(self.num_classes):
total = vji[c] + vij[c] - self.confusion_matrix[c][c]
if total == 0:
iou = 0
else:
iou = float(self.confusion_matrix[c][c]) / total
avg_iou += iou
iou_list.append(iou)
avg_iou = float(avg_iou) / float(self.num_classes)
return np.array(iou_list), avg_iou
def accuracy(self):
total = self.confusion_matrix.sum()
total_right = 0
for c in range(self.num_classes):
total_right += self.confusion_matrix[c][c]
if total == 0:
avg_acc = 0
else:
avg_acc = float(total_right) / total
vij = np.zeros(self.num_classes, dtype=int)
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
acc_list = []
for c in range(self.num_classes):
if vij[c] == 0:
acc = 0
else:
acc = self.confusion_matrix[c][c] / float(vij[c])
acc_list.append(acc)
return np.array(acc_list), avg_acc
def kappa(self):
vji = np.zeros(self.num_classes)
vij = np.zeros(self.num_classes)
for j in range(self.num_classes):
v_j = 0
for i in range(self.num_classes):
v_j += self.confusion_matrix[j][i]
vji[j] = v_j
for i in range(self.num_classes):
v_i = 0
for j in range(self.num_classes):
v_i += self.confusion_matrix[j][i]
vij[i] = v_i
total = self.confusion_matrix.sum()
# avoid spillovers
# TODO: is it reasonable to hard code 10000.0?
total = float(total) / 10000.0
vji = vji / 10000.0
vij = vij / 10000.0
tp = 0
tc = 0
for c in range(self.num_classes):
tp += vji[c] * vij[c]
tc += self.confusion_matrix[c][c]
tc = tc / 10000.0
pe = tp / (total * total)
po = tc / total
kappa = (po - pe) / (1 - pe)
return kappa
def get_cityscapes_palette(num_cls=19):
""" Returns the color map for visualizing the segmentation mask.
Args:
num_cls: Number of classes
Returns:
The color map
"""
palette = [0] * (num_cls * 3)
palette[0:3] = (128, 64, 128) # 0: 'road'
palette[3:6] = (244, 35,232) # 1 'sidewalk'
palette[6:9] = (70, 70, 70) # 2''building'
palette[9:12] = (102,102,156) # 3 wall
palette[12:15] = (190,153,153) # 4 fence
palette[15:18] = (153,153,153) # 5 pole
palette[18:21] = (250,170, 30) # 6 'traffic light'
palette[21:24] = (220,220, 0) # 7 'traffic sign'
palette[24:27] = (107,142, 35) # 8 'vegetation'
palette[27:30] = (152,251,152) # 9 'terrain'
palette[30:33] = ( 70,130,180) # 10 sky
palette[33:36] = (220, 20, 60) # 11 person
palette[36:39] = (255, 0, 0) # 12 rider
palette[39:42] = (0, 0, 142) # 13 car
palette[42:45] = (0, 0, 70) # 14 truck
palette[45:48] = (0, 60,100) # 15 bus
palette[48:51] = (0, 80,100) # 16 train
palette[51:54] = (0, 0,230) # 17 'motorcycle'
palette[54:57] = (119, 11, 32) # 18 'bicycle'
palette[57:60] = (105, 105, 105)
return palette
def get_gene_palette(num_cls=182): #Ref: CCNet
""" Returns the color map for visualizing the segmentation mask.
Args:
num_cls: Number of classes
Returns:
The color map
"""
n = num_cls
palette = [0] * (n * 3)
for j in range(0, n):
lab = j
palette[j * 3 + 0] = 0
palette[j * 3 + 1] = 0
palette[j * 3 + 2] = 0
i = 0
while lab:
palette[j * 3 + 0] |= (((lab >> 0) & 1) << (7 - i))
palette[j * 3 + 1] |= (((lab >> 1) & 1) << (7 - i))
palette[j * 3 + 2] |= (((lab >> 2) & 1) << (7 - i))
i += 1
lab >>= 3
return palette
def get_palette(dataset):
if dataset == 'cityscapes':
palette = get_cityscapes_palette(19)
elif dataset == 'pascalContext':
palette = get_gene_palette(num_cls=59)
else:
raise RuntimeError("unkonw dataset :{}".format(dataset))
return palette
# coding: utf8
# copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import paddle.fluid as fluid
import numpy as np
import importlib
from src.utils.config import cfg
from paddle.fluid.contrib.mixed_precision.decorator import OptimizerWithMixedPrecison, decorate, AutoMixedPrecisionLists
class Solver(object):
def __init__(self, main_prog, start_prog):
total_images = cfg.DATASET.TRAIN_TOTAL_IMAGES
self.weight_decay = cfg.SOLVER.WEIGHT_DECAY
self.momentum = cfg.SOLVER.MOMENTUM
self.momentum2 = cfg.SOLVER.MOMENTUM2
self.step_per_epoch = total_images // cfg.TRAIN_BATCH_SIZE
if total_images % cfg.TRAIN_BATCH_SIZE != 0:
self.step_per_epoch += 1
self.total_step = cfg.SOLVER.NUM_EPOCHS * self.step_per_epoch
self.main_prog = main_prog
self.start_prog = start_prog
self.warmup_step = cfg.SOLVER.LR_WARMUP_STEPS if cfg.SOLVER.LR_WARMUP else -1
self.decay_step = self.total_step - self.warmup_step
self.decay_epochs = cfg.SOLVER.NUM_EPOCHS - self.warmup_step / self.step_per_epoch
def lr_warmup(self, learning_rate, start_lr, end_lr):
linear_step = end_lr - start_lr
lr = fluid.layers.tensor.create_global_var(
shape=[1],
value=0.0,
dtype='float32',
persistable=True,
name="learning_rate_warmup")
global_step = fluid.layers.learning_rate_scheduler._decay_step_counter()
warmup_counter = fluid.layers.autoincreased_step_counter(
counter_name='@LR_DECAY_COUNTER_WARMUP_IN_SEG@', begin=1, step=1)
global_counter = fluid.default_main_program().global_block(
).vars['@LR_DECAY_COUNTER@']
warmup_counter = fluid.layers.cast(warmup_counter, 'float32')
with fluid.layers.control_flow.Switch() as switch:
with switch.case(warmup_counter <= self.warmup_step):
decayed_lr = start_lr + linear_step * (
warmup_counter / self.warmup_step)
fluid.layers.tensor.assign(decayed_lr, lr)
# hold the global_step to 0 during the warm-up phase
fluid.layers.increment(global_counter, value=-1)
with switch.default():
fluid.layers.tensor.assign(learning_rate, lr)
return lr
def piecewise_decay(self):
gamma = cfg.SOLVER.GAMMA
bd = [self.step_per_epoch * e for e in cfg.SOLVER.DECAY_EPOCH]
lr = [cfg.SOLVER.LR * (gamma**i) for i in range(len(bd) + 1)]
decayed_lr = fluid.layers.piecewise_decay(boundaries=bd, values=lr)
return decayed_lr
def poly_decay(self):
power = cfg.SOLVER.POWER
decayed_lr = fluid.layers.polynomial_decay(
cfg.SOLVER.LR, self.decay_step, end_learning_rate=0, power=power)
return decayed_lr
def cosine_decay(self):
decayed_lr = fluid.layers.cosine_decay(
cfg.SOLVER.LR, self.step_per_epoch, self.decay_epochs)
return decayed_lr
def get_lr(self, lr_policy):
if lr_policy.lower() == 'poly':
decayed_lr = self.poly_decay()
elif lr_policy.lower() == 'piecewise':
decayed_lr = self.piecewise_decay()
elif lr_policy.lower() == 'cosine':
decayed_lr = self.cosine_decay()
else:
raise Exception(
"unsupport learning decay policy! only support poly,piecewise,cosine"
)
decayed_lr = self.lr_warmup(decayed_lr, 0, cfg.SOLVER.LR)
return decayed_lr
def sgd_optimizer(self, lr_policy, loss):
decayed_lr = self.get_lr(lr_policy)
optimizer = fluid.optimizer.Momentum(
learning_rate=decayed_lr,
momentum=self.momentum,
regularization=fluid.regularizer.L2Decay(
regularization_coeff=self.weight_decay),
)
if cfg.MODEL.FP16:
if cfg.MODEL.MODEL_NAME in ["pspnet"]:
custom_black_list = {"pool2d"}
else:
custom_black_list = {}
amp_lists = AutoMixedPrecisionLists(
custom_black_list=custom_black_list)
assert isinstance(cfg.MODEL.SCALE_LOSS, float) or isinstance(cfg.MODEL.SCALE_LOSS, str), \
"data type of MODEL.SCALE_LOSS must be float or str"
if isinstance(cfg.MODEL.SCALE_LOSS, float):
optimizer = decorate(
optimizer,
amp_lists=amp_lists,
init_loss_scaling=cfg.MODEL.SCALE_LOSS,
use_dynamic_loss_scaling=False)
else:
assert cfg.MODEL.SCALE_LOSS.lower() in [
'dynamic'
], "if MODEL.SCALE_LOSS is a string,\
must be set as 'DYNAMIC'!"
optimizer = decorate(
optimizer,
amp_lists=amp_lists,
use_dynamic_loss_scaling=True)
optimizer.minimize(loss)
return decayed_lr
def adam_optimizer(self, lr_policy, loss):
decayed_lr = self.get_lr(lr_policy)
optimizer = fluid.optimizer.Adam(
learning_rate=decayed_lr,
beta1=self.momentum,
beta2=self.momentum2,
regularization=fluid.regularizer.L2Decay(
regularization_coeff=self.weight_decay),
)
optimizer.minimize(loss)
return decayed_lr
def optimise(self, loss):
lr_policy = cfg.SOLVER.LR_POLICY
opt = cfg.SOLVER.OPTIMIZER
if opt.lower() == 'adam':
return self.adam_optimizer(lr_policy, loss)
elif opt.lower() == 'sgd':
return self.sgd_optimizer(lr_policy, loss)
else:
raise Exception(
"unsupport optimizer solver, only support adam and sgd")
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
def calculate_eta(remaining_step, speed):
if remaining_step < 0:
remaining_step = 0
remaining_time = int(remaining_step / speed)
result = "{:0>2}:{:0>2}:{:0>2}"
arr = []
for i in range(2, -1, -1):
arr.append(int(remaining_time / 60**i))
remaining_time %= 60**i
return result.format(*arr)
class Timer(object):
""" Simple timer class for measuring time consuming """
def __init__(self):
self._start_time = 0.0
self._end_time = 0.0
self._elapsed_time = 0.0
self._is_running = False
def start(self):
self._is_running = True
self._start_time = time.time()
def restart(self):
self.start()
def stop(self):
self._is_running = False
self._end_time = time.time()
def elapsed_time(self):
self._end_time = time.time()
self._elapsed_time = self._end_time - self._start_time
if not self.is_running:
return 0.0
return self._elapsed_time
@property
def is_running(self):
return self._is_running
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册