提交 19d9ceb4 编写于 作者: W wuzewu

Rename hub_module

上级 898cc7dc

要显示的变更太多。

To preserve performance only 1000 of 1000+ files are displayed.
# DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks
## Introduction
This page implements the [DELTA](https://arxiv.org/abs/1901.09229) algorithm in [PaddlePaddle](https://www.paddlepaddle.org.cn).
> Li, Xingjian, et al. "DELTA: Deep learning transfer using feature map with attention for convolutional networks." ICLR 2019.
## Preparation of Data and Pre-trained Model
- Download transfer learning target datasets, like [Caltech-256](https://www.kaggle.com/jessicali9530/caltech256), [CUB_200_2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) or others. Arrange the dataset in this way:
```
root/train/dog/xxy.jpg
root/train/dog/xxz.jpg
...
root/train/cat/nsdf3.jpg
root/train/cat/asd932_.jpg
...
root/test/dog/xxx.jpg
...
root/test/cat/123.jpg
...
```
- Download [the pretrained models](https://github.com/PaddlePaddle/models/tree/release/1.7/PaddleCV/image_classification#resnet-series). We give the results of ResNet-101 below.
## Running Scripts
Modify `global_data_path` in `datasets/data_path` to the path root where the dataset is.
```bash
python -u main.py --dataset Caltech30 --delta_reg 0.1 --wd_rate 1e-4 --batch_size 64 --outdir outdir --num_epoch 100 --use_cuda 0
python -u main.py --dataset CUB_200_2011 --delta_reg 0.1 --wd_rate 1e-4 --batch_size 64 --outdir outdir --num_epoch 100 --use_cuda 0
```
Those scripts give the results below:
\ | l2 | delta
---|---|---
Caltech-256|79.86|84.71
CUB_200|77.41|80.05
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
'--prefix', default=None, type=str, help='prefix for model id')
parser.add_argument('--dataset', default='PetImages', type=str, help='dataset')
parser.add_argument(
'--seed',
default=None,
type=int,
help='random seed (default: None, i.e., not fix the randomness).')
parser.add_argument('--batch_size', default=20, type=int, help='batch_size.')
parser.add_argument('--delta_reg', default=0.1, type=float, help='delta_reg.')
parser.add_argument('--wd_rate', default=1e-4, type=float, help='wd_rate.')
parser.add_argument(
'--use_cuda', default=0, type=int, help='use_cuda device. -1 cpu.')
parser.add_argument('--num_epoch', default=100, type=int, help='num_epoch.')
parser.add_argument('--outdir', default='outdir', type=str, help='outdir')
parser.add_argument(
'--pretrained_model',
default='./pretrained_models/ResNet101_pretrained',
type=str,
help='pretrained model pathname')
args = parser.parse_args()
global_data_path = '[root_path]/datasets'
import cv2
import numpy as np
import six
import os
import glob
def resize_short(img, target_size, interpolation=None):
"""resize image
Args:
img: image data
target_size: resize short target size
interpolation: interpolation mode
Returns:
resized image data
"""
percent = float(target_size) / min(img.shape[0], img.shape[1])
resized_width = int(round(img.shape[1] * percent))
resized_height = int(round(img.shape[0] * percent))
if interpolation:
resized = cv2.resize(
img, (resized_width, resized_height), interpolation=interpolation)
else:
resized = cv2.resize(img, (resized_width, resized_height))
return resized
def crop_image(img, target_size, center):
"""crop image
Args:
img: images data
target_size: crop target size
center: crop mode
Returns:
img: cropped image data
"""
height, width = img.shape[:2]
size = target_size
if center == True:
w_start = (width - size) // 2
h_start = (height - size) // 2
else:
w_start = np.random.randint(0, width - size + 1)
h_start = np.random.randint(0, height - size + 1)
w_end = w_start + size
h_end = h_start + size
img = img[h_start:h_end, w_start:w_end, :]
return img
def preprocess_image(img, random_mirror=True):
"""
centered, scaled by 1/255.
:param img: np.array: shape: [ns, h, w, 3], color order: rgb.
:return: np.array: shape: [ns, h, w, 3]
"""
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
# transpose to [ns, 3, h, w]
img = img.astype('float32').transpose((0, 3, 1, 2)) / 255
img_mean = np.array(mean).reshape((3, 1, 1))
img_std = np.array(std).reshape((3, 1, 1))
img -= img_mean
img /= img_std
if random_mirror:
mirror = int(np.random.uniform(0, 2))
if mirror == 1:
img = img[:, :, ::-1, :]
return img
def _find_classes(dir):
# Faster and available in Python 3.5 and above
classes = [d.name for d in os.scandir(dir) if d.is_dir()]
classes.sort()
class_to_idx = {classes[i]: i for i in range(len(classes))}
return classes, class_to_idx
class ReaderConfig():
"""
A generic data loader where the images are arranged in this way:
root/train/dog/xxy.jpg
root/train/dog/xxz.jpg
...
root/train/cat/nsdf3.jpg
root/train/cat/asd932_.jpg
...
root/test/dog/xxx.jpg
...
root/test/cat/123.jpg
...
"""
def __init__(self, dataset_dir, is_test):
image_paths, labels, self.num_classes = self.reader_creator(
dataset_dir, is_test)
random_per = np.random.permutation(range(len(image_paths)))
self.image_paths = image_paths[random_per]
self.labels = labels[random_per]
self.is_test = is_test
def get_reader(self):
def reader():
IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
'.tif', '.tiff', '.webp')
target_size = 256
crop_size = 224
for i, img_path in enumerate(self.image_paths):
if not img_path.lower().endswith(IMG_EXTENSIONS):
continue
img = cv2.imread(img_path)
if img is None:
print(img_path)
continue
img = resize_short(img, target_size, interpolation=None)
img = crop_image(img, crop_size, center=self.is_test)
img = img[:, :, ::-1]
img = np.expand_dims(img, axis=0)
img = preprocess_image(img, not self.is_test)
yield img, self.labels[i]
return reader
def reader_creator(self, dataset_dir, is_test=False):
IMG_EXTENSIONS = ('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm',
'.tif', '.tiff', '.webp')
# read
if is_test:
datasubset_dir = os.path.join(dataset_dir, 'test')
else:
datasubset_dir = os.path.join(dataset_dir, 'train')
class_names, class_to_idx = _find_classes(datasubset_dir)
# num_classes = len(class_names)
image_paths = []
labels = []
for class_name in class_names:
classes_dir = os.path.join(datasubset_dir, class_name)
for img_path in glob.glob(os.path.join(classes_dir, '*')):
if not img_path.lower().endswith(IMG_EXTENSIONS):
continue
image_paths.append(img_path)
labels.append(class_to_idx[class_name])
image_paths = np.array(image_paths)
labels = np.array(labels)
return image_paths, labels, len(class_names)
import os
import time
import sys
import math
import numpy as np
import functools
import re
import logging
import glob
import paddle
import paddle.fluid as fluid
from models.resnet import ResNet101
from datasets.readers import ReaderConfig
# import cv2
# import skimage
# import matplotlib.pyplot as plt
# from paddle.fluid.core import PaddleTensor
# from paddle.fluid.core import AnalysisConfig
# from paddle.fluid.core import create_paddle_predictor
from args import args
from datasets.data_path import global_data_path
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
if args.seed is not None:
np.random.seed(args.seed)
print(os.environ.get('LD_LIBRARY_PATH', None))
print(os.environ.get('PATH', None))
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self):
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def load_vars_by_dict(executor, name_var_dict, main_program=None):
from paddle.fluid.framework import Program, Variable
from paddle.fluid import core
load_prog = Program()
load_block = load_prog.global_block()
if main_program is None:
main_program = fluid.default_main_program()
if not isinstance(main_program, Program):
raise TypeError("program should be as Program type or None")
for each_var_name in name_var_dict.keys():
assert isinstance(name_var_dict[each_var_name], Variable)
if name_var_dict[each_var_name].type == core.VarDesc.VarType.RAW:
continue
load_block.append_op(
type='load',
inputs={},
outputs={'Out': [name_var_dict[each_var_name]]},
attrs={'file_path': each_var_name})
executor.run(load_prog)
def get_model_id():
prefix = ''
if args.prefix is not None:
prefix = args.prefix + '-' # for some notes.
model_id = prefix + args.dataset + \
'-epo_' + str(args.num_epoch) + \
'-b_' + str(args.batch_size) + \
'-reg_' + str(args.delta_reg) + \
'-wd_' + str(args.wd_rate)
return model_id
def train():
dataset = args.dataset
image_shape = [3, 224, 224]
pretrained_model = args.pretrained_model
class_map_path = f'{global_data_path}/{dataset}/readable_label.txt'
if os.path.exists(class_map_path):
logger.info(
"The map of readable label and numerical label has been found!")
with open(class_map_path) as f:
label_dict = {}
strinfo = re.compile(r"\d+ ")
for item in f.readlines():
key = int(item.split(" ")[0])
value = [
strinfo.sub("", l).replace("\n", "")
for l in item.split(", ")
]
label_dict[key] = value[0]
assert os.path.isdir(
pretrained_model), "please load right pretrained model path for infer"
# data reader
batch_size = args.batch_size
reader_config = ReaderConfig(f'{global_data_path}/{dataset}', is_test=False)
reader = reader_config.get_reader()
train_reader = paddle.batch(
paddle.reader.shuffle(reader, buf_size=batch_size),
batch_size,
drop_last=True)
# model ops
image = fluid.data(
name='image', shape=[None] + image_shape, dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
model = ResNet101(is_test=False)
features, logits = model.net(
input=image, class_dim=reader_config.num_classes)
out = fluid.layers.softmax(logits)
# loss, metric
cost = fluid.layers.mean(fluid.layers.cross_entropy(out, label))
accuracy = fluid.layers.accuracy(input=out, label=label)
# delta regularization
# teacher model pre-trained on Imagenet, 1000 classes.
global_name = 't_'
t_model = ResNet101(is_test=True, global_name=global_name)
t_features, _ = t_model.net(input=image, class_dim=1000)
for f in t_features.keys():
t_features[f].stop_gradient = True
# delta loss. hard code for the layer name, which is just before global pooling.
delta_loss = fluid.layers.square(t_features['t_res5c.add.output.5.tmp_0'] -
features['res5c.add.output.5.tmp_0'])
delta_loss = fluid.layers.reduce_mean(delta_loss)
params = fluid.default_main_program().global_block().all_parameters()
parameters = []
for param in params:
if param.trainable:
if global_name in param.name:
print('\tfixing', param.name)
else:
print('\ttraining', param.name)
parameters.append(param.name)
# optimizer, with piecewise_decay learning rate.
total_steps = len(reader_config.image_paths) * args.num_epoch // batch_size
boundaries = [int(total_steps * 2 / 3)]
print('\ttotal learning steps:', total_steps)
print('\tlr decays at:', boundaries)
values = [0.01, 0.001]
optimizer = fluid.optimizer.Momentum(
learning_rate=fluid.layers.piecewise_decay(
boundaries=boundaries, values=values),
momentum=0.9,
parameter_list=parameters,
regularization=fluid.regularizer.L2Decay(args.wd_rate))
cur_lr = optimizer._global_learning_rate()
optimizer.minimize(
cost + args.delta_reg * delta_loss, parameter_list=parameters)
# data reader
feed_order = ['image', 'label']
# executor (session)
place = fluid.CUDAPlace(
args.use_cuda) if args.use_cuda >= 0 else fluid.CPUPlace()
exe = fluid.Executor(place)
# running
main_program = fluid.default_main_program()
start_program = fluid.default_startup_program()
feed_var_list_loop = [
main_program.global_block().var(var_name) for var_name in feed_order
]
feeder = fluid.DataFeeder(feed_list=feed_var_list_loop, place=place)
exe.run(start_program)
loading_parameters = {}
t_loading_parameters = {}
for p in main_program.all_parameters():
if 'fc' not in p.name:
if global_name in p.name:
new_name = os.path.join(pretrained_model,
p.name.split(global_name)[-1])
t_loading_parameters[new_name] = p
print(new_name, p.name)
else:
name = os.path.join(pretrained_model, p.name)
loading_parameters[name] = p
print(name, p.name)
else:
print(f'not loading {p.name}')
load_vars_by_dict(exe, loading_parameters, main_program=main_program)
load_vars_by_dict(exe, t_loading_parameters, main_program=main_program)
step = 0
# test_data = reader_creator_all_in_memory('./datasets/PetImages', is_test=True)
for e_id in range(args.num_epoch):
avg_delta_loss = AverageMeter()
avg_loss = AverageMeter()
avg_accuracy = AverageMeter()
batch_time = AverageMeter()
end = time.time()
for step_id, data_train in enumerate(train_reader()):
wrapped_results = exe.run(
main_program,
feed=feeder.feed(data_train),
fetch_list=[cost, accuracy, delta_loss, cur_lr])
# print(avg_loss_value[2])
batch_time.update(time.time() - end)
end = time.time()
avg_loss.update(wrapped_results[0][0], len(data_train))
avg_accuracy.update(wrapped_results[1][0], len(data_train))
avg_delta_loss.update(wrapped_results[2][0], len(data_train))
if step % 100 == 0:
print(
f"\tEpoch {e_id}, Global_Step {step}, Batch_Time {batch_time.avg: .2f},"
f" LR {wrapped_results[3][0]}, "
f"Loss {avg_loss.avg: .4f}, Acc {avg_accuracy.avg: .4f}, Delta_Loss {avg_delta_loss.avg: .4f}"
)
step += 1
if args.outdir is not None:
try:
os.makedirs(args.outdir, exist_ok=True)
fluid.io.save_params(
executor=exe, dirname=args.outdir + '/' + get_model_id())
except:
print('\t Not saving trained parameters.')
if e_id == args.num_epoch - 1:
print("kpis\ttrain_cost\t%f" % avg_loss.avg)
print("kpis\ttrain_acc\t%f" % avg_accuracy.avg)
def test():
image_shape = [3, 224, 224]
pretrained_model = args.outdir + '/' + get_model_id()
# data reader
batch_size = args.batch_size
reader_config = ReaderConfig(
f'{global_data_path}/{args.dataset}', is_test=True)
reader = reader_config.get_reader()
test_reader = paddle.batch(reader, batch_size)
# model ops
image = fluid.data(
name='image', shape=[None] + image_shape, dtype='float32')
label = fluid.data(name='label', shape=[None, 1], dtype='int64')
model = ResNet101(is_test=True)
_, logits = model.net(input=image, class_dim=reader_config.num_classes)
out = fluid.layers.softmax(logits)
# loss, metric
cost = fluid.layers.mean(fluid.layers.cross_entropy(out, label))
accuracy = fluid.layers.accuracy(input=out, label=label)
# data reader
feed_order = ['image', 'label']
# executor (session)
place = fluid.CUDAPlace(
args.use_cuda) if args.use_cuda >= 0 else fluid.CPUPlace()
exe = fluid.Executor(place)
# running
main_program = fluid.default_main_program()
start_program = fluid.default_startup_program()
feed_var_list_loop = [
main_program.global_block().var(var_name) for var_name in feed_order
]
feeder = fluid.DataFeeder(feed_list=feed_var_list_loop, place=place)
exe.run(start_program)
fluid.io.load_params(exe, pretrained_model)
step = 0
avg_loss = AverageMeter()
avg_accuracy = AverageMeter()
for step_id, data_train in enumerate(test_reader()):
avg_loss_value = exe.run(
main_program,
feed=feeder.feed(data_train),
fetch_list=[cost, accuracy])
avg_loss.update(avg_loss_value[0], len(data_train))
avg_accuracy.update(avg_loss_value[1], len(data_train))
if step_id % 10 == 0:
print("\nBatch %d, Loss %f, Acc %f" % (step_id, avg_loss.avg,
avg_accuracy.avg))
step += 1
print("test counts:", avg_loss.count)
print("test_cost\t%f" % avg_loss.avg)
print("test_acc\t%f" % avg_accuracy.avg)
if __name__ == '__main__':
print(args)
train()
test()
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
# from https://github.com/PaddlePaddle/models/blob/release/1.7/PaddleCV/image_classification/models/resnet.py.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import paddle
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
__all__ = [
"ResNet", "ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"
]
class ResNet():
def __init__(self, layers=50, is_test=True, global_name=''):
self.layers = layers
self.is_test = is_test
self.features = {}
self.global_name = global_name
def net(self, input, class_dim=1000, data_format="NCHW"):
layers = self.layers
supported_layers = [18, 34, 50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
if layers == 18:
depth = [2, 2, 2, 2]
elif layers == 34 or layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer(
input=input,
num_filters=64,
filter_size=7,
stride=2,
act='relu',
name="conv1",
data_format=data_format)
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max',
name=self.global_name + 'poo1',
data_format=data_format)
self.features[conv.name] = conv
if layers >= 50:
for block in range(len(depth)):
for i in range(depth[block]):
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
name=conv_name,
data_format=data_format)
self.features[conv.name] = conv
pool = fluid.layers.pool2d(
input=conv,
pool_type='avg',
global_pooling=True,
name=self.global_name + 'global_pooling',
data_format=data_format)
self.features[pool.name] = pool
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
bias_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.b_0'),
param_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.w_0',
initializer=fluid.initializer.Uniform(-stdv, stdv)))
else:
for block in range(len(depth)):
for i in range(depth[block]):
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.basic_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
is_first=block == i == 0,
name=conv_name,
data_format=data_format)
self.features[conv.name] = conv
pool = fluid.layers.pool2d(
input=conv,
pool_type='avg',
global_pooling=True,
name=self.global_name + 'global_pooling',
data_format=data_format)
self.features[pool.name] = pool
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
bias_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.b_0'),
param_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.w_0',
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return self.features, out
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None,
data_format='NCHW'):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=ParamAttr(name=self.global_name + name + "_weights"),
bias_attr=False,
name=name + '.conv2d.output.1',
data_format=data_format)
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
return fluid.layers.batch_norm(
input=conv,
act=act,
name=self.global_name + bn_name + '.output.1',
param_attr=ParamAttr(self.global_name + bn_name + '_scale'),
bias_attr=ParamAttr(self.global_name + bn_name + '_offset'),
moving_mean_name=self.global_name + bn_name + '_mean',
moving_variance_name=self.global_name + bn_name + '_variance',
data_layout=data_format,
use_global_stats=self.is_test)
def shortcut(self, input, ch_out, stride, is_first, name, data_format):
if data_format == 'NCHW':
ch_in = input.shape[1]
else:
ch_in = input.shape[-1]
if ch_in != ch_out or stride != 1 or is_first == True:
return self.conv_bn_layer(
input, ch_out, 1, stride, name=name, data_format=data_format)
else:
return input
def bottleneck_block(self, input, num_filters, stride, name, data_format):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=1,
act='relu',
name=name + "_branch2a",
data_format=data_format)
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu',
name=name + "_branch2b",
data_format=data_format)
conv2 = self.conv_bn_layer(
input=conv1,
num_filters=num_filters * 4,
filter_size=1,
act=None,
name=name + "_branch2c",
data_format=data_format)
short = self.shortcut(
input,
num_filters * 4,
stride,
is_first=False,
name=name + "_branch1",
data_format=data_format)
return fluid.layers.elementwise_add(
x=short,
y=conv2,
act='relu',
name=self.global_name + name + ".add.output.5")
def basic_block(self, input, num_filters, stride, is_first, name,
data_format):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a",
data_format=data_format)
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b",
data_format=data_format)
short = self.shortcut(
input,
num_filters,
stride,
is_first,
name=name + "_branch1",
data_format=data_format)
return fluid.layers.elementwise_add(
x=short,
y=conv1,
act='relu',
name=self.global_name + name + ".add.output.5")
def ResNet18(is_test=True, global_name=''):
model = ResNet(layers=18, is_test=is_test, global_name=global_name)
return model
def ResNet34(is_test=True, global_name=''):
model = ResNet(layers=34, is_test=is_test, global_name=global_name)
return model
def ResNet50(is_test=True, global_name=''):
model = ResNet(layers=50, is_test=is_test, global_name=global_name)
return model
def ResNet101(is_test=True, global_name=''):
model = ResNet(layers=101, is_test=is_test, global_name=global_name)
return model
def ResNet152(is_test=True, global_name=''):
model = ResNet(layers=152, is_test=is_test, global_name=global_name)
return model
#copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
# from https://github.com/PaddlePaddle/models/blob/release/1.7/PaddleCV/image_classification/models/resnet_vc.py.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import paddle
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
__all__ = ["ResNet", "ResNet50_vc", "ResNet101_vc", "ResNet152_vc"]
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class ResNet():
def __init__(self, layers=50, is_test=False, global_name=''):
self.params = train_parameters
self.layers = layers
self.is_test = is_test
self.features = {}
self.global_name = global_name
def net(self, input, class_dim=1000):
layers = self.layers
supported_layers = [50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
if layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer(
input=input,
num_filters=32,
filter_size=3,
stride=2,
act='relu',
name='conv1_1')
conv = self.conv_bn_layer(
input=conv,
num_filters=32,
filter_size=3,
stride=1,
act='relu',
name='conv1_2')
conv = self.conv_bn_layer(
input=conv,
num_filters=64,
filter_size=3,
stride=1,
act='relu',
name='conv1_3')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max',
name=self.global_name + 'poo1')
self.features[conv.name] = conv
for block in range(len(depth)):
for i in range(depth[block]):
if layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
name=conv_name)
self.features[conv.name] = conv
pool = fluid.layers.pool2d(
input=conv,
pool_type='avg',
global_pooling=True,
name=self.global_name + 'global_pooling')
self.features[pool.name] = pool
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=class_dim,
bias_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.b_0'),
param_attr=fluid.param_attr.ParamAttr(
name=self.global_name + 'fc_0.w_0',
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return self.features, out
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups,
act=None,
param_attr=ParamAttr(name=self.global_name + name + "_weights"),
bias_attr=False,
name=self.global_name + name + '.conv2d.output.1')
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
return fluid.layers.batch_norm(
input=conv,
act=act,
name=self.global_name + bn_name + '.output.1',
param_attr=ParamAttr(self.global_name + bn_name + '_scale'),
bias_attr=ParamAttr(self.global_name + bn_name + '_offset'),
moving_mean_name=self.global_name + bn_name + '_mean',
moving_variance_name=self.global_name + bn_name + '_variance',
use_global_stats=self.is_test)
def shortcut(self, input, ch_out, stride, name):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1:
return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
else:
return input
def bottleneck_block(self, input, num_filters, stride, name):
conv0 = self.conv_bn_layer(
input=input,
num_filters=num_filters,
filter_size=1,
act='relu',
name=name + "_branch2a")
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu',
name=name + "_branch2b")
conv2 = self.conv_bn_layer(
input=conv1,
num_filters=num_filters * 4,
filter_size=1,
act=None,
name=name + "_branch2c")
short = self.shortcut(
input, num_filters * 4, stride, name=name + "_branch1")
return fluid.layers.elementwise_add(
x=short,
y=conv2,
act='relu',
name=self.global_name + name + ".add.output.5")
def ResNet50_vc(is_test=True, global_name=''):
model = ResNet(layers=50, is_test=is_test, global_name=global_name)
return model
def ResNet101_vc(is_test=True, global_name=''):
model = ResNet(layers=101, is_test=is_test, global_name=global_name)
return model
def ResNet152_vc(is_test=True, global_name=''):
model = ResNet(layers=152, is_test=is_test, global_name=global_name)
return model
# coding:utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import ast
import importlib.util
import nltk
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.dygraph as dg
import paddlehub as hub
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, serving
from paddlehub.common.dir import THIRD_PARTY_HOME
from paddlehub.common.utils import mkdir
from paddlehub.common.downloader import default_downloader
from paddlehub.module.module import runnable
from paddlehub.module.nlp_module import DataFormatError
lack_dependency = []
for dependency in ["ruamel", "parakeet", "soundfile", "librosa"]:
if not importlib.util.find_spec(dependency):
lack_dependency.append(dependency)
# Accelerate NLTK package download via paddlehub. 'import parakeet' will use the package.
_PUNKT_URL = "https://paddlehub.bj.bcebos.com/paddlehub-thirdparty/punkt.tar.gz"
_CMUDICT_URL = "https://paddlehub.bj.bcebos.com/paddlehub-thirdparty/cmudict.tar.gz"
nltk_path = os.path.join(THIRD_PARTY_HOME, "nltk_data")
tokenizers_path = os.path.join(nltk_path, "tokenizers")
corpora_path = os.path.join(nltk_path, "corpora")
punkt_path = os.path.join(tokenizers_path, "punkt")
cmudict_path = os.path.join(corpora_path, "cmudict")
if not os.path.exists(punkt_path):
default_downloader.download_file_and_uncompress(
url=_PUNKT_URL, save_path=tokenizers_path, print_progress=True)
if not os.path.exists(cmudict_path):
default_downloader.download_file_and_uncompress(
url=_CMUDICT_URL, save_path=corpora_path, print_progress=True)
nltk.data.path.append(nltk_path)
if not lack_dependency:
import soundfile as sf
import librosa
import ruamel.yaml
from parakeet.utils import io
from parakeet.g2p import en
from parakeet.models.deepvoice3 import Encoder, Decoder, PostNet, SpectraNet
from parakeet.models.waveflow import WaveFlowModule
from parakeet.models.deepvoice3.weight_norm_hook import remove_weight_norm
else:
raise ImportError(
"The module requires additional dependencies: %s. You can install parakeet via 'git clone https://github.com/PaddlePaddle/Parakeet && cd Parakeet && pip install -e .' and others via pip install"
% ", ".join(lack_dependency))
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
class WaveflowVocoder(object):
def __init__(self, config_path, checkpoint_path):
with open(config_path, 'rt') as f:
config = ruamel.yaml.safe_load(f)
ns = argparse.Namespace()
for k, v in config.items():
setattr(ns, k, v)
ns.use_fp16 = False
self.model = WaveFlowModule(ns)
io.load_parameters(self.model, checkpoint_path=checkpoint_path)
def __call__(self, mel):
with dg.no_grad():
self.model.eval()
audio = self.model.synthesize(mel)
self.model.train()
return audio
class GriffinLimVocoder(object):
def __init__(self,
sharpening_factor=1.4,
sample_rate=22050,
n_fft=1024,
win_length=1024,
hop_length=256):
self.sample_rate = sample_rate
self.n_fft = n_fft
self.sharpening_factor = sharpening_factor
self.win_length = win_length
self.hop_length = hop_length
def __call__(self, mel):
spec = librosa.feature.inverse.mel_to_stft(
np.exp(mel),
sr=self.sample_rate,
n_fft=self.n_fft,
fmin=0,
fmax=8000.0,
power=1.0)
audio = librosa.core.griffinlim(
spec**self.sharpening_factor,
win_length=self.win_length,
hop_length=self.hop_length)
return audio
@moduleinfo(
name="deepvoice3_ljspeech",
version="1.0.0",
summary=
"Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.",
author="paddlepaddle",
author_email="",
type="nlp/tts",
)
class DeepVoice3(hub.NLPPredictionModule):
def _initialize(self):
"""
initialize with the necessary elements
"""
self.tts_checkpoint_path = os.path.join(self.directory, "assets", "tts",
"step-1780000")
self.waveflow_checkpoint_path = os.path.join(self.directory, "assets",
"vocoder", "step-2000000")
self.waveflow_config_path = os.path.join(
self.directory, "assets", "vocoder", "waveflow_ljspeech.yaml")
tts_checkpoint_path = os.path.join(self.directory, "assets", "tts",
"ljspeech.yaml")
with open(tts_checkpoint_path) as f:
self.tts_config = ruamel.yaml.safe_load(f)
with fluid.dygraph.guard(fluid.CPUPlace()):
char_embedding = dg.Embedding((en.n_vocab,
self.tts_config["char_dim"]))
multi_speaker = self.tts_config["n_speakers"] > 1
speaker_embedding = dg.Embedding((self.tts_config["n_speakers"], self.tts_config["speaker_dim"])) \
if multi_speaker else None
encoder = Encoder(
self.tts_config["encoder_layers"],
self.tts_config["char_dim"],
self.tts_config["encoder_dim"],
self.tts_config["kernel_size"],
has_bias=multi_speaker,
bias_dim=self.tts_config["speaker_dim"],
keep_prob=1.0 - self.tts_config["dropout"])
decoder = Decoder(
self.tts_config["n_mels"],
self.tts_config["reduction_factor"],
list(self.tts_config["prenet_sizes"]) +
[self.tts_config["char_dim"]],
self.tts_config["decoder_layers"],
self.tts_config["kernel_size"],
self.tts_config["attention_dim"],
position_encoding_weight=self.tts_config["position_weight"],
omega=self.tts_config["position_rate"],
has_bias=multi_speaker,
bias_dim=self.tts_config["speaker_dim"],
keep_prob=1.0 - self.tts_config["dropout"])
postnet = PostNet(
self.tts_config["postnet_layers"],
self.tts_config["char_dim"],
self.tts_config["postnet_dim"],
self.tts_config["kernel_size"],
self.tts_config["n_mels"],
self.tts_config["reduction_factor"],
has_bias=multi_speaker,
bias_dim=self.tts_config["speaker_dim"],
keep_prob=1.0 - self.tts_config["dropout"])
self.tts_model = SpectraNet(char_embedding, speaker_embedding,
encoder, decoder, postnet)
io.load_parameters(
model=self.tts_model, checkpoint_path=self.tts_checkpoint_path)
for name, layer in self.tts_model.named_sublayers():
try:
remove_weight_norm(layer)
except ValueError:
# this layer has not weight norm hook
pass
self.waveflow = WaveflowVocoder(
config_path=self.waveflow_config_path,
checkpoint_path=self.waveflow_checkpoint_path)
self.griffin = GriffinLimVocoder(
sharpening_factor=self.tts_config["sharpening_factor"],
sample_rate=self.tts_config["sample_rate"],
n_fft=self.tts_config["n_fft"],
win_length=self.tts_config["win_length"],
hop_length=self.tts_config["hop_length"])
def synthesize(self, texts, use_gpu=False, vocoder="griffin-lim"):
"""
Get the synthetic wavs from the texts.
Args:
texts(list): the input texts to be predicted.
use_gpu(bool): whether use gpu to predict or not
vocoder(str): the vocoder name, "griffin-lim" or "waveflow"
Returns:
wavs(str): the audio wav with sample rate . You can use soundfile.write to save it.
sample_rate(int): the audio sample rate.
"""
if use_gpu and "CUDA_VISIBLE_DEVICES" not in os.environ:
use_gpu = False
logger.warning(
"use_gpu has been set False as you didn't set the environment variable CUDA_VISIBLE_DEVICES while using use_gpu=True"
)
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
if texts and isinstance(texts, list):
predicted_data = texts
else:
raise ValueError(
"The input data is inconsistent with expectations.")
wavs = []
with fluid.dygraph.guard(place):
self.tts_model.eval()
self.waveflow.model.eval()
monotonic_layers = [4]
for text in predicted_data:
# init input
logger.info("Processing sentence: %s" % text)
text = en.text_to_sequence(text, p=1.0)
text = np.expand_dims(np.array(text, dtype="int64"), 0)
lengths = np.array([text.size], dtype=np.int64)
text_seqs = dg.to_variable(text)
text_lengths = dg.to_variable(lengths)
decoder_layers = self.tts_config["decoder_layers"]
force_monotonic_attention = [False] * decoder_layers
for i in monotonic_layers:
force_monotonic_attention[i] = True
outputs = self.tts_model(
text_seqs,
text_lengths,
speakers=None,
force_monotonic_attention=force_monotonic_attention,
window=(self.tts_config["backward_step"],
self.tts_config["forward_step"]))
decoded, refined, attentions = outputs
if vocoder == 'griffin-lim':
# synthesis use griffin-lim
wav = self.griffin(refined.numpy()[0].T)
elif vocoder == 'waveflow':
# synthesis use waveflow
wav = self.waveflow(
fluid.layers.transpose(refined, [0, 2, 1])).numpy()[0]
else:
raise ValueError(
'vocoder error, we only support griffinlim and waveflow, but recevied %s.'
% vocoder)
wavs.append(wav)
return wavs, self.tts_config["sample_rate"]
@serving
def serving_method(self, texts, use_gpu=False, vocoder="griffin-lim"):
"""
Run as a service.
"""
wavs, sample_rate = self.synthesize(texts, use_gpu, vocoder)
wavs = [wav.tolist() for wav in wavs]
result = {"wavs": wavs, "sample_rate": sample_rate}
return result
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU for prediction")
self.arg_config_group.add_argument(
'--vocoder',
type=str,
default="griffin-lim",
choices=['griffin-lim', 'waveflow'],
help="the vocoder name")
def add_module_output_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--output_path',
type=str,
default=os.path.abspath(
os.path.join(os.path.curdir, f"{self.name}_prediction")),
help="path to save experiment results")
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description='Run the %s module.' % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_input_group = self.parser.add_argument_group(
title="Ouput options", description="Ouput path. Optional.")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, optional.")
self.add_module_config_arg()
self.add_module_input_arg()
self.add_module_output_arg()
args = self.parser.parse_args(argvs)
try:
input_data = self.check_input_data(args)
except DataFormatError and RuntimeError:
self.parser.print_help()
return None
mkdir(args.output_path)
wavs, sample_rate = self.synthesize(
texts=input_data, use_gpu=args.use_gpu, vocoder=args.vocoder)
for index, wav in enumerate(wavs):
sf.write(
os.path.join(args.output_path, f"{index}.wav"), wav,
sample_rate)
ret = f"The synthesized wav files have been saved in {args.output_path}"
return ret
if __name__ == "__main__":
module = DeepVoice3()
test_text = [
"Simple as this proposition is, it is necessary to be stated",
"Parakeet stands for Paddle PARAllel text-to-speech toolkit.",
]
wavs, sample_rate = module.synthesize(texts=test_text, vocoder="waveflow")
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
# coding:utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import ast
import argparse
import importlib.util
import nltk
import paddle.fluid as fluid
import paddle.fluid.dygraph as dg
import paddlehub as hub
from paddlehub.module.module import runnable
from paddlehub.common.utils import mkdir
from paddlehub.module.nlp_module import DataFormatError
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, serving
from paddlehub.common.dir import THIRD_PARTY_HOME
from paddlehub.common.downloader import default_downloader
lack_dependency = []
for dependency in ["ruamel", "parakeet", "soundfile", "librosa"]:
if not importlib.util.find_spec(dependency):
lack_dependency.append(dependency)
# Accelerate NLTK package download via paddlehub. 'import parakeet' will use the package.
_PUNKT_URL = "https://paddlehub.bj.bcebos.com/paddlehub-thirdparty/punkt.tar.gz"
_CMUDICT_URL = "https://paddlehub.bj.bcebos.com/paddlehub-thirdparty/cmudict.tar.gz"
nltk_path = os.path.join(THIRD_PARTY_HOME, "nltk_data")
tokenizers_path = os.path.join(nltk_path, "tokenizers")
corpora_path = os.path.join(nltk_path, "corpora")
punkt_path = os.path.join(tokenizers_path, "punkt")
cmudict_path = os.path.join(corpora_path, "cmudict")
if not os.path.exists(punkt_path):
default_downloader.download_file_and_uncompress(
url=_PUNKT_URL, save_path=tokenizers_path, print_progress=True)
if not os.path.exists(cmudict_path):
default_downloader.download_file_and_uncompress(
url=_CMUDICT_URL, save_path=corpora_path, print_progress=True)
nltk.data.path.append(nltk_path)
if not lack_dependency:
import soundfile as sf
import librosa
from ruamel import yaml
from parakeet.models.fastspeech.fastspeech import FastSpeech as FastSpeechModel
from parakeet.g2p.en import text_to_sequence
from parakeet.models.transformer_tts.utils import *
from parakeet.utils import io
from parakeet.modules.weight_norm import WeightNormWrapper
from parakeet.models.waveflow import WaveFlowModule
else:
raise ImportError(
"The module requires additional dependencies: %s. You can install parakeet via 'git clone https://github.com/PaddlePaddle/Parakeet && cd Parakeet && pip install -e .' and others via pip install"
% ", ".join(lack_dependency))
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
@moduleinfo(
name="fastspeech_ljspeech",
version="1.0.0",
summary=
"FastSpeech proposes a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. See https://arxiv.org/abs/1905.09263 for details.",
author="baidu-nlp",
author_email="",
type="nlp/tts",
)
class FastSpeech(hub.NLPPredictionModule):
def _initialize(self):
"""
initialize with the necessary elements
"""
self.tts_checkpoint_path = os.path.join(self.directory, "assets", "tts",
"step-162000")
self.waveflow_checkpoint_path = os.path.join(self.directory, "assets",
"vocoder", "step-2000000")
self.waveflow_config_path = os.path.join(
self.directory, "assets", "vocoder", "waveflow_ljspeech.yaml")
tts_config_path = os.path.join(self.directory, "assets", "tts",
"ljspeech.yaml")
with open(tts_config_path) as f:
self.tts_config = yaml.load(f, Loader=yaml.Loader)
with fluid.dygraph.guard(fluid.CPUPlace()):
self.tts_model = FastSpeechModel(
self.tts_config['network'],
num_mels=self.tts_config['audio']['num_mels'])
io.load_parameters(
model=self.tts_model, checkpoint_path=self.tts_checkpoint_path)
# Build vocoder.
args = AttrDict()
args.config = self.waveflow_config_path
args.use_fp16 = False
self.waveflow_config = io.add_yaml_config_to_args(args)
self.waveflow = WaveFlowModule(self.waveflow_config)
io.load_parameters(
model=self.waveflow,
checkpoint_path=self.waveflow_checkpoint_path)
def synthesize(self, texts, use_gpu=False, speed=1.0,
vocoder="griffin-lim"):
"""
Get the synthetic wavs from the texts.
Args:
texts(list): the input texts to be predicted.
use_gpu(bool): whether use gpu to predict or not. Default False.
speed(float): Controlling the voice speed. Default 1.0.
vocoder(str): the vocoder name, "griffin-lim" or "waveflow".
Returns:
wavs(str): the audio wav with sample rate . You can use soundfile.write to save it.
sample_rate(int): the audio sample rate.
"""
if use_gpu and "CUDA_VISIBLE_DEVICES" not in os.environ:
use_gpu = False
logger.warning(
"use_gpu has been set False as you didn't set the environment variable CUDA_VISIBLE_DEVICES while using use_gpu=True"
)
if use_gpu:
place = fluid.CUDAPlace(0)
else:
place = fluid.CPUPlace()
if texts and isinstance(texts, list):
predicted_data = texts
else:
raise ValueError(
"The input data is inconsistent with expectations.")
wavs = []
with fluid.dygraph.guard(place):
self.tts_model.eval()
self.waveflow.eval()
for text in predicted_data:
# init input
logger.info("Processing sentence: %s" % text)
text = np.asarray(text_to_sequence(text))
text = np.expand_dims(text, axis=0)
pos_text = np.arange(1, text.shape[1] + 1)
pos_text = np.expand_dims(pos_text, axis=0)
text = dg.to_variable(text).astype(np.int64)
pos_text = dg.to_variable(pos_text).astype(np.int64)
_, mel_output_postnet = self.tts_model(
text, pos_text, alpha=1 / speed)
if vocoder == 'griffin-lim':
# synthesis use griffin-lim
wav = self.synthesis_with_griffinlim(
mel_output_postnet, self.tts_config['audio'])
elif vocoder == 'waveflow':
wav = self.synthesis_with_waveflow(
mel_output_postnet, self.waveflow_config.sigma)
else:
raise ValueError(
'vocoder error, we only support griffinlim and waveflow, but recevied %s.'
% vocoder)
wavs.append(wav)
return wavs, self.tts_config['audio']['sr']
def synthesis_with_griffinlim(self, mel_output, cfg):
# synthesis with griffin-lim
mel_output = fluid.layers.transpose(
fluid.layers.squeeze(mel_output, [0]), [1, 0])
mel_output = np.exp(mel_output.numpy())
basis = librosa.filters.mel(
cfg['sr'],
cfg['n_fft'],
cfg['num_mels'],
fmin=cfg['fmin'],
fmax=cfg['fmax'])
inv_basis = np.linalg.pinv(basis)
spec = np.maximum(1e-10, np.dot(inv_basis, mel_output))
wav = librosa.core.griffinlim(
spec**cfg['power'],
hop_length=cfg['hop_length'],
win_length=cfg['win_length'])
return wav
def synthesis_with_waveflow(self, mel_output, sigma):
mel_spectrogram = fluid.layers.transpose(
fluid.layers.squeeze(mel_output, [0]), [1, 0])
mel_spectrogram = fluid.layers.unsqueeze(mel_spectrogram, [0])
for layer in self.waveflow.sublayers():
if isinstance(layer, WeightNormWrapper):
layer.remove_weight_norm()
# Run model inference.
wav = self.waveflow.synthesize(mel_spectrogram, sigma=sigma)
return wav.numpy()[0]
@serving
def serving_method(self,
texts,
use_gpu=False,
speed=1.0,
vocoder="griffin-lim"):
"""
Run as a service.
"""
wavs, sample_rate = self.synthesize(texts, use_gpu, speed, vocoder)
wavs = [wav.tolist() for wav in wavs]
result = {"wavs": wavs, "sample_rate": sample_rate}
return result
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU for prediction")
self.arg_config_group.add_argument(
'--vocoder',
type=str,
default="griffin-lim",
choices=['griffin-lim', 'waveflow'],
help="the vocoder name")
def add_module_output_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--output_path',
type=str,
default=os.path.abspath(
os.path.join(os.path.curdir, f"{self.name}_prediction")),
help="path to save experiment results")
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description='Run the %s module.' % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_input_group = self.parser.add_argument_group(
title="Ouput options", description="Ouput path. Optional.")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, optional.")
self.add_module_config_arg()
self.add_module_input_arg()
self.add_module_output_arg()
args = self.parser.parse_args(argvs)
try:
input_data = self.check_input_data(args)
except DataFormatError and RuntimeError:
self.parser.print_help()
return None
mkdir(args.output_path)
wavs, sample_rate = self.synthesize(
texts=input_data, use_gpu=args.use_gpu, vocoder=args.vocoder)
for index, wav in enumerate(wavs):
sf.write(
os.path.join(args.output_path, f"{index}.wav"), wav,
sample_rate)
ret = f"The synthesized wav files have been saved in {args.output_path}"
return ret
if __name__ == "__main__":
module = FastSpeech()
test_text = [
"Simple as this proposition is, it is necessary to be stated",
]
wavs, sample_rate = module.synthesize(
texts=test_text, speed=1, vocoder="waveflow")
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
# coding:utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import ast
import argparse
import importlib.util
import nltk
import paddle.fluid as fluid
import paddle.fluid.dygraph as dg
import paddlehub as hub
from paddlehub.module.module import runnable
from paddlehub.common.utils import mkdir
from paddlehub.module.nlp_module import DataFormatError
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, serving
from paddlehub.common.dir import THIRD_PARTY_HOME
from paddlehub.common.downloader import default_downloader
lack_dependency = []
for dependency in ["ruamel", "parakeet", "scipy", "soundfile", "librosa"]:
if not importlib.util.find_spec(dependency):
lack_dependency.append(dependency)
# Accelerate NLTK package download via paddlehub. 'import parakeet' will use the package.
_PUNKT_URL = "https://paddlehub.bj.bcebos.com/paddlehub-thirdparty/punkt.tar.gz"
_CMUDICT_URL = "https://paddlehub.bj.bcebos.com/paddlehub-thirdparty/cmudict.tar.gz"
nltk_path = os.path.join(THIRD_PARTY_HOME, "nltk_data")
tokenizers_path = os.path.join(nltk_path, "tokenizers")
corpora_path = os.path.join(nltk_path, "corpora")
punkt_path = os.path.join(tokenizers_path, "punkt")
cmudict_path = os.path.join(corpora_path, "cmudict")
if not os.path.exists(punkt_path):
default_downloader.download_file_and_uncompress(
url=_PUNKT_URL, save_path=tokenizers_path, print_progress=True)
if not os.path.exists(cmudict_path):
default_downloader.download_file_and_uncompress(
url=_CMUDICT_URL, save_path=corpora_path, print_progress=True)
nltk.data.path.append(nltk_path)
if not lack_dependency:
import soundfile as sf
import librosa
from ruamel import yaml
from scipy.io.wavfile import write
from parakeet.g2p.en import text_to_sequence
from parakeet.models.transformer_tts.utils import *
from parakeet.models.transformer_tts import TransformerTTS as TransformerTTSModel
from parakeet.models.waveflow import WaveFlowModule
from parakeet.utils import io
from parakeet.modules.weight_norm import WeightNormWrapper
else:
raise ImportError(
"The module requires additional dependencies: %s. You can install parakeet via 'git clone https://github.com/PaddlePaddle/Parakeet && cd Parakeet && pip install -e .' and others via pip install"
% ", ".join(lack_dependency))
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
@moduleinfo(
name="transformer_tts_ljspeech",
version="1.0.0",
summary=
"Transformer TTS introduces and adapts the multi-head attention mechanism to replace the RNN structures and also the original attention mechanism in Tacotron2. See https://arxiv.org/abs/1809.08895 for details",
author="baidu-nlp",
author_email="",
type="nlp/tts",
)
class TransformerTTS(hub.NLPPredictionModule):
def _initialize(self):
"""
initialize with the necessary elements
"""
self.tts_checkpoint_path = os.path.join(self.directory, "assets", "tts",
"step-120000")
self.waveflow_checkpoint_path = os.path.join(self.directory, "assets",
"vocoder", "step-2000000")
self.waveflow_config_path = os.path.join(
self.directory, "assets", "vocoder", "waveflow_ljspeech.yaml")
tts_config_path = os.path.join(self.directory, "assets", "tts",
"ljspeech.yaml")
with open(tts_config_path) as f:
self.tts_config = yaml.load(f, Loader=yaml.Loader)
# The max length of audio when synthsis.
self.max_len = 1000
# The threshold of stop token which indicates the time step should stop generate spectrum or not.
self.stop_threshold = 0.5
with fluid.dygraph.guard(fluid.CPUPlace()):
# Build TTS.
with fluid.unique_name.guard():
network_cfg = self.tts_config['network']
self.tts_model = TransformerTTSModel(
network_cfg['embedding_size'], network_cfg['hidden_size'],
network_cfg['encoder_num_head'],
network_cfg['encoder_n_layers'],
self.tts_config['audio']['num_mels'],
network_cfg['outputs_per_step'],
network_cfg['decoder_num_head'],
network_cfg['decoder_n_layers'])
io.load_parameters(
model=self.tts_model,
checkpoint_path=self.tts_checkpoint_path)
# Build vocoder.
args = AttrDict()
args.config = self.waveflow_config_path
args.use_fp16 = False
self.waveflow_config = io.add_yaml_config_to_args(args)
self.waveflow = WaveFlowModule(self.waveflow_config)
io.load_parameters(
model=self.waveflow,
checkpoint_path=self.waveflow_checkpoint_path)
def synthesize(self, texts, use_gpu=False, vocoder="griffin-lim"):
"""
Get the synthetic wavs from the texts.
Args:
texts(list): the input texts to be predicted.
use_gpu(bool): whether use gpu to predict or not
vocoder(str): the vocoder name, "griffin-lim" or "waveflow"
Returns:
wavs(str): the audio wav with sample rate . You can use soundfile.write to save it.
sample_rate(int): the audio sample rate.
"""
if use_gpu and "CUDA_VISIBLE_DEVICES" not in os.environ:
use_gpu = False
logger.warning(
"use_gpu has been set False as you didn't set the environment variable CUDA_VISIBLE_DEVICES while using use_gpu=True"
)
if use_gpu:
place = fluid.CUDAPlace(0)
else:
place = fluid.CPUPlace()
if texts and isinstance(texts, list):
predicted_data = texts
else:
raise ValueError(
"The input data is inconsistent with expectations.")
wavs = []
with fluid.dygraph.guard(place):
self.tts_model.eval()
self.waveflow.eval()
for text in predicted_data:
# init input
logger.info("Processing sentence: %s" % text)
text = np.asarray(text_to_sequence(text))
text = fluid.layers.unsqueeze(
dg.to_variable(text).astype(np.int64), [0])
mel_input = dg.to_variable(np.zeros([1, 1,
80])).astype(np.float32)
pos_text = np.arange(1, text.shape[1] + 1)
pos_text = fluid.layers.unsqueeze(
dg.to_variable(pos_text).astype(np.int64), [0])
for i in range(self.max_len):
pos_mel = np.arange(1, mel_input.shape[1] + 1)
pos_mel = fluid.layers.unsqueeze(
dg.to_variable(pos_mel).astype(np.int64), [0])
mel_pred, postnet_pred, attn_probs, stop_preds, attn_enc, attn_dec = self.tts_model(
text, mel_input, pos_text, pos_mel)
if stop_preds.numpy()[0, -1] > self.stop_threshold:
break
mel_input = fluid.layers.concat(
[mel_input, postnet_pred[:, -1:, :]], axis=1)
if vocoder == 'griffin-lim':
# synthesis use griffin-lim
wav = self.synthesis_with_griffinlim(
postnet_pred, self.tts_config['audio'])
elif vocoder == 'waveflow':
# synthesis use waveflow
wav = self.synthesis_with_waveflow(
postnet_pred, self.waveflow_config.sigma)
else:
raise ValueError(
'vocoder error, we only support griffinlim and waveflow, but recevied %s.'
% vocoder)
wavs.append(wav)
return wavs, self.tts_config['audio']['sr']
def synthesis_with_griffinlim(self, mel_output, cfg):
# synthesis with griffin-lim
mel_output = fluid.layers.transpose(
fluid.layers.squeeze(mel_output, [0]), [1, 0])
mel_output = np.exp(mel_output.numpy())
basis = librosa.filters.mel(
cfg['sr'],
cfg['n_fft'],
cfg['num_mels'],
fmin=cfg['fmin'],
fmax=cfg['fmax'])
inv_basis = np.linalg.pinv(basis)
spec = np.maximum(1e-10, np.dot(inv_basis, mel_output))
wav = librosa.core.griffinlim(
spec**cfg['power'],
hop_length=cfg['hop_length'],
win_length=cfg['win_length'])
return wav
def synthesis_with_waveflow(self, mel_output, sigma):
mel_spectrogram = fluid.layers.transpose(
fluid.layers.squeeze(mel_output, [0]), [1, 0])
mel_spectrogram = fluid.layers.unsqueeze(mel_spectrogram, [0])
for layer in self.waveflow.sublayers():
if isinstance(layer, WeightNormWrapper):
layer.remove_weight_norm()
# Run model inference.
wav = self.waveflow.synthesize(mel_spectrogram, sigma=sigma)
return wav.numpy()[0]
@serving
def serving_method(self, texts, use_gpu=False, vocoder="griffin-lim"):
"""
Run as a service.
"""
wavs, sample_rate = self.synthesize(texts, use_gpu, vocoder)
wavs = [wav.tolist() for wav in wavs]
result = {"wavs": wavs, "sample_rate": sample_rate}
return result
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU for prediction")
self.arg_config_group.add_argument(
'--vocoder',
type=str,
default="griffin-lim",
choices=['griffin-lim', 'waveflow'],
help="the vocoder name")
def add_module_output_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--output_path',
type=str,
default=os.path.abspath(
os.path.join(os.path.curdir, f"{self.name}_prediction")),
help="path to save experiment results")
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description='Run the %s module.' % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_input_group = self.parser.add_argument_group(
title="Ouput options", description="Ouput path. Optional.")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, optional.")
self.add_module_config_arg()
self.add_module_input_arg()
self.add_module_output_arg()
args = self.parser.parse_args(argvs)
try:
input_data = self.check_input_data(args)
except DataFormatError and RuntimeError:
self.parser.print_help()
return None
mkdir(args.output_path)
wavs, sample_rate = self.synthesize(
texts=input_data, use_gpu=args.use_gpu, vocoder=args.vocoder)
for index, wav in enumerate(wavs):
sf.write(
os.path.join(args.output_path, f"{index}.wav"), wav,
sample_rate)
ret = f"The synthesized wav files have been saved in {args.output_path}"
return ret
if __name__ == "__main__":
module = TransformerTTS()
test_text = [
"Life was like a box of chocolates, you never know what you're gonna get.",
]
wavs, sample_rate = module.synthesize(texts=test_text, vocoder="waveflow")
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
import argparse
import os
import paddlehub as hub
from paddlehub.module.module import runnable, moduleinfo
from senta_test.processor import load_vocab
@moduleinfo(
name="senta_test",
version="1.0.0",
summary="This is a PaddleHub Module. Just for test.",
author="anonymous",
author_email="",
type="nlp/sentiment_analysis",
)
class SentaTest(hub.Module):
def _initialize(self):
# add arg parser
self.parser = argparse.ArgumentParser(
description="Run the senta_test module.",
prog='hub run senta_test',
usage='%(prog)s',
add_help=True)
self.parser.add_argument(
'--input_text', type=str, default=None, help="text to predict")
# load word dict
vocab_path = os.path.join(self.directory, "vocab.list")
self.vocab = load_vocab(vocab_path)
def sentiment_classify(self, texts):
results = []
for text in texts:
sentiment = "positive"
for word in self.vocab:
if word in text:
sentiment = "negative"
break
results.append({"text": text, "sentiment": sentiment})
return results
@runnable
def run_cmd(self, argvs):
args = self.parser.parse_args(argvs)
texts = [args.input_text]
return self.sentiment_classify(texts)
# coding=utf-8
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import math
from paddle import fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
__all__ = ['DarkNet']
class DarkNet(object):
"""DarkNet, see https://pjreddie.com/darknet/yolo/
Args:
depth (int): network depth, currently only darknet 53 is supported
norm_type (str): normalization type, 'bn' and 'sync_bn' are supported
norm_decay (float): weight decay for normalization layer weights
get_prediction (bool): whether to get prediction
class_dim (int): number of class while classification
"""
def __init__(self,
depth=53,
norm_type='sync_bn',
norm_decay=0.,
weight_prefix_name='',
get_prediction=False,
class_dim=1000):
assert depth in [53], "unsupported depth value"
self.depth = depth
self.norm_type = norm_type
self.norm_decay = norm_decay
self.depth_cfg = {53: ([1, 2, 8, 8, 4], self.basicblock)}
self.prefix_name = weight_prefix_name
self.class_dim = class_dim
self.get_prediction = get_prediction
def _conv_norm(self,
input,
ch_out,
filter_size,
stride,
padding,
act='leaky',
name=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
act=None,
param_attr=ParamAttr(name=name + ".conv.weights"),
bias_attr=False)
bn_name = name + ".bn"
bn_param_attr = ParamAttr(
regularizer=L2Decay(float(self.norm_decay)),
name=bn_name + '.scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(float(self.norm_decay)),
name=bn_name + '.offset')
out = fluid.layers.batch_norm(
input=conv,
act=None,
param_attr=bn_param_attr,
bias_attr=bn_bias_attr,
moving_mean_name=bn_name + '.mean',
moving_variance_name=bn_name + '.var')
# leaky relu here has `alpha` as 0.1, can not be set by
# `act` param in fluid.layers.batch_norm above.
if act == 'leaky':
out = fluid.layers.leaky_relu(x=out, alpha=0.1)
return out
def _downsample(self,
input,
ch_out,
filter_size=3,
stride=2,
padding=1,
name=None):
return self._conv_norm(
input,
ch_out=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
name=name)
def basicblock(self, input, ch_out, name=None):
conv1 = self._conv_norm(
input,
ch_out=ch_out,
filter_size=1,
stride=1,
padding=0,
name=name + ".0")
conv2 = self._conv_norm(
conv1,
ch_out=ch_out * 2,
filter_size=3,
stride=1,
padding=1,
name=name + ".1")
out = fluid.layers.elementwise_add(x=input, y=conv2, act=None)
return out
def layer_warp(self, block_func, input, ch_out, count, name=None):
out = block_func(input, ch_out=ch_out, name='{}.0'.format(name))
for j in six.moves.xrange(1, count):
out = block_func(out, ch_out=ch_out, name='{}.{}'.format(name, j))
return out
def __call__(self, input):
"""Get the backbone of DarkNet, that is output for the 5 stages.
:param input: Variable of input image
:type input: Variable
:Returns: The last variables of each stage.
"""
stages, block_func = self.depth_cfg[self.depth]
stages = stages[0:5]
conv = self._conv_norm(
input=input,
ch_out=32,
filter_size=3,
stride=1,
padding=1,
name=self.prefix_name + "yolo_input")
downsample_ = self._downsample(
input=conv,
ch_out=conv.shape[1] * 2,
name=self.prefix_name + "yolo_input.downsample")
blocks = []
for i, stage in enumerate(stages):
block = self.layer_warp(
block_func=block_func,
input=downsample_,
ch_out=32 * 2**i,
count=stage,
name=self.prefix_name + "stage.{}".format(i))
blocks.append(block)
if i < len(stages) - 1: # do not downsaple in the last stage
downsample_ = self._downsample(
input=block,
ch_out=block.shape[1] * 2,
name=self.prefix_name + "stage.{}.downsample".format(i))
if self.get_prediction:
pool = fluid.layers.pool2d(
input=block, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(
input=pool,
size=self.class_dim,
param_attr=ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name='fc_weights'),
bias_attr=ParamAttr(name='fc_offset'))
out = fluid.layers.softmax(out)
return out
else:
return blocks
# coding=utf-8
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
import os
from collections import OrderedDict
import cv2
import numpy as np
from PIL import Image, ImageEnhance
from paddle import fluid
DATA_DIM = 224
img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
def resize_short(img, target_size):
percent = float(target_size) / min(img.size[0], img.size[1])
resized_width = int(round(img.size[0] * percent))
resized_height = int(round(img.size[1] * percent))
img = img.resize((resized_width, resized_height), Image.LANCZOS)
return img
def crop_image(img, target_size, center):
width, height = img.size
size = target_size
if center == True:
w_start = (width - size) / 2
h_start = (height - size) / 2
else:
w_start = np.random.randint(0, width - size + 1)
h_start = np.random.randint(0, height - size + 1)
w_end = w_start + size
h_end = h_start + size
img = img.crop((w_start, h_start, w_end, h_end))
return img
def process_image(img):
img = resize_short(img, target_size=256)
img = crop_image(img, target_size=DATA_DIM, center=True)
if img.mode != 'RGB':
img = img.convert('RGB')
#img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.array(img).astype('float32').transpose((2, 0, 1)) / 255
img -= img_mean
img /= img_std
return img
def test_reader(paths=None, images=None):
"""data generator
:param paths: path to images.
:type paths: list, each element is a str
:param images: data of images, [N, H, W, C]
:type images: numpy.ndarray
"""
img_list = []
if paths:
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file path.".format(img_path)
img = Image.open(img_path)
#img = cv2.imread(img_path)
img_list.append(img)
if images is not None:
for img in images:
img_list.append(Image.fromarray(np.uint8(img)))
for im in img_list:
im = process_image(im)
yield im
import os
import ast
import argparse
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
from paddlehub.module.module import moduleinfo, runnable
from paddle.fluid.core import PaddleTensor, AnalysisConfig, create_paddle_predictor
from paddlehub.common.paddle_helper import add_vars_prefix
from paddlehub.io.parser import txt_parser
from darknet53_imagenet.darknet import DarkNet
from darknet53_imagenet.processor import load_label_info
from darknet53_imagenet.data_feed import test_reader
@moduleinfo(
name="darknet53_imagenet",
version="1.1.0",
type="cv/classification",
summary=
"DarkNet53 is a image classfication model trained with ImageNet-2012 dataset.",
author="paddlepaddle",
author_email="paddle-dev@baidu.com")
class DarkNet53(hub.Module):
def _initialize(self):
self.default_pretrained_model_path = os.path.join(
self.directory, "darknet53_model")
self.label_names = load_label_info(
os.path.join(self.directory, "label_file.txt"))
self.infer_prog = None
self.pred_out = None
self._set_config()
def get_expected_image_width(self):
return 224
def get_expected_image_height(self):
return 224
def get_pretrained_images_mean(self):
im_mean = np.array([0.485, 0.456, 0.406]).reshape(1, 3)
return im_mean
def get_pretrained_images_std(self):
im_std = np.array([0.229, 0.224, 0.225]).reshape(1, 3)
return im_std
def _set_config(self):
"""
predictor config setting
"""
cpu_config = AnalysisConfig(self.default_pretrained_model_path)
cpu_config.disable_glog_info()
cpu_config.disable_gpu()
self.cpu_predictor = create_paddle_predictor(cpu_config)
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
use_gpu = True
except:
use_gpu = False
if use_gpu:
gpu_config = AnalysisConfig(self.default_pretrained_model_path)
gpu_config.disable_glog_info()
gpu_config.enable_use_gpu(memory_pool_init_size_mb=500, device_id=0)
self.gpu_predictor = create_paddle_predictor(gpu_config)
def context(self,
input_image=None,
trainable=True,
pretrained=True,
param_prefix='',
get_prediction=False):
"""Distill the Head Features, so as to perform transfer learning.
:param input_image: image tensor.
:type input_image: <class 'paddle.fluid.framework.Variable'>
:param trainable: whether to set parameters trainable.
:type trainable: bool
:param pretrained: whether to load default pretrained model.
:type pretrained: bool
:param param_prefix: the prefix of parameters in yolo_head and backbone
:type param_prefix: str
:param get_prediction: whether to get prediction,
if True, outputs is {'bbox_out': bbox_out},
if False, outputs is {'head_features': head_features}.
:type get_prediction: bool
"""
context_prog = input_image.block.program if input_image else fluid.Program(
)
startup_program = fluid.Program()
with fluid.program_guard(context_prog, startup_program):
image = input_image if input_image else fluid.data(
name='image',
shape=[-1, 3, 224, 224],
dtype='float32',
lod_level=0)
backbone = DarkNet(get_prediction=get_prediction)
out = backbone(image)
inputs = {'image': image}
if get_prediction:
outputs = {'pred_out': out}
else:
outputs = {'body_feats': out}
place = fluid.CPUPlace()
exe = fluid.Executor(place)
if pretrained:
def _if_exist(var):
return os.path.exists(
os.path.join(self.default_pretrained_model_path,
var.name))
if not param_prefix:
fluid.io.load_vars(
exe,
self.default_pretrained_model_path,
main_program=context_prog,
predicate=_if_exist)
else:
exe.run(startup_program)
return inputs, outputs, context_prog
def classification(self,
paths=None,
images=None,
use_gpu=False,
batch_size=1,
top_k=2):
"""API of Classification.
:param paths: the path of images.
:type paths: list, each element is correspond to the path of an image.
:param images: data of images, [N, H, W, C]
:type images: numpy.ndarray
:param use_gpu: whether to use gpu or not.
:type use_gpu: bool
:param batch_size: batch size.
:type batch_size: int
:param top_k : top k
:type top_k : int
"""
if self.infer_prog is None:
inputs, outputs, self.infer_prog = self.context(
trainable=False, pretrained=True, get_prediction=True)
self.infer_prog = self.infer_prog.clone(for_test=True)
self.pred_out = outputs['pred_out']
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
all_images = []
paths = paths if paths else []
for yield_data in test_reader(paths, images):
all_images.append(yield_data)
images_num = len(all_images)
loop_num = int(np.ceil(images_num / batch_size))
res_list = []
top_k = max(min(top_k, 1000), 1)
for iter_id in range(loop_num):
batch_data = []
handle_id = iter_id * batch_size
for image_id in range(batch_size):
try:
batch_data.append(all_images[handle_id + image_id])
except:
pass
batch_data = np.array(batch_data).astype('float32')
data_tensor = PaddleTensor(batch_data.copy())
if use_gpu:
result = self.gpu_predictor.run([data_tensor])
else:
result = self.cpu_predictor.run([data_tensor])
for i, res in enumerate(result[0].as_ndarray()):
res_dict = {}
pred_label = np.argsort(res)[::-1][:top_k]
for k in pred_label:
class_name = self.label_names[int(k)].split(',')[0]
max_prob = res[k]
res_dict[class_name] = max_prob
res_list.append(res_dict)
return res_list
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument(
'--batch_size',
type=int,
default=1,
help="batch size for prediction")
def add_module_input_arg(self):
"""
Add the command input options
"""
self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="input data")
self.arg_input_group.add_argument(
'--input_file',
type=str,
default=None,
help="file contain input data")
def check_input_data(self, args):
input_data = []
if args.input_path:
input_data = [args.input_path]
elif args.input_file:
if not os.path.exists(args.input_file):
raise RuntimeError("File %s is not exist." % args.input_file)
else:
input_data = txt_parser.parse(args.input_file, use_strip=True)
return input_data
@runnable
def run_cmd(self, argvs):
self.parser = argparse.ArgumentParser(
description="Run the {}".format(self.name),
prog="hub run {}".format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
input_data = self.check_input_data(args)
if len(input_data) == 0:
self.parser.print_help()
exit(1)
else:
for image_path in input_data:
if not os.path.exists(image_path):
raise RuntimeError(
"File %s or %s is not exist." % image_path)
return self.classification(
paths=input_data, use_gpu=args.use_gpu, batch_size=args.batch_size)
# coding=utf-8
import os
import time
from collections import OrderedDict
import cv2
import numpy as np
from PIL import Image
__all__ = ['reader']
DATA_DIM = 224
img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
def resize_short(img, target_size):
percent = float(target_size) / min(img.size[0], img.size[1])
resized_width = int(round(img.size[0] * percent))
resized_height = int(round(img.size[1] * percent))
img = img.resize((resized_width, resized_height), Image.LANCZOS)
return img
def crop_image(img, target_size, center):
width, height = img.size
size = target_size
if center == True:
w_start = (width - size) / 2
h_start = (height - size) / 2
else:
w_start = np.random.randint(0, width - size + 1)
h_start = np.random.randint(0, height - size + 1)
w_end = w_start + size
h_end = h_start + size
img = img.crop((w_start, h_start, w_end, h_end))
return img
def process_image(img):
img = resize_short(img, target_size=256)
img = crop_image(img, target_size=DATA_DIM, center=True)
if img.mode != 'RGB':
img = img.convert('RGB')
img = np.array(img).astype('float32').transpose((2, 0, 1)) / 255
img -= img_mean
img /= img_std
return img
def reader(images=None, paths=None):
"""
Preprocess to yield image.
Args:
images (list[numpy.ndarray]): images data, shape of each is [H, W, C].
paths (list[str]): paths to images.
Yield:
each (collections.OrderedDict): info of original image, preprocessed image.
"""
component = list()
if paths:
for im_path in paths:
each = OrderedDict()
assert os.path.isfile(
im_path), "The {} isn't a valid file path.".format(im_path)
each['org_im_path'] = im_path
each['org_im'] = Image.open(im_path)
each['org_im_width'], each['org_im_height'] = each['org_im'].size
component.append(each)
if images is not None:
assert type(images), "images is a list."
for im in images:
each = OrderedDict()
each['org_im'] = Image.fromarray(im[:, :, ::-1])
each['org_im_path'] = 'ndarray_time={}'.format(
round(time.time(), 6) * 1e6)
each['org_im_width'], each['org_im_height'] = each['org_im'].size
component.append(each)
for element in component:
element['image'] = process_image(element['org_im'])
yield element
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册