提交 759b68ea 编写于 作者: M MRXLT

Merge remote-tracking branch 'upstream/master'

......@@ -23,7 +23,7 @@ PLSC具备以下特点:
### 基础功能
* [API简介](docs/api_intro.md)
* [自定义模型](docs/custom_modes.md)
* [自定义模型](docs/custom_models.md)
* [自定义Reader接口]
### 预测部署
......@@ -33,6 +33,5 @@ PLSC具备以下特点:
### 高级功能
* [混合精度训练]
* [分布式参数转换]
* [Base64格式图像预处理]
* [分布式参数转换](docs/distributed_params.md)
* [Base64格式图像预处理](docs/base64_preprocessor.md)
# Base64格式图像预处理
## 简介
实际业务中,一种常见的训练数据存储格式是将图像数据编码为base64格式。训练数据文件
的每一行存储一张图像的base64数据和该图像的标签,并通常以制表符('\t')分隔。
通常,所有训练数据文件的文件列表记录在一个单独的文件中,整个训练数据集的目录结构如下:
```shell
dataset
|-- file_list.txt
|-- dataset.part1
|-- dataset.part2
... ....
`-- dataset.part10
```
其中,file_list.txt记录训练数据的文件列表,每行代表一个文件,以上面的例子来说,
file_list.txt的文件内容如下:
```shell
dataset.part1
dataset.part2
...
dataset.part10
```
而数据文件的每一行表示一张图像数据的base64表示,以及以制表符分隔的图像标签。
对于分布式训练,需要每张GPU卡处理相同数量的图像数据,并且通常需要在训练前做一次
训练数据的全局shuffle。
本文档介绍Base64格式图像预处理工具,用于在对训练数据做全局shuffle,并将训练数据均分到多个数据文件,
数据文件的数量和训练中使用的GPU卡数相同。当训练数据的总量不能整除GPU卡数时,通常会填充部分图像
数据(填充的图像数据随机选自训练数据集),以保证总的训练图像数量是GPU卡数的整数倍。
## 工具使用方法
工具位于tools目录下。
可以通过下面的命令行查看工具的使用帮助信息:
```python
python tools/process_base64_files.py --help
```
该工具支持以下命令行选项:
* data_dir: 训练数据的根目录
* file_list: 记录训练数据文件的列表文件,如file_list.txt
* nranks: 训练所使用的GPU卡的数量。
可以通过以下命令行运行该工具:
```shell
python tools/process_base64_files.py --data_dir=./dataset --file_list=file_list.txt --nranks=8
```
那么,会生成8个数量数据文件,每个文件中包含相同数量的训练数据。
最终的目录格式如下:
```shell
dataset
|-- file_list.txt
|-- dataset.part1
|-- dataset.part2
... ....
`-- dataset.part8
```
......@@ -3,7 +3,6 @@
默认地,PaddlePaddle大规模分类库构建基于ResNet50模型的训练模型。
PLSC提供了模型基类plsc.models.base_model.BaseModel,用户可以基于该基类构建自己的网络模型。用户自定义的模型类需要继承自该基类,并实现build_network方法,该方法用于构建用户自定义模型。
用户在使用时需要调用类的get_output方法,该方法在用户自定义模型的尾端自动添加分布式FC层。
下面的例子给出如何使用BaseModel基类定义用户自己的网络模型, 以及如何使用。
```python
......
# 分布式参数转换
## 简介
对于最后一层全连接层参数(W和b,假设参数b存在,否则,全连接参数仅为W),通常切分到所有训练GPU卡。例如,
假设训练阶段使用的GPU卡数为N,那么
$$W = [W_{1}, W_{2},..., W_{N}$$
$$b = [b_{1}, b_{2},..., b_{N}$$
并且,参数$W_{i}$和$b_{i}$保存在第i个GPU。
当保存模型时,各个GPU卡的分布式参数均会得到保存。
在热启动或fine-tuning阶段,如果训练GPU卡数和热启动前或者预训练阶段使用的GPU卡数不同时,需要
对分布式参数进行转换,以保证分布式参数的数量和训练使用的GPU卡数相同。
默认地,当使用plsc.entry.Entry.train()方法时,会自动进行分布式参数的转换。
## 工具使用方法
分布式参数转换工具也可以单独使用,可以通过下面的命令查看使用方法:
```shell
python -m plsc.utils.process_distfc_parameter --help
```
该工具支持以下命令行选项:
| 选项 | 描述 |
| :---------------------- | :------------------- |
| name_feature | 分布式参数的名称特征,用于识别分布式参数。默认的,分布式参数的名称前缀为dist@arcface@rank@rankid或者dist@softmax@rank@rankid。其中,rankid为表示GPU卡的id。默认地,name_feature的值为@rank@。用户通常不需要改变该参数的值 |
| pretrain_nranks | 预训练阶段使用的GPU卡数 |
| nranks | 本次训练将使用的GPU卡数 |
| num_classes | 分类类别的数目 |
| emb_dim | 倒数第二层全连接层的输出维度,不包含batch size |
| pretrained_model_dir | 预训练模型的保存目录 |
| output_dir | 转换后分布式参数的保存目录 |
通常,在预训练模型中包含meta.pickle文件,该文件记录预训练阶段使用的GPU卡数,分类类别书和倒数第二层全连接层的输出维度,因此通常不需要指定pretrain_nranks、num_classes和emb_dim参数。
可以通过以下命令转换分布式参数:
```shell
python -m plsc.utils.process_distfc_parameter --nranks=4 --pretrained_model_dir=./output --output_dir=./output_post
```
需要注意的是,转换后的分布式参数保存目录只包含转换后的分布式参数,而不包含其它模型参数。因此,通常需要使用转换后的分布式参数替换
预训练模型中的分布式参数。
......@@ -2,6 +2,7 @@
通常,PaddlePaddle大规模分类库在训练过程中保存的模型只保存模型参数信息,
而不包括预测模型结构。为了部署PLSC预测库,需要将预训练模型导出为预测模型。
预测模型包括预测所需要的模型参数和模型结构,用于后续地预测任务(参见[C++预测库使用])
可以通过下面的代码将预训练模型导出为预测模型:
......
# 混合精度训练
PLSC支持混合精度训练。使用混合精度训练可以提升训练的速度,同时减少训练使用的内存。
可以通过下面的代码设置开启混合精度训练:
```python
from __future__ import print_function
import plsc.entry as entry
def main():
ins = entry.Entry()
ins.set_mixed_precision(True, 1.0)
ins.train()
if __name__ == "__main__":
main()
```
其中,`set_mixed_precision`函数介绍如下:
| API | 描述 | 参数说明 |
| :------------------- | :--------------------| :---------------------- |
| set_mixed_precision(use_fp16, loss_scaling) | 设置混合精度训练 | `use_fp16`为是否开启混合精度训练,默认为False;`loss_scaling`为初始的损失缩放值,默认为1.0|
- `use_fp16`:bool类型,当想要开启混合精度训练时,可将此参数设为True即可。
- `loss_scaling`:float类型,为初始的损失缩放值,这个值有可能会影响混合精度训练的精度,建议设为默认值1.0。
为了提高混合精度训练的稳定性和精度,默认开启了动态损失缩放机制。更多关于混合精度训练的介绍可参考:[混合精度训练](https://arxiv.org/abs/1710.03740)
......@@ -12,3 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .entry import Entry
__all__ = ['Entry']
import os
import sys
import time
import argparse
import functools
import numpy as np
import paddle
import paddle.fluid as fluid
import resnet
import sklearn
import reader
from verification import evaluate
from utility import add_arguments, print_arguments
from paddle.fluid.incubate.fleet.collective import fleet, DistributedStrategy
from paddle.fluid.incubate.fleet.collective import DistFCConfig
import paddle.fluid.incubate.fleet.base.role_maker as role_maker
from paddle.fluid.transpiler.details.program_utils import program_to_code
from paddle.fluid.optimizer import Optimizer
import paddle.fluid.profiler as profiler
from fp16_utils import rewrite_program, update_role_var_grad, update_loss_scaling, move_optimize_ops_back
from fp16_lists import AutoMixedPrecisionLists
from paddle.fluid.transpiler.details import program_to_code
import paddle.fluid.layers as layers
import paddle.fluid.unique_name as unique_name
parser = argparse.ArgumentParser(description="Train parallel face network.")
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('train_batch_size', int, 128, "Minibatch size for training.")
add_arg('test_batch_size', int, 120, "Minibatch size for test.")
add_arg('num_epochs', int, 120, "Number of epochs to run.")
add_arg('image_shape', str, "3,112,112", "Image size in the format of CHW.")
add_arg('emb_dim', int, 512, "Embedding dim size.")
add_arg('class_dim', int, 85742, "Number of classes.")
add_arg('model_save_dir', str, None, "Directory to save model.")
add_arg('pretrained_model', str, None, "Directory for pretrained model.")
add_arg('lr', float, 0.1, "Initial learning rate.")
add_arg('model', str, "ResNet_ARCFACE50", "The network to use.")
add_arg('loss_type', str, "softmax", "Type of network loss to use.")
add_arg('margin', float, 0.5, "Parameter of margin for arcface or dist_arcface.")
add_arg('scale', float, 64.0, "Parameter of scale for arcface or dist_arcface.")
add_arg('with_test', bool, False, "Whether to do test during training.")
add_arg('fp16', bool, True, "Whether to do test during training.")
add_arg('profile', bool, False, "Enable profiler or not." )
# yapf: enable
args = parser.parse_args()
model_list = [m for m in dir(resnet) if "__" not in m]
def optimizer_setting(params, args):
ls = params["learning_strategy"]
step = 1
bd = [step * e for e in ls["epochs"]]
base_lr = params["lr"]
lr = [base_lr * (0.1 ** i) for i in range(len(bd) + 1)]
print("bd: {}".format(bd))
print("lr_step: {}".format(lr))
step_lr = fluid.layers.piecewise_decay(boundaries=bd, values=lr)
optimizer = fluid.optimizer.Momentum(
learning_rate=step_lr,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(5e-4))
num_trainers = int(os.getenv("PADDLE_TRAINERS_NUM", 1))
if args.loss_type in ["dist_softmax", "dist_arcface"]:
if args.fp16:
wrapper = DistributedClassificationOptimizer(
optimizer, args.train_batch_size * num_trainers, step_lr,
loss_type=args.loss_type, init_loss_scaling=1.0)
else:
wrapper = DistributedClassificationOptimizer(optimizer, args.train_batch_size * num_trainers, step_lr)
elif args.loss_type in ["softmax", "arcface"]:
wrapper = optimizer
return wrapper
def build_program(args,
main_program,
startup_program,
is_train=True,
use_parallel_test=False,
fleet=None,
strategy=None):
model_name = args.model
assert model_name in model_list, \
"{} is not in supported lists: {}".format(args.model, model_list)
assert not (is_train and use_parallel_test), \
"is_train and use_parallel_test cannot be set simultaneously"
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
worker_num = int(os.getenv("PADDLE_TRAINERS_NUM", 1))
image_shape = [int(m) for m in args.image_shape.split(",")]
# model definition
model = resnet.__dict__[model_name]()
with fluid.program_guard(main_program, startup_program):
with fluid.unique_name.guard():
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
emb, loss = model.net(input=image,
label=label,
is_train=is_train,
emb_dim=args.emb_dim,
class_dim=args.class_dim,
loss_type=args.loss_type,
margin=args.margin,
scale=args.scale)
if args.loss_type in ["dist_softmax", "dist_arcface"]:
shard_prob = loss._get_info("shard_prob")
prob_all = fluid.layers.collective._c_allgather(shard_prob,
nranks=worker_num, use_calc_stream=True)
prob_list = fluid.layers.split(prob_all, dim=0,
num_or_sections=worker_num)
prob = fluid.layers.concat(prob_list, axis=1)
label_all = fluid.layers.collective._c_allgather(label,
nranks=worker_num, use_calc_stream=True)
acc1 = fluid.layers.accuracy(input=prob, label=label_all, k=1)
acc5 = fluid.layers.accuracy(input=prob, label=label_all, k=5)
elif args.loss_type in ["softmax", "arcface"]:
prob = loss[1]
loss = loss[0]
acc1 = fluid.layers.accuracy(input=prob, label=label, k=1)
acc5 = fluid.layers.accuracy(input=prob, label=label, k=5)
optimizer = None
if is_train:
# parameters from model and arguments
params = model.params
params["lr"] = args.lr
params["num_epochs"] = args.num_epochs
params["learning_strategy"]["batch_size"] = args.train_batch_size
# initialize optimizer
optimizer = optimizer_setting(params, args)
dist_optimizer = fleet.distributed_optimizer(optimizer, strategy=strategy)
dist_optimizer.minimize(loss)
elif use_parallel_test:
emb = fluid.layers.collective._c_allgather(emb,
nranks=worker_num, use_calc_stream=True)
return emb, loss, acc1, acc5, optimizer
def train(args):
pretrained_model = args.pretrained_model
model_save_dir = args.model_save_dir
model_name = args.model
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
worker_num = int(os.getenv("PADDLE_TRAINERS_NUM", 1))
role = role_maker.PaddleCloudRoleMaker(is_collective=True)
fleet.init(role)
strategy = DistributedStrategy()
strategy.mode = "collective"
strategy.collective_mode = "grad_allreduce"
startup_prog = fluid.Program()
train_prog = fluid.Program()
test_program = fluid.Program()
train_emb, train_loss, train_acc1, train_acc5, optimizer = \
build_program(args, train_prog, startup_prog, True, False,
fleet, strategy)
test_emb, test_loss, test_acc1, test_acc5, _ = \
build_program(args, test_program, startup_prog, False, True)
if args.loss_type in ["dist_softmax", "dist_arcface"]:
if not args.fp16:
global_lr = optimizer._optimizer._global_learning_rate(
program=train_prog)
else:
global_lr = optimizer._optimizer._global_learning_rate(
program=train_prog)
elif args.loss_type in ["softmax", "arcface"]:
global_lr = optimizer._global_learning_rate(program=train_prog)
origin_prog = fleet._origin_program
train_prog = fleet.main_program
if trainer_id == 0:
with open('start.program', 'w') as fout:
program_to_code(startup_prog, fout, True)
with open('main.program', 'w') as fout:
program_to_code(train_prog, fout, True)
with open('origin.program', 'w') as fout:
program_to_code(origin_prog, fout, True)
gpu_id = int(os.getenv("FLAGS_selected_gpus", 0))
place = fluid.CUDAPlace(gpu_id)
exe = fluid.Executor(place)
exe.run(startup_prog)
if pretrained_model:
pretrained_model = os.path.join(pretrained_model, str(trainer_id))
def if_exist(var):
has_var = os.path.exists(os.path.join(pretrained_model, var.name))
if has_var:
print('var: %s found' % (var.name))
return has_var
fluid.io.load_vars(exe, pretrained_model, predicate=if_exist,
main_program=train_prog)
train_reader = paddle.batch(reader.arc_train(args.class_dim),
batch_size=args.train_batch_size)
if args.with_test:
test_list, test_name_list = reader.test()
test_feeder = fluid.DataFeeder(place=place, feed_list=['image', 'label'], program=test_program)
fetch_list_test = [test_emb.name, test_acc1.name, test_acc5.name]
feeder = fluid.DataFeeder(place=place, feed_list=['image', 'label'], program=train_prog)
fetch_list_train = [train_loss.name, global_lr.name, train_acc1.name, train_acc5.name,train_emb.name,"loss_scaling_0"]
# test_program = test_program._prune(targets=loss)
num_trainers = int(os.getenv("PADDLE_TRAINERS_NUM", 1))
real_batch_size = args.train_batch_size * num_trainers
real_test_batch_size = args.test_batch_size * num_trainers
local_time = 0.0
nsamples = 0
inspect_steps = 100
step_cnt = 0
for pass_id in range(args.num_epochs):
train_info = [[], [], [], []]
local_train_info = [[], [], [], []]
for batch_id, data in enumerate(train_reader()):
nsamples += real_batch_size
t1 = time.time()
loss, lr, acc1, acc5, train_embedding, loss_scaling = exe.run(train_prog, feed=feeder.feed(data),
fetch_list=fetch_list_train, use_program_cache=True)
t2 = time.time()
if args.profile and step_cnt == 50:
print("begin profiler")
if trainer_id == 0:
profiler.start_profiler("All")
elif args.profile and batch_id == 55:
print("begin to end profiler")
if trainer_id == 0:
profiler.stop_profiler("total", "./profile_%d" % (trainer_id))
print("end profiler break!")
args.profile=False
period = t2 - t1
local_time += period
train_info[0].append(np.array(loss)[0])
train_info[1].append(np.array(lr)[0])
local_train_info[0].append(np.array(loss)[0])
local_train_info[1].append(np.array(lr)[0])
if batch_id % inspect_steps == 0:
avg_loss = np.mean(local_train_info[0])
avg_lr = np.mean(local_train_info[1])
print("Pass:%d batch:%d lr:%f loss:%f qps:%.2f acc1:%.4f acc5:%.4f" % (
pass_id, batch_id, avg_lr, avg_loss, nsamples / local_time,
acc1, acc5))
#print("train_embedding:,",np.array(train_embedding)[0])
print("train_embedding is nan:",np.isnan(np.array(train_embedding)[0]).sum())
print("loss_scaling",loss_scaling)
local_time = 0
nsamples = 0
local_train_info = [[], [], [], []]
step_cnt += 1
if args.with_test and step_cnt % inspect_steps == 0:
test_start = time.time()
for i in xrange(len(test_list)):
data_list, issame_list = test_list[i]
embeddings_list = []
for j in xrange(len(data_list)):
data = data_list[j]
embeddings = None
parallel_test_steps = data.shape[0] // real_test_batch_size
beg = 0
end = 0
for idx in range(parallel_test_steps):
start = idx * real_test_batch_size
offset = trainer_id * args.test_batch_size
begin = start + offset
end = begin + args.test_batch_size
_data = []
for k in xrange(begin, end):
_data.append((data[k], 0))
assert len(_data) == args.test_batch_size
[_embeddings, acc1, acc5] = exe.run(test_program,
fetch_list = fetch_list_test, feed=test_feeder.feed(_data),
use_program_cache=True)
if embeddings is None:
embeddings = np.zeros((data.shape[0], _embeddings.shape[1]))
embeddings[start:start+real_test_batch_size, :] = _embeddings[:, :]
beg = parallel_test_steps * real_test_batch_size
while beg < data.shape[0]:
end = min(beg + args.test_batch_size, data.shape[0])
count = end - beg
_data = []
for k in xrange(end - args.test_batch_size, end):
_data.append((data[k], 0))
[_embeddings, acc1, acc5] = exe.run(test_program,
fetch_list = fetch_list_test, feed=test_feeder.feed(_data),
use_program_cache=True)
_embeddings = _embeddings[0:args.test_batch_size,:]
embeddings[beg:end, :] = _embeddings[(args.test_batch_size-count):, :]
beg = end
embeddings_list.append(embeddings)
xnorm = 0.0
xnorm_cnt = 0
for embed in embeddings_list:
xnorm += np.sqrt((embed * embed).sum(axis=1)).sum(axis=0)
xnorm_cnt += embed.shape[0]
xnorm /= xnorm_cnt
embeddings = embeddings_list[0] + embeddings_list[1]
if np.isnan(embeddings).sum() > 1:
print("======test np.isnan(embeddings).sum()",np.isnan(embeddings).sum())
continue
embeddings = sklearn.preprocessing.normalize(embeddings)
_, _, accuracy, val, val_std, far = evaluate(embeddings, issame_list, nrof_folds=10)
acc, std = np.mean(accuracy), np.std(accuracy)
print('[%s][%d]XNorm: %f' % (test_name_list[i], step_cnt, xnorm))
print('[%s][%d]Accuracy-Flip: %1.5f+-%1.5f' % (test_name_list[i], step_cnt, acc, std))
sys.stdout.flush()
test_end = time.time()
print("test time: {}".format(test_end - test_start))
train_loss = np.array(train_info[0]).mean()
print("End pass {0}, train_loss {1}".format(pass_id, train_loss))
sys.stdout.flush()
#save model
#if trainer_id == 0:
if model_save_dir:
model_path = os.path.join(model_save_dir + '/' + model_name,
str(pass_id), str(trainer_id))
if not os.path.isdir(model_path):
os.makedirs(model_path)
fluid.io.save_persistables(exe, model_path)
class DistributedClassificationOptimizer(Optimizer):
'''
A optimizer wrapper to generate backward network for distributed
classification training of model parallelism.
'''
def __init__(self,optimizer, batch_size, lr,
loss_type='dist_arcface',
amp_lists=None,
init_loss_scaling=1.0,
incr_every_n_steps=1000,
decr_every_n_nan_or_inf=2,
incr_ratio=2.0,
decr_ratio=0.5,
use_dynamic_loss_scaling=True):
super(DistributedClassificationOptimizer, self).__init__(
learning_rate=lr)
self._optimizer = optimizer
self._batch_size = batch_size
self._amp_lists = amp_lists
if amp_lists is None:
self._amp_lists = AutoMixedPrecisionLists()
self._param_grads = None
self._scaled_loss = None
self._loss_type = loss_type
self._init_loss_scaling = init_loss_scaling
self._loss_scaling = layers.create_global_var(
name=unique_name.generate("loss_scaling"),
shape=[1],
value=init_loss_scaling,
dtype='float32',
persistable=True)
self._use_dynamic_loss_scaling = use_dynamic_loss_scaling
if self._use_dynamic_loss_scaling:
self._incr_every_n_steps = layers.fill_constant(
shape=[1], dtype='int32', value=incr_every_n_steps)
self._decr_every_n_nan_or_inf = layers.fill_constant(
shape=[1], dtype='int32', value=decr_every_n_nan_or_inf)
self._incr_ratio = incr_ratio
self._decr_ratio = decr_ratio
self._num_good_steps = layers.create_global_var(
name=unique_name.generate("num_good_steps"),
shape=[1],
value=0,
dtype='int32',
persistable=True)
self._num_bad_steps = layers.create_global_var(
name=unique_name.generate("num_bad_steps"),
shape=[1],
value=0,
dtype='int32',
persistable=True)
# Ensure the data type of learning rate vars is float32 (same as the
# master parameter dtype)
if isinstance(optimizer._learning_rate, float):
optimizer._learning_rate_map[fluid.default_main_program()] = \
layers.create_global_var(
name=unique_name.generate("learning_rate"),
shape=[1],
value=float(optimizer._learning_rate),
dtype='float32',
persistable=True)
def minimize(self,
loss,
startup_program=None,
parameter_list=None,
no_grad_set=None,
callbacks=None):
assert loss._get_info('shard_logit')
shard_logit = loss._get_info('shard_logit')
shard_prob = loss._get_info('shard_prob')
shard_label = loss._get_info('shard_label')
shard_dim = loss._get_info('shard_dim')
op_maker = fluid.core.op_proto_and_checker_maker
op_role_key = op_maker.kOpRoleAttrName()
op_role_var_key = op_maker.kOpRoleVarAttrName()
backward_role = int(op_maker.OpRole.Backward)
loss_backward_role = int(op_maker.OpRole.Loss) | int(
op_maker.OpRole.Backward)
# minimize a scalar of reduce_sum to generate the backward network
scalar = fluid.layers.reduce_sum(shard_logit)
if not args.fp16:
ret = self._optimizer.minimize(scalar)
with open("fp32_before.program", "w") as f:
program_to_code(block.program,fout=f, skip_op_callstack=False)
block = loss.block
# remove the unnecessary ops
index = 0
for i, op in enumerate(block.ops):
if op.all_attrs()[op_role_key] == loss_backward_role:
index = i
break
print("op_role_key: ",op_role_key)
print("loss_backward_role:",loss_backward_role)
# print("\nblock.ops: ",block.ops)
print("block.ops[index - 1].type: ", block.ops[index - 1].type)
print("block.ops[index].type: ", block.ops[index].type)
print("block.ops[index + 1].type: ", block.ops[index + 1].type)
assert block.ops[index - 1].type == 'reduce_sum'
assert block.ops[index].type == 'fill_constant'
assert block.ops[index + 1].type == 'reduce_sum_grad'
block._remove_op(index + 1)
block._remove_op(index)
block._remove_op(index - 1)
# insert the calculated gradient
dtype = shard_logit.dtype
shard_one_hot = fluid.layers.create_tensor(dtype, name='shard_one_hot')
block._insert_op(
index - 1,
type='one_hot',
inputs={'X': shard_label},
outputs={'Out': shard_one_hot},
attrs={
'depth': shard_dim,
'allow_out_of_range': True,
op_role_key: backward_role
})
shard_logit_grad = fluid.layers.create_tensor(
dtype, name=fluid.backward._append_grad_suffix_(shard_logit.name))
block._insert_op(
index,
type='elementwise_sub',
inputs={'X': shard_prob,
'Y': shard_one_hot},
outputs={'Out': shard_logit_grad},
attrs={op_role_key: backward_role})
block._insert_op(
index + 1,
type='scale',
inputs={'X': shard_logit_grad},
outputs={'Out': shard_logit_grad},
attrs={
'scale': 1.0 / self._batch_size,
op_role_key: loss_backward_role
})
with open("fp32_after.program", "w") as f:
program_to_code(block.program,fout=f, skip_op_callstack=False)
# use mixed_precision for training
else:
block = loss.block
rewrite_program(block.program, self._amp_lists)
self._params_grads = self._optimizer.backward(
scalar, startup_program, parameter_list, no_grad_set,
callbacks)
update_role_var_grad(block.program, self._params_grads)
move_optimize_ops_back(block.program.global_block())
scaled_params_grads = []
for p, g in self._params_grads:
with fluid.default_main_program()._optimized_guard([p, g]):
scaled_g = g / self._loss_scaling
scaled_params_grads.append([p, scaled_g])
index = 0
for i, op in enumerate(block.ops):
if op.all_attrs()[op_role_key] == loss_backward_role:
index = i
break
fp32 = fluid.core.VarDesc.VarType.FP32
dtype = shard_logit.dtype
if self._loss_type == 'dist_arcface':
assert block.ops[index - 2].type == 'fill_constant'
assert block.ops[index - 1].type == 'reduce_sum'
assert block.ops[index].type == 'fill_constant'
assert block.ops[index + 1].type == 'reduce_sum_grad'
assert block.ops[index + 2].type == 'scale'
assert block.ops[index + 3].type == 'elementwise_add_grad'
block._remove_op(index + 2)
block._remove_op(index + 1)
block._remove_op(index)
block._remove_op(index - 1)
# insert the calculated gradient
shard_one_hot = fluid.layers.create_tensor(dtype, name='shard_one_hot')
block._insert_op(
index - 1,
type='one_hot',
inputs={'X': shard_label},
outputs={'Out': shard_one_hot},
attrs={
'depth': shard_dim,
'allow_out_of_range': True,
op_role_key: backward_role
})
shard_one_hot_fp32 = fluid.layers.create_tensor(fp32, name=(shard_one_hot.name+".cast_fp32"))
block._insert_op(
index,
type="cast",
inputs={"X": shard_one_hot},
outputs={"Out": shard_one_hot_fp32},
attrs={
"in_dtype": fluid.core.VarDesc.VarType.FP16,
"out_dtype": fluid.core.VarDesc.VarType.FP32,
op_role_key: backward_role
})
name = 'tmp_3@GRAD'
shard_logit_grad_fp32 = block.var(name)
block._insert_op(
index+1,
type='elementwise_sub',
inputs={'X': shard_prob,
'Y': shard_one_hot_fp32},
outputs={'Out': shard_logit_grad_fp32},
attrs={op_role_key: backward_role})
block._insert_op(
index+2,
type='elementwise_mul',
inputs={'X': shard_logit_grad_fp32,
'Y': self._loss_scaling},
outputs={'Out': shard_logit_grad_fp32},
attrs={op_role_key: backward_role})
block._insert_op(
index+3,
type='scale',
inputs={'X': shard_logit_grad_fp32},
outputs={'Out': shard_logit_grad_fp32},
attrs={
'scale': 1.0 / self._batch_size,
op_role_key: loss_backward_role
})
elif self._loss_type == 'dist_softmax':
print("block.ops[index - 3].type: ", block.ops[index - 3].type)
print("block.ops[index - 2].type: ", block.ops[index - 2].type)
print("block.ops[index-1].type: ", block.ops[index - 1].type)
print("block.ops[index].type: ", block.ops[index].type)
print("block.ops[index + 1].type: ", block.ops[index +1].type)
print("block.ops[index + 2].type: ", block.ops[index +2].type)
print("block.ops[index + 3].type: ", block.ops[index +3].type)
with open("fp16_softmax_before.program", "w") as f:
program_to_code(block.program,fout=f, skip_op_callstack=False)
assert block.ops[index - 1].type == 'reduce_sum'
assert block.ops[index].type == 'fill_constant'
assert block.ops[index + 1].type == 'reduce_sum_grad'
assert block.ops[index + 2].type == 'cast'
assert block.ops[index + 3].type == 'elementwise_add_grad'
block._remove_op(index + 1)
block._remove_op(index)
block._remove_op(index - 1)
# insert the calculated gradient
shard_one_hot = fluid.layers.create_tensor(fp32, name='shard_one_hot')
shard_one_hot_fp32 = fluid.layers.create_tensor(fp32,
name=(shard_one_hot.name+".cast_fp32"))
shard_logit_grad_fp32 = block.var(shard_logit.name+".cast_fp32@GRAD")
block._insert_op(
index - 1,
type='one_hot',
inputs={'X': shard_label},
outputs={'Out': shard_one_hot_fp32},
attrs={
'depth': shard_dim,
'allow_out_of_range': True,
op_role_key: backward_role
})
block._insert_op(
index,
type='elementwise_sub',
inputs={'X': shard_prob,
'Y': shard_one_hot_fp32},
outputs={'Out': shard_logit_grad_fp32},
attrs={op_role_key: backward_role})
block._insert_op(
index + 1,
type='elementwise_mul',
inputs={'X': shard_logit_grad_fp32,
'Y': self._loss_scaling},
outputs={'Out': shard_logit_grad_fp32},
attrs={op_role_key: backward_role})
block._insert_op(
index + 2,
type='scale',
inputs={'X': shard_logit_grad_fp32},
outputs={'Out': shard_logit_grad_fp32},
attrs={
'scale': 1.0 / self._batch_size,
op_role_key: loss_backward_role
})
if self._use_dynamic_loss_scaling:
grads = [layers.reduce_sum(g) for [_, g] in scaled_params_grads]
all_grads = layers.concat(grads)
all_grads_sum = layers.reduce_sum(all_grads)
is_overall_finite = layers.isfinite(all_grads_sum)
update_loss_scaling(is_overall_finite, self._loss_scaling,
self._num_good_steps, self._num_bad_steps,
self._incr_every_n_steps,
self._decr_every_n_nan_or_inf, self._incr_ratio,
self._decr_ratio)
with layers.Switch() as switch:
with switch.case(is_overall_finite):
pass
with switch.default():
for _, g in scaled_params_grads:
layers.assign(layers.zeros_like(g), g)
optimize_ops = self._optimizer.apply_gradients(scaled_params_grads)
ret = optimize_ops, scaled_params_grads
with open("fp16_softmax.program", "w") as f:
program_to_code(block.program,fout=f, skip_op_callstack=False)
return ret
def main():
global args
all_loss_types = ["softmax", "arcface", "dist_softmax", "dist_arcface"]
assert args.loss_type in all_loss_types, \
"All supported loss types [{}], but give {}.".format(
all_loss_types, args.loss_type)
print_arguments(args)
train(args)
if __name__ == '__main__':
main()
......@@ -24,6 +24,7 @@ import pickle
import subprocess
import shutil
import logging
import tempfile
import paddle
import paddle.fluid as fluid
......@@ -43,7 +44,8 @@ from paddle.fluid.optimizer import Optimizer
logging.basicConfig(
format='[%(asctime)s %(levelname)s line:%(lineno)d] %(message)s',
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%d %b %Y %H:%M:%S')
logger = logging.getLogger(__name__)
......@@ -57,6 +59,9 @@ class Entry(object):
"""
Check the validation of parameters.
"""
assert os.getenv("PADDLE_TRAINERS_NUM") is not None, \
"Please start script using paddle.distributed.launch module."
supported_types = ["softmax", "arcface",
"dist_softmax", "dist_arcface"]
assert self.loss_type in supported_types, \
......@@ -70,10 +75,8 @@ class Entry(object):
def __init__(self):
self.config = config.config
super(Entry, self).__init__()
assert os.getenv("PADDLE_TRAINERS_NUM") is not None, \
"Please start script using paddle.distributed.launch module."
num_trainers = int(os.getenv("PADDLE_TRAINERS_NUM"))
trainer_id = int(os.getenv("PADDLE_TRAINER_ID"))
num_trainers = int(os.getenv("PADDLE_TRAINERS_NUM", 1))
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", 0))
self.trainer_id = trainer_id
self.num_trainers = num_trainers
......@@ -114,8 +117,15 @@ class Entry(object):
self.model_save_dir = self.config.model_save_dir
self.warmup_epochs = self.config.warmup_epochs
if self.checkpoint_dir:
self.checkpoint_dir = os.path.abspath(self.checkpoint_dir)
if self.model_save_dir:
self.model_save_dir = os.path.abspath(self.model_save_dir)
if self.dataset_dir:
self.dataset_dir = os.path.abspath(self.dataset_dir)
logger.info('=' * 30)
logger.info("Default configuration: ")
logger.info("Default configuration:")
for key in self.config:
logger.info('\t' + str(key) + ": " + str(self.config[key]))
logger.info('trainer_id: {}, num_trainers: {}'.format(
......@@ -123,18 +133,21 @@ class Entry(object):
logger.info('=' * 30)
def set_val_targets(self, targets):
"""
Set the names of validation datasets, separated by comma.
"""
self.val_targets = targets
logger.info("Set val_targets to {} by user.".format(targets))
logger.info("Set val_targets to {}.".format(targets))
def set_train_batch_size(self, batch_size):
self.train_batch_size = batch_size
self.global_train_batch_size = batch_size * self.num_trainers
logger.info("Set train batch size to {} by user.".format(batch_size))
logger.info("Set train batch size to {}.".format(batch_size))
def set_test_batch_size(self, batch_size):
self.test_batch_size = batch_size
self.global_test_batch_size = batch_size * self.num_trainers
logger.info("Set test batch size to {} by user.".format(batch_size))
logger.info("Set test batch size to {}.".format(batch_size))
def set_hdfs_info(self, fs_name, fs_ugi, directory):
"""
......@@ -153,38 +166,42 @@ class Entry(object):
def set_model_save_dir(self, directory):
"""
Set the directory to save model.
Set the directory to save models.
"""
if directory:
directory = os.path.abspath(directory)
self.model_save_dir = directory
logger.info("Set model_save_dir to {} by user.".format(directory))
logger.info("Set model_save_dir to {}.".format(directory))
def set_dataset_dir(self, directory):
"""
Set the root directory for datasets.
"""
if directory:
directory = os.path.abspath(directory)
self.dataset_dir = directory
logger.info("Set dataset_dir to {} by user.".format(directory))
logger.info("Set dataset_dir to {}.".format(directory))
def set_train_image_num(self, num):
"""
Set the total number of images for train.
"""
self.train_image_num = num
logger.info("Set train_image_num to {} by user.".format(num))
logger.info("Set train_image_num to {}.".format(num))
def set_class_num(self, num):
"""
Set the number of classes.
"""
self.num_classes = num
logger.info("Set num_classes to {} by user.".format(num))
logger.info("Set num_classes to {}.".format(num))
def set_emb_size(self, size):
"""
Set the size of the last hidding layer before the distributed fc-layer.
"""
self.emb_size = size
logger.info("Set emb_size to {} by user.".format(size))
logger.info("Set emb_size to {}.".format(size))
def set_model(self, model):
"""
......@@ -194,25 +211,27 @@ class Entry(object):
if not isinstance(model, base_model.BaseModel):
raise ValueError("The parameter for set_model must be an "
"instance of BaseModel.")
logger.info("Set model to {} by user.".format(model))
logger.info("Set model to {}.".format(model))
def set_train_epochs(self, num):
"""
Set the number of epochs to train.
"""
self.train_epochs = num
logger.info("Set train_epochs to {} by user.".format(num))
logger.info("Set train_epochs to {}.".format(num))
def set_checkpoint_dir(self, directory):
"""
Set the directory for checkpoint loaded before training/testing.
"""
if directory:
directory = os.path.abspath(directory)
self.checkpoint_dir = directory
logger.info("Set checkpoint_dir to {} by user.".format(directory))
logger.info("Set checkpoint_dir to {}.".format(directory))
def set_warmup_epochs(self, num):
self.warmup_epochs = num
logger.info("Set warmup_epochs to {} by user.".format(num))
logger.info("Set warmup_epochs to {}.".format(num))
def set_loss_type(self, type):
supported_types = ["dist_softmax", "dist_arcface", "softmax", "arcface"]
......@@ -220,24 +239,22 @@ class Entry(object):
raise ValueError("All supported loss types: {}".format(
supported_types))
self.loss_type = type
logger.info("Set loss_type to {} by user.".format(type))
logger.info("Set loss_type to {}.".format(type))
def set_image_shape(self, shape):
if not isinstance(shape, (list, tuple)):
raise ValueError("shape must be of type list or tuple")
raise ValueError("Shape must be of type list or tuple")
self.image_shape = shape
logger.info("Set image_shape to {} by user.".format(shape))
logger.info("Set image_shape to {}.".format(shape))
def set_optimizer(self, optimizer):
if not isinstance(optimizer, Optimizer):
raise ValueError("optimizer must be as type of Optimizer")
raise ValueError("Optimizer must be type of Optimizer")
self.optimizer = optimizer
logger.info("User manually set optimizer")
def get_optimizer(self):
if self.optimizer:
return self.optimizer
def _get_optimizer(self):
if not self.optimizer:
bd = [step for step in self.lr_steps]
start_lr = self.lr
......@@ -247,12 +264,12 @@ class Entry(object):
train_image_num * 1.0 / self.num_trainers))
steps_per_pass = int(math.ceil(
images_per_trainer * 1.0 / self.train_batch_size))
logger.info("steps per epoch: %d" % steps_per_pass)
logger.info("Steps per epoch: %d" % steps_per_pass)
warmup_steps = steps_per_pass * self.warmup_epochs
batch_denom = 1024
base_lr = start_lr * global_batch_size / batch_denom
lr = [base_lr * (0.1 ** i) for i in range(len(bd) + 1)]
logger.info("lr boundaries: {}".format(bd))
logger.info("LR boundaries: {}".format(bd))
logger.info("lr_step: {}".format(lr))
if self.warmup_epochs:
lr_val = lr_warmup(fluid.layers.piecewise_decay(boundaries=bd,
......@@ -268,6 +285,7 @@ class Entry(object):
if self.loss_type in ["dist_softmax", "dist_arcface"]:
self.optimizer = DistributedClassificationOptimizer(
self.optimizer, global_batch_size)
return self.optimizer
def build_program(self,
......@@ -302,6 +320,7 @@ class Entry(object):
loss_type=self.loss_type,
margin=self.margin,
scale=self.scale)
if self.loss_type in ["dist_softmax", "dist_arcface"]:
shard_prob = loss._get_info("shard_prob")
......@@ -320,10 +339,12 @@ class Entry(object):
optimizer = None
if is_train:
# initialize optimizer
optimizer = self.get_optimizer()
optimizer = self._get_optimizer()
dist_optimizer = self.fleet.distributed_optimizer(
optimizer, strategy=self.strategy)
dist_optimizer.minimize(loss)
if "dist" in self.loss_type:
optimizer = optimizer._optimizer
elif use_parallel_test:
emb = fluid.layers.collective._c_allgather(emb,
nranks=num_trainers, use_calc_stream=True)
......@@ -361,11 +382,7 @@ class Entry(object):
def preprocess_distributed_params(self,
local_dir):
local_dir = os.path.abspath(local_dir)
output_dir = local_dir + "_@tmp"
assert not os.path.exists(output_dir), \
"The temp directory {} for distributed params exists.".format(
output_dir)
os.makedirs(output_dir)
output_dir = tempfile.mkdtemp()
cmd = sys.executable + ' -m plsc.utils.process_distfc_parameter '
cmd += "--nranks {} ".format(self.num_trainers)
cmd += "--num_classes {} ".format(self.num_classes)
......@@ -388,13 +405,11 @@ class Entry(object):
file = os.path.join(output_dir, file)
shutil.move(file, local_dir)
shutil.rmtree(output_dir)
file_name = os.path.join(local_dir, '.lock')
with open(file_name, 'w') as f:
pass
def append_broadcast_ops(self, program):
def _append_broadcast_ops(self, program):
"""
Before test, we broadcast bn-related parameters to all other trainers.
Before test, we broadcast bathnorm-related parameters to all
other trainers from trainer-0.
"""
bn_vars = [var for var in program.list_vars()
if 'batch_norm' in var.name and var.persistable]
......@@ -420,24 +435,26 @@ class Entry(object):
checkpoint_dir = self.checkpoint_dir
if self.fs_name is not None:
ans = 'y'
if os.path.exists(checkpoint_dir):
ans = input("Downloading pretrained model, but the local "
ans = input("Downloading pretrained models, but the local "
"checkpoint directory ({}) exists, overwrite it "
"or not? [Y/N]".format(checkpoint_dir))
if ans.lower() == n:
logger.info("Using the local checkpoint directory, instead"
" of the remote one.")
else:
logger.info("Overwriting the local checkpoint directory.")
if ans.lower() == 'y':
if os.path.exists(checkpoint_dir):
logger.info("Using the local checkpoint directory.")
shutil.rmtree(checkpoint_dir)
os.makedirs(checkpoint_dir)
# sync all trainers to avoid loading checkpoints before
# parameters are downloaded
file_name = os.path.join(checkpoint_dir, '.lock')
if self.trainer_id == 0:
self.get_files_from_hdfs(checkpoint_dir)
with open(file_name, 'w') as f:
pass
time.sleep(5)
time.sleep(10)
os.remove(file_name)
else:
while True:
......@@ -445,15 +462,15 @@ class Entry(object):
time.sleep(1)
else:
break
else:
self.get_files_from_hdfs(checkpoint_dir)
# Preporcess distributed parameters.
file_name = os.path.join(checkpoint_dir, '.lock')
distributed = self.loss_type in ["dist_softmax", "dist_arcface"]
if load_for_train and self.trainer_id == 0 and distributed:
self.preprocess_distributed_params(checkpoint_dir)
time.sleep(5)
with open(file_name, 'w') as f:
pass
time.sleep(10)
os.remove(file_name)
elif load_for_train and distributed:
# wait trainer_id (0) to complete
......@@ -503,11 +520,11 @@ class Entry(object):
load_for_train=False)
assert self.model_save_dir, \
"Does not set model_save_dir for inference."
"Does not set model_save_dir for inference model converting."
if os.path.exists(self.model_save_dir):
ans = input("model_save_dir for inference model ({}) exists, "
"overwrite it or not? [Y/N]".format(model_save_dir))
if ans.lower() == n:
if ans.lower() == 'n':
logger.error("model_save_dir for inference model exists, "
"and cannot overwrite it.")
exit()
......@@ -551,17 +568,17 @@ class Entry(object):
load_for_train=False)
if self.train_reader is None:
train_reader = paddle.batch(reader.arc_train(
predict_reader = paddle.batch(reader.arc_train(
self.dataset_dir, self.num_classes),
batch_size=self.train_batch_size)
else:
train_reader = self.train_reader
predict_reader = self.train_reader
feeder = fluid.DataFeeder(place=place,
feed_list=['image', 'label'], program=main_program)
fetch_list = [emb.name]
for data in train_reader():
for data in predict_reader():
emb = exe.run(main_program, feed=feeder.feed(data),
fetch_list=fetch_list, use_program_cache=True)
print("emb: ", emb)
......@@ -684,16 +701,12 @@ class Entry(object):
self.build_program(True, False)
if self.with_test:
test_emb, test_loss, test_acc1, test_acc5, _ = \
self.build_program(False, True)
self.build_program(False, self.num_trainers > 1)
test_list, test_name_list = reader.test(
self.dataset_dir, self.val_targets)
test_program = self.test_program
self.append_broadcast_ops(test_program)
self._append_broadcast_ops(test_program)
if self.loss_type in ["dist_softmax", "dist_arcface"]:
global_lr = optimizer._optimizer._global_learning_rate(
program=self.train_program)
else:
global_lr = optimizer._global_learning_rate(
program=self.train_program)
......@@ -720,10 +733,10 @@ class Entry(object):
fetch_list_test = [test_emb.name, test_acc1.name, test_acc5.name]
real_test_batch_size = self.global_test_batch_size
if self.checkpoint_dir == "":
load_checkpoint = False
else:
if self.checkpoint_dir:
load_checkpoint = True
else:
load_checkpoint = False
if load_checkpoint:
self.load_checkpoint(executor=exe, main_program=origin_prog)
......@@ -839,7 +852,14 @@ class Entry(object):
model_save_dir = os.path.join(
self.model_save_dir, str(pass_id))
if not os.path.exists(model_save_dir):
# may be more than one processes trying
# to create the directory
try:
os.makedirs(model_save_dir)
except OSError as exc:
if exc.errno != errno.EEXIST:
raise
pass
if trainer_id == 0:
fluid.io.save_persistables(exe,
model_save_dir,
......
#!/usr/bin/env bash
export FLAGS_cudnn_exhaustive_search=true
export FLAGS_fraction_of_gpu_memory_to_use=0.96
export FLAGS_eager_delete_tensor_gb=0.0
selected_gpus="0,1,2,3,4,5,6,7"
#selected_gpus="4,5,6"
python -m paddle.distributed.launch \
--selected_gpus $selected_gpus \
--log_dir mylog \
do_train.py \
--model=ResNet_ARCFACE50 \
--loss_type=dist_softmax \
--model_save_dir=output \
--margin=0.5 \
--train_batch_size 32 \
--class_dim 85742 \
--with_test=True
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" PLSC version string """
plsc_version = "0.1.0"
numpy>=1.12, <=1.16.4 ; python_version<"3.5"
numpy>=1.12 ; python_version>="3.5"
scipy>=0.19.0, <=1.2.1 ; python_version<"3.5"
paddlepaddle>=1.6.2
scipy ; python_version>="3.5"
Pillow
sklearn
easydict
......@@ -11,17 +11,53 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from setuptools import setup, find_packages
"""Setup for pip package."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
setup(name="plsc",
version="0.1.0",
description="Large Scale Classfication via distributed fc.",
author='lilong',
author_email="lilong.albert@gmail.com",
url="http",
license="Apache",
#packages=['paddleXML'],
from setuptools import find_packages
from setuptools import setup
from plsc.version import plsc_version
REQUIRED_PACKAGES = [
'sklearn', 'easydict', 'paddlepaddle>=1.6.2', 'Pillow',
'numpy', 'scipy'
]
setup(
name="plsc",
version=plsc_version,
description=
("PaddlePaddle Large Scale Classfication Package."),
long_description='',
url='https://github.com/PaddlePaddle/PLSC',
author='PaddlePaddle Authors',
author_email='paddle-dev@baidu.com',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
#install_requires=['paddlepaddle>=1.6.1'],
python_requires='>=2'
)
# PyPI package information.
classifiers=[
'Development Status :: 4 - Beta',
'Intended Audience :: Developers',
'Intended Audience :: Education',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: Apache Software License',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Topic :: Scientific/Engineering',
'Topic :: Scientific/Engineering :: Mathematics',
'Topic :: Scientific/Engineering :: Artificial Intelligence',
'Topic :: Software Development',
'Topic :: Software Development :: Libraries',
'Topic :: Software Development :: Libraries :: Python Modules',
],
license="Apache 2.0",
keywords=
('plsc paddlepaddle large-scale classification model-parallelism distributed-training'))
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册