求助怎么使用单机多gpu
Created by: DianaZhang
我使用的就是手写字体的例子,然后在exe之后使用了多gpu的compiled_prog,但是运行结果报错。
环境 ubuntu 18.04 cuda 9.0 cudnn 7.1 paddlepdaalde 1.4.1.post85
在~/.bashrc中已经修改了LD_LIBRARY_PATH路径: 以上集中修改之后都人就包以下错误
报的错误:
W0614 10:27:29.849985 9177 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the OriginProgram()
method!
W0614 10:27:31.884308 9177 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 8.0
W0614 10:27:31.884506 9177 dynamic_loader.cc:107] Can not find library: libcudnn.so. Please try to add the lib path to LD_LIBRARY_PATH.
W0614 10:27:31.884533 9177 dynamic_loader.cc:165] Failed to find dynamic library: libcudnn.so ( libcudnn.so: cannot open shared object file: No such file or directory )
Please specify its path correctly using following ways:
Method. set environment variable LD_LIBRARY_PATH on Linux or DYLD_LIBRARY_PATH on Mac OS.
For instance, issue command: export LD_LIBRARY_PATH=...
Note: After Mac OS 10.11, using the DYLD_LIBRARY_PATH is impossible unless System Integrity Protection (SIP) is disabled.
Traceback (most recent call last):
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 263, in
main(use_cuda=use_cuda, nn_type=predict)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 243, in main
params_filename=params_filename)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 149, in train
exe.run(startup_program)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 565, in run
use_program_cache=use_program_cache)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 642, in run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Invoke operator fill_constant error.
Python Callstacks:
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1725, in prepend_op
attrs=kwargs.get("attrs", None))
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/initializer.py", line 167, in call
stop_gradient=True)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1517, in create_var
kwargs'initializer'
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/layer_helper_base.py", line 382, in set_variable_initializer
initializer=initializer)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/layers/tensor.py", line 152, in create_global_var
value=float(value), force_cpu=force_cpu))
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 136, in create_global_learning_rate
persistable=True)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 275, in create_optimization_pass
self.create_global_learning_rate()
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 441, in apply_gradients
optimize_ops = self.create_optimization_pass(params_grads)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 469, in apply_optimize
optimize_ops = self.apply_gradients(params_grads)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 500, in minimize
loss, startup_program=startup_program, params_grads=params_grads)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 119, in train
optimizer.minimize(avg_loss)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 243, in main
params_filename=params_filename)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 263, in
main(use_cuda=use_cuda, nn_type=predict)
C++ Callstacks:
Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion at [/paddle/paddle/fluid/platform/dynload/cudnn.cc:59]
PaddlePaddle Call Stacks:
0 0x7f338b3a1eb0p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352
1 0x7f338b3a2229p paddle::platform::EnforceNotMet::EnforceNotMet(std::exception_ptr::exception_ptr, char const*, int) + 137
2 0x7f338d1fcd48p paddle::platform::dynload::EnforceCUDNNLoaded(char const*) + 200
3 0x7f338d1d9515p paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace) + 741
4 0x7f338d1de1e8p std::Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > (), std::reference_wrapper<std::Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > >, std::less<boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> >, std::allocator<std::pair<boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > > > > >, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>)::{lambda()#1} ()> > >::M_invoke(std::Any_data const&) + 104
5 0x7f338d1dc7cap std::Function_handler<std::unique_ptr<std::future_base::Result_base, std::future_base::Result_base::Deleter> (), std::future_base::Task_setter<std::unique_ptr<std::future_base::Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > >, std::future_base::Result_base::Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > > >::M_invoke(std::Any_data const&) + 42
6 0x7f338b46e747p std::future_base::State_base::M_do_set(std::function<std::unique_ptr<std::future_base::Result_base, std::future_base::Result_base::Deleter> ()>&, bool&) + 39
7 0x7f33e9761827p
8 0x7f338d1dfabcp std::future_base::Deferred_state<std::Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void>, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > >, std::less<boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void> >, std::allocator<std::pair<boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > > > > >, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_deletepaddle::platform::DeviceContext > >::M_run_deferred() + 220
9 0x7f338d1d9fe9p paddle::platform::DeviceContextPool::Get(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 137
10 0x7f338d12c6d8p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 72
11 0x7f338d12d094p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 292
12 0x7f338d12a9bcp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
13 0x7f338b5145dep paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 382
14 0x7f338b51541fp paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocatorstd::string > const&, bool) + 143
15 0x7f338b391abep
16 0x7f338b3d497ep
17 0x565d5cp _PyCFunction_FastCallDict + 860
18 0x503073p
19 0x506859p _PyEval_EvalFrameDefault + 1097
20 0x504c28p
21 0x502540p
22 0x502f3dp
23 0x507641p _PyEval_EvalFrameDefault + 4657
24 0x504c28p
25 0x502540p
26 0x502f3dp
27 0x506859p _PyEval_EvalFrameDefault + 1097
28 0x504c28p
29 0x502540p
30 0x502f3dp
31 0x507641p _PyEval_EvalFrameDefault + 4657
32 0x504c28p
33 0x502540p
34 0x502f3dp
35 0x507641p _PyEval_EvalFrameDefault + 4657
36 0x504c28p
37 0x506393p PyEval_EvalCode + 35
38 0x634d52p
39 0x634e0ap PyRun_FileExFlags + 154
40 0x6385c8p PyRun_SimpleFileExFlags + 392
41 0x63915ap Py_Main + 1402
42 0x4a6f10p main + 224
43 0x7f33e9992b97p __libc_start_main + 231
44 0x5afa0ap _start + 42 0x7fá
最后是整个代码: ` from future import print_function
import os import argparse from PIL import Image import numpy import paddle import paddle.fluid as fluid import time
def parse_args(): parser = argparse.ArgumentParser("mnist") parser.add_argument( '--enable_ce', action='store_true', help="If set, run the task with continuous evaluation logs.") parser.add_argument( '--use_gpu', type=bool, default=False, help="Whether to use GPU or not.") parser.add_argument( '--num_epochs', type=int, default=5, help="number of epochs.") args = parser.parse_args() return args
def loss_net(hidden, label): prediction = fluid.layers.fc(input=hidden, size=10, act='softmax') loss = fluid.layers.cross_entropy(input=prediction, label=label) avg_loss = fluid.layers.mean(loss) acc = fluid.layers.accuracy(input=prediction, label=label) return prediction, avg_loss, acc
def multilayer_perceptron(img, label): img = fluid.layers.fc(input=img, size=200, act='tanh') hidden = fluid.layers.fc(input=img, size=200, act='tanh') return loss_net(hidden, label)
def softmax_regression(img, label): return loss_net(img, label)
def convolutional_neural_network(img, label): conv_pool_1 = fluid.nets.simple_img_conv_pool( input=img, filter_size=5, num_filters=20, pool_size=2, pool_stride=2, act="relu") conv_pool_1 = fluid.layers.batch_norm(conv_pool_1) conv_pool_2 = fluid.nets.simple_img_conv_pool( input=conv_pool_1, filter_size=5, num_filters=50, pool_size=2, pool_stride=2, act="relu") return loss_net(conv_pool_2, label)
def train(nn_type, use_cuda, save_dirname=None, model_filename=None, params_filename=None): if use_cuda and not fluid.core.is_compiled_with_cuda(): return
startup_program = fluid.default_startup_program()
main_program = fluid.default_main_program()
if args.enable_ce:
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE)
test_reader = paddle.batch(
paddle.dataset.mnist.test(), batch_size=BATCH_SIZE)
startup_program.random_seed = 90
main_program.random_seed = 90
else:
train_reader = paddle.batch(
paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=500),
batch_size=BATCH_SIZE)
test_reader = paddle.batch(
paddle.dataset.mnist.test(), batch_size=BATCH_SIZE)
img = fluid.layers.data(name='img', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
if nn_type == 'softmax_regression':
net_conf = softmax_regression
elif nn_type == 'multilayer_perceptron':
net_conf = multilayer_perceptron
else:
net_conf = convolutional_neural_network
prediction, avg_loss, acc = net_conf(img, label)
test_program = main_program.clone(for_test=True)
optimizer = fluid.optimizer.Adam(learning_rate=0.001)
optimizer.minimize(avg_loss)
def train_test(train_test_program, train_test_feed, train_test_reader):
acc_set = []
avg_loss_set = []
for test_data in train_test_reader():
acc_np, avg_loss_np = exe.run(
program=train_test_program,
feed=train_test_feed.feed(test_data),
fetch_list=[acc, avg_loss])
acc_set.append(float(acc_np))
avg_loss_set.append(float(avg_loss_np))
# get test acc and loss
acc_val_mean = numpy.array(acc_set).mean()
avg_loss_val_mean = numpy.array(avg_loss_set).mean()
return avg_loss_val_mean, acc_val_mean
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
compiled_prog = fluid.compiler.CompiledProgram(
fluid.default_main_program()).with_data_parallel(
loss_name=[avg_loss, acc])
feeder = fluid.DataFeeder(feed_list=[img, label], place=place)
exe.run(startup_program)
epochs = [epoch_id for epoch_id in range(PASS_NUM)]
lists = []
step = 0
for epoch_id in epochs:
for step_id, data in enumerate(train_reader()):
metrics = exe.run(
compiled_prog,
feed=feeder.feed(data),
fetch_list=[avg_loss, acc])
if step % 100 == 0:
print("Pass %d, Batch %d, Cost %f" % (step, epoch_id,
metrics[0]))
step += 1
# test for epoch
avg_loss_val, acc_val = train_test(
train_test_program=test_program,
train_test_reader=test_reader,
train_test_feed=feeder)
print("Test with Epoch %d, avg_cost: %s, acc: %s" %
(epoch_id, avg_loss_val, acc_val))
lists.append((epoch_id, avg_loss_val, acc_val))
if save_dirname is not None:
fluid.io.save_inference_model(
save_dirname, ["img"], [prediction],
exe,
model_filename=model_filename,
params_filename=params_filename)
if args.enable_ce:
print("kpis\ttrain_cost\t%f" % metrics[0])
print("kpis\ttest_cost\t%s" % avg_loss_val)
print("kpis\ttest_acc\t%s" % acc_val)
# find the best pass
best = sorted(lists, key=lambda list: float(list[1]))[0]
print('Best pass is %s, testing Avgcost is %s' % (best[0], best[1]))
print('The classification accuracy is %.2f%%' % (float(best[2]) * 100))
def infer(use_cuda, save_dirname=None, model_filename=None, params_filename=None): if save_dirname is None: return
place = fluid.CUDAPlace(3) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
t1=time.time()
def load_image(file):
im = Image.open(file).convert('L')
im = im.resize((28, 28), Image.ANTIALIAS)
im = numpy.array(im).reshape(1, 1, 28, 28).astype(numpy.float32)
im = im / 255.0 * 2.0 - 1.0
return im
cur_dir = os.path.dirname(os.path.realpath(__file__))
tensor_img = load_image(cur_dir + '/image/infer_3.png')
inference_scope = fluid.core.Scope()
with fluid.scope_guard(inference_scope):
# Use fluid.io.load_inference_model to obtain the inference program desc,
# the feed_target_names (the names of variables that will be feeded
# data using feed operators), and the fetch_targets (variables that
# we want to obtain data from using fetch operators).
[inference_program, feed_target_names,
fetch_targets] = fluid.io.load_inference_model(
save_dirname, exe, model_filename, params_filename)
# Construct feed as a dictionary of {feed_target_name: feed_target_data}
# and results will contain a list of data corresponding to fetch_targets.
results = exe.run(
inference_program,
feed={feed_target_names[0]: tensor_img},
fetch_list=fetch_targets)
lab = numpy.argsort(results)
print("Inference result of image/infer_3.png is: %d" % lab[0][0][-1])
t2=time.time()
print(t2-t1)
def main(use_cuda, nn_type): model_filename = None params_filename = None save_dirname = "recognize_digits_" + nn_type + ".inference.model" t1=time.time() # call train() with is_local argument to run distributed train train( nn_type=nn_type, use_cuda=use_cuda, save_dirname=save_dirname, model_filename=model_filename, params_filename=params_filename) t3=time.time() infer( use_cuda=use_cuda, save_dirname=save_dirname, model_filename=model_filename, params_filename=params_filename) t2=time.time() print(t3-t1) print(t2-t3)
if name == 'main': args = parse_args() BATCH_SIZE = 64 PASS_NUM = args.num_epochs use_cuda = args.use_gpu predict = 'softmax_regression' # uncomment for Softmax # predict = 'multilayer_perceptron' # uncomment for MLP # predict = 'convolutional_neural_network' # uncomment for LeNet5 main(use_cuda=use_cuda, nn_type=predict) `