win10系统下显存使用率0%,却提示显存不足
Created by: a2824256
- 标题:win10系统下显存使用率0%,却提示显存不足
- 版本、环境信息: 1)PaddlePaddle版本:1.8 2)CPU:intel i7-8700k 3)GPU:NVIDIA TIAN XP 4)系统环境:win10, conda, python3.7.7
- 训练信息 1)单机单卡 2)显存信息 3)Operator信息
- 复现信息:自己编写的一份代码,用于信号预测
- 问题描述: 报错信息:
(pdpd) E:\integral>python train.py
data_pretreatmenting
data_pretreatment_finished
network start
network end
W0819 11:57:30.471578 5956 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0
W0819 11:57:30.479526 5956 device_context.cc:260] device: 0, cuDNN Version: 7.6.
开始训练
W0819 11:58:48.131574 5956 operator.cc:187] elementwise_add raises an exception struct paddle::memory::allocation::BadAlloc,
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.
----------------------
Error Message Summary:
----------------------
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 2.456825GB memory on GPU 0, available memory is only 5.439270GB.
Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.
at (D:\1.8.4\paddle\paddle\fluid\memory\allocation\cuda_allocator.cc:69)
C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "train.py", line 136, in <module>
avg_loss_value, = exe.run(main_program, feed=feeder.feed(data_train), fetch_list=[loss])
File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py", line 1071, in run
six.reraise(*sys.exc_info())
File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\six.py", line 703, in reraise
raise value
File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py", line 1066, in run
return_merged=return_merged)
File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py", line 1154, in _run_impl
use_program_cache=use_program_cache)
File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py", line 1229, in _run_program
fetch_var_name)
RuntimeError:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.
----------------------
Error Message Summary:
----------------------
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 2.456825GB memory on GPU 0, available memory is only 5.439270GB.
Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.
at (D:\1.8.4\paddle\paddle\fluid\memory\allocation\cuda_allocator.cc:69)
train.py文件代码:
from __future__ import print_function
import paddle
import paddle.fluid as fluid
import numpy as np
import math
import csv
from paddle.utils.plot import Ploter
import sys
ITERABLE = True
TRAIN_DATA = []
TEST_DATA = []
TRAIN_RES = []
TEST_RES = []
# TRAIN_FILES = ["./train_set.csv", "./train_set2.csv", "./train_set3.csv"]
TRAIN_FILES = "./train_set.csv"
TEST_FILE = "./test_set4.csv"
def data_pretreatment():
# train
with open(TRAIN_FILES) as f:
render = csv.reader(f)
for row in render:
if len(row) != 0:
TRAIN_DATA.append([row[0], row[1], row[2], row[3]])
TRAIN_RES.append([row[4], row[5]])
# test
with open(TEST_FILE) as f:
render = csv.reader(f)
for row in render:
if len(row) != 0:
TEST_DATA.append([row[0], row[1], row[2], row[3]])
TEST_RES.append([row[4], row[5]])
def train_sample_reader():
for i in range(1):
input = np.array(TRAIN_DATA).astype('float32')
label = np.array(TRAIN_RES).astype('float32')
yield input, label
def test_sample_reader():
for i in range(1):
input = np.array(TEST_DATA).astype('float32')
label = np.array(TEST_RES).astype('float32')
yield input, label
def train(executor, program, reader, feeder, fetch_list):
accumulated = 1 * [0]
count = 0
for data_test in reader():
outs = executor.run(program=program,
feed=feeder.feed(data_test),
fetch_list=fetch_list)
accumulated = [x_c[0] + x_c[1][0] for x_c in zip(accumulated, outs)] # 累加测试过程中的损失值
count += 1 # 累加测试集中的样本数量
return [x_d / count for x_d in accumulated]
# 训练部分
# data pretreatment
print("data_pretreatmenting")
data_pretreatment()
print("data_pretreatment_finished")
train_reader = fluid.io.batch(train_sample_reader, batch_size=1)
test_reader = fluid.io.batch(test_sample_reader, batch_size=1)
print("network start")
# network
INPUT = fluid.data(name='input', shape=[None, 4], dtype='float32')
LABEL = fluid.data(name='label', shape=[None, 2], dtype='float32')
hidden = fluid.layers.fc(name='fc1', input=INPUT, size=20, act='relu')
param_attr_1 = fluid.ParamAttr(name='batch_norm_w_1', initializer=fluid.initializer.Constant(value=1.0))
bias_attr_1 = fluid.ParamAttr(name='batch_norm_b_1', initializer=fluid.initializer.Constant(value=0.0))
hidden = fluid.layers.batch_norm(input=hidden, param_attr=param_attr_1, bias_attr=bias_attr_1)
hidden = fluid.layers.fc(name='fc2', input=hidden, size=40, act='relu')
param_attr_2 = fluid.ParamAttr(name='batch_norm_w_2', initializer=fluid.initializer.Constant(value=1.0))
bias_attr_2 = fluid.ParamAttr(name='batch_norm_b_2', initializer=fluid.initializer.Constant(value=0.0))
hidden = fluid.layers.batch_norm(input=hidden, param_attr=param_attr_2, bias_attr=bias_attr_2)
hidden = fluid.layers.fc(name='fc3', input=hidden, size=20, act='relu')
param_attr_3 = fluid.ParamAttr(name='batch_norm_w_3', initializer=fluid.initializer.Constant(value=1.0))
bias_attr_3 = fluid.ParamAttr(name='batch_norm_b_3', initializer=fluid.initializer.Constant(value=0.0))
hidden = fluid.layers.batch_norm(input=hidden, param_attr=param_attr_3, bias_attr=bias_attr_3)
hidden = fluid.layers.fc(name='fc4', input=hidden, size=10, act='relu')
param_attr_4 = fluid.ParamAttr(name='batch_norm_w_4', initializer=fluid.initializer.Constant(value=1.0))
bias_attr_4 = fluid.ParamAttr(name='batch_norm_b_4', initializer=fluid.initializer.Constant(value=0.0))
hidden = fluid.layers.batch_norm(input=hidden, param_attr=param_attr_4, bias_attr=bias_attr_4)
prediction = fluid.layers.fc(name='res', input=hidden, size=2, act=None)
print("network end")
# main program
main_program = fluid.default_main_program() # 获取默认/全局主函数
startup_program = fluid.default_startup_program()
# loss
loss = fluid.layers.mean(fluid.layers.mse_loss(input=prediction, label=LABEL))
# loss = fluid.layers.mse_loss(input=prediction, label=LABEL)
# acc = fluid.layers.accuracy(input=prediction, label=LABEL)
# test program
test_program = main_program.clone(for_test=True)
adam = fluid.optimizer.Adam(learning_rate=0.01)
adam.minimize(loss)
use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
num_epochs = 100
params_dirname = "./my_paddle_model"
feeder = fluid.DataFeeder(place=place, feed_list=[INPUT, LABEL])
exe.run(startup_program)
train_prompt = "train cost"
test_prompt = "test cost"
step = 0
exe_test = fluid.Executor(place)
print("开始训练")
for pass_id in range(num_epochs):
for data_train in train_reader():
step = step + 1
avg_loss_value, = exe.run(main_program, feed=feeder.feed(data_train), fetch_list=[loss])
if step % 10 == 0:
# plot_prompt.append(train_prompt, step, avg_loss_value[0])
# plot_prompt.plot()
print("%s, Epoch %d, Step %d, Cost %f" % (train_prompt, pass_id, step, avg_loss_value[0]))
if step % 10 == 0:
test_metics = train(executor=exe_test, program=test_program, reader=test_reader, fetch_list=[loss.name], feeder=feeder)
# plot_prompt.append(test_prompt, step, test_metics[0])
# plot_prompt.plot()
print("%s, Epoch %d, Step %d, Cost %f" % (test_prompt, pass_id, step, test_metics[0]))
if test_metics[0] < 10.0:
break
if math.isnan(float(avg_loss_value[0])):
sys.exit("got NaN loss, training failed.")
# 保存训练参数到之前给定的路径中
if params_dirname is not None:
fluid.io.save_inference_model(params_dirname, ['input'], [prediction], exe)
conda 包版本
(pdpd) E:\integral>conda list
# packages in environment at C:\ProgramData\Miniconda3\envs\pdpd:
#
# Name Version Build Channel
appdirs 1.4.4 pypi_0 pypi
astor 0.8.1 py37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
attrs 19.3.0 py_0 defaults
babel 2.8.0 pypi_0 pypi
backcall 0.2.0 py_0 defaults
blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
bleach 3.1.5 py_0 defaults
brotlipy 0.7.0 py37he774522_1000 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates 2020.6.24 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi 2020.6.20 py37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cffi 1.14.0 py37h7a1dbc1_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cfgv 3.2.0 pypi_0 pypi
chardet 3.0.4 py37_1003 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
click 7.1.2 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
colorama 0.4.3 py_0 defaults
cryptography 2.9.2 py37h7a1dbc1_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudatoolkit 10.0.130 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudnn 7.6.5 cuda10.0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cycler 0.10.0 py37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cython 0.29.21 pypi_0 pypi
decorator 4.4.2 py_0 defaults
defusedxml 0.6.0 py_0 defaults
distlib 0.3.1 pypi_0 pypi
docstring-parser 0.3 pypi_0 pypi
entrypoints 0.3 py37_0 defaults
filelock 3.0.12 pypi_0 pypi
flake8 3.8.3 pypi_0 pypi
flask 1.1.2 pypi_0 pypi
flask-babel 1.0.0 pypi_0 pypi
freetype 2.10.2 hd328e21_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
funcsigs 1.0.2 pypi_0 pypi
gast 0.4.0 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
graphviz 2.38 hfd603c8_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
hdf5 1.8.20 hac2f561_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
icc_rt 2019.0.0 h0cc432a_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
icu 58.2 ha925a31_3 defaults
identify 1.4.25 pypi_0 pypi
idna 2.10 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
importlib-metadata 1.7.0 py37_0 defaults
importlib_metadata 1.7.0 0 defaults
intel-openmp 2020.1 216 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ipykernel 5.3.4 py37h5ca1d4c_0 defaults
ipython 7.16.1 py37h5ca1d4c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ipython_genutils 0.2.0 py37_0 defaults
ipywidgets 7.5.1 py_0 defaults
itsdangerous 1.1.0 pypi_0 pypi
jedi 0.17.2 py37_0 defaults
jinja2 2.11.2 py_0 defaults
joblib 0.16.0 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jpeg 9b hb83a4c4_2 defaults
jsonschema 3.2.0 py37_1 defaults
jupyter 1.0.0 py37_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jupyter_client 6.1.6 py_0 defaults
jupyter_console 6.1.0 py_0 defaults
jupyter_core 4.6.3 py37_0 defaults
kiwisolver 1.2.0 py37h74a9793_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libopencv 3.4.2 h20b85fd_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libpng 1.6.37 h2a8f88b_0 defaults
libprotobuf 3.12.4 h200bbdf_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libsodium 1.0.18 h62dcd97_0 defaults
libtiff 4.1.0 h56a325e_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
llvmlite 0.33.0 pypi_0 pypi
lz4-c 1.9.2 h62dcd97_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
m2w64-gcc-libgfortran 5.3.0 6 defaults
m2w64-gcc-libs 5.3.0 7 defaults
m2w64-gcc-libs-core 5.3.0 7 defaults
m2w64-gmp 6.1.0 2 defaults
m2w64-libwinpthread-git 5.0.0.4634.697f757 2 defaults
markupsafe 1.1.1 py37hfa6e2cd_1 defaults
matplotlib 3.2.2 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
matplotlib-base 3.2.2 py37h64f37c6_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mccabe 0.6.1 pypi_0 pypi
mistune 0.8.4 py37hfa6e2cd_1001 defaults
mkl 2020.1 216 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl-service 2.3.0 py37hb782905_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_fft 1.1.0 py37h45dec08_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_random 1.1.1 py37h47e9c7a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mpmath 1.1.0 pypi_0 pypi
msys2-conda-epoch 20160418 1 defaults
nbconvert 5.6.1 py37_1 defaults
nbformat 5.0.7 py_0 defaults
nltk 3.5 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nodeenv 1.4.0 pypi_0 pypi
notebook 6.0.3 py37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numba 0.50.1 pypi_0 pypi
numpy 1.19.1 py37h5510c5b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy-base 1.19.1 py37ha3acd2a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
objgraph 3.4.1 pypi_0 pypi
olefile 0.46 py37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opencv 3.4.2 py37h40b0b35_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opencv-python 4.1.1.26 pypi_0 pypi
openssl 1.1.1g he774522_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
packaging 20.4 py_0 defaults
paddlepaddle-gpu 1.8.4.post107 pypi_0 pypi
pandoc 2.10.1 0 defaults
pandocfilters 1.4.2 py37_1 defaults
parso 0.7.0 py_0 defaults
pathlib 1.0.1 py37_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pickleshare 0.7.5 py37_1001 defaults
pillow 7.2.0 py37hcc1f983_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pip 20.2.2 py37_0 defaults
pre-commit 2.6.0 pypi_0 pypi
prettytable 0.7 pypi_0 pypi
prometheus_client 0.8.0 py_0 defaults
prompt-toolkit 3.0.5 py_0 defaults
prompt_toolkit 3.0.5 0 defaults
protobuf 3.12.4 py37ha925a31_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
py-cpuinfo 5.0.0 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
py-opencv 3.4.2 py37hc319ecb_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pycocotools 2.0 pypi_0 pypi
pycodestyle 2.6.0 pypi_0 pypi
pycparser 2.20 py_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyflakes 2.2.0 pypi_0 pypi
pygments 2.6.1 py_0 defaults
pyopenssl 19.1.0 py_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyparsing 2.4.7 py_0 defaults
pyqt 5.9.2 py37h6538335_2 defaults
pyrsistent 0.16.0 py37he774522_0 defaults
pysocks 1.7.1 py37_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python 3.7.7 h81c818b_4 defaults
python-dateutil 2.8.1 py_0 defaults
pytz 2020.1 pypi_0 pypi
pywin32 227 py37he774522_1 defaults
pywinpty 0.5.7 py37_0 defaults
pyyaml 5.3.1 py37he774522_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyzmq 19.0.1 py37ha925a31_1 defaults
qt 5.9.7 vc14h73c81de_0 defaults
qtconsole 4.7.5 py_0 defaults
qtpy 1.9.0 py_0 defaults
recordio 0.1 pypi_0 pypi
regex 2020.7.14 py37he774522_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
requests 2.24.0 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
scipy 1.5.0 py37h9439919_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
send2trash 1.5.0 py37_0 defaults
setuptools 49.6.0 py37_0 defaults
shapely 1.7.0 pypi_0 pypi
sip 4.19.8 py37h6538335_0 defaults
six 1.15.0 py_0 defaults
sqlite 3.32.3 h2a8f88b_0 defaults
sympy 1.6.2 pypi_0 pypi
terminado 0.8.3 py37_0 defaults
testpath 0.4.4 py_0 defaults
tk 8.6.10 he774522_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
toml 0.10.1 pypi_0 pypi
tornado 6.0.4 py37he774522_1 defaults
tqdm 4.48.2 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
traitlets 4.3.3 py37_0 defaults
typeguard 2.9.1 pypi_0 pypi
urllib3 1.25.10 py_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
vc 14.1 h0510ff6_4 defaults
virtualenv 20.0.29 pypi_0 pypi
visualdl 2.0.0b8 pypi_0 pypi
vs2015_runtime 14.16.27012 hf0eaf9b_3 defaults
wcwidth 0.2.5 py_0 defaults
webencodings 0.5.1 py37_1 defaults
werkzeug 1.0.1 pypi_0 pypi
wheel 0.34.2 py37_0 defaults
widgetsnbextension 3.5.1 py37_0 defaults
win_inet_pton 1.1.0 py37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wincertstore 0.2 py37_0 defaults
winpty 0.4.3 4 defaults
xz 5.2.5 h62dcd97_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yaml 0.2.5 he774522_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zeromq 4.3.2 ha925a31_2 defaults
zipp 3.1.0 py_0 defaults
zlib 1.2.11 h62dcd97_4 defaults
zstd 1.4.5 h04227a9_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
训练文件一个957Mb,我运行另外一个用tf写的程序能检测出有12G的空闲内存,感觉是不是调用不到GPU?