win10系统下显存使用率0%，却提示显存不足 (#26430) · Issue · PaddlePaddle / Paddle

win10系统下显存使用率0%，却提示显存不足

Created by: a2824256

标题：win10系统下显存使用率0%，却提示显存不足
版本、环境信息： 1）PaddlePaddle版本：1.8 2）CPU：intel i7-8700k 3）GPU：NVIDIA TIAN XP 4）系统环境：win10, conda, python3.7.7
训练信息 1）单机单卡 2）显存信息 3）Operator信息
复现信息：自己编写的一份代码，用于信号预测
问题描述：报错信息：

(pdpd) E:\integral>python train.py
data_pretreatmenting
data_pretreatment_finished
network start
network end
W0819 11:57:30.471578  5956 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0
W0819 11:57:30.479526  5956 device_context.cc:260] device: 0, cuDNN Version: 7.6.
开始训练
W0819 11:58:48.131574  5956 operator.cc:187] elementwise_add raises an exception struct paddle::memory::allocation::BadAlloc,

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.

----------------------
Error Message Summary:
----------------------
ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 2.456825GB memory on GPU 0, available memory is only 5.439270GB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.

 at (D:\1.8.4\paddle\paddle\fluid\memory\allocation\cuda_allocator.cc:69)
C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py:1070: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "train.py", line 136, in <module>
    avg_loss_value, = exe.run(main_program, feed=feeder.feed(data_train), fetch_list=[loss])
  File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py", line 1071, in run
    six.reraise(*sys.exc_info())
  File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\six.py", line 703, in reraise
    raise value
  File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py", line 1066, in run
    return_merged=return_merged)
  File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py", line 1154, in _run_impl
    use_program_cache=use_program_cache)
  File "C:\ProgramData\Miniconda3\envs\pdpd\lib\site-packages\paddle\fluid\executor.py", line 1229, in _run_program
    fetch_var_name)
RuntimeError:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.

----------------------
Error Message Summary:
----------------------
ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 2.456825GB memory on GPU 0, available memory is only 5.439270GB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.

 at (D:\1.8.4\paddle\paddle\fluid\memory\allocation\cuda_allocator.cc:69)

train.py文件代码：

from __future__ import print_function
import paddle
import paddle.fluid as fluid

import numpy as np
import math
import csv
from paddle.utils.plot import Ploter
import sys

ITERABLE = True
TRAIN_DATA = []
TEST_DATA = []
TRAIN_RES = []
TEST_RES = []
# TRAIN_FILES = ["./train_set.csv", "./train_set2.csv", "./train_set3.csv"]
TRAIN_FILES = "./train_set.csv"
TEST_FILE = "./test_set4.csv"


def data_pretreatment():
    # train
    with open(TRAIN_FILES) as f:
        render = csv.reader(f)
        for row in render:
            if len(row) != 0:
                TRAIN_DATA.append([row[0], row[1], row[2], row[3]])
                TRAIN_RES.append([row[4], row[5]])

    # test
    with open(TEST_FILE) as f:
        render = csv.reader(f)
        for row in render:
            if len(row) != 0:
                TEST_DATA.append([row[0], row[1], row[2], row[3]])
                TEST_RES.append([row[4], row[5]])


def train_sample_reader():
    for i in range(1):
        input = np.array(TRAIN_DATA).astype('float32')
        label = np.array(TRAIN_RES).astype('float32')
        yield input, label

def test_sample_reader():
    for i in range(1):
        input = np.array(TEST_DATA).astype('float32')
        label = np.array(TEST_RES).astype('float32')
        yield input, label


def train(executor, program, reader, feeder, fetch_list):
    accumulated = 1 * [0]
    count = 0
    for data_test in reader():
        outs = executor.run(program=program,
                            feed=feeder.feed(data_test),
                            fetch_list=fetch_list)
        accumulated = [x_c[0] + x_c[1][0] for x_c in zip(accumulated, outs)]  # 累加测试过程中的损失值
        count += 1  # 累加测试集中的样本数量
    return [x_d / count for x_d in accumulated]

# 训练部分

# data pretreatment
print("data_pretreatmenting")
data_pretreatment()
print("data_pretreatment_finished")
train_reader = fluid.io.batch(train_sample_reader, batch_size=1)
test_reader = fluid.io.batch(test_sample_reader, batch_size=1)
print("network start")
# network

INPUT = fluid.data(name='input', shape=[None, 4], dtype='float32')
LABEL = fluid.data(name='label', shape=[None, 2], dtype='float32')
hidden = fluid.layers.fc(name='fc1', input=INPUT, size=20, act='relu')
param_attr_1 = fluid.ParamAttr(name='batch_norm_w_1', initializer=fluid.initializer.Constant(value=1.0))
bias_attr_1 = fluid.ParamAttr(name='batch_norm_b_1', initializer=fluid.initializer.Constant(value=0.0))
hidden = fluid.layers.batch_norm(input=hidden, param_attr=param_attr_1, bias_attr=bias_attr_1)
hidden = fluid.layers.fc(name='fc2', input=hidden, size=40, act='relu')
param_attr_2 = fluid.ParamAttr(name='batch_norm_w_2', initializer=fluid.initializer.Constant(value=1.0))
bias_attr_2 = fluid.ParamAttr(name='batch_norm_b_2', initializer=fluid.initializer.Constant(value=0.0))
hidden = fluid.layers.batch_norm(input=hidden, param_attr=param_attr_2, bias_attr=bias_attr_2)
hidden = fluid.layers.fc(name='fc3', input=hidden, size=20, act='relu')
param_attr_3 = fluid.ParamAttr(name='batch_norm_w_3', initializer=fluid.initializer.Constant(value=1.0))
bias_attr_3 = fluid.ParamAttr(name='batch_norm_b_3', initializer=fluid.initializer.Constant(value=0.0))
hidden = fluid.layers.batch_norm(input=hidden, param_attr=param_attr_3, bias_attr=bias_attr_3)
hidden = fluid.layers.fc(name='fc4', input=hidden, size=10, act='relu')
param_attr_4 = fluid.ParamAttr(name='batch_norm_w_4', initializer=fluid.initializer.Constant(value=1.0))
bias_attr_4 = fluid.ParamAttr(name='batch_norm_b_4', initializer=fluid.initializer.Constant(value=0.0))
hidden = fluid.layers.batch_norm(input=hidden, param_attr=param_attr_4, bias_attr=bias_attr_4)
prediction = fluid.layers.fc(name='res', input=hidden, size=2, act=None)
print("network end")
# main program
main_program = fluid.default_main_program()  # 获取默认/全局主函数
startup_program = fluid.default_startup_program()

# loss
loss = fluid.layers.mean(fluid.layers.mse_loss(input=prediction, label=LABEL))
# loss = fluid.layers.mse_loss(input=prediction, label=LABEL)
# acc = fluid.layers.accuracy(input=prediction, label=LABEL)

# test program
test_program = main_program.clone(for_test=True)
adam = fluid.optimizer.Adam(learning_rate=0.01)
adam.minimize(loss)

use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)

num_epochs = 100


params_dirname = "./my_paddle_model"
feeder = fluid.DataFeeder(place=place, feed_list=[INPUT, LABEL])
exe.run(startup_program)
train_prompt = "train cost"
test_prompt = "test cost"

step = 0

exe_test = fluid.Executor(place)
print("开始训练")
for pass_id in range(num_epochs):
    for data_train in train_reader():
        step = step + 1
        avg_loss_value, = exe.run(main_program, feed=feeder.feed(data_train), fetch_list=[loss])
        if step % 10 == 0: 
            # plot_prompt.append(train_prompt, step, avg_loss_value[0])
            # plot_prompt.plot()
            print("%s, Epoch %d, Step %d, Cost %f" % (train_prompt, pass_id, step, avg_loss_value[0]))
        if step % 10 == 0:  
            test_metics = train(executor=exe_test, program=test_program, reader=test_reader, fetch_list=[loss.name], feeder=feeder)
            # plot_prompt.append(test_prompt, step, test_metics[0])
            # plot_prompt.plot()
            print("%s, Epoch %d, Step %d, Cost %f" % (test_prompt, pass_id, step, test_metics[0]))
            if test_metics[0] < 10.0: 
                break

        if math.isnan(float(avg_loss_value[0])):
            sys.exit("got NaN loss, training failed.")

        #  保存训练参数到之前给定的路径中
        if params_dirname is not None:
            fluid.io.save_inference_model(params_dirname, ['input'], [prediction], exe)

conda 包版本

(pdpd) E:\integral>conda list
# packages in environment at C:\ProgramData\Miniconda3\envs\pdpd:
#
# Name                    Version                   Build  Channel
appdirs                   1.4.4                    pypi_0    pypi
astor                     0.8.1                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
attrs                     19.3.0                     py_0    defaults
babel                     2.8.0                    pypi_0    pypi
backcall                  0.2.0                      py_0    defaults
blas                      1.0                         mkl    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
bleach                    3.1.5                      py_0    defaults
brotlipy                  0.7.0           py37he774522_1000    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates           2020.6.24                     0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
certifi                   2020.6.20                py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cffi                      1.14.0           py37h7a1dbc1_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cfgv                      3.2.0                    pypi_0    pypi
chardet                   3.0.4                 py37_1003    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
click                     7.1.2                      py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
colorama                  0.4.3                      py_0    defaults
cryptography              2.9.2            py37h7a1dbc1_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudatoolkit               10.0.130                      0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cudnn                     7.6.5                cuda10.0_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cycler                    0.10.0                   py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cython                    0.29.21                  pypi_0    pypi
decorator                 4.4.2                      py_0    defaults
defusedxml                0.6.0                      py_0    defaults
distlib                   0.3.1                    pypi_0    pypi
docstring-parser          0.3                      pypi_0    pypi
entrypoints               0.3                      py37_0    defaults
filelock                  3.0.12                   pypi_0    pypi
flake8                    3.8.3                    pypi_0    pypi
flask                     1.1.2                    pypi_0    pypi
flask-babel               1.0.0                    pypi_0    pypi
freetype                  2.10.2               hd328e21_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
funcsigs                  1.0.2                    pypi_0    pypi
gast                      0.4.0                      py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
graphviz                  2.38                 hfd603c8_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
hdf5                      1.8.20               hac2f561_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
icc_rt                    2019.0.0             h0cc432a_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
icu                       58.2                 ha925a31_3    defaults
identify                  1.4.25                   pypi_0    pypi
idna                      2.10                       py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
importlib-metadata        1.7.0                    py37_0    defaults
importlib_metadata        1.7.0                         0    defaults
intel-openmp              2020.1                      216    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ipykernel                 5.3.4            py37h5ca1d4c_0    defaults
ipython                   7.16.1           py37h5ca1d4c_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ipython_genutils          0.2.0                    py37_0    defaults
ipywidgets                7.5.1                      py_0    defaults
itsdangerous              1.1.0                    pypi_0    pypi
jedi                      0.17.2                   py37_0    defaults
jinja2                    2.11.2                     py_0    defaults
joblib                    0.16.0                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jpeg                      9b                   hb83a4c4_2    defaults
jsonschema                3.2.0                    py37_1    defaults
jupyter                   1.0.0                    py37_7    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jupyter_client            6.1.6                      py_0    defaults
jupyter_console           6.1.0                      py_0    defaults
jupyter_core              4.6.3                    py37_0    defaults
kiwisolver                1.2.0            py37h74a9793_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libopencv                 3.4.2                h20b85fd_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libpng                    1.6.37               h2a8f88b_0    defaults
libprotobuf               3.12.4               h200bbdf_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libsodium                 1.0.18               h62dcd97_0    defaults
libtiff                   4.1.0                h56a325e_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
llvmlite                  0.33.0                   pypi_0    pypi
lz4-c                     1.9.2                h62dcd97_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
m2w64-gcc-libgfortran     5.3.0                         6    defaults
m2w64-gcc-libs            5.3.0                         7    defaults
m2w64-gcc-libs-core       5.3.0                         7    defaults
m2w64-gmp                 6.1.0                         2    defaults
m2w64-libwinpthread-git   5.0.0.4634.697f757               2    defaults
markupsafe                1.1.1            py37hfa6e2cd_1    defaults
matplotlib                3.2.2                         0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
matplotlib-base           3.2.2            py37h64f37c6_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mccabe                    0.6.1                    pypi_0    pypi
mistune                   0.8.4           py37hfa6e2cd_1001    defaults
mkl                       2020.1                      216    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl-service               2.3.0            py37hb782905_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_fft                   1.1.0            py37h45dec08_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mkl_random                1.1.1            py37h47e9c7a_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
mpmath                    1.1.0                    pypi_0    pypi
msys2-conda-epoch         20160418                      1    defaults
nbconvert                 5.6.1                    py37_1    defaults
nbformat                  5.0.7                      py_0    defaults
nltk                      3.5                        py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
nodeenv                   1.4.0                    pypi_0    pypi
notebook                  6.0.3                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numba                     0.50.1                   pypi_0    pypi
numpy                     1.19.1           py37h5510c5b_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy-base                1.19.1           py37ha3acd2a_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
objgraph                  3.4.1                    pypi_0    pypi
olefile                   0.46                     py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opencv                    3.4.2            py37h40b0b35_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
opencv-python             4.1.1.26                 pypi_0    pypi
openssl                   1.1.1g               he774522_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
packaging                 20.4                       py_0    defaults
paddlepaddle-gpu          1.8.4.post107            pypi_0    pypi
pandoc                    2.10.1                        0    defaults
pandocfilters             1.4.2                    py37_1    defaults
parso                     0.7.0                      py_0    defaults
pathlib                   1.0.1                    py37_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pickleshare               0.7.5                 py37_1001    defaults
pillow                    7.2.0            py37hcc1f983_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pip                       20.2.2                   py37_0    defaults
pre-commit                2.6.0                    pypi_0    pypi
prettytable               0.7                      pypi_0    pypi
prometheus_client         0.8.0                      py_0    defaults
prompt-toolkit            3.0.5                      py_0    defaults
prompt_toolkit            3.0.5                         0    defaults
protobuf                  3.12.4           py37ha925a31_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
py-cpuinfo                5.0.0                      py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
py-opencv                 3.4.2            py37hc319ecb_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pycocotools               2.0                      pypi_0    pypi
pycodestyle               2.6.0                    pypi_0    pypi
pycparser                 2.20                       py_2    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyflakes                  2.2.0                    pypi_0    pypi
pygments                  2.6.1                      py_0    defaults
pyopenssl                 19.1.0                     py_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyparsing                 2.4.7                      py_0    defaults
pyqt                      5.9.2            py37h6538335_2    defaults
pyrsistent                0.16.0           py37he774522_0    defaults
pysocks                   1.7.1                    py37_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python                    3.7.7                h81c818b_4    defaults
python-dateutil           2.8.1                      py_0    defaults
pytz                      2020.1                   pypi_0    pypi
pywin32                   227              py37he774522_1    defaults
pywinpty                  0.5.7                    py37_0    defaults
pyyaml                    5.3.1            py37he774522_1    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyzmq                     19.0.1           py37ha925a31_1    defaults
qt                        5.9.7            vc14h73c81de_0    defaults
qtconsole                 4.7.5                      py_0    defaults
qtpy                      1.9.0                      py_0    defaults
recordio                  0.1                      pypi_0    pypi
regex                     2020.7.14        py37he774522_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
requests                  2.24.0                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
scipy                     1.5.0            py37h9439919_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
send2trash                1.5.0                    py37_0    defaults
setuptools                49.6.0                   py37_0    defaults
shapely                   1.7.0                    pypi_0    pypi
sip                       4.19.8           py37h6538335_0    defaults
six                       1.15.0                     py_0    defaults
sqlite                    3.32.3               h2a8f88b_0    defaults
sympy                     1.6.2                    pypi_0    pypi
terminado                 0.8.3                    py37_0    defaults
testpath                  0.4.4                      py_0    defaults
tk                        8.6.10               he774522_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
toml                      0.10.1                   pypi_0    pypi
tornado                   6.0.4            py37he774522_1    defaults
tqdm                      4.48.2                     py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
traitlets                 4.3.3                    py37_0    defaults
typeguard                 2.9.1                    pypi_0    pypi
urllib3                   1.25.10                    py_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
vc                        14.1                 h0510ff6_4    defaults
virtualenv                20.0.29                  pypi_0    pypi
visualdl                  2.0.0b8                  pypi_0    pypi
vs2015_runtime            14.16.27012          hf0eaf9b_3    defaults
wcwidth                   0.2.5                      py_0    defaults
webencodings              0.5.1                    py37_1    defaults
werkzeug                  1.0.1                    pypi_0    pypi
wheel                     0.34.2                   py37_0    defaults
widgetsnbextension        3.5.1                    py37_0    defaults
win_inet_pton             1.1.0                    py37_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wincertstore              0.2                      py37_0    defaults
winpty                    0.4.3                         4    defaults
xz                        5.2.5                h62dcd97_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yaml                      0.2.5                he774522_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zeromq                    4.3.2                ha925a31_2    defaults
zipp                      3.1.0                      py_0    defaults
zlib                      1.2.11               h62dcd97_4    defaults
zstd                      1.4.5                h04227a9_0    https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

训练文件一个957Mb，我运行另外一个用tf写的程序能检测出有12G的空闲内存，感觉是不是调用不到GPU？

PaddlePaddle / Paddle 1 年多 前同步成功

win10系统下显存使用率0%，却提示显存不足

PaddlePaddle / Paddle
1 年多前同步成功