ResourceExhaustedError:Out of memory error on GPU 0.
Created by: Firework471
错误信息: RuntimeError:
C++ Call Stacks (More useful to developers):
Windows not support stack backtrace yet.
Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 25.000244MB memory on GPU 0, available memory is only 2.349999MB.
Please check whether there is any other process using GPU 0.
- If yes, please stop them, or start PaddlePaddle on another GPU.
- If no, please try one of the following suggestions:
- Decrease the batch size of your model.
- FLAGS_fraction_of_gpu_memory_to_use is 0.50 now, please set it to a higher value but less than 1.0.
The command is
export FLAGS_fraction_of_gpu_memory_to_use=xxx
.
at (D:\1.7.1\paddle\paddle\fluid\memory\detail\system_allocator.cc:150)
部分代码(基本是按照教程写的): for j in range(len(train_data) // BATCH): feature, label = get_batch_data(train_data, j) loss, _ = ernie(feature, labels=label) # ernie模型的返回值包含(loss, logits);其中logits目前暂时不需要使用 loss.backward() optimizer.minimize(loss) ernie.clear_gradients() if j % 10 == 0: print('train %d: loss %.5f' % (j, loss.numpy())) # evaluate if j % 100 == 0: all_pred, all_label = [], [] with D.base.switch_tracer_mode_guard(is_train=False): # 在这个with域内ernie不会进行梯度计算; ernie.eval() # 控制模型进入eval模式,这将会关闭所有的dropout; for j in range(len(test_data) // BATCH): feature, label = get_batch_data(test_data, j) loss, logits = ernie(feature, labels=label) all_pred.extend(L.argmax(logits, -1).numpy()) all_label.extend(label.numpy()) ernie.train() f1 = f1_score(all_label, all_pred, average='macro') acc = (np.array(all_label) == np.array(all_pred)).astype(np.float32).mean() print('acc %.5f' % acc) 调试的啥时候出错的地方是loss, _ = ernie(feature, labels=label) 这一行 在代码前面加了 os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = "0.9" os.environ["FLAGS_eager_delete_tensor_gb"] = "0" 并且把batch调成1依然还是出现同样的错误,希望各位大佬不吝赐教,感激感激!