GPU模式下,flask threaded=true时,只发送单个请求inference infer报错
Created by: RacingDawn
报错具体信息: F0905 03:29:56.093694 4176 hl_cuda_cudnn.cc:251] Check failed: CUDNN_STATUS_SUCCESS == cudnnStat (0 vs. 3) Cudnn Error: CUDNN_STATUS_BAD_PARAM
* Check failure stack trace: *
@ 0x7f2ac561fcfd google::LogMessage::Fail() @ 0x7f2ac56237ac google::LogMessage::SendToLog() @ 0x7f2ac561f823 google::LogMessage::Flush() @ 0x7f2ac5624cbe google::LogMessageFatal::~LogMessageFatal() @ 0x7f2ac55d2b35 hl_conv_workspace() @ 0x7f2ac523c4d2 paddle::ConvBaseProjection::reshape() @ 0x7f2ac52f59bb paddle::ConvProjection::forward() @ 0x7f2ac523ecaf paddle::CudnnConvBaseLayer::forward() @ 0x7f2ac51f2f6d paddle::NeuralNetwork::forward() @ 0x7f2ac519157d _wrap_GradientMachine_forward @ 0x4c30ce PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4c1e6f PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4c1e6f PyEval_EvalFrameEx @ 0x4d4c9d (unknown) @ 0x4bc9b6 PyEval_EvalFrameEx @ 0x4d4c9d (unknown) @ 0x4bc9b6 PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4c16e7 PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4d55f3 (unknown) @ 0x4a577e PyObject_Call @ 0x4bed3d PyEval_EvalFrameEx @ 0x4c136f PyEval_EvalFrameEx @ 0x4c136f PyEval_EvalFrameEx @ 0x4c136f PyEval_EvalFrameEx @ 0x4b9ab6 PyEval_EvalCodeEx @ 0x4d54b9 (unknown) @ 0x4eebee (unknown) @ 0x4a577e PyObject_Callserving代码:
@app.route('/', methods=['POST'])
def infer():
fields = filter(lambda x: len(x) != 0, outputField.split(","))
with open(paramFile) as param_f, open(topologyFile) as topo_f:
params = paddle.parameters.Parameters.from_tar(param_f)
inferer = paddle.inference.Inference(parameters=params, fileobj=topo_f)
try:
feeding = {}
d = []
for i, key in enumerate(request.json):
d.append(request.json[key])
feeding[key] = i
r = inferer.infer([d,d], feeding=feeding, field=fields)
except:
trace = traceback.format_exc()
errorResp(trace)
if isinstance(r, list):
return successResp([elem.tolist() for elem in r])
else:
return successResp(r.tolist())
if __name__ == '__main__':
args = parser.parse_args()
paddle.init(use_gpu=args.gpu)
paramFile = args.paramFile
topologyFile = args.topologyFile
outputField = args.outputField
print 'serving on port', args.port
app.run(host='0.0.0.0', port=args.port, threaded=True)
当执行python start_paddleServ.py --topologyFile /data/objectDetection/inference_topology.pkl --paramFile /data/objectDetection/param.tar --gpu
启动serve,接着只通过一个client访问时,也会报上述错误。
将treaded置为false则不会;或者非gpu模式下,启动为treaded=true也不会报错。
参考其他issue:https://github.com/PaddlePaddle/DeepSpeech/issues/254 如果GPU模式下,多线程调用cudnn不安全,但是单个访问也会出现问题吗?