bert模型预测出错
Created by: ClassmateXiaoyu
你好,我今天试着用HTTP和RPC两种方式启动bert的预测,分别遇到了问题。 最开始已经用命令ps -ef | grep "serving" | grep -v grep | awk '{print $2}' | xargs kill先杀一遍进程,HTTP的启动命令、报错、日志如下:
[root@768de910d24c /]# python -m paddle_serving_server.serve --model bert_seq20_model --port 9292 --thread 4 --name bert &>bert_log.txt &
[1] 1060
[root@768de910d24c /]# curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://0.0.0.0:9292/bert/prediction
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
[root@768de910d24c /]# cat bert_log.txt
web service address:
http://172.17.0.2:9292/bert/prediction
mkdir: cannot create directory 'workdir': File exists
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0509 06:29:50.606771 1066 config_manager.cpp:217] Not found key in configue: cluster
E0509 06:29:50.606863 1066 config_manager.cpp:234] Not found key in configue: split_tag_name
E0509 06:29:50.606878 1066 config_manager.cpp:235] Not found key in configue: tag_candidates
E0509 06:29:50.606911 1066 config_manager.cpp:202] Not found key in configue: connect_timeout_ms
E0509 06:29:50.606927 1066 config_manager.cpp:203] Not found key in configue: rpc_timeout_ms
E0509 06:29:50.606941 1066 config_manager.cpp:205] Not found key in configue: hedge_request_timeout_ms
E0509 06:29:50.606956 1066 config_manager.cpp:207] Not found key in configue: connect_retry_count
E0509 06:29:50.606971 1066 config_manager.cpp:209] Not found key in configue: hedge_fetch_retry_count
E0509 06:29:50.606986 1066 config_manager.cpp:211] Not found key in configue: max_connection_per_host
E0509 06:29:50.606999 1066 config_manager.cpp:212] Not found key in configue: connection_type
E0509 06:29:50.607236 1066 config_manager.cpp:219] Not found key in configue: load_balance_strategy
E0509 06:29:50.607264 1066 config_manager.cpp:221] Not found key in configue: cluster_filter_strategy
E0509 06:29:50.607308 1066 config_manager.cpp:226] Not found key in configue: protocol
E0509 06:29:50.607322 1066 config_manager.cpp:227] Not found key in configue: compress_type
E0509 06:29:50.607331 1066 config_manager.cpp:228] Not found key in configue: package_size
E0509 06:29:50.607340 1066 config_manager.cpp:230] Not found key in configue: max_channel_per_request
E0509 06:29:50.607349 1066 config_manager.cpp:234] Not found key in configue: split_tag_name
E0509 06:29:50.607358 1066 config_manager.cpp:235] Not found key in configue: tag_candidates
I0509 06:29:50.613191 1066 naming_service_thread.cpp:209] brpc::policy::ListNamingService("0.0.0.0:9293"): added 1
default_variant_conf {
tag: "default"
connection_conf {
connect_timeout_ms: 2000
rpc_timeout_ms: 20000
connect_retry_count: 2
max_connection_per_host: 100
hedge_request_timeout_ms: -1
hedge_fetch_retry_count: 2
connection_type: "pooled"
}
naming_conf {
cluster_filter_strategy: "Default"
load_balance_strategy: "la"
}
rpc_parameter {
compress_type: 0
package_size: 20
protocol: "baidu_std"
max_channel_per_request: 3
}
}
predictors {
name: "general_model"
service_name: "baidu.paddle_serving.predictor.general_model.GeneralModelService"
endpoint_router: "WeightedRandomRender"
weighted_random_render_conf {
variant_weight_list: "100"
}
variants {
tag: "var1"
naming_conf {
cluster: "list://0.0.0.0:9293"
}
}
}
* Serving Flask app "paddle_serving_server.web_service" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:9292/ (Press CTRL+C to quit)
I0100 00:00:00.000000 1074 op_repository.h:65] RAW: Succ regist op: GeneralDistKVInferOp
I0100 00:00:00.000000 1074 op_repository.h:65] RAW: Succ regist op: GeneralTextReaderOp
I0100 00:00:00.000000 1074 op_repository.h:65] RAW: Succ regist op: GeneralCopyOp
I0100 00:00:00.000000 1074 op_repository.h:65] RAW: Succ regist op: GeneralDistKVQuantInferOp
I0100 00:00:00.000000 1074 op_repository.h:65] RAW: Succ regist op: GeneralReaderOp
I0100 00:00:00.000000 1074 op_repository.h:65] RAW: Succ regist op: GeneralInferOp
I0100 00:00:00.000000 1074 op_repository.h:65] RAW: Succ regist op: GeneralTextResponseOp
I0100 00:00:00.000000 1074 op_repository.h:65] RAW: Succ regist op: GeneralResponseOp
I0100 00:00:00.000000 1074 service_manager.h:61] RAW: Service[LoadGeneralModelService] insert successfully!
I0100 00:00:00.000000 1074 load_general_model_service.pb.h:299] RAW: Success regist service[LoadGeneralModelService][PN5baidu14paddle_serving9predictor26load_general_model_service27LoadGeneralModelServiceImplE]
I0100 00:00:00.000000 1074 service_manager.h:61] RAW: Service[GeneralModelService] insert successfully!
I0100 00:00:00.000000 1074 general_model_service.pb.h:1220] RAW: Success regist service[GeneralModelService][PN5baidu14paddle_serving9predictor13general_model23GeneralModelServiceImplE]
I0100 00:00:00.000000 1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 1074 fluid_cpu_engine.cpp:25] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine<FluidCpuAnalysisCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS in macro!
I0100 00:00:00.000000 1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS_DIR, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 1074 fluid_cpu_engine.cpp:31] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuAnalysisDirCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS_DIR in macro!
I0100 00:00:00.000000 1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS_DIR_SIGMOID, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 1074 fluid_cpu_engine.cpp:37] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuAnalysisDirWithSigmoidCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS_DIR_SIGMOID in macro!
I0100 00:00:00.000000 1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 1074 fluid_cpu_engine.cpp:42] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine<FluidCpuNativeCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE in macro!
I0100 00:00:00.000000 1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE_DIR, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 1074 fluid_cpu_engine.cpp:47] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine<FluidCpuNativeDirCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE_DIR in macro!
I0100 00:00:00.000000 1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE_DIR_SIGMOID, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 1074 fluid_cpu_engine.cpp:53] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuNativeDirWithSigmoidCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE_DIR_SIGMOID in macro!
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [fc_lstm_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
W0509 06:29:57.814424 1066 predictor.hpp:129] inference call failed, message: [E-5100]1/1 channels failed, fail_limit=1 [C0][E-5100][172.17.0.2:9293][E-5100]InferService inference failed!
E0509 06:29:57.814563 1066 general_model.cpp:369] failed call predictor with req: insts { } fetch_var_names: "pooled_output"
[2020-05-09 06:29:57,815] ERROR in app: Exception on /bert/prediction [POST]
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1953, in full_dispatch_request
return self.finalize_request(rv)
File "/usr/lib/python2.7/site-packages/flask/app.py", line 1968, in finalize_request
response = self.make_response(rv)
File "/usr/lib/python2.7/site-packages/flask/app.py", line 2098, in make_response
"The view function did not return a valid response. The"
TypeError: The view function did not return a valid response. The function either returned None or ended without a return statement.
127.0.0.1 - - [09/May/2020 06:29:57] "POST /bert/prediction HTTP/1.1" 500 -
Going to Run Command
/usr/lib/python2.7/site-packages/paddle_serving_server/serving-cpu-noavx-openblas-0.2.0/serving -enable_model_toolkit -inferservice_path workdir -inferservice_file infer_service.prototxt -max_concurrency 0 -num_threads 16 -port 8080 -reload_interval_s 10 -resource_path workdir -resource_file resource.prototxt -workflow_path workdir -workflow_file workflow.prototxt -bthread_concurrency 16