bert模型预测出错 (#540) · Issue · PaddlePaddle / Serving

bert模型预测出错

Created by: ClassmateXiaoyu
你好，我今天试着用HTTP和RPC两种方式启动bert的预测，分别遇到了问题。最开始已经用命令ps -ef | grep "serving" | grep -v grep | awk '{print $2}' | xargs kill先杀一遍进程，HTTP的启动命令、报错、日志如下：
[root@768de910d24c /]# python -m paddle_serving_server.serve --model bert_seq20_model --port 9292 --thread 4 --name bert &>bert_log.txt &
[1] 1060
[root@768de910d24c /]# curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://0.0.0.0:9292/bert/prediction
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>

[root@768de910d24c /]# cat bert_log.txt
web service address:
http://172.17.0.2:9292/bert/prediction
mkdir: cannot create directory 'workdir': File exists
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0509 06:29:50.606771  1066 config_manager.cpp:217] Not found key in configue: cluster
E0509 06:29:50.606863  1066 config_manager.cpp:234] Not found key in configue: split_tag_name
E0509 06:29:50.606878  1066 config_manager.cpp:235] Not found key in configue: tag_candidates
E0509 06:29:50.606911  1066 config_manager.cpp:202] Not found key in configue: connect_timeout_ms
E0509 06:29:50.606927  1066 config_manager.cpp:203] Not found key in configue: rpc_timeout_ms
E0509 06:29:50.606941  1066 config_manager.cpp:205] Not found key in configue: hedge_request_timeout_ms
E0509 06:29:50.606956  1066 config_manager.cpp:207] Not found key in configue: connect_retry_count
E0509 06:29:50.606971  1066 config_manager.cpp:209] Not found key in configue: hedge_fetch_retry_count
E0509 06:29:50.606986  1066 config_manager.cpp:211] Not found key in configue: max_connection_per_host
E0509 06:29:50.606999  1066 config_manager.cpp:212] Not found key in configue: connection_type
E0509 06:29:50.607236  1066 config_manager.cpp:219] Not found key in configue: load_balance_strategy
E0509 06:29:50.607264  1066 config_manager.cpp:221] Not found key in configue: cluster_filter_strategy
E0509 06:29:50.607308  1066 config_manager.cpp:226] Not found key in configue: protocol
E0509 06:29:50.607322  1066 config_manager.cpp:227] Not found key in configue: compress_type
E0509 06:29:50.607331  1066 config_manager.cpp:228] Not found key in configue: package_size
E0509 06:29:50.607340  1066 config_manager.cpp:230] Not found key in configue: max_channel_per_request
E0509 06:29:50.607349  1066 config_manager.cpp:234] Not found key in configue: split_tag_name
E0509 06:29:50.607358  1066 config_manager.cpp:235] Not found key in configue: tag_candidates
I0509 06:29:50.613191  1066 naming_service_thread.cpp:209] brpc::policy::ListNamingService("0.0.0.0:9293"): added 1
default_variant_conf {
  tag: "default"
  connection_conf {
    connect_timeout_ms: 2000
    rpc_timeout_ms: 20000
    connect_retry_count: 2
    max_connection_per_host: 100
    hedge_request_timeout_ms: -1
    hedge_fetch_retry_count: 2
    connection_type: "pooled"
  }
  naming_conf {
    cluster_filter_strategy: "Default"
    load_balance_strategy: "la"
  }
  rpc_parameter {
    compress_type: 0
    package_size: 20
    protocol: "baidu_std"
    max_channel_per_request: 3
  }
}
predictors {
  name: "general_model"
  service_name: "baidu.paddle_serving.predictor.general_model.GeneralModelService"
  endpoint_router: "WeightedRandomRender"
  weighted_random_render_conf {
    variant_weight_list: "100"
  }
  variants {
    tag: "var1"
    naming_conf {
      cluster: "list://0.0.0.0:9293"
    }
  }
}

 * Serving Flask app "paddle_serving_server.web_service" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:9292/ (Press CTRL+C to quit)
I0100 00:00:00.000000  1074 op_repository.h:65] RAW: Succ regist op: GeneralDistKVInferOp
I0100 00:00:00.000000  1074 op_repository.h:65] RAW: Succ regist op: GeneralTextReaderOp
I0100 00:00:00.000000  1074 op_repository.h:65] RAW: Succ regist op: GeneralCopyOp
I0100 00:00:00.000000  1074 op_repository.h:65] RAW: Succ regist op: GeneralDistKVQuantInferOp
I0100 00:00:00.000000  1074 op_repository.h:65] RAW: Succ regist op: GeneralReaderOp
I0100 00:00:00.000000  1074 op_repository.h:65] RAW: Succ regist op: GeneralInferOp
I0100 00:00:00.000000  1074 op_repository.h:65] RAW: Succ regist op: GeneralTextResponseOp
I0100 00:00:00.000000  1074 op_repository.h:65] RAW: Succ regist op: GeneralResponseOp
I0100 00:00:00.000000  1074 service_manager.h:61] RAW: Service[LoadGeneralModelService] insert successfully!
I0100 00:00:00.000000  1074 load_general_model_service.pb.h:299] RAW: Success regist service[LoadGeneralModelService][PN5baidu14paddle_serving9predictor26load_general_model_service27LoadGeneralModelServiceImplE]
I0100 00:00:00.000000  1074 service_manager.h:61] RAW: Service[GeneralModelService] insert successfully!
I0100 00:00:00.000000  1074 general_model_service.pb.h:1220] RAW: Success regist service[GeneralModelService][PN5baidu14paddle_serving9predictor13general_model23GeneralModelServiceImplE]
I0100 00:00:00.000000  1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000  1074 fluid_cpu_engine.cpp:25] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine<FluidCpuAnalysisCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS in macro!
I0100 00:00:00.000000  1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS_DIR, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000  1074 fluid_cpu_engine.cpp:31] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuAnalysisDirCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS_DIR in macro!
I0100 00:00:00.000000  1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS_DIR_SIGMOID, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000  1074 fluid_cpu_engine.cpp:37] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuAnalysisDirWithSigmoidCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS_DIR_SIGMOID in macro!
I0100 00:00:00.000000  1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000  1074 fluid_cpu_engine.cpp:42] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine<FluidCpuNativeCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE in macro!
I0100 00:00:00.000000  1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE_DIR, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000  1074 fluid_cpu_engine.cpp:47] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine<FluidCpuNativeDirCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE_DIR in macro!
I0100 00:00:00.000000  1074 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE_DIR_SIGMOID, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000  1074 fluid_cpu_engine.cpp:53] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuNativeDirWithSigmoidCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE_DIR_SIGMOID in macro!
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [attention_lstm_fuse_pass]
--- Running IR pass [seqconv_eltadd_relu_fuse_pass]
--- Running IR pass [seqpool_cvm_concat_fuse_pass]
--- Running IR pass [fc_lstm_fuse_pass]
--- Running IR pass [mul_lstm_fuse_pass]
--- Running IR pass [fc_gru_fuse_pass]
--- Running IR pass [mul_gru_fuse_pass]
--- Running IR pass [seq_concat_fc_fuse_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [repeated_fc_relu_fuse_pass]
--- Running IR pass [squared_mat_sub_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [conv_transpose_bn_fuse_pass]
--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [ir_graph_to_program_pass]
W0509 06:29:57.814424  1066 predictor.hpp:129] inference call failed, message: [E-5100]1/1 channels failed, fail_limit=1 [C0][E-5100][172.17.0.2:9293][E-5100]InferService inference failed!
E0509 06:29:57.814563  1066 general_model.cpp:369] failed call predictor with req: insts { } fetch_var_names: "pooled_output"
[2020-05-09 06:29:57,815] ERROR in app: Exception on /bert/prediction [POST]
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1953, in full_dispatch_request
    return self.finalize_request(rv)
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 1968, in finalize_request
    response = self.make_response(rv)
  File "/usr/lib/python2.7/site-packages/flask/app.py", line 2098, in make_response
    "The view function did not return a valid response. The"
TypeError: The view function did not return a valid response. The function either returned None or ended without a return statement.
127.0.0.1 - - [09/May/2020 06:29:57] "POST /bert/prediction HTTP/1.1" 500 -
Going to Run Command
/usr/lib/python2.7/site-packages/paddle_serving_server/serving-cpu-noavx-openblas-0.2.0/serving -enable_model_toolkit -inferservice_path workdir -inferservice_file infer_service.prototxt -max_concurrency 0 -num_threads 16 -port 8080 -reload_interval_s 10 -resource_path workdir -resource_file resource.prototxt -workflow_path workdir -workflow_file workflow.prototxt -bthread_concurrency 16
PaddlePaddle / Serving 大约 2 年 前同步成功

bert模型预测出错

PaddlePaddle / Serving
大约 2 年前同步成功