diff --git a/doc/CLIENT_CONFIGURE.md b/doc/CLIENT_CONFIGURE.md index dcc49e749017f706ca428ead26c92c66d75dc6a1..460adcbe194a9f60661d71eff29fa1ae4f87e1ce 100644 --- a/doc/CLIENT_CONFIGURE.md +++ b/doc/CLIENT_CONFIGURE.md @@ -1,6 +1,8 @@ # Client side configuration -Paddle Serving C++ client SDK主配置文件为conf/predictors.prototxt。其中一个示例如下: +Paddle Serving C++ client SDK的配置文件格式用protobuf定义,全部在configure/proto/sdk_configure.proto中。如果要增加配置字段,需要先在该protobuf文件中增加相应字段,才能被Serving SDK读取和解析。 + +Paddle Serving主配置文件为conf/predictors.prototxt。其中一个示例如下: ## 1. Sample conf diff --git a/doc/CREATING.md b/doc/CREATING.md index a70803fb8e0e2cb416ae3fac9cbceba0bdf0290e..d057af4c38ef97c14b532cc563157a514745acec 100644 --- a/doc/CREATING.md +++ b/doc/CREATING.md @@ -75,6 +75,8 @@ service ImageClassifyService { #### 2.2.2 示例配置 +关于Serving端的配置的详细信息,可以参考[Serving端配置](SERVING_CONFIGURE.md) + 以下配置文件将ReaderOP, ClassifyOP和WriteJsonOP串联成一个workflow (关于OP/workflow等概念,可参考[设计文档](DESIGN.md)) - 配置文件示例: @@ -209,6 +211,8 @@ target_link_libraries(serving opencv_imgcodecs |enable_model_toolkit|true|模型管理| |enable_protocol_list|baidu_std|brpc 通信协议列表| |log_dir|./log|log dir| +|num_threads|brpc server使用的系统线程数,默认为CPU核数| +|max_concurrency|并发处理的请求数,设为<=0则为不予限制,若大于0则限定brpc server端同时处理的请求数上限| 可以通过在serving/conf/gflags.conf覆盖默认值,例如 ``` diff --git a/doc/INDEX.md b/doc/INDEX.md index 2632ddc485ef1335549b1e8c6ef59d43b3f5995f..2a04b4db58968ca85ab32d7257f5084eb76750ce 100644 --- a/doc/INDEX.md +++ b/doc/INDEX.md @@ -10,3 +10,5 @@ [Getting Started](GETTING_STARTED.md) [Installation](INSTALL.md) + +[Server Side Configuration](SERVING_CONFIGURE.md) diff --git a/doc/SERVING_CONFIGURE.md b/doc/SERVING_CONFIGURE.md new file mode 100644 index 0000000000000000000000000000000000000000..4f6264d0e66d9758c93be3b9a37112734fe03a15 --- /dev/null +++ b/doc/SERVING_CONFIGURE.md @@ -0,0 +1,192 @@ +# Serving Side Configuration + + +Paddle Serving配置文件格式采用明文格式的protobuf文件,配置文件的每个字段都需要事先在configure/proto/目录下相关.proto定义中定义好,才能被protobuf读取和解析到。 + +Serving端的所有配置均在configure/proto/server_configure.proto文件中。 + +## 1. service.prototxt +Serving端service 配置的入口是service.prototxt,用于配置Paddle Serving实例挂载的service列表。他的protobuf格式可参考`configure/server_configure.protobuf`的`InferServiceConf`类型。(至于具体的磁盘文件路径可通过--inferservice_path与--inferservice_file 命令行选项修改),样例如下: + +```JSON +port: 8010 +services { + name: "ImageClassifyService" + workflows: "workflow1" +} +``` + +其中 + +port: 该字段标明本机serving实例启动的监听端口。默认为8010。还可以通过--port=8010命令行参数指定。 + +services: 可以配置多个services。Paddle Serving被设计为单个Serving实例可以同时承载多个预测服务,服务间通过service name进行区分。例如以下代码配置2个预测服务: +```JSON +port: 8010 +services { + name: "ImageClassifyService" + workflows: "workflow1" +} +services { + name: "BuiltinEchoService" + workflows: "workflow2" +} +``` + +service.name: 请填写serving/proto/xx.proto文件的service名称,例如,在serving/proto/image_class.proto中,service名称如下声明: +```JSON +service ImageClassifyService { + rpc inference(Request) returns (Response); + rpc debug(Request) returns (Response); + option (pds.options).generate_impl = true; +}; +``` +则service name就是`ImageClassifyService` + +service.workflows: 用于指定该service下所配的workflow列表。可以配置多个workflow。在本例中,为`ImageClassifyService`配置了一个workflow:`workflow1`。`workflow1`的具体定义在workflow.prototxt + +## 2. workflow.prototxt + +workflow.prototxt用来描述每一个具体的workflow,他的protobuf格式可参考`configure/server_configure.protobuf`的`Workflow`类型。具体的磁盘文件路径可通过--workflow_path和--workflow_file指定。一个例子如下: + +```JSON +workflows { + name: "workflow1" + workflow_type: "Sequence" + nodes { + name: "image_reader_op" + type: "ReaderOp" + } + nodes { + name: "image_classify_op" + type: "ClassifyOp" + dependencies { + name: "image_reader_op" + mode: "RO" + } + } + nodes { + name: "write_json_op" + type: "WriteJsonOp" + dependencies { + name: "image_classify_op" + mode: "RO" + } + } +} + +workflows { + name: "workflow2" + workflow_type: "Sequence" + nodes { + name: "echo_op" + type: "CommonEchoOp" + } +} +``` +以上样例配置了2个workflow:`workflow1`和`workflow2`。以`workflow1`为例: +name: workflow名称,用于从service.prototxt索引到具体的workflow +workflow_type: 可选"Sequence", "Parallel",表示本workflow下节点所代表的OP是否可并行。**当前只支持Sequence类型,如配置了Parallel类型,则该workflow不会被执行** +nodes: 用于串联成workflow的所有节点,可配置多个nodes。nodes间通过配置dependencies串联起来 +node.name: 随意,建议取一个能代表当前node所执行OP的类 +node.type: 当前node所执行OP的类名称,与serving/op/下每个具体的OP类的名称对应 +node.dependencies: 依赖的上游node列表 +node.dependencies.name: 与workflow内节点的name保持一致 +node.dependencies.mode: RO-Read Only, RW-Read Write + +# 3. resource.prototxt + +Serving端resource配置的入口是resource.prototxt,用于配置模型信息。它的protobuf格式参考`configure/proto/server_configure.proto`的ResourceConf。具体的磁盘文件路径可用--resource_path和--resource_file指定。样例如下: + +```JSON +model_manager_path: ./conf +model_manager_file: model_toolkit.prototxt +``` + +主要用来指定model_toolkit.prototxt路径 + +# 4. model_toolkit.prototxt + +用来配置模型信息和所用的预测引擎。它的protobuf格式参考`configure/proto/server_configure.proto`的ModelToolkitConf。model_toolkit.protobuf的磁盘路径不能通过命令行参数覆盖。样例如下: + +```JSON +engines { + name: "image_classification_resnet" + type: "FLUID_CPU_NATIVE_DIR" + reloadable_meta: "./data/model/paddle/fluid_time_file" + reloadable_type: "timestamp_ne" + model_data_path: "./data/model/paddle/fluid/SE_ResNeXt50_32x4d" + runtime_thread_num: 0 + batch_infer_size: 0 + enable_batch_align: 0 +} +``` + +其中 + +name: 模型名称。InferManager通过此名称,找到要使用的模型和预测引擎。可参考serving/op/classify_op.h与serving/op/classify_op.cpp的InferManager::instance().infer()方法的参数来了解。 +type: 预测引擎的类型。可在inferencer-fluid-cpu/src/fluid_cpu_engine.cpp找到当前注册的预测引擎列表 + +|预测引擎|含义| +|--------|----| +|FLUID_CPU_ANALYSIS|使用fluid Analysis API;模型所有参数保存在一个文件| +|FLUID_CPU_ANALYSIS_DIR|使用fluid Analysis API;模型所有参数分开保存为独立的文件,整个模型放到一个目录中| +|FLUID_CPU_NATIVE|使用fluid Native API;模型所有参数保存在一个文件| +|FLUID_CPU_NATIVE_DIR|使用fluid Native API;模型所有参数分开保存为独立的文件,整个模型放到一个目录中| + +**fluid Analysis API和fluid Native API的区别** + +Analysis API在模型加载过程中,会对模型计算逻辑进行多种优化,包括但不限于zero copy tensor,相邻OP的fuse等 + +reloadable_meta: 目前实际内容无意义,用来通过对该文件的mtime判断是否超过reload时间阈值 +reloadable_type: 检查reload条件:timestamp_ne/timestamp_gt/md5sum/revision/none +|reloadable_type|含义| +|---------------|----| +|timestamp_ne|reloadable_meta所指定文件的mtime时间戳发生变化| +|timestamp_gt|reloadable_meta所指定文件的mtime时间戳大于等于上次检查时记录的mtime时间戳| +|md5sum|目前无用,配置后永远不reload| +|revision|目前无用,配置后用于不reload| + +model_data_path: 模型文件路径 +runtime_thread_num: 若大于0, 则启用bsf多线程调度框架,在每个预测bthread worker内启动多线程预测。 +batch_infer_size: 启用bsf多线程预测时,每个预测线程的batch size +enable_batch_align: + +## 5. 命令行配置参数 + +以下是serving端支持的gflag配置选项列表,并提供了默认值。 + +| name | 默认值 | 含义 | +|------|--------|------| +|workflow_path|./conf|workflow配置目录名| +|workflow_file|workflow.prototxt|workflow配置文件名| +|inferservice_path|./conf|service配置目录名| +|inferservice_file|service.prototxt|service配置文件名| +|resource_path|./conf|资源管理器目录名| +|resource_file|resource.prototxt|资源管理器文件名| +|reload_interval_s|10|重载线程间隔时间(s)| +|enable_model_toolkit|true|模型管理| +|enable_protocol_list|baidu_std|brpc 通信协议列表| +|log_dir|./log|log dir| +|num_threads|brpc server使用的系统线程数,默认为CPU核数| +|max_concurrency|并发处理的请求数,设为<=0则为不予限制,若大于0则限定brpc server端同时处理的请求数上限| + +可以通过在serving/conf/gflags.conf覆盖默认值,例如 +``` +--log_dir=./serving_log/ +``` +将指定日志目录到./serving_log目录下 + +### 5.1 gflags.conf + +可以将命令行配置参数写到配置文件中,该文件路径默认为`conf/gflags.conf`。如果`conf/gflags.conf`存在,则serving端会尝试解析其中的gflags命令。例如 +```shell +--enable_model_toolkit +--port=8011 +``` + +可用以下命令指定另外的命令行参数配置文件 + +```shell +bin/serving --g=true --flagfile=conf/gflags.conf.new +``` diff --git a/predictor/framework/infer.h b/predictor/framework/infer.h index 67c4ab55eaec1b9cc8522ffb32f3c8484de3e16e..3754a7ea828b5fcbecf7aa3c9436488174fef3da 100644 --- a/predictor/framework/infer.h +++ b/predictor/framework/infer.h @@ -486,154 +486,6 @@ class FluidInferEngine : public DBReloadableInferEngine { } }; -template -class TensorrtInferEngine : public DBReloadableInferEngine { - public: - TensorrtInferEngine() {} - ~TensorrtInferEngine() {} - - int infer_impl1(const void* in, void* out, uint32_t batch_size) { - TensorrtFamilyCore* core = - DBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get fluid core in infer_impl()"; - return -1; - } - - if (!core->Run(in, out, batch_size)) { - LOG(ERROR) << "Failed run fluid family core"; - return -1; - } - return 0; - } - - int infer_impl2(const BatchTensor& in, BatchTensor& out) { // NOLINT - LOG(ERROR) << "Tensortrt donot supports infer_impl2 yet!"; - return -1; - } -}; - -template -class AbacusInferEngine - : public CloneDBReloadableInferEngine { - public: - AbacusInferEngine() {} - ~AbacusInferEngine() {} - - int infer_impl1(const void* in, void* out, uint32_t batch_size = -1) { - LOG(ERROR) << "Abacus dnn engine must use predict interface"; - return -1; - } - - int infer_impl2(const BatchTensor& in, BatchTensor& out) { // NOLINT - LOG(ERROR) << "Abacus dnn engine must use predict interface"; - return -1; - } - - // Abacus special interface - int predict(uint32_t ins_num) { - AbacusFamilyCore* core = - CloneDBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get abacus core in predict()"; - return -1; - } - - return core->predict(ins_num); - } - int set_use_fpga(bool use_fpga) { - AbacusFamilyCore* core = - CloneDBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get abacus core in predict()"; - return -1; - } - - return core->set_use_fpga(use_fpga); - } - int debug() { - AbacusFamilyCore* core = - CloneDBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get abacus core in debug()"; - return -1; - } - return core->debug(); - } - - int set_search_id(uint64_t sid) { - AbacusFamilyCore* core = - CloneDBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get abacus core in set_serach_id()"; - return -1; - } - return core->set_search_id(sid); - } - - int set_hidden_layer_dim(uint32_t dim) { - AbacusFamilyCore* core = - CloneDBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get abacus core in set_layer_dim()"; - return -1; - } - return core->set_hidden_layer_dim(dim); - } - - int get_input(uint32_t ins_idx, uint32_t* fea_num, void* in) { - AbacusFamilyCore* core = - CloneDBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get abacus core in get_input()"; - return -1; - } - return core->get_input(ins_idx, fea_num, in); - } - - int get_layer_value(const std::string& name, - uint32_t ins_num, - uint32_t fea_dim, - void* out) { - AbacusFamilyCore* core = - CloneDBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get abacus core in get_layer_value()"; - return -1; - } - return core->get_layer_value(name, ins_num, fea_dim, out); - } - - void set_position_idx(void* input, uint64_t fea, uint32_t ins_idx) { - AbacusFamilyCore* core = - CloneDBReloadableInferEngine::get_core(); - if (!core || !core->get()) { - LOG(ERROR) << "Failed get abacus core in set_position_idx()"; - return; - } - core->set_position_idx(input, fea, ins_idx); - return; - } -}; - -template -class PaddleV2InferEngine - : public CloneDBReloadableInferEngine { - public: - PaddleV2InferEngine() {} - ~PaddleV2InferEngine() {} - - int infer_impl1(const void* in, void* out, uint32_t batch_size = -1) { - LOG(ERROR) << "Paddle V2 engine must use predict interface"; - return -1; - } - - int infer_impl2(const BatchTensor& in, BatchTensor& out) { // NOLINT - LOG(ERROR) << "Paddle V2 engine must use predict interface"; - return -1; - } -}; - typedef FactoryPool StaticInferFactory; class VersionedInferEngine : public InferEngine { diff --git a/sdk-cpp/src/config_manager.cpp b/sdk-cpp/src/config_manager.cpp index 24d25f33e9cf8ee63ea15ba22a5e4cb59920e9d4..d3aec8c886948e6aad99260cb460dfdf79b2100f 100644 --- a/sdk-cpp/src/config_manager.cpp +++ b/sdk-cpp/src/config_manager.cpp @@ -53,9 +53,6 @@ int EndpointConfigManager::load() { } uint32_t ep_size = sdk_conf.predictors_size(); -#if 1 - LOG(INFO) << "ep_size: " << ep_size; -#endif for (uint32_t ei = 0; ei < ep_size; ++ei) { EndpointInfo ep; if (init_one_endpoint(sdk_conf.predictors(ei), ep, default_var) != 0) { @@ -88,9 +85,6 @@ int EndpointConfigManager::load() { int EndpointConfigManager::init_one_endpoint(const configure::Predictor& conf, EndpointInfo& ep, const VariantInfo& dft_var) { -#if 1 - LOG(INFO) << "init_one_endpoint " << conf.name().c_str(); -#endif try { // name ep.endpoint_name = conf.name(); @@ -120,9 +114,6 @@ int EndpointConfigManager::init_one_endpoint(const configure::Predictor& conf, // varlist uint32_t var_size = conf.variants_size(); -#if 1 - LOG(INFO) << "Variant size: " << var_size; -#endif for (uint32_t vi = 0; vi < var_size; ++vi) { VariantInfo var; if (merge_variant(dft_var, conf.variants(vi), var) != 0) { @@ -180,9 +171,6 @@ int EndpointConfigManager::init_one_variant(const configure::VariantConf& conf, const configure::RpcParameter& params = conf.rpc_parameter(); PARSE_CONF_ITEM(params, var.parameters.protocol, protocol, -1); -#if 1 - LOG(WARNING) << var.parameters.protocol.value.c_str(); -#endif PARSE_CONF_ITEM(params, var.parameters.compress_type, compress_type, -1); PARSE_CONF_ITEM(params, var.parameters.package_size, package_size, -1); PARSE_CONF_ITEM( @@ -213,9 +201,6 @@ int EndpointConfigManager::merge_variant(const VariantInfo& default_var, VariantInfo& merged_var) { merged_var = default_var; -#if 1 - LOG(INFO) << "merge_variant " << conf.tag().c_str(); -#endif return init_one_variant(conf, merged_var); }