未验证 提交 47c5a916 编写于 作者: M mcl-stone 提交者: GitHub

Merge branch 'develop' into Serving-mcl

...@@ -54,6 +54,7 @@ option(SERVER "Compile Paddle Serving Server" OFF) ...@@ -54,6 +54,7 @@ option(SERVER "Compile Paddle Serving Server" OFF)
option(APP "Compile Paddle Serving App package" OFF) option(APP "Compile Paddle Serving App package" OFF)
option(WITH_ELASTIC_CTR "Compile ELASITC-CTR solution" OFF) option(WITH_ELASTIC_CTR "Compile ELASITC-CTR solution" OFF)
option(PACK "Compile for whl" OFF) option(PACK "Compile for whl" OFF)
option(WITH_TRT "Compile Paddle Serving with TRT" OFF)
set(WITH_MKLML ${WITH_MKL}) set(WITH_MKLML ${WITH_MKL})
if (NOT DEFINED WITH_MKLDNN) if (NOT DEFINED WITH_MKLDNN)
......
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
<p> <p>
<p align="center"> <p align="center">
<br> <br>
<a href="https://travis-ci.com/PaddlePaddle/Serving"> <a href="https://travis-ci.com/PaddlePaddle/Serving">
...@@ -29,7 +30,7 @@ We consider deploying deep learning inference service online to be a user-facing ...@@ -29,7 +30,7 @@ We consider deploying deep learning inference service online to be a user-facing
<h2 align="center">Installation</h2> <h2 align="center">Installation</h2>
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md) We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
``` ```
# Run CPU Docker # Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:latest docker pull hub.baidubce.com/paddlepaddle/serving:latest
...@@ -38,26 +39,39 @@ docker exec -it test bash ...@@ -38,26 +39,39 @@ docker exec -it test bash
``` ```
``` ```
# Run GPU Docker # Run GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker exec -it test bash nvidia-docker exec -it test bash
``` ```
```shell ```shell
pip install paddle-serving-client pip install paddle-serving-client==0.3.2
pip install paddle-serving-server # CPU pip install paddle-serving-server==0.3.2 # CPU
pip install paddle-serving-server-gpu # GPU pip install paddle-serving-server-gpu==0.3.2.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.3.2.post10 # GPU with CUDA10.0
``` ```
You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download. You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command. If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use HTTP service without install client. Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7 and Ubuntu 16/18.
Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.6/3.7.
Recommended to install paddle >= 1.8.2.
<h2 align="center"> Pre-built services with Paddle Serving</h2> <h2 align="center"> Pre-built services with Paddle Serving</h2>
<h3 align="center">Latest release</h4>
<p align="center">
<a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr">Optical Character Recognition</a>
<br>
<a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/faster_rcnn_model">Object Detection</a>
<br>
<a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/deeplabv3">Image Segmentation</a>
<p>
<h3 align="center">Chinese Word Segmentation</h4> <h3 align="center">Chinese Word Segmentation</h4>
``` shell ``` shell
...@@ -111,9 +125,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -111,9 +125,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `port` | int | `9292` | Exposed port of current service to users| | `port` | int | `9292` | Exposed port of current service to users|
| `name` | str | `""` | Service name, can be used to generate HTTP request url | | `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served | | `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim` | - | - | Enable memory / graphic memory optimization | | `mem_optim_off` | - | - | Disable memory / graphic memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph | | `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT |
Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/). Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
</center> </center>
...@@ -159,6 +174,11 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit ...@@ -159,6 +174,11 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit
- [An End-to-end tutorial from training to inference service deployment](doc/TRAIN_TO_SERVICE.md) - [An End-to-end tutorial from training to inference service deployment](doc/TRAIN_TO_SERVICE.md)
- [Write Bert-as-Service in 10 minutes](doc/BERT_10_MINS.md) - [Write Bert-as-Service in 10 minutes](doc/BERT_10_MINS.md)
### Tutorial at AIStudio
- [Introduction to PaddleServing](https://aistudio.baidu.com/aistudio/projectdetail/605819)
- [Image Segmentation on Paddle Serving](https://aistudio.baidu.com/aistudio/projectdetail/457715)
- [Sentimental Analysis](https://aistudio.baidu.com/aistudio/projectdetail/509014)
### Developers ### Developers
- [How to config Serving native operators on server side?](doc/SERVER_DAG.md) - [How to config Serving native operators on server side?](doc/SERVER_DAG.md)
- [How to develop a new Serving operator?](doc/NEW_OPERATOR.md) - [How to develop a new Serving operator?](doc/NEW_OPERATOR.md)
...@@ -184,6 +204,7 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit ...@@ -184,6 +204,7 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit
<h2 align="center">Community</h2> <h2 align="center">Community</h2>
### Slack ### Slack
To connect with other users and contributors, welcome to join our [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ) To connect with other users and contributors, welcome to join our [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
......
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
<p> <p>
<p align="center"> <p align="center">
<br> <br>
<a href="https://travis-ci.com/PaddlePaddle/Serving"> <a href="https://travis-ci.com/PaddlePaddle/Serving">
...@@ -31,7 +32,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务 ...@@ -31,7 +32,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
<h2 align="center">安装</h2> <h2 align="center">安装</h2>
**强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md) **强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)。更多镜像请查看[Docker镜像列表](doc/DOCKER_IMAGES_CN.md)
``` ```
# 启动 CPU Docker # 启动 CPU Docker
...@@ -41,21 +42,26 @@ docker exec -it test bash ...@@ -41,21 +42,26 @@ docker exec -it test bash
``` ```
``` ```
# 启动 GPU Docker # 启动 GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker exec -it test bash nvidia-docker exec -it test bash
``` ```
```shell ```shell
pip install paddle-serving-client pip install paddle-serving-client==0.3.2
pip install paddle-serving-server # CPU pip install paddle-serving-server==0.3.2 # CPU
pip install paddle-serving-server-gpu # GPU pip install paddle-serving-server-gpu==0.3.2.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.3.2.post10 # GPU with CUDA10.0
``` ```
您可能需要使用国内镜像源(例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`)来加速下载。 您可能需要使用国内镜像源(例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`)来加速下载。
如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。 如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。
Paddle Serving安装包支持Centos 6/7和Ubuntu 16/18,或者您可以使用HTTP服务,这种情况下不需要安装客户端。 paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7和Ubuntu 16/18。
paddle-serving-client和paddle-serving-app安装包支持Linux和Windows,其中paddle-serving-client仅支持python2.7/3.5/3.6。
推荐安装1.8.2及以上版本的paddle
<h2 align="center"> Paddle Serving预装的服务 </h2> <h2 align="center"> Paddle Serving预装的服务 </h2>
...@@ -115,9 +121,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -115,9 +121,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `port` | int | `9292` | Exposed port of current service to users| | `port` | int | `9292` | Exposed port of current service to users|
| `name` | str | `""` | Service name, can be used to generate HTTP request url | | `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served | | `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim` | - | - | Enable memory optimization | | `mem_optim_off` | - | - | Disable memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph | | `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT |
我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求,请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。 我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求,请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
</center> </center>
...@@ -164,6 +171,11 @@ print(fetch_map) ...@@ -164,6 +171,11 @@ print(fetch_map)
- [端到端完成从训练到部署全流程](doc/TRAIN_TO_SERVICE_CN.md) - [端到端完成从训练到部署全流程](doc/TRAIN_TO_SERVICE_CN.md)
- [十分钟构建Bert-As-Service](doc/BERT_10_MINS_CN.md) - [十分钟构建Bert-As-Service](doc/BERT_10_MINS_CN.md)
### AIStudio教程
- [PaddleServing作业](https://aistudio.baidu.com/aistudio/projectdetail/605819)
- [PaddleServing图像分割](https://aistudio.baidu.com/aistudio/projectdetail/457715)
- [PaddleServing情感分析](https://aistudio.baidu.com/aistudio/projectdetail/509014)
### 开发者教程 ### 开发者教程
- [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md) - [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md)
- [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md) - [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md)
......
...@@ -40,8 +40,8 @@ ExternalProject_Add( ...@@ -40,8 +40,8 @@ ExternalProject_Add(
extern_brpc extern_brpc
${EXTERNAL_PROJECT_LOG_ARGS} ${EXTERNAL_PROJECT_LOG_ARGS}
# TODO(gongwb): change to de newst repo when they changed. # TODO(gongwb): change to de newst repo when they changed.
GIT_REPOSITORY "https://github.com/gongweibao/brpc" GIT_REPOSITORY "https://github.com/wangjiawei04/brpc"
GIT_TAG "e9b67ec1b7458f2af5fae76451afe1e27e01b4b4" GIT_TAG "6d79e0b17f25107c35b705ea58d888083f59ff47"
PREFIX ${BRPC_SOURCES_DIR} PREFIX ${BRPC_SOURCES_DIR}
UPDATE_COMMAND "" UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER} CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
......
...@@ -143,7 +143,6 @@ function(grpc_protobuf_generate_python SRCS) ...@@ -143,7 +143,6 @@ function(grpc_protobuf_generate_python SRCS)
set(${SRCS} ${${SRCS}} PARENT_SCOPE) set(${SRCS} ${${SRCS}} PARENT_SCOPE)
endfunction() endfunction()
# Print and set the protobuf library information, # Print and set the protobuf library information,
# finish this cmake process and exit from this file. # finish this cmake process and exit from this file.
macro(PROMPT_PROTOBUF_LIB) macro(PROMPT_PROTOBUF_LIB)
......
...@@ -31,10 +31,14 @@ message( "WITH_GPU = ${WITH_GPU}") ...@@ -31,10 +31,14 @@ message( "WITH_GPU = ${WITH_GPU}")
# Paddle Version should be one of: # Paddle Version should be one of:
# latest: latest develop build # latest: latest develop build
# version number like 1.5.2 # version number like 1.5.2
SET(PADDLE_VERSION "1.7.2") SET(PADDLE_VERSION "1.8.4")
if (WITH_GPU) if (WITH_GPU)
SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-gpu-cuda${CUDA_VERSION_MAJOR}-cudnn7-avx-mkl") if (WITH_TRT)
SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6")
else()
SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-gpu-cuda10-cudnn7-avx-mkl")
endif()
else() else()
if (WITH_AVX) if (WITH_AVX)
if (WITH_MKLML) if (WITH_MKLML)
...@@ -50,7 +54,23 @@ endif() ...@@ -50,7 +54,23 @@ endif()
SET(PADDLE_LIB_PATH "http://paddle-inference-lib.bj.bcebos.com/${PADDLE_LIB_VERSION}/fluid_inference.tgz") SET(PADDLE_LIB_PATH "http://paddle-inference-lib.bj.bcebos.com/${PADDLE_LIB_VERSION}/fluid_inference.tgz")
MESSAGE(STATUS "PADDLE_LIB_PATH=${PADDLE_LIB_PATH}") MESSAGE(STATUS "PADDLE_LIB_PATH=${PADDLE_LIB_PATH}")
if (WITH_GPU OR WITH_MKLML) if (WITH_GPU OR WITH_MKLML)
ExternalProject_Add( if (WITH_TRT)
ExternalProject_Add(
"extern_paddle"
${EXTERNAL_PROJECT_LOG_ARGS}
URL "${PADDLE_LIB_PATH}"
PREFIX "${PADDLE_SOURCES_DIR}"
DOWNLOAD_DIR "${PADDLE_DOWNLOAD_DIR}"
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
UPDATE_COMMAND ""
INSTALL_COMMAND
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/paddle/include ${PADDLE_INSTALL_DIR}/include &&
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/paddle/lib ${PADDLE_INSTALL_DIR}/lib &&
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/third_party ${PADDLE_INSTALL_DIR}/third_party
)
else()
ExternalProject_Add(
"extern_paddle" "extern_paddle"
${EXTERNAL_PROJECT_LOG_ARGS} ${EXTERNAL_PROJECT_LOG_ARGS}
URL "${PADDLE_LIB_PATH}" URL "${PADDLE_LIB_PATH}"
...@@ -64,7 +84,8 @@ ExternalProject_Add( ...@@ -64,7 +84,8 @@ ExternalProject_Add(
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/paddle/lib ${PADDLE_INSTALL_DIR}/lib && ${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/paddle/lib ${PADDLE_INSTALL_DIR}/lib &&
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/third_party ${PADDLE_INSTALL_DIR}/third_party && ${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/third_party ${PADDLE_INSTALL_DIR}/third_party &&
${CMAKE_COMMAND} -E copy ${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib/libmkldnn.so.0 ${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib/libmkldnn.so ${CMAKE_COMMAND} -E copy ${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib/libmkldnn.so.0 ${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib/libmkldnn.so
) )
endif()
else() else()
ExternalProject_Add( ExternalProject_Add(
"extern_paddle" "extern_paddle"
...@@ -92,8 +113,16 @@ LINK_DIRECTORIES(${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib) ...@@ -92,8 +113,16 @@ LINK_DIRECTORIES(${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib)
ADD_LIBRARY(openblas STATIC IMPORTED GLOBAL) ADD_LIBRARY(openblas STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET openblas PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/openblas/lib/libopenblas.a) SET_PROPERTY(TARGET openblas PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/openblas/lib/libopenblas.a)
ADD_LIBRARY(paddle_fluid STATIC IMPORTED GLOBAL) ADD_LIBRARY(paddle_fluid SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET paddle_fluid PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/lib/libpaddle_fluid.a) SET_PROPERTY(TARGET paddle_fluid PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/lib/libpaddle_fluid.so)
if (WITH_TRT)
ADD_LIBRARY(nvinfer SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET nvinfer PROPERTY IMPORTED_LOCATION ${TENSORRT_ROOT}/lib/libnvinfer.so)
ADD_LIBRARY(nvinfer_plugin SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET nvinfer_plugin PROPERTY IMPORTED_LOCATION ${TENSORRT_ROOT}/lib/libnvinfer_plugin.so)
endif()
ADD_LIBRARY(xxhash STATIC IMPORTED GLOBAL) ADD_LIBRARY(xxhash STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET xxhash PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/xxhash/lib/libxxhash.a) SET_PROPERTY(TARGET xxhash PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/xxhash/lib/libxxhash.a)
...@@ -102,3 +131,8 @@ LIST(APPEND external_project_dependencies paddle) ...@@ -102,3 +131,8 @@ LIST(APPEND external_project_dependencies paddle)
LIST(APPEND paddle_depend_libs LIST(APPEND paddle_depend_libs
xxhash) xxhash)
if(WITH_TRT)
LIST(APPEND paddle_depend_libs
nvinfer nvinfer_plugin)
endif()
...@@ -86,6 +86,7 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD ...@@ -86,6 +86,7 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
COMMENT "Copy generated general_model_config proto file into directory paddle_serving_server/proto." COMMENT "Copy generated general_model_config proto file into directory paddle_serving_server/proto."
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
......
...@@ -14,6 +14,12 @@ ...@@ -14,6 +14,12 @@
syntax = "proto2"; syntax = "proto2";
package baidu.paddle_serving.multi_lang;
option java_multiple_files = true;
option java_package = "io.paddle.serving.grpc";
option java_outer_classname = "ServingProto";
message Tensor { message Tensor {
optional bytes data = 1; optional bytes data = 1;
repeated int32 int_data = 2; repeated int32 int_data = 2;
...@@ -28,16 +34,18 @@ message FeedInst { repeated Tensor tensor_array = 1; }; ...@@ -28,16 +34,18 @@ message FeedInst { repeated Tensor tensor_array = 1; };
message FetchInst { repeated Tensor tensor_array = 1; }; message FetchInst { repeated Tensor tensor_array = 1; };
message Request { message InferenceRequest {
repeated FeedInst insts = 1; repeated FeedInst insts = 1;
repeated string feed_var_names = 2; repeated string feed_var_names = 2;
repeated string fetch_var_names = 3; repeated string fetch_var_names = 3;
required bool is_python = 4 [ default = false ]; required bool is_python = 4 [ default = false ];
required uint64 log_id = 5 [ default = 0 ];
}; };
message Response { message InferenceResponse {
repeated ModelOutput outputs = 1; repeated ModelOutput outputs = 1;
optional string tag = 2; optional string tag = 2;
required int32 err_code = 3;
}; };
message ModelOutput { message ModelOutput {
...@@ -45,6 +53,17 @@ message ModelOutput { ...@@ -45,6 +53,17 @@ message ModelOutput {
optional string engine_name = 2; optional string engine_name = 2;
} }
message SetTimeoutRequest { required int32 timeout_ms = 1; }
message SimpleResponse { required int32 err_code = 1; }
message GetClientConfigRequest {}
message GetClientConfigResponse { required string client_config_str = 1; }
service MultiLangGeneralModelService { service MultiLangGeneralModelService {
rpc inference(Request) returns (Response) {} rpc Inference(InferenceRequest) returns (InferenceResponse) {}
rpc SetTimeout(SetTimeoutRequest) returns (SimpleResponse) {}
rpc GetClientConfig(GetClientConfigRequest)
returns (GetClientConfigResponse) {}
}; };
...@@ -44,6 +44,7 @@ message EngineDesc { ...@@ -44,6 +44,7 @@ message EngineDesc {
optional bool static_optimization = 14; optional bool static_optimization = 14;
optional bool force_update_static_cache = 15; optional bool force_update_static_cache = 15;
optional bool enable_ir_optimization = 16; optional bool enable_ir_optimization = 16;
optional bool use_trt = 17;
}; };
// model_toolkit conf // model_toolkit conf
...@@ -58,6 +59,8 @@ message ResourceConf { ...@@ -58,6 +59,8 @@ message ResourceConf {
optional string cube_config_path = 5; optional string cube_config_path = 5;
optional string cube_config_file = 6; optional string cube_config_file = 6;
optional int32 cube_quant_bits = 7; // set 0 if no quant. optional int32 cube_quant_bits = 7; // set 0 if no quant.
optional string auth_product_name = 8;
optional string auth_container_id = 9;
}; };
// DAG node depency info // DAG node depency info
......
...@@ -12,8 +12,9 @@ ...@@ -12,8 +12,9 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License # limitations under the License
#execute_process(COMMAND go env -w GO111MODULE=off)
add_subdirectory(cube-server) add_subdirectory(cube-server)
add_subdirectory(cube-api) add_subdirectory(cube-api)
add_subdirectory(cube-builder) add_subdirectory(cube-builder)
add_subdirectory(cube-transfer) #add_subdirectory(cube-transfer)
add_subdirectory(cube-agent) #add_subdirectory(cube-agent)
...@@ -22,7 +22,8 @@ ...@@ -22,7 +22,8 @@
#ifdef BCLOUD #ifdef BCLOUD
#include "baidu/rpc/channel.h" #include "baidu/rpc/channel.h"
#include "baidu/rpc/parallel_channel.h" #include "baidu/rpc/parallel_channel.h"
#include "rapidjson/document.h" #include "rapidjson_1.0/document.h"
#include "rapidjson_1.0/rapidjson.h"
#else #else
#include "brpc/channel.h" #include "brpc/channel.h"
#include "brpc/parallel_channel.h" #include "brpc/parallel_channel.h"
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
// limitations under the License. // limitations under the License.
#include <gflags/gflags.h> #include <gflags/gflags.h>
#include <algorithm>
#include <atomic> #include <atomic>
#include <fstream> #include <fstream>
#include <thread> //NOLINT #include <thread> //NOLINT
...@@ -31,8 +32,9 @@ DEFINE_bool(print_output, false, "print output flag"); ...@@ -31,8 +32,9 @@ DEFINE_bool(print_output, false, "print output flag");
DEFINE_int32(thread_num, 1, "thread num"); DEFINE_int32(thread_num, 1, "thread num");
std::atomic<int> g_concurrency(0); std::atomic<int> g_concurrency(0);
std::vector<uint64_t> time_list; std::vector<std::vector<uint64_t>> time_list;
std::vector<uint64_t> request_list; std::vector<uint64_t> request_list;
int turns = 1000;
namespace { namespace {
inline uint64_t time_diff(const struct timeval& start_time, inline uint64_t time_diff(const struct timeval& start_time,
...@@ -93,14 +95,15 @@ int run(int argc, char** argv, int thread_id) { ...@@ -93,14 +95,15 @@ int run(int argc, char** argv, int thread_id) {
uint64_t file_size = key_list.size(); uint64_t file_size = key_list.size();
uint64_t index = 0; uint64_t index = 0;
uint64_t request = 0; uint64_t request = 0;
while (g_concurrency.load() >= FLAGS_thread_num) { while (g_concurrency.load() >= FLAGS_thread_num) {
} }
g_concurrency++; g_concurrency++;
time_list[thread_id].resize(turns);
while (index < file_size) { while (request < turns) {
// uint64_t key = strtoul(buffer, NULL, 10); // uint64_t key = strtoul(buffer, NULL, 10);
if (index >= file_size) {
index = 0;
}
keys.push_back(key_list[index]); keys.push_back(key_list[index]);
index += 1; index += 1;
int ret = 0; int ret = 0;
...@@ -121,47 +124,12 @@ int run(int argc, char** argv, int thread_id) { ...@@ -121,47 +124,12 @@ int run(int argc, char** argv, int thread_id) {
} }
++seek_counter; ++seek_counter;
uint64_t seek_cost = time_diff(seek_start, seek_end); uint64_t seek_cost = time_diff(seek_start, seek_end);
seek_cost_total += seek_cost; time_list[thread_id][request - 1] = seek_cost;
if (seek_cost > seek_cost_max) {
seek_cost_max = seek_cost;
}
if (seek_cost < seek_cost_min) {
seek_cost_min = seek_cost;
}
keys.clear(); keys.clear();
values.clear(); values.clear();
} }
} }
/*
if (keys.size() > 0) {
int ret = 0;
values.resize(keys.size());
TIME_FLAG(seek_start);
ret = cube->seek(FLAGS_dict, keys, &values);
TIME_FLAG(seek_end);
if (ret != 0) {
LOG(WARNING) << "cube seek failed";
} else if (FLAGS_print_output) {
for (size_t i = 0; i < keys.size(); ++i) {
fprintf(stdout,
"key:%lu value:%s\n",
keys[i],
string_to_hex(values[i].buff).c_str());
}
}
++seek_counter;
uint64_t seek_cost = time_diff(seek_start, seek_end);
seek_cost_total += seek_cost;
if (seek_cost > seek_cost_max) {
seek_cost_max = seek_cost;
}
if (seek_cost < seek_cost_min) {
seek_cost_min = seek_cost;
}
}
*/
g_concurrency--; g_concurrency--;
// fclose(key_file); // fclose(key_file);
...@@ -171,12 +139,6 @@ int run(int argc, char** argv, int thread_id) { ...@@ -171,12 +139,6 @@ int run(int argc, char** argv, int thread_id) {
LOG(WARNING) << "destroy cube api failed err=" << ret; LOG(WARNING) << "destroy cube api failed err=" << ret;
} }
uint64_t seek_cost_avg = seek_cost_total / seek_counter;
LOG(INFO) << "seek cost avg = " << seek_cost_avg;
LOG(INFO) << "seek cost max = " << seek_cost_max;
LOG(INFO) << "seek cost min = " << seek_cost_min;
time_list[thread_id] = seek_cost_avg;
request_list[thread_id] = request; request_list[thread_id] = request;
return 0; return 0;
...@@ -188,6 +150,7 @@ int run_m(int argc, char** argv) { ...@@ -188,6 +150,7 @@ int run_m(int argc, char** argv) {
request_list.resize(thread_num); request_list.resize(thread_num);
time_list.resize(thread_num); time_list.resize(thread_num);
std::vector<std::thread*> thread_pool; std::vector<std::thread*> thread_pool;
TIME_FLAG(main_start);
for (int i = 0; i < thread_num; i++) { for (int i = 0; i < thread_num; i++) {
thread_pool.push_back(new std::thread(run, argc, argv, i)); thread_pool.push_back(new std::thread(run, argc, argv, i));
} }
...@@ -195,27 +158,42 @@ int run_m(int argc, char** argv) { ...@@ -195,27 +158,42 @@ int run_m(int argc, char** argv) {
thread_pool[i]->join(); thread_pool[i]->join();
delete thread_pool[i]; delete thread_pool[i];
} }
TIME_FLAG(main_end);
uint64_t sum_time = 0; uint64_t sum_time = 0;
uint64_t max_time = 0; uint64_t max_time = 0;
uint64_t min_time = 1000000; uint64_t min_time = 1000000;
uint64_t request_num = 0; std::vector<uint64_t> all_time_list;
for (int i = 0; i < thread_num; i++) { for (int i = 0; i < thread_num; i++) {
sum_time += time_list[i]; for (int j = 0; j < request_list[i]; j++) {
if (time_list[i] > max_time) { sum_time += time_list[i][j];
max_time = time_list[i]; if (time_list[i][j] > max_time) {
} max_time = time_list[i][j];
if (time_list[i] < min_time) { }
min_time = time_list[i]; if (time_list[i][j] < min_time) {
} min_time = time_list[i][j];
request_num += request_list[i]; }
} all_time_list.push_back(time_list[i][j]);
uint64_t mean_time = sum_time / thread_num; }
LOG(INFO) << thread_num << " thread seek cost" }
<< " avg = " << std::to_string(mean_time) std::sort(all_time_list.begin(), all_time_list.end());
<< " max = " << std::to_string(max_time) uint64_t mean_time = sum_time / (thread_num * turns);
<< " min = " << std::to_string(min_time); uint64_t main_time = time_diff(main_start, main_end);
LOG(INFO) << " total_request = " << std::to_string(request_num) << " speed = " uint64_t request_num = turns * thread_num;
<< std::to_string(1000000 * thread_num / mean_time) // mean_time us LOG(INFO)
<< "\n"
<< thread_num << " thread seek cost"
<< "\navg: " << std::to_string(mean_time) << "\n50 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.5 * request_num)])
<< "\n80 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.8 * request_num)])
<< "\n90 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.9 * request_num)])
<< "\n99 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.99 * request_num)])
<< "\n99.9 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.999 * request_num)])
<< "\ntotal_request: " << std::to_string(request_num) << "\nspeed: "
<< std::to_string(turns * 1000000 / main_time) // mean_time us
<< " query per second"; << " query per second";
return 0; return 0;
} }
......
...@@ -49,6 +49,8 @@ class ModelRes { ...@@ -49,6 +49,8 @@ class ModelRes {
res._int64_value_map.end()); res._int64_value_map.end());
_float_value_map.insert(res._float_value_map.begin(), _float_value_map.insert(res._float_value_map.begin(),
res._float_value_map.end()); res._float_value_map.end());
_int32_value_map.insert(res._int32_value_map.begin(),
res._int32_value_map.end());
_shape_map.insert(res._shape_map.begin(), res._shape_map.end()); _shape_map.insert(res._shape_map.begin(), res._shape_map.end());
_lod_map.insert(res._lod_map.begin(), res._lod_map.end()); _lod_map.insert(res._lod_map.begin(), res._lod_map.end());
} }
...@@ -60,6 +62,9 @@ class ModelRes { ...@@ -60,6 +62,9 @@ class ModelRes {
_float_value_map.insert( _float_value_map.insert(
std::make_move_iterator(std::begin(res._float_value_map)), std::make_move_iterator(std::begin(res._float_value_map)),
std::make_move_iterator(std::end(res._float_value_map))); std::make_move_iterator(std::end(res._float_value_map)));
_int32_value_map.insert(
std::make_move_iterator(std::begin(res._int32_value_map)),
std::make_move_iterator(std::end(res._int32_value_map)));
_shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)), _shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)),
std::make_move_iterator(std::end(res._shape_map))); std::make_move_iterator(std::end(res._shape_map)));
_lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)), _lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)),
...@@ -78,6 +83,12 @@ class ModelRes { ...@@ -78,6 +83,12 @@ class ModelRes {
std::vector<float>&& get_float_by_name_with_rv(const std::string& name) { std::vector<float>&& get_float_by_name_with_rv(const std::string& name) {
return std::move(_float_value_map[name]); return std::move(_float_value_map[name]);
} }
const std::vector<int32_t>& get_int32_by_name(const std::string& name) {
return _int32_value_map[name];
}
std::vector<int32_t>&& get_int32_by_name_with_rv(const std::string& name) {
return std::move(_int32_value_map[name]);
}
const std::vector<int>& get_shape_by_name(const std::string& name) { const std::vector<int>& get_shape_by_name(const std::string& name) {
return _shape_map[name]; return _shape_map[name];
} }
...@@ -103,6 +114,9 @@ class ModelRes { ...@@ -103,6 +114,9 @@ class ModelRes {
_float_value_map.insert( _float_value_map.insert(
std::make_move_iterator(std::begin(res._float_value_map)), std::make_move_iterator(std::begin(res._float_value_map)),
std::make_move_iterator(std::end(res._float_value_map))); std::make_move_iterator(std::end(res._float_value_map)));
_int32_value_map.insert(
std::make_move_iterator(std::begin(res._int32_value_map)),
std::make_move_iterator(std::end(res._int32_value_map)));
_shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)), _shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)),
std::make_move_iterator(std::end(res._shape_map))); std::make_move_iterator(std::end(res._shape_map)));
_lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)), _lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)),
...@@ -115,6 +129,7 @@ class ModelRes { ...@@ -115,6 +129,7 @@ class ModelRes {
std::string _engine_name; std::string _engine_name;
std::map<std::string, std::vector<int64_t>> _int64_value_map; std::map<std::string, std::vector<int64_t>> _int64_value_map;
std::map<std::string, std::vector<float>> _float_value_map; std::map<std::string, std::vector<float>> _float_value_map;
std::map<std::string, std::vector<int32_t>> _int32_value_map;
std::map<std::string, std::vector<int>> _shape_map; std::map<std::string, std::vector<int>> _shape_map;
std::map<std::string, std::vector<int>> _lod_map; std::map<std::string, std::vector<int>> _lod_map;
}; };
...@@ -145,6 +160,14 @@ class PredictorRes { ...@@ -145,6 +160,14 @@ class PredictorRes {
const std::string& name) { const std::string& name) {
return std::move(_models[model_idx].get_float_by_name_with_rv(name)); return std::move(_models[model_idx].get_float_by_name_with_rv(name));
} }
const std::vector<int32_t>& get_int32_by_name(const int model_idx,
const std::string& name) {
return _models[model_idx].get_int32_by_name(name);
}
std::vector<int32_t>&& get_int32_by_name_with_rv(const int model_idx,
const std::string& name) {
return std::move(_models[model_idx].get_int32_by_name_with_rv(name));
}
const std::vector<int>& get_shape_by_name(const int model_idx, const std::vector<int>& get_shape_by_name(const int model_idx,
const std::string& name) { const std::string& name) {
return _models[model_idx].get_shape_by_name(name); return _models[model_idx].get_shape_by_name(name);
...@@ -195,27 +218,19 @@ class PredictorClient { ...@@ -195,27 +218,19 @@ class PredictorClient {
int destroy_predictor(); int destroy_predictor();
int batch_predict(
const std::vector<std::vector<std::vector<float>>>& float_feed_batch,
const std::vector<std::string>& float_feed_name,
const std::vector<std::vector<int>>& float_shape,
const std::vector<std::vector<std::vector<int64_t>>>& int_feed_batch,
const std::vector<std::string>& int_feed_name,
const std::vector<std::vector<int>>& int_shape,
const std::vector<std::string>& fetch_name,
PredictorRes& predict_res_batch, // NOLINT
const int& pid);
int numpy_predict( int numpy_predict(
const std::vector<std::vector<py::array_t<float>>>& float_feed_batch, const std::vector<std::vector<py::array_t<float>>>& float_feed_batch,
const std::vector<std::string>& float_feed_name, const std::vector<std::string>& float_feed_name,
const std::vector<std::vector<int>>& float_shape, const std::vector<std::vector<int>>& float_shape,
const std::vector<std::vector<int>>& float_lod_slot_batch,
const std::vector<std::vector<py::array_t<int64_t>>>& int_feed_batch, const std::vector<std::vector<py::array_t<int64_t>>>& int_feed_batch,
const std::vector<std::string>& int_feed_name, const std::vector<std::string>& int_feed_name,
const std::vector<std::vector<int>>& int_shape, const std::vector<std::vector<int>>& int_shape,
const std::vector<std::vector<int>>& int_lod_slot_batch,
const std::vector<std::string>& fetch_name, const std::vector<std::string>& fetch_name,
PredictorRes& predict_res_batch, // NOLINT PredictorRes& predict_res_batch, // NOLINT
const int& pid); const int& pid,
const uint64_t log_id);
private: private:
PredictorApi _api; PredictorApi _api;
......
...@@ -39,7 +39,9 @@ using configure::GeneralModelConfig; ...@@ -39,7 +39,9 @@ using configure::GeneralModelConfig;
void PredictorClient::init_gflags(std::vector<std::string> argv) { void PredictorClient::init_gflags(std::vector<std::string> argv) {
std::call_once(gflags_init_flag, [&]() { std::call_once(gflags_init_flag, [&]() {
#ifndef BCLOUD
FLAGS_logtostderr = true; FLAGS_logtostderr = true;
#endif
argv.insert(argv.begin(), "dummy"); argv.insert(argv.begin(), "dummy");
int argc = argv.size(); int argc = argv.size();
char **arr = new char *[argv.size()]; char **arr = new char *[argv.size()];
...@@ -135,216 +137,19 @@ int PredictorClient::create_predictor() { ...@@ -135,216 +137,19 @@ int PredictorClient::create_predictor() {
return 0; return 0;
} }
int PredictorClient::batch_predict(
const std::vector<std::vector<std::vector<float>>> &float_feed_batch,
const std::vector<std::string> &float_feed_name,
const std::vector<std::vector<int>> &float_shape,
const std::vector<std::vector<std::vector<int64_t>>> &int_feed_batch,
const std::vector<std::string> &int_feed_name,
const std::vector<std::vector<int>> &int_shape,
const std::vector<std::string> &fetch_name,
PredictorRes &predict_res_batch,
const int &pid) {
int batch_size = std::max(float_feed_batch.size(), int_feed_batch.size());
predict_res_batch.clear();
Timer timeline;
int64_t preprocess_start = timeline.TimeStampUS();
int fetch_name_num = fetch_name.size();
_api.thrd_initialize();
std::string variant_tag;
_predictor = _api.fetch_predictor("general_model", &variant_tag);
predict_res_batch.set_variant_tag(variant_tag);
VLOG(2) << "fetch general model predictor done.";
VLOG(2) << "float feed name size: " << float_feed_name.size();
VLOG(2) << "int feed name size: " << int_feed_name.size();
VLOG(2) << "max body size : " << brpc::fLU64::FLAGS_max_body_size;
Request req;
for (auto &name : fetch_name) {
req.add_fetch_var_names(name);
}
for (int bi = 0; bi < batch_size; bi++) {
VLOG(2) << "prepare batch " << bi;
std::vector<Tensor *> tensor_vec;
FeedInst *inst = req.add_insts();
std::vector<std::vector<float>> float_feed = float_feed_batch[bi];
std::vector<std::vector<int64_t>> int_feed = int_feed_batch[bi];
for (auto &name : float_feed_name) {
tensor_vec.push_back(inst->add_tensor_array());
}
for (auto &name : int_feed_name) {
tensor_vec.push_back(inst->add_tensor_array());
}
VLOG(2) << "batch [" << bi << "] int_feed_name and float_feed_name "
<< "prepared";
int vec_idx = 0;
VLOG(2) << "tensor_vec size " << tensor_vec.size() << " float shape "
<< float_shape.size();
for (auto &name : float_feed_name) {
int idx = _feed_name_to_idx[name];
Tensor *tensor = tensor_vec[idx];
VLOG(2) << "prepare float feed " << name << " shape size "
<< float_shape[vec_idx].size();
for (uint32_t j = 0; j < float_shape[vec_idx].size(); ++j) {
tensor->add_shape(float_shape[vec_idx][j]);
}
tensor->set_elem_type(1);
for (uint32_t j = 0; j < float_feed[vec_idx].size(); ++j) {
tensor->add_float_data(float_feed[vec_idx][j]);
}
vec_idx++;
}
VLOG(2) << "batch [" << bi << "] "
<< "float feed value prepared";
vec_idx = 0;
for (auto &name : int_feed_name) {
int idx = _feed_name_to_idx[name];
Tensor *tensor = tensor_vec[idx];
VLOG(2) << "prepare int feed " << name << " shape size "
<< int_shape[vec_idx].size();
for (uint32_t j = 0; j < int_shape[vec_idx].size(); ++j) {
tensor->add_shape(int_shape[vec_idx][j]);
}
tensor->set_elem_type(0);
VLOG(3) << "feed var name " << name << " index " << vec_idx
<< "first data " << int_feed[vec_idx][0];
for (uint32_t j = 0; j < int_feed[vec_idx].size(); ++j) {
tensor->add_int64_data(int_feed[vec_idx][j]);
}
vec_idx++;
}
VLOG(2) << "batch [" << bi << "] "
<< "int feed value prepared";
}
int64_t preprocess_end = timeline.TimeStampUS();
int64_t client_infer_start = timeline.TimeStampUS();
Response res;
int64_t client_infer_end = 0;
int64_t postprocess_start = 0;
int64_t postprocess_end = 0;
if (FLAGS_profile_client) {
if (FLAGS_profile_server) {
req.set_profile_server(true);
}
}
res.Clear();
if (_predictor->inference(&req, &res) != 0) {
LOG(ERROR) << "failed call predictor with req: " << req.ShortDebugString();
_api.thrd_clear();
return -1;
} else {
client_infer_end = timeline.TimeStampUS();
postprocess_start = client_infer_end;
VLOG(2) << "get model output num";
uint32_t model_num = res.outputs_size();
VLOG(2) << "model num: " << model_num;
for (uint32_t m_idx = 0; m_idx < model_num; ++m_idx) {
VLOG(2) << "process model output index: " << m_idx;
auto output = res.outputs(m_idx);
ModelRes model;
model.set_engine_name(output.engine_name());
int idx = 0;
for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name];
int shape_size = output.insts(0).tensor_array(idx).shape_size();
VLOG(2) << "fetch var " << name << " index " << idx << " shape size "
<< shape_size;
model._shape_map[name].resize(shape_size);
for (int i = 0; i < shape_size; ++i) {
model._shape_map[name][i] =
output.insts(0).tensor_array(idx).shape(i);
}
int lod_size = output.insts(0).tensor_array(idx).lod_size();
if (lod_size > 0) {
model._lod_map[name].resize(lod_size);
for (int i = 0; i < lod_size; ++i) {
model._lod_map[name][i] = output.insts(0).tensor_array(idx).lod(i);
}
}
idx += 1;
}
idx = 0;
for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name];
if (_fetch_name_to_type[name] == 0) {
VLOG(2) << "ferch var " << name << "type int";
model._int64_value_map[name].resize(
output.insts(0).tensor_array(idx).int64_data_size());
int size = output.insts(0).tensor_array(idx).int64_data_size();
for (int i = 0; i < size; ++i) {
model._int64_value_map[name][i] =
output.insts(0).tensor_array(idx).int64_data(i);
}
} else {
VLOG(2) << "fetch var " << name << "type float";
model._float_value_map[name].resize(
output.insts(0).tensor_array(idx).float_data_size());
int size = output.insts(0).tensor_array(idx).float_data_size();
for (int i = 0; i < size; ++i) {
model._float_value_map[name][i] =
output.insts(0).tensor_array(idx).float_data(i);
}
}
idx += 1;
}
predict_res_batch.add_model_res(std::move(model));
}
postprocess_end = timeline.TimeStampUS();
}
if (FLAGS_profile_client) {
std::ostringstream oss;
oss << "PROFILE\t"
<< "pid:" << pid << "\t"
<< "prepro_0:" << preprocess_start << " "
<< "prepro_1:" << preprocess_end << " "
<< "client_infer_0:" << client_infer_start << " "
<< "client_infer_1:" << client_infer_end << " ";
if (FLAGS_profile_server) {
int op_num = res.profile_time_size() / 2;
for (int i = 0; i < op_num; ++i) {
oss << "op" << i << "_0:" << res.profile_time(i * 2) << " ";
oss << "op" << i << "_1:" << res.profile_time(i * 2 + 1) << " ";
}
}
oss << "postpro_0:" << postprocess_start << " ";
oss << "postpro_1:" << postprocess_end;
fprintf(stderr, "%s\n", oss.str().c_str());
}
_api.thrd_clear();
return 0;
}
int PredictorClient::numpy_predict( int PredictorClient::numpy_predict(
const std::vector<std::vector<py::array_t<float>>> &float_feed_batch, const std::vector<std::vector<py::array_t<float>>> &float_feed_batch,
const std::vector<std::string> &float_feed_name, const std::vector<std::string> &float_feed_name,
const std::vector<std::vector<int>> &float_shape, const std::vector<std::vector<int>> &float_shape,
const std::vector<std::vector<int>> &float_lod_slot_batch,
const std::vector<std::vector<py::array_t<int64_t>>> &int_feed_batch, const std::vector<std::vector<py::array_t<int64_t>>> &int_feed_batch,
const std::vector<std::string> &int_feed_name, const std::vector<std::string> &int_feed_name,
const std::vector<std::vector<int>> &int_shape, const std::vector<std::vector<int>> &int_shape,
const std::vector<std::vector<int>> &int_lod_slot_batch,
const std::vector<std::string> &fetch_name, const std::vector<std::string> &fetch_name,
PredictorRes &predict_res_batch, PredictorRes &predict_res_batch,
const int &pid) { const int &pid,
const uint64_t log_id) {
int batch_size = std::max(float_feed_batch.size(), int_feed_batch.size()); int batch_size = std::max(float_feed_batch.size(), int_feed_batch.size());
VLOG(2) << "batch size: " << batch_size; VLOG(2) << "batch size: " << batch_size;
predict_res_batch.clear(); predict_res_batch.clear();
...@@ -362,6 +167,7 @@ int PredictorClient::numpy_predict( ...@@ -362,6 +167,7 @@ int PredictorClient::numpy_predict(
VLOG(2) << "int feed name size: " << int_feed_name.size(); VLOG(2) << "int feed name size: " << int_feed_name.size();
VLOG(2) << "max body size : " << brpc::fLU64::FLAGS_max_body_size; VLOG(2) << "max body size : " << brpc::fLU64::FLAGS_max_body_size;
Request req; Request req;
req.set_log_id(log_id);
for (auto &name : fetch_name) { for (auto &name : fetch_name) {
req.add_fetch_var_names(name); req.add_fetch_var_names(name);
} }
...@@ -394,6 +200,9 @@ int PredictorClient::numpy_predict( ...@@ -394,6 +200,9 @@ int PredictorClient::numpy_predict(
for (uint32_t j = 0; j < float_shape[vec_idx].size(); ++j) { for (uint32_t j = 0; j < float_shape[vec_idx].size(); ++j) {
tensor->add_shape(float_shape[vec_idx][j]); tensor->add_shape(float_shape[vec_idx][j]);
} }
for (uint32_t j = 0; j < float_lod_slot_batch[vec_idx].size(); ++j) {
tensor->add_lod(float_lod_slot_batch[vec_idx][j]);
}
tensor->set_elem_type(1); tensor->set_elem_type(1);
const int float_shape_size = float_shape[vec_idx].size(); const int float_shape_size = float_shape[vec_idx].size();
switch (float_shape_size) { switch (float_shape_size) {
...@@ -448,12 +257,22 @@ int PredictorClient::numpy_predict( ...@@ -448,12 +257,22 @@ int PredictorClient::numpy_predict(
for (auto &name : int_feed_name) { for (auto &name : int_feed_name) {
int idx = _feed_name_to_idx[name]; int idx = _feed_name_to_idx[name];
Tensor *tensor = tensor_vec[idx]; Tensor *tensor = tensor_vec[idx];
VLOG(2) << "prepare int feed " << name << " shape size "
<< int_shape[vec_idx].size();
for (uint32_t j = 0; j < int_shape[vec_idx].size(); ++j) { for (uint32_t j = 0; j < int_shape[vec_idx].size(); ++j) {
tensor->add_shape(int_shape[vec_idx][j]); tensor->add_shape(int_shape[vec_idx][j]);
} }
tensor->set_elem_type(0); for (uint32_t j = 0; j < int_lod_slot_batch[vec_idx].size(); ++j) {
tensor->add_lod(int_lod_slot_batch[vec_idx][j]);
}
tensor->set_elem_type(_type[idx]);
if (_type[idx] == 0) {
VLOG(2) << "prepare int feed " << name << " shape size "
<< int_shape[vec_idx].size();
} else {
VLOG(2) << "prepare int32 feed " << name << " shape size "
<< int_shape[vec_idx].size();
}
const int int_shape_size = int_shape[vec_idx].size(); const int int_shape_size = int_shape[vec_idx].size();
switch (int_shape_size) { switch (int_shape_size) {
...@@ -463,7 +282,11 @@ int PredictorClient::numpy_predict( ...@@ -463,7 +282,11 @@ int PredictorClient::numpy_predict(
for (ssize_t j = 0; j < int_array.shape(1); j++) { for (ssize_t j = 0; j < int_array.shape(1); j++) {
for (ssize_t k = 0; k < int_array.shape(2); k++) { for (ssize_t k = 0; k < int_array.shape(2); k++) {
for (ssize_t l = 0; k < int_array.shape(3); l++) { for (ssize_t l = 0; k < int_array.shape(3); l++) {
if (_type[idx] == 0) {
tensor->add_int64_data(int_array(i, j, k, l)); tensor->add_int64_data(int_array(i, j, k, l));
} else {
tensor->add_int_data(int_array(i, j, k, l));
}
} }
} }
} }
...@@ -475,7 +298,11 @@ int PredictorClient::numpy_predict( ...@@ -475,7 +298,11 @@ int PredictorClient::numpy_predict(
for (ssize_t i = 0; i < int_array.shape(0); i++) { for (ssize_t i = 0; i < int_array.shape(0); i++) {
for (ssize_t j = 0; j < int_array.shape(1); j++) { for (ssize_t j = 0; j < int_array.shape(1); j++) {
for (ssize_t k = 0; k < int_array.shape(2); k++) { for (ssize_t k = 0; k < int_array.shape(2); k++) {
if (_type[idx] == 0) {
tensor->add_int64_data(int_array(i, j, k)); tensor->add_int64_data(int_array(i, j, k));
} else {
tensor->add_int_data(int_array(i, j, k));
}
} }
} }
} }
...@@ -485,7 +312,11 @@ int PredictorClient::numpy_predict( ...@@ -485,7 +312,11 @@ int PredictorClient::numpy_predict(
auto int_array = int_feed[vec_idx].unchecked<2>(); auto int_array = int_feed[vec_idx].unchecked<2>();
for (ssize_t i = 0; i < int_array.shape(0); i++) { for (ssize_t i = 0; i < int_array.shape(0); i++) {
for (ssize_t j = 0; j < int_array.shape(1); j++) { for (ssize_t j = 0; j < int_array.shape(1); j++) {
if (_type[idx] == 0) {
tensor->add_int64_data(int_array(i, j)); tensor->add_int64_data(int_array(i, j));
} else {
tensor->add_int_data(int_array(i, j));
}
} }
} }
break; break;
...@@ -493,7 +324,11 @@ int PredictorClient::numpy_predict( ...@@ -493,7 +324,11 @@ int PredictorClient::numpy_predict(
case 1: { case 1: {
auto int_array = int_feed[vec_idx].unchecked<1>(); auto int_array = int_feed[vec_idx].unchecked<1>();
for (ssize_t i = 0; i < int_array.shape(0); i++) { for (ssize_t i = 0; i < int_array.shape(0); i++) {
if (_type[idx] == 0) {
tensor->add_int64_data(int_array(i)); tensor->add_int64_data(int_array(i));
} else {
tensor->add_int_data(int_array(i));
}
} }
break; break;
} }
...@@ -563,23 +398,23 @@ int PredictorClient::numpy_predict( ...@@ -563,23 +398,23 @@ int PredictorClient::numpy_predict(
for (auto &name : fetch_name) { for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name]; // int idx = _fetch_name_to_idx[name];
if (_fetch_name_to_type[name] == 0) { if (_fetch_name_to_type[name] == 0) {
VLOG(2) << "ferch var " << name << "type int"; VLOG(2) << "ferch var " << name << "type int64";
model._int64_value_map[name].resize(
output.insts(0).tensor_array(idx).int64_data_size());
int size = output.insts(0).tensor_array(idx).int64_data_size(); int size = output.insts(0).tensor_array(idx).int64_data_size();
for (int i = 0; i < size; ++i) { model._int64_value_map[name] = std::vector<int64_t>(
model._int64_value_map[name][i] = output.insts(0).tensor_array(idx).int64_data().begin(),
output.insts(0).tensor_array(idx).int64_data(i); output.insts(0).tensor_array(idx).int64_data().begin() + size);
} } else if (_fetch_name_to_type[name] == 1) {
} else {
VLOG(2) << "fetch var " << name << "type float"; VLOG(2) << "fetch var " << name << "type float";
model._float_value_map[name].resize(
output.insts(0).tensor_array(idx).float_data_size());
int size = output.insts(0).tensor_array(idx).float_data_size(); int size = output.insts(0).tensor_array(idx).float_data_size();
for (int i = 0; i < size; ++i) { model._float_value_map[name] = std::vector<float>(
model._float_value_map[name][i] = output.insts(0).tensor_array(idx).float_data().begin(),
output.insts(0).tensor_array(idx).float_data(i); output.insts(0).tensor_array(idx).float_data().begin() + size);
} } else if (_fetch_name_to_type[name] == 2) {
VLOG(2) << "fetch var " << name << "type int32";
int size = output.insts(0).tensor_array(idx).int_data_size();
model._int32_value_map[name] = std::vector<int32_t>(
output.insts(0).tensor_array(idx).int_data().begin(),
output.insts(0).tensor_array(idx).int_data().begin() + size);
} }
idx += 1; idx += 1;
} }
...@@ -613,7 +448,6 @@ int PredictorClient::numpy_predict( ...@@ -613,7 +448,6 @@ int PredictorClient::numpy_predict(
_api.thrd_clear(); _api.thrd_clear();
return 0; return 0;
} }
} // namespace general_model } // namespace general_model
} // namespace paddle_serving } // namespace paddle_serving
} // namespace baidu } // namespace baidu
...@@ -95,52 +95,34 @@ PYBIND11_MODULE(serving_client, m) { ...@@ -95,52 +95,34 @@ PYBIND11_MODULE(serving_client, m) {
[](PredictorClient &self) { self.create_predictor(); }) [](PredictorClient &self) { self.create_predictor(); })
.def("destroy_predictor", .def("destroy_predictor",
[](PredictorClient &self) { self.destroy_predictor(); }) [](PredictorClient &self) { self.destroy_predictor(); })
.def("batch_predict",
[](PredictorClient &self,
const std::vector<std::vector<std::vector<float>>>
&float_feed_batch,
const std::vector<std::string> &float_feed_name,
const std::vector<std::vector<int>> &float_shape,
const std::vector<std::vector<std::vector<int64_t>>>
&int_feed_batch,
const std::vector<std::string> &int_feed_name,
const std::vector<std::vector<int>> &int_shape,
const std::vector<std::string> &fetch_name,
PredictorRes &predict_res_batch,
const int &pid) {
return self.batch_predict(float_feed_batch,
float_feed_name,
float_shape,
int_feed_batch,
int_feed_name,
int_shape,
fetch_name,
predict_res_batch,
pid);
},
py::call_guard<py::gil_scoped_release>())
.def("numpy_predict", .def("numpy_predict",
[](PredictorClient &self, [](PredictorClient &self,
const std::vector<std::vector<py::array_t<float>>> const std::vector<std::vector<py::array_t<float>>>
&float_feed_batch, &float_feed_batch,
const std::vector<std::string> &float_feed_name, const std::vector<std::string> &float_feed_name,
const std::vector<std::vector<int>> &float_shape, const std::vector<std::vector<int>> &float_shape,
const std::vector<std::vector<int>> &float_lod_slot_batch,
const std::vector<std::vector<py::array_t<int64_t>>> const std::vector<std::vector<py::array_t<int64_t>>>
&int_feed_batch, &int_feed_batch,
const std::vector<std::string> &int_feed_name, const std::vector<std::string> &int_feed_name,
const std::vector<std::vector<int>> &int_shape, const std::vector<std::vector<int>> &int_shape,
const std::vector<std::vector<int>> &int_lod_slot_batch,
const std::vector<std::string> &fetch_name, const std::vector<std::string> &fetch_name,
PredictorRes &predict_res_batch, PredictorRes &predict_res_batch,
const int &pid) { const int &pid,
const uint64_t log_id) {
return self.numpy_predict(float_feed_batch, return self.numpy_predict(float_feed_batch,
float_feed_name, float_feed_name,
float_shape, float_shape,
float_lod_slot_batch,
int_feed_batch, int_feed_batch,
int_feed_name, int_feed_name,
int_shape, int_shape,
int_lod_slot_batch,
fetch_name, fetch_name,
predict_res_batch, predict_res_batch,
pid); pid,
log_id);
}, },
py::call_guard<py::gil_scoped_release>()); py::call_guard<py::gil_scoped_release>());
} }
......
...@@ -9,7 +9,7 @@ endif() ...@@ -9,7 +9,7 @@ endif()
target_include_directories(serving PUBLIC target_include_directories(serving PUBLIC
${CMAKE_CURRENT_BINARY_DIR}/../../core/predictor ${CMAKE_CURRENT_BINARY_DIR}/../../core/predictor
) )
include_directories(${CUDNN_ROOT}/include/)
if(WITH_GPU) if(WITH_GPU)
target_link_libraries(serving -Wl,--whole-archive fluid_gpu_engine target_link_libraries(serving -Wl,--whole-archive fluid_gpu_engine
-Wl,--no-whole-archive) -Wl,--no-whole-archive)
...@@ -29,7 +29,11 @@ if(WITH_GPU) ...@@ -29,7 +29,11 @@ if(WITH_GPU)
endif() endif()
if(WITH_MKL OR WITH_GPU) if(WITH_MKL OR WITH_GPU)
if (WITH_TRT)
target_link_libraries(serving -liomp5 -lmklml_intel -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -lbz2)
else()
target_link_libraries(serving -liomp5 -lmklml_intel -lmkldnn -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -lbz2) target_link_libraries(serving -liomp5 -lmklml_intel -lmkldnn -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -lbz2)
endif()
else() else()
target_link_libraries(serving openblas -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -lbz2) target_link_libraries(serving openblas -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -lbz2)
endif() endif()
......
...@@ -45,36 +45,41 @@ int GeneralCopyOp::inference() { ...@@ -45,36 +45,41 @@ int GeneralCopyOp::inference() {
const std::string pre_name = pre_node_names[0]; const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name); const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
VLOG(2) << "precedent name: " << pre_name; uint64_t log_id = input_blob->GetLogId();
VLOG(2) << "(logid=" << log_id << ") precedent name: " << pre_name;
const TensorVector *in = &input_blob->tensor_vector; const TensorVector *in = &input_blob->tensor_vector;
VLOG(2) << "input size: " << in->size(); VLOG(2) << "(logid=" << log_id << ") input size: " << in->size();
int batch_size = input_blob->GetBatchSize(); int batch_size = input_blob->GetBatchSize();
int input_var_num = 0; int input_var_num = 0;
GeneralBlob *res = mutable_data<GeneralBlob>(); GeneralBlob *res = mutable_data<GeneralBlob>();
res->SetLogId(log_id);
TensorVector *out = &res->tensor_vector; TensorVector *out = &res->tensor_vector;
VLOG(2) << "input batch size: " << batch_size; VLOG(2) << "(logid=" << log_id << ") input batch size: " << batch_size;
res->SetBatchSize(batch_size); res->SetBatchSize(batch_size);
if (!res) { if (!res) {
LOG(ERROR) << "Failed get op tls reader object output"; LOG(ERROR) << "(logid=" << log_id
<< ") Failed get op tls reader object output";
} }
Timer timeline; Timer timeline;
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
VLOG(2) << "Going to init lod tensor"; VLOG(2) << "(logid=" << log_id << ") Going to init lod tensor";
for (int i = 0; i < in->size(); ++i) { for (int i = 0; i < in->size(); ++i) {
paddle::PaddleTensor lod_tensor; paddle::PaddleTensor lod_tensor;
CopyLod(&in->at(i), &lod_tensor); CopyLod(&in->at(i), &lod_tensor);
lod_tensor.dtype = in->at(i).dtype; lod_tensor.dtype = in->at(i).dtype;
lod_tensor.name = in->at(i).name; lod_tensor.name = in->at(i).name;
VLOG(2) << "lod tensor [" << i << "].name = " << lod_tensor.name; VLOG(2) << "(logid=" << log_id << ") lod tensor [" << i
<< "].name = " << lod_tensor.name;
out->push_back(lod_tensor); out->push_back(lod_tensor);
} }
VLOG(2) << "pack done."; VLOG(2) << "(logid=" << log_id << ") pack done.";
for (int i = 0; i < out->size(); ++i) { for (int i = 0; i < out->size(); ++i) {
int64_t *src_ptr = static_cast<int64_t *>(in->at(i).data.data()); int64_t *src_ptr = static_cast<int64_t *>(in->at(i).data.data());
...@@ -86,7 +91,7 @@ int GeneralCopyOp::inference() { ...@@ -86,7 +91,7 @@ int GeneralCopyOp::inference() {
} }
} }
VLOG(2) << "output done."; VLOG(2) << "(logid=" << log_id << ") output done.";
timeline.Pause(); timeline.Pause();
int64_t end = timeline.TimeStampUS(); int64_t end = timeline.TimeStampUS();
...@@ -94,7 +99,7 @@ int GeneralCopyOp::inference() { ...@@ -94,7 +99,7 @@ int GeneralCopyOp::inference() {
AddBlobInfo(res, start); AddBlobInfo(res, start);
AddBlobInfo(res, end); AddBlobInfo(res, end);
VLOG(2) << "read data from client success"; VLOG(2) << "(logid=" << log_id << ") read data from client success";
return 0; return 0;
} }
......
...@@ -13,20 +13,12 @@ ...@@ -13,20 +13,12 @@
// limitations under the License. // limitations under the License.
#pragma once #pragma once
#include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT
#endif
#include <string> #include <string>
#include <vector>
#include "core/general-server/general_model_service.pb.h" #include "core/general-server/general_model_service.pb.h"
#include "core/general-server/op/general_infer_helper.h" #include "core/general-server/op/general_infer_helper.h"
#include "core/predictor/framework/resource.h" #include "core/predictor/framework/resource.h"
#include "paddle_inference_api.h" // NOLINT
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
......
...@@ -50,18 +50,20 @@ int GeneralDistKVInferOp::inference() { ...@@ -50,18 +50,20 @@ int GeneralDistKVInferOp::inference() {
const std::string pre_name = pre_node_names[0]; const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name); const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
VLOG(2) << "Get precedent op name: " << pre_name; uint64_t log_id = input_blob->GetLogId();
VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name;
GeneralBlob *output_blob = mutable_data<GeneralBlob>(); GeneralBlob *output_blob = mutable_data<GeneralBlob>();
if (!input_blob) { if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op:" << pre_name; LOG(ERROR) << "(logid=" << log_id
<< ") Failed mutable depended argument, op:" << pre_name;
return -1; return -1;
} }
const TensorVector *in = &input_blob->tensor_vector; const TensorVector *in = &input_blob->tensor_vector;
TensorVector *out = &output_blob->tensor_vector; TensorVector *out = &output_blob->tensor_vector;
int batch_size = input_blob->GetBatchSize(); int batch_size = input_blob->GetBatchSize();
VLOG(2) << "input batch size: " << batch_size; VLOG(2) << "(logid=" << log_id << ") input batch size: " << batch_size;
std::vector<uint64_t> keys; std::vector<uint64_t> keys;
std::vector<rec::mcube::CubeValue> values; std::vector<rec::mcube::CubeValue> values;
int sparse_count = 0; int sparse_count = 0;
...@@ -90,16 +92,20 @@ int GeneralDistKVInferOp::inference() { ...@@ -90,16 +92,20 @@ int GeneralDistKVInferOp::inference() {
keys.begin() + key_idx); keys.begin() + key_idx);
key_idx += dataptr_size_pairs[i].second; key_idx += dataptr_size_pairs[i].second;
} }
Timer timeline;
int64_t cube_start = timeline.TimeStampUS();
timeline.Start();
rec::mcube::CubeAPI *cube = rec::mcube::CubeAPI::instance(); rec::mcube::CubeAPI *cube = rec::mcube::CubeAPI::instance();
std::vector<std::string> table_names = cube->get_table_names(); std::vector<std::string> table_names = cube->get_table_names();
if (table_names.size() == 0) { if (table_names.size() == 0) {
LOG(ERROR) << "cube init error or cube config not given."; LOG(ERROR) << "(logid=" << log_id
<< ") cube init error or cube config not given.";
return -1; return -1;
} }
int ret = cube->seek(table_names[0], keys, &values); int ret = cube->seek(table_names[0], keys, &values);
int64_t cube_end = timeline.TimeStampUS();
if (values.size() != keys.size() || values[0].buff.size() == 0) { if (values.size() != keys.size() || values[0].buff.size() == 0) {
LOG(ERROR) << "cube value return null"; LOG(ERROR) << "(logid=" << log_id << ") cube value return null";
} }
size_t EMBEDDING_SIZE = values[0].buff.size() / sizeof(float); size_t EMBEDDING_SIZE = values[0].buff.size() / sizeof(float);
TensorVector sparse_out; TensorVector sparse_out;
...@@ -150,21 +156,23 @@ int GeneralDistKVInferOp::inference() { ...@@ -150,21 +156,23 @@ int GeneralDistKVInferOp::inference() {
infer_in.insert(infer_in.end(), sparse_out.begin(), sparse_out.end()); infer_in.insert(infer_in.end(), sparse_out.begin(), sparse_out.end());
output_blob->SetBatchSize(batch_size); output_blob->SetBatchSize(batch_size);
output_blob->SetLogId(log_id);
VLOG(2) << "infer batch size: " << batch_size; VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size;
Timer timeline;
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
timeline.Start();
if (InferManager::instance().infer( if (InferManager::instance().infer(
engine_name().c_str(), &infer_in, out, batch_size)) { engine_name().c_str(), &infer_in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << engine_name(); LOG(ERROR) << "(logid=" << log_id
<< ") Failed do infer in fluid model: " << engine_name();
return -1; return -1;
} }
int64_t end = timeline.TimeStampUS(); int64_t end = timeline.TimeStampUS();
CopyBlobInfo(input_blob, output_blob); CopyBlobInfo(input_blob, output_blob);
AddBlobInfo(output_blob, cube_start);
AddBlobInfo(output_blob, cube_end);
AddBlobInfo(output_blob, start); AddBlobInfo(output_blob, start);
AddBlobInfo(output_blob, end); AddBlobInfo(output_blob, end);
return 0; return 0;
......
...@@ -15,17 +15,9 @@ ...@@ -15,17 +15,9 @@
#pragma once #pragma once
#include <string> #include <string>
#include <vector> #include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT
#endif
#include "core/general-server/general_model_service.pb.h" #include "core/general-server/general_model_service.pb.h"
#include "core/general-server/op/general_infer_helper.h" #include "core/general-server/op/general_infer_helper.h"
#include "paddle_inference_api.h" // NOLINT
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
......
...@@ -59,10 +59,13 @@ int GeneralDistKVQuantInferOp::inference() { ...@@ -59,10 +59,13 @@ int GeneralDistKVQuantInferOp::inference() {
return -1; return -1;
} }
uint64_t log_id = input_blob->GetLogId();
output_blob->SetLogId(log_id);
const TensorVector *in = &input_blob->tensor_vector; const TensorVector *in = &input_blob->tensor_vector;
TensorVector *out = &output_blob->tensor_vector; TensorVector *out = &output_blob->tensor_vector;
int batch_size = input_blob->GetBatchSize(); int batch_size = input_blob->GetBatchSize();
VLOG(2) << "input batch size: " << batch_size; VLOG(2) << "(logid=" << log_id << ") input batch size: " << batch_size;
std::vector<uint64_t> keys; std::vector<uint64_t> keys;
std::vector<rec::mcube::CubeValue> values; std::vector<rec::mcube::CubeValue> values;
int sparse_count = 0; int sparse_count = 0;
...@@ -94,13 +97,14 @@ int GeneralDistKVQuantInferOp::inference() { ...@@ -94,13 +97,14 @@ int GeneralDistKVQuantInferOp::inference() {
rec::mcube::CubeAPI *cube = rec::mcube::CubeAPI::instance(); rec::mcube::CubeAPI *cube = rec::mcube::CubeAPI::instance();
std::vector<std::string> table_names = cube->get_table_names(); std::vector<std::string> table_names = cube->get_table_names();
if (table_names.size() == 0) { if (table_names.size() == 0) {
LOG(ERROR) << "cube init error or cube config not given."; LOG(ERROR) << "(logid=" << log_id
<< ") cube init error or cube config not given.";
return -1; return -1;
} }
int ret = cube->seek(table_names[0], keys, &values); int ret = cube->seek(table_names[0], keys, &values);
if (values.size() != keys.size() || values[0].buff.size() == 0) { if (values.size() != keys.size() || values[0].buff.size() == 0) {
LOG(ERROR) << "cube value return null"; LOG(ERROR) << "(logid=" << log_id << ") cube value return null";
} }
TensorVector sparse_out; TensorVector sparse_out;
...@@ -182,7 +186,7 @@ int GeneralDistKVQuantInferOp::inference() { ...@@ -182,7 +186,7 @@ int GeneralDistKVQuantInferOp::inference() {
output_blob->SetBatchSize(batch_size); output_blob->SetBatchSize(batch_size);
VLOG(2) << "infer batch size: " << batch_size; VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size;
Timer timeline; Timer timeline;
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
...@@ -190,7 +194,8 @@ int GeneralDistKVQuantInferOp::inference() { ...@@ -190,7 +194,8 @@ int GeneralDistKVQuantInferOp::inference() {
if (InferManager::instance().infer( if (InferManager::instance().infer(
engine_name().c_str(), &infer_in, out, batch_size)) { engine_name().c_str(), &infer_in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << engine_name(); LOG(ERROR) << "(logid=" << log_id
<< ") Failed do infer in fluid model: " << engine_name();
return -1; return -1;
} }
......
...@@ -15,17 +15,9 @@ ...@@ -15,17 +15,9 @@
#pragma once #pragma once
#include <string> #include <string>
#include <vector> #include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT
#endif
#include "core/general-server/general_model_service.pb.h" #include "core/general-server/general_model_service.pb.h"
#include "core/general-server/op/general_infer_helper.h" #include "core/general-server/op/general_infer_helper.h"
#include "paddle_inference_api.h" // NOLINT
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
......
...@@ -15,17 +15,9 @@ ...@@ -15,17 +15,9 @@
#pragma once #pragma once
#include <string.h> #include <string.h>
#include <string>
#include <vector> #include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT #include "paddle_inference_api.h" // NOLINT
#endif
#include <string>
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
...@@ -35,6 +27,7 @@ struct GeneralBlob { ...@@ -35,6 +27,7 @@ struct GeneralBlob {
std::vector<paddle::PaddleTensor> tensor_vector; std::vector<paddle::PaddleTensor> tensor_vector;
int64_t time_stamp[20]; int64_t time_stamp[20];
int p_size = 0; int p_size = 0;
uint64_t _log_id = -1; // for logging
int _batch_size; int _batch_size;
...@@ -46,9 +39,11 @@ struct GeneralBlob { ...@@ -46,9 +39,11 @@ struct GeneralBlob {
tensor_vector.clear(); tensor_vector.clear();
} }
int SetBatchSize(int batch_size) { _batch_size = batch_size; } void SetBatchSize(int batch_size) { _batch_size = batch_size; }
void SetLogId(uint64_t log_id) { _log_id = log_id; }
int GetBatchSize() const { return _batch_size; } int GetBatchSize() const { return _batch_size; }
uint64_t GetLogId() const { return _log_id; }
std::string ShortDebugString() const { return "Not implemented!"; } std::string ShortDebugString() const { return "Not implemented!"; }
}; };
......
...@@ -47,22 +47,26 @@ int GeneralInferOp::inference() { ...@@ -47,22 +47,26 @@ int GeneralInferOp::inference() {
const std::string pre_name = pre_node_names[0]; const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name); const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
VLOG(2) << "Get precedent op name: " << pre_name; uint64_t log_id = input_blob->GetLogId();
VLOG(2) << "(logid=" << log_id << ") Get precedent op name: " << pre_name;
GeneralBlob *output_blob = mutable_data<GeneralBlob>(); GeneralBlob *output_blob = mutable_data<GeneralBlob>();
output_blob->SetLogId(log_id);
if (!input_blob) { if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op:" << pre_name; LOG(ERROR) << "(logid=" << log_id
<< ") Failed mutable depended argument, op:" << pre_name;
return -1; return -1;
} }
const TensorVector *in = &input_blob->tensor_vector; const TensorVector *in = &input_blob->tensor_vector;
TensorVector *out = &output_blob->tensor_vector; TensorVector *out = &output_blob->tensor_vector;
int batch_size = input_blob->GetBatchSize();
VLOG(2) << "input batch size: " << batch_size;
output_blob->SetBatchSize(batch_size); int batch_size = input_blob->_batch_size;
VLOG(2) << "(logid=" << log_id << ") input batch size: " << batch_size;
VLOG(2) << "infer batch size: " << batch_size; output_blob->_batch_size = batch_size;
VLOG(2) << "(logid=" << log_id << ") infer batch size: " << batch_size;
Timer timeline; Timer timeline;
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
...@@ -70,7 +74,8 @@ int GeneralInferOp::inference() { ...@@ -70,7 +74,8 @@ int GeneralInferOp::inference() {
if (InferManager::instance().infer( if (InferManager::instance().infer(
engine_name().c_str(), in, out, batch_size)) { engine_name().c_str(), in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << engine_name().c_str(); LOG(ERROR) << "(logid=" << log_id
<< ") Failed do infer in fluid model: " << engine_name().c_str();
return -1; return -1;
} }
......
...@@ -15,17 +15,9 @@ ...@@ -15,17 +15,9 @@
#pragma once #pragma once
#include <string> #include <string>
#include <vector> #include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT
#endif
#include "core/general-server/general_model_service.pb.h" #include "core/general-server/general_model_service.pb.h"
#include "core/general-server/op/general_infer_helper.h" #include "core/general-server/op/general_infer_helper.h"
#include "paddle_inference_api.h" // NOLINT
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
......
...@@ -37,9 +37,9 @@ int conf_check(const Request *req, ...@@ -37,9 +37,9 @@ int conf_check(const Request *req,
const std::shared_ptr<PaddleGeneralModelConfig> &model_config) { const std::shared_ptr<PaddleGeneralModelConfig> &model_config) {
int var_num = req->insts(0).tensor_array_size(); int var_num = req->insts(0).tensor_array_size();
if (var_num != model_config->_feed_type.size()) { if (var_num != model_config->_feed_type.size()) {
VLOG(2) << "var num: " << var_num; LOG(ERROR) << "feed var number not match: model config["
VLOG(2) << "model config var num: " << model_config->_feed_type.size(); << model_config->_feed_type.size() << "] vs. actual[" << var_num
LOG(ERROR) << "feed var number not match."; << "]";
return -1; return -1;
} }
...@@ -72,8 +72,7 @@ int conf_check(const Request *req, ...@@ -72,8 +72,7 @@ int conf_check(const Request *req,
int GeneralReaderOp::inference() { int GeneralReaderOp::inference() {
// reade request from client // reade request from client
const Request *req = dynamic_cast<const Request *>(get_request_message()); const Request *req = dynamic_cast<const Request *>(get_request_message());
uint64_t log_id = req->log_id();
int batch_size = req->insts_size();
int input_var_num = 0; int input_var_num = 0;
std::vector<int64_t> elem_type; std::vector<int64_t> elem_type;
std::vector<int64_t> elem_size; std::vector<int64_t> elem_size;
...@@ -82,26 +81,29 @@ int GeneralReaderOp::inference() { ...@@ -82,26 +81,29 @@ int GeneralReaderOp::inference() {
GeneralBlob *res = mutable_data<GeneralBlob>(); GeneralBlob *res = mutable_data<GeneralBlob>();
TensorVector *out = &res->tensor_vector; TensorVector *out = &res->tensor_vector;
res->SetBatchSize(batch_size); res->SetLogId(log_id);
if (!res) { if (!res) {
LOG(ERROR) << "Failed get op tls reader object output"; LOG(ERROR) << "(logid=" << log_id
<< ") Failed get op tls reader object output";
} }
Timer timeline; Timer timeline;
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
int var_num = req->insts(0).tensor_array_size(); int var_num = req->insts(0).tensor_array_size();
VLOG(2) << "var num: " << var_num; VLOG(2) << "(logid=" << log_id << ") var num: " << var_num;
VLOG(2) << "(logid=" << log_id
<< ") start to call load general model_conf op";
VLOG(2) << "start to call load general model_conf op";
baidu::paddle_serving::predictor::Resource &resource = baidu::paddle_serving::predictor::Resource &resource =
baidu::paddle_serving::predictor::Resource::instance(); baidu::paddle_serving::predictor::Resource::instance();
VLOG(2) << "get resource pointer done."; VLOG(2) << "(logid=" << log_id << ") get resource pointer done.";
std::shared_ptr<PaddleGeneralModelConfig> model_config = std::shared_ptr<PaddleGeneralModelConfig> model_config =
resource.get_general_model_config(); resource.get_general_model_config();
VLOG(2) << "print general model config done."; VLOG(2) << "(logid=" << log_id << ") print general model config done.";
// TODO(guru4elephant): how to do conditional check? // TODO(guru4elephant): how to do conditional check?
/* /*
...@@ -117,7 +119,6 @@ int GeneralReaderOp::inference() { ...@@ -117,7 +119,6 @@ int GeneralReaderOp::inference() {
elem_type.resize(var_num); elem_type.resize(var_num);
elem_size.resize(var_num); elem_size.resize(var_num);
capacity.resize(var_num); capacity.resize(var_num);
// prepare basic information for input // prepare basic information for input
for (int i = 0; i < var_num; ++i) { for (int i = 0; i < var_num; ++i) {
paddle::PaddleTensor lod_tensor; paddle::PaddleTensor lod_tensor;
...@@ -126,47 +127,64 @@ int GeneralReaderOp::inference() { ...@@ -126,47 +127,64 @@ int GeneralReaderOp::inference() {
if (elem_type[i] == 0) { // int64 if (elem_type[i] == 0) { // int64
elem_size[i] = sizeof(int64_t); elem_size[i] = sizeof(int64_t);
lod_tensor.dtype = paddle::PaddleDType::INT64; lod_tensor.dtype = paddle::PaddleDType::INT64;
} else { } else if (elem_type[i] == 1) {
elem_size[i] = sizeof(float); elem_size[i] = sizeof(float);
lod_tensor.dtype = paddle::PaddleDType::FLOAT32; lod_tensor.dtype = paddle::PaddleDType::FLOAT32;
} else if (elem_type[i] == 2) {
elem_size[i] = sizeof(int32_t);
lod_tensor.dtype = paddle::PaddleDType::INT32;
} }
// implement lod tensor here
if (model_config->_is_lod_feed[i]) { if (req->insts(0).tensor_array(i).lod_size() > 0) {
VLOG(2) << "(logid=" << log_id << ") var[" << i << "] is lod_tensor";
lod_tensor.lod.resize(1); lod_tensor.lod.resize(1);
lod_tensor.lod[0].push_back(0); for (int k = 0; k < req->insts(0).tensor_array(i).lod_size(); ++k) {
VLOG(2) << "var[" << i << "] is lod_tensor"; lod_tensor.lod[0].push_back(req->insts(0).tensor_array(i).lod(k));
}
capacity[i] = 1;
for (int k = 0; k < req->insts(0).tensor_array(i).shape_size(); ++k) {
int dim = req->insts(0).tensor_array(i).shape(k);
VLOG(2) << "(logid=" << log_id << ") shape for var[" << i
<< "]: " << dim;
capacity[i] *= dim;
lod_tensor.shape.push_back(dim);
}
VLOG(2) << "(logid=" << log_id << ") var[" << i
<< "] is tensor, capacity: " << capacity[i];
} else { } else {
lod_tensor.shape.push_back(batch_size);
capacity[i] = 1; capacity[i] = 1;
for (int k = 0; k < req->insts(0).tensor_array(i).shape_size(); ++k) { for (int k = 0; k < req->insts(0).tensor_array(i).shape_size(); ++k) {
int dim = req->insts(0).tensor_array(i).shape(k); int dim = req->insts(0).tensor_array(i).shape(k);
VLOG(2) << "shape for var[" << i << "]: " << dim; VLOG(2) << "(logid=" << log_id << ") shape for var[" << i
<< "]: " << dim;
capacity[i] *= dim; capacity[i] *= dim;
lod_tensor.shape.push_back(dim); lod_tensor.shape.push_back(dim);
} }
VLOG(2) << "var[" << i << "] is tensor, capacity: " << capacity[i]; VLOG(2) << "(logid=" << log_id << ") var[" << i
<< "] is tensor, capacity: " << capacity[i];
} }
lod_tensor.name = model_config->_feed_name[i]; lod_tensor.name = model_config->_feed_name[i];
out->push_back(lod_tensor); out->push_back(lod_tensor);
} }
// specify the memory needed for output tensor_vector // specify the memory needed for output tensor_vector
for (int i = 0; i < var_num; ++i) { for (int i = 0; i < var_num; ++i) {
if (out->at(i).lod.size() == 1) { if (out->at(i).lod.size() == 1) {
int tensor_size = 0; int tensor_size = 0;
for (int j = 0; j < batch_size; ++j) { const Tensor &tensor = req->insts(0).tensor_array(i);
const Tensor &tensor = req->insts(j).tensor_array(i);
int data_len = 0; int data_len = 0;
if (tensor.int64_data_size() > 0) { if (tensor.int64_data_size() > 0) {
data_len = tensor.int64_data_size(); data_len = tensor.int64_data_size();
} else { } else if (tensor.float_data_size() > 0) {
data_len = tensor.float_data_size(); data_len = tensor.float_data_size();
} else if (tensor.int_data_size() > 0) {
data_len = tensor.int_data_size();
} }
VLOG(2) << "tensor size for var[" << i << "]: " << data_len; VLOG(2) << "(logid=" << log_id << ") tensor size for var[" << i
<< "]: " << data_len;
tensor_size += data_len; tensor_size += data_len;
int cur_len = out->at(i).lod[0].back(); int cur_len = out->at(i).lod[0].back();
VLOG(2) << "current len: " << cur_len; VLOG(2) << "(logid=" << log_id << ") current len: " << cur_len;
int sample_len = 0; int sample_len = 0;
if (tensor.shape_size() == 1) { if (tensor.shape_size() == 1) {
...@@ -174,23 +192,14 @@ int GeneralReaderOp::inference() { ...@@ -174,23 +192,14 @@ int GeneralReaderOp::inference() {
} else { } else {
sample_len = tensor.shape(0); sample_len = tensor.shape(0);
} }
out->at(i).lod[0].push_back(cur_len + sample_len); VLOG(2) << "(logid=" << log_id << ") new len: " << cur_len + sample_len;
VLOG(2) << "new len: " << cur_len + sample_len;
}
out->at(i).data.Resize(tensor_size * elem_size[i]); out->at(i).data.Resize(tensor_size * elem_size[i]);
out->at(i).shape = {out->at(i).lod[0].back()}; VLOG(2) << "(logid=" << log_id << ") var[" << i
for (int j = 1; j < req->insts(0).tensor_array(i).shape_size(); ++j) {
out->at(i).shape.push_back(req->insts(0).tensor_array(i).shape(j));
}
if (out->at(i).shape.size() == 1) {
out->at(i).shape.push_back(1);
}
VLOG(2) << "var[" << i
<< "] is lod_tensor and len=" << out->at(i).lod[0].back(); << "] is lod_tensor and len=" << out->at(i).lod[0].back();
} else { } else {
out->at(i).data.Resize(batch_size * capacity[i] * elem_size[i]); out->at(i).data.Resize(capacity[i] * elem_size[i]);
VLOG(2) << "var[" << i VLOG(2) << "(logid=" << log_id << ") var[" << i
<< "] is tensor and capacity=" << batch_size * capacity[i]; << "] is tensor and capacity=" << capacity[i];
} }
} }
...@@ -198,44 +207,43 @@ int GeneralReaderOp::inference() { ...@@ -198,44 +207,43 @@ int GeneralReaderOp::inference() {
for (int i = 0; i < var_num; ++i) { for (int i = 0; i < var_num; ++i) {
if (elem_type[i] == 0) { if (elem_type[i] == 0) {
int64_t *dst_ptr = static_cast<int64_t *>(out->at(i).data.data()); int64_t *dst_ptr = static_cast<int64_t *>(out->at(i).data.data());
VLOG(2) << "(logid=" << log_id << ") first element data in var[" << i
<< "] is " << req->insts(0).tensor_array(i).int64_data(0);
int offset = 0; int offset = 0;
for (int j = 0; j < batch_size; ++j) { int elem_num = req->insts(0).tensor_array(i).int64_data_size();
int elem_num = req->insts(j).tensor_array(i).int64_data_size();
for (int k = 0; k < elem_num; ++k) { for (int k = 0; k < elem_num; ++k) {
dst_ptr[offset + k] = req->insts(j).tensor_array(i).int64_data(k); dst_ptr[offset + k] = req->insts(0).tensor_array(i).int64_data(k);
}
if (out->at(i).lod.size() == 1) {
offset = out->at(i).lod[0][j + 1];
} else {
offset += capacity[i];
}
} }
} else { } else if (elem_type[i] == 1) {
float *dst_ptr = static_cast<float *>(out->at(i).data.data()); float *dst_ptr = static_cast<float *>(out->at(i).data.data());
VLOG(2) << "(logid=" << log_id << ") first element data in var[" << i
<< "] is " << req->insts(0).tensor_array(i).float_data(0);
int offset = 0; int offset = 0;
for (int j = 0; j < batch_size; ++j) { int elem_num = req->insts(0).tensor_array(i).float_data_size();
int elem_num = req->insts(j).tensor_array(i).float_data_size();
for (int k = 0; k < elem_num; ++k) { for (int k = 0; k < elem_num; ++k) {
dst_ptr[offset + k] = req->insts(j).tensor_array(i).float_data(k); dst_ptr[offset + k] = req->insts(0).tensor_array(i).float_data(k);
}
if (out->at(i).lod.size() == 1) {
offset = out->at(i).lod[0][j + 1];
} else {
offset += capacity[i];
} }
} else if (elem_type[i] == 2) {
int32_t *dst_ptr = static_cast<int32_t *>(out->at(i).data.data());
VLOG(2) << "(logid=" << log_id << ") first element data in var[" << i
<< "] is " << req->insts(0).tensor_array(i).int_data(0);
int offset = 0;
int elem_num = req->insts(0).tensor_array(i).int_data_size();
for (int k = 0; k < elem_num; ++k) {
dst_ptr[offset + k] = req->insts(0).tensor_array(i).int_data(k);
} }
} }
} }
VLOG(2) << "output size: " << out->size(); VLOG(2) << "(logid=" << log_id << ") output size: " << out->size();
timeline.Pause(); timeline.Pause();
int64_t end = timeline.TimeStampUS(); int64_t end = timeline.TimeStampUS();
res->p_size = 0; res->p_size = 0;
res->_batch_size = 1;
AddBlobInfo(res, start); AddBlobInfo(res, start);
AddBlobInfo(res, end); AddBlobInfo(res, end);
VLOG(2) << "read data from client success"; VLOG(2) << "(logid=" << log_id << ") read data from client success";
return 0; return 0;
} }
DEFINE_OP(GeneralReaderOp); DEFINE_OP(GeneralReaderOp);
......
...@@ -13,21 +13,13 @@ ...@@ -13,21 +13,13 @@
// limitations under the License. // limitations under the License.
#pragma once #pragma once
#include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT
#endif
#include <string> #include <string>
#include <vector>
#include "core/general-server/general_model_service.pb.h" #include "core/general-server/general_model_service.pb.h"
#include "core/general-server/load_general_model_service.pb.h" #include "core/general-server/load_general_model_service.pb.h"
#include "core/general-server/op/general_infer_helper.h" #include "core/general-server/op/general_infer_helper.h"
#include "core/predictor/framework/resource.h" #include "core/predictor/framework/resource.h"
#include "paddle_inference_api.h" // NOLINT
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
......
...@@ -42,6 +42,9 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; ...@@ -42,6 +42,9 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralResponseOp::inference() { int GeneralResponseOp::inference() {
const std::vector<std::string> pre_node_names = pre_names(); const std::vector<std::string> pre_node_names = pre_names();
VLOG(2) << "pre node names size: " << pre_node_names.size(); VLOG(2) << "pre node names size: " << pre_node_names.size();
const GeneralBlob *input_blob;
uint64_t log_id =
get_depend_argument<GeneralBlob>(pre_node_names[0])->GetLogId();
const Request *req = dynamic_cast<const Request *>(get_request_message()); const Request *req = dynamic_cast<const Request *>(get_request_message());
// response inst with only fetch_var_names // response inst with only fetch_var_names
...@@ -52,15 +55,17 @@ int GeneralResponseOp::inference() { ...@@ -52,15 +55,17 @@ int GeneralResponseOp::inference() {
// timeline.Start(); // timeline.Start();
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
VLOG(2) << "start to call load general model_conf op"; VLOG(2) << "(logid=" << log_id
<< ") start to call load general model_conf op";
baidu::paddle_serving::predictor::Resource &resource = baidu::paddle_serving::predictor::Resource &resource =
baidu::paddle_serving::predictor::Resource::instance(); baidu::paddle_serving::predictor::Resource::instance();
VLOG(2) << "get resource pointer done."; VLOG(2) << "(logid=" << log_id << ") get resource pointer done.";
std::shared_ptr<PaddleGeneralModelConfig> model_config = std::shared_ptr<PaddleGeneralModelConfig> model_config =
resource.get_general_model_config(); resource.get_general_model_config();
VLOG(2) << "max body size : " << brpc::fLU64::FLAGS_max_body_size; VLOG(2) << "(logid=" << log_id
<< ") max body size : " << brpc::fLU64::FLAGS_max_body_size;
std::vector<int> fetch_index; std::vector<int> fetch_index;
fetch_index.resize(req->fetch_var_names_size()); fetch_index.resize(req->fetch_var_names_size());
...@@ -69,16 +74,16 @@ int GeneralResponseOp::inference() { ...@@ -69,16 +74,16 @@ int GeneralResponseOp::inference() {
model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)]; model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)];
} }
const GeneralBlob *input_blob;
for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) { for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) {
const std::string &pre_name = pre_node_names[pi]; const std::string &pre_name = pre_node_names[pi];
VLOG(2) << "pre names[" << pi << "]: " << pre_name << " (" VLOG(2) << "(logid=" << log_id << ") pre names[" << pi << "]: " << pre_name
<< pre_node_names.size() << ")"; << " (" << pre_node_names.size() << ")";
input_blob = get_depend_argument<GeneralBlob>(pre_name); input_blob = get_depend_argument<GeneralBlob>(pre_name);
// fprintf(stderr, "input(%s) blob address %x\n", pre_names.c_str(), // fprintf(stderr, "input(%s) blob address %x\n", pre_names.c_str(),
// input_blob); // input_blob);
if (!input_blob) { if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op: " << pre_name; LOG(ERROR) << "(logid=" << log_id
<< ") Failed mutable depended argument, op: " << pre_name;
return -1; return -1;
} }
...@@ -91,19 +96,20 @@ int GeneralResponseOp::inference() { ...@@ -91,19 +96,20 @@ int GeneralResponseOp::inference() {
for (auto &idx : fetch_index) { for (auto &idx : fetch_index) {
Tensor *tensor = fetch_inst->add_tensor_array(); Tensor *tensor = fetch_inst->add_tensor_array();
tensor->set_elem_type(1);
if (model_config->_is_lod_fetch[idx]) { if (model_config->_is_lod_fetch[idx]) {
VLOG(2) << "out[" << idx << "] " << model_config->_fetch_name[idx] VLOG(2) << "(logid=" << log_id << ") out[" << idx << "] "
<< " is lod_tensor"; << model_config->_fetch_name[idx] << " is lod_tensor";
for (int k = 0; k < in->at(idx).shape.size(); ++k) { for (int k = 0; k < in->at(idx).shape.size(); ++k) {
VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k]; VLOG(2) << "(logid=" << log_id << ") shape[" << k
<< "]: " << in->at(idx).shape[k];
tensor->add_shape(in->at(idx).shape[k]); tensor->add_shape(in->at(idx).shape[k]);
} }
} else { } else {
VLOG(2) << "out[" << idx << "] " << model_config->_fetch_name[idx] VLOG(2) << "(logid=" << log_id << ") out[" << idx << "] "
<< " is tensor"; << model_config->_fetch_name[idx] << " is tensor";
for (int k = 0; k < in->at(idx).shape.size(); ++k) { for (int k = 0; k < in->at(idx).shape.size(); ++k) {
VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k]; VLOG(2) << "(logid=" << log_id << ") shape[" << k
<< "]: " << in->at(idx).shape[k];
tensor->add_shape(in->at(idx).shape[k]); tensor->add_shape(in->at(idx).shape[k]);
} }
} }
...@@ -115,51 +121,53 @@ int GeneralResponseOp::inference() { ...@@ -115,51 +121,53 @@ int GeneralResponseOp::inference() {
for (int j = 0; j < in->at(idx).shape.size(); ++j) { for (int j = 0; j < in->at(idx).shape.size(); ++j) {
cap *= in->at(idx).shape[j]; cap *= in->at(idx).shape[j];
} }
if (in->at(idx).dtype == paddle::PaddleDType::INT64) {
VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx]
<< "].";
int64_t *data_ptr = static_cast<int64_t *>(in->at(idx).data.data());
if (model_config->_is_lod_fetch[idx]) {
FetchInst *fetch_p = output->mutable_insts(0);
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_lod(
in->at(idx).lod[0][j]);
}
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_int64_data(data_ptr[j]);
}
} else {
FetchInst *fetch_p = output->mutable_insts(0); FetchInst *fetch_p = output->mutable_insts(0);
for (int j = 0; j < cap; ++j) { auto dtype = in->at(idx).dtype;
fetch_p->mutable_tensor_array(var_idx)->add_int64_data(data_ptr[j]);
} if (dtype == paddle::PaddleDType::INT64) {
} VLOG(2) << "(logid=" << log_id << ") Prepare int64 var ["
VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready"; << model_config->_fetch_name[idx] << "].";
var_idx++; int64_t *data_ptr = static_cast<int64_t *>(in->at(idx).data.data());
} else if (in->at(idx).dtype == paddle::PaddleDType::FLOAT32) { // from
VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx] // https://stackoverflow.com/questions/15499641/copy-a-stdvector-to-a-repeated-field-from-protobuf-with-memcpy
<< "]."; // `Swap` method is faster than `{}` method.
google::protobuf::RepeatedField<int64_t> tmp_data(data_ptr,
data_ptr + cap);
fetch_p->mutable_tensor_array(var_idx)->mutable_int64_data()->Swap(
&tmp_data);
} else if (dtype == paddle::PaddleDType::FLOAT32) {
VLOG(2) << "(logid=" << log_id << ") Prepare float var ["
<< model_config->_fetch_name[idx] << "].";
float *data_ptr = static_cast<float *>(in->at(idx).data.data()); float *data_ptr = static_cast<float *>(in->at(idx).data.data());
google::protobuf::RepeatedField<float> tmp_data(data_ptr,
data_ptr + cap);
fetch_p->mutable_tensor_array(var_idx)->mutable_float_data()->Swap(
&tmp_data);
} else if (dtype == paddle::PaddleDType::INT32) {
VLOG(2) << "(logid=" << log_id << ")Prepare int32 var ["
<< model_config->_fetch_name[idx] << "].";
int32_t *data_ptr = static_cast<int32_t *>(in->at(idx).data.data());
google::protobuf::RepeatedField<int32_t> tmp_data(data_ptr,
data_ptr + cap);
fetch_p->mutable_tensor_array(var_idx)->mutable_int_data()->Swap(
&tmp_data);
}
if (model_config->_is_lod_fetch[idx]) { if (model_config->_is_lod_fetch[idx]) {
FetchInst *fetch_p = output->mutable_insts(0); if (in->at(idx).lod.size() > 0) {
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) { for (int j = 0; j < in->at(idx).lod[0].size(); ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_lod( fetch_p->mutable_tensor_array(var_idx)->add_lod(
in->at(idx).lod[0][j]); in->at(idx).lod[0][j]);
} }
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]);
}
} else {
FetchInst *fetch_p = output->mutable_insts(0);
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]);
} }
} }
VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready";
VLOG(2) << "(logid=" << log_id << ") fetch var ["
<< model_config->_fetch_name[idx] << "] ready";
var_idx++; var_idx++;
} }
} }
}
if (req->profile_server()) { if (req->profile_server()) {
int64_t end = timeline.TimeStampUS(); int64_t end = timeline.TimeStampUS();
...@@ -169,7 +177,8 @@ int GeneralResponseOp::inference() { ...@@ -169,7 +177,8 @@ int GeneralResponseOp::inference() {
// a more elegant way. // a more elegant way.
for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) { for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) {
input_blob = get_depend_argument<GeneralBlob>(pre_node_names[pi]); input_blob = get_depend_argument<GeneralBlob>(pre_node_names[pi]);
VLOG(2) << "p size for input blob: " << input_blob->p_size; VLOG(2) << "(logid=" << log_id
<< ") p size for input blob: " << input_blob->p_size;
int profile_time_idx = -1; int profile_time_idx = -1;
if (pi == 0) { if (pi == 0) {
profile_time_idx = 0; profile_time_idx = 0;
......
...@@ -15,16 +15,8 @@ ...@@ -15,16 +15,8 @@
#pragma once #pragma once
#include <string> #include <string>
#include <vector> #include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT
#endif
#include "core/general-server/general_model_service.pb.h" #include "core/general-server/general_model_service.pb.h"
#include "paddle_inference_api.h" // NOLINT
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
......
...@@ -35,6 +35,7 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; ...@@ -35,6 +35,7 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralTextReaderOp::inference() { int GeneralTextReaderOp::inference() {
// reade request from client // reade request from client
const Request *req = dynamic_cast<const Request *>(get_request_message()); const Request *req = dynamic_cast<const Request *>(get_request_message());
uint64_t log_id = req->log_id();
int batch_size = req->insts_size(); int batch_size = req->insts_size();
int input_var_num = 0; int input_var_num = 0;
...@@ -44,16 +45,18 @@ int GeneralTextReaderOp::inference() { ...@@ -44,16 +45,18 @@ int GeneralTextReaderOp::inference() {
std::vector<int64_t> capacity; std::vector<int64_t> capacity;
GeneralBlob *res = mutable_data<GeneralBlob>(); GeneralBlob *res = mutable_data<GeneralBlob>();
TensorVector *out = &res->tensor_vector;
res->SetBatchSize(batch_size);
if (!res) { if (!res) {
LOG(ERROR) << "Failed get op tls reader object output"; LOG(ERROR) << "(logid=" << log_id
<< ") Failed get op tls reader object output";
} }
TensorVector *out = &res->tensor_vector;
res->SetBatchSize(batch_size);
res->SetLogId(log_id);
if (batch_size <= 0) { if (batch_size <= 0) {
LOG(ERROR) << "Batch size < 0"; LOG(ERROR) << "(logid=" << log_id << ") Batch size < 0";
return -1; return -1;
} }
...@@ -61,17 +64,18 @@ int GeneralTextReaderOp::inference() { ...@@ -61,17 +64,18 @@ int GeneralTextReaderOp::inference() {
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
int var_num = req->insts(0).tensor_array_size(); int var_num = req->insts(0).tensor_array_size();
VLOG(2) << "var num: " << var_num; VLOG(2) << "(logid=" << log_id << ") var num: " << var_num;
VLOG(2) << "start to call load general model_conf op"; VLOG(2) << "(logid=" << log_id
<< ") start to call load general model_conf op";
baidu::paddle_serving::predictor::Resource &resource = baidu::paddle_serving::predictor::Resource &resource =
baidu::paddle_serving::predictor::Resource::instance(); baidu::paddle_serving::predictor::Resource::instance();
VLOG(2) << "get resource pointer done."; VLOG(2) << "(logid=" << log_id << ") get resource pointer done.";
std::shared_ptr<PaddleGeneralModelConfig> model_config = std::shared_ptr<PaddleGeneralModelConfig> model_config =
resource.get_general_model_config(); resource.get_general_model_config();
VLOG(2) << "print general model config done."; VLOG(2) << "(logid=" << log_id << ") print general model config done.";
elem_type.resize(var_num); elem_type.resize(var_num);
elem_size.resize(var_num); elem_size.resize(var_num);
...@@ -79,7 +83,8 @@ int GeneralTextReaderOp::inference() { ...@@ -79,7 +83,8 @@ int GeneralTextReaderOp::inference() {
for (int i = 0; i < var_num; ++i) { for (int i = 0; i < var_num; ++i) {
paddle::PaddleTensor lod_tensor; paddle::PaddleTensor lod_tensor;
elem_type[i] = req->insts(0).tensor_array(i).elem_type(); elem_type[i] = req->insts(0).tensor_array(i).elem_type();
VLOG(2) << "var[" << i << "] has elem type: " << elem_type[i]; VLOG(2) << "(logid=" << log_id << ") var[" << i
<< "] has elem type: " << elem_type[i];
if (elem_type[i] == 0) { // int64 if (elem_type[i] == 0) { // int64
elem_size[i] = sizeof(int64_t); elem_size[i] = sizeof(int64_t);
lod_tensor.dtype = paddle::PaddleDType::INT64; lod_tensor.dtype = paddle::PaddleDType::INT64;
...@@ -91,17 +96,19 @@ int GeneralTextReaderOp::inference() { ...@@ -91,17 +96,19 @@ int GeneralTextReaderOp::inference() {
if (req->insts(0).tensor_array(i).shape(0) == -1) { if (req->insts(0).tensor_array(i).shape(0) == -1) {
lod_tensor.lod.resize(1); lod_tensor.lod.resize(1);
lod_tensor.lod[0].push_back(0); lod_tensor.lod[0].push_back(0);
VLOG(2) << "var[" << i << "] is lod_tensor"; VLOG(2) << "(logid=" << log_id << ") var[" << i << "] is lod_tensor";
} else { } else {
lod_tensor.shape.push_back(batch_size); lod_tensor.shape.push_back(batch_size);
capacity[i] = 1; capacity[i] = 1;
for (int k = 0; k < req->insts(0).tensor_array(i).shape_size(); ++k) { for (int k = 0; k < req->insts(0).tensor_array(i).shape_size(); ++k) {
int dim = req->insts(0).tensor_array(i).shape(k); int dim = req->insts(0).tensor_array(i).shape(k);
VLOG(2) << "shape for var[" << i << "]: " << dim; VLOG(2) << "(logid=" << log_id << ") shape for var[" << i
<< "]: " << dim;
capacity[i] *= dim; capacity[i] *= dim;
lod_tensor.shape.push_back(dim); lod_tensor.shape.push_back(dim);
} }
VLOG(2) << "var[" << i << "] is tensor, capacity: " << capacity[i]; VLOG(2) << "(logid=" << log_id << ") var[" << i
<< "] is tensor, capacity: " << capacity[i];
} }
lod_tensor.name = model_config->_feed_name[i]; lod_tensor.name = model_config->_feed_name[i];
out->push_back(lod_tensor); out->push_back(lod_tensor);
...@@ -117,11 +124,11 @@ int GeneralTextReaderOp::inference() { ...@@ -117,11 +124,11 @@ int GeneralTextReaderOp::inference() {
} }
out->at(i).data.Resize(out->at(i).lod[0].back() * elem_size[i]); out->at(i).data.Resize(out->at(i).lod[0].back() * elem_size[i]);
out->at(i).shape = {out->at(i).lod[0].back(), 1}; out->at(i).shape = {out->at(i).lod[0].back(), 1};
VLOG(2) << "var[" << i VLOG(2) << "(logid=" << log_id << ") var[" << i
<< "] is lod_tensor and len=" << out->at(i).lod[0].back(); << "] is lod_tensor and len=" << out->at(i).lod[0].back();
} else { } else {
out->at(i).data.Resize(batch_size * capacity[i] * elem_size[i]); out->at(i).data.Resize(batch_size * capacity[i] * elem_size[i]);
VLOG(2) << "var[" << i VLOG(2) << "(logid=" << log_id << ") var[" << i
<< "] is tensor and capacity=" << batch_size * capacity[i]; << "] is tensor and capacity=" << batch_size * capacity[i];
} }
} }
...@@ -163,7 +170,7 @@ int GeneralTextReaderOp::inference() { ...@@ -163,7 +170,7 @@ int GeneralTextReaderOp::inference() {
AddBlobInfo(res, start); AddBlobInfo(res, start);
AddBlobInfo(res, end); AddBlobInfo(res, end);
VLOG(2) << "read data from client success"; VLOG(2) << "(logid=" << log_id << ") read data from client success";
return 0; return 0;
} }
DEFINE_OP(GeneralTextReaderOp); DEFINE_OP(GeneralTextReaderOp);
......
...@@ -13,21 +13,13 @@ ...@@ -13,21 +13,13 @@
// limitations under the License. // limitations under the License.
#pragma once #pragma once
#include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT
#endif
#include <string> #include <string>
#include <vector>
#include "core/general-server/general_model_service.pb.h" #include "core/general-server/general_model_service.pb.h"
#include "core/general-server/load_general_model_service.pb.h" #include "core/general-server/load_general_model_service.pb.h"
#include "core/general-server/op/general_infer_helper.h" #include "core/general-server/op/general_infer_helper.h"
#include "core/predictor/framework/resource.h" #include "core/predictor/framework/resource.h"
#include "paddle_inference_api.h" // NOLINT
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
......
...@@ -40,6 +40,9 @@ int GeneralTextResponseOp::inference() { ...@@ -40,6 +40,9 @@ int GeneralTextResponseOp::inference() {
VLOG(2) << "Going to run inference"; VLOG(2) << "Going to run inference";
const std::vector<std::string> pre_node_names = pre_names(); const std::vector<std::string> pre_node_names = pre_names();
VLOG(2) << "pre node names size: " << pre_node_names.size(); VLOG(2) << "pre node names size: " << pre_node_names.size();
const GeneralBlob *input_blob;
uint64_t log_id =
get_depend_argument<GeneralBlob>(pre_node_names[0])->GetLogId();
const Request *req = dynamic_cast<const Request *>(get_request_message()); const Request *req = dynamic_cast<const Request *>(get_request_message());
// response inst with only fetch_var_names // response inst with only fetch_var_names
...@@ -48,11 +51,12 @@ int GeneralTextResponseOp::inference() { ...@@ -48,11 +51,12 @@ int GeneralTextResponseOp::inference() {
Timer timeline; Timer timeline;
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
VLOG(2) << "start to call load general model_conf op"; VLOG(2) << "(logid=" << log_id
<< ") start to call load general model_conf op";
baidu::paddle_serving::predictor::Resource &resource = baidu::paddle_serving::predictor::Resource &resource =
baidu::paddle_serving::predictor::Resource::instance(); baidu::paddle_serving::predictor::Resource::instance();
VLOG(2) << "get resource pointer done."; VLOG(2) << "(logid=" << log_id << ") get resource pointer done.";
std::shared_ptr<PaddleGeneralModelConfig> model_config = std::shared_ptr<PaddleGeneralModelConfig> model_config =
resource.get_general_model_config(); resource.get_general_model_config();
...@@ -63,20 +67,20 @@ int GeneralTextResponseOp::inference() { ...@@ -63,20 +67,20 @@ int GeneralTextResponseOp::inference() {
model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)]; model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)];
} }
const GeneralBlob *input_blob;
for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) { for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) {
const std::string &pre_name = pre_node_names[pi]; const std::string &pre_name = pre_node_names[pi];
VLOG(2) << "pre names[" << pi << "]: " << pre_name << " (" VLOG(2) << "(logid=" << log_id << ") pre names[" << pi << "]: " << pre_name
<< pre_node_names.size() << ")"; << " (" << pre_node_names.size() << ")";
input_blob = get_depend_argument<GeneralBlob>(pre_name); input_blob = get_depend_argument<GeneralBlob>(pre_name);
if (!input_blob) { if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op: " << pre_name; LOG(ERROR) << "(logid=" << log_id
<< ") Failed mutable depended argument, op: " << pre_name;
return -1; return -1;
} }
const TensorVector *in = &input_blob->tensor_vector; const TensorVector *in = &input_blob->tensor_vector;
int batch_size = input_blob->GetBatchSize(); int batch_size = input_blob->GetBatchSize();
VLOG(2) << "input batch size: " << batch_size; VLOG(2) << "(logid=" << log_id << ") input batch size: " << batch_size;
ModelOutput *output = res->add_outputs(); ModelOutput *output = res->add_outputs();
output->set_engine_name( output->set_engine_name(
...@@ -88,12 +92,13 @@ int GeneralTextResponseOp::inference() { ...@@ -88,12 +92,13 @@ int GeneralTextResponseOp::inference() {
// currently only response float tensor or lod_tensor // currently only response float tensor or lod_tensor
tensor->set_elem_type(1); tensor->set_elem_type(1);
if (model_config->_is_lod_fetch[idx]) { if (model_config->_is_lod_fetch[idx]) {
VLOG(2) << "out[" << idx << " is lod_tensor"; VLOG(2) << "(logid=" << log_id << ") out[" << idx << " is lod_tensor";
tensor->add_shape(-1); tensor->add_shape(-1);
} else { } else {
VLOG(2) << "out[" << idx << "] is tensor"; VLOG(2) << "(logid=" << log_id << ") out[" << idx << "] is tensor";
for (int k = 1; k < in->at(idx).shape.size(); ++k) { for (int k = 1; k < in->at(idx).shape.size(); ++k) {
VLOG(2) << "shape[" << k - 1 << "]: " << in->at(idx).shape[k]; VLOG(2) << "(logid=" << log_id << ") shape[" << k - 1
<< "]: " << in->at(idx).shape[k];
tensor->add_shape(in->at(idx).shape[k]); tensor->add_shape(in->at(idx).shape[k]);
} }
} }
...@@ -137,7 +142,8 @@ int GeneralTextResponseOp::inference() { ...@@ -137,7 +142,8 @@ int GeneralTextResponseOp::inference() {
// a more elegant way. // a more elegant way.
for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) { for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) {
input_blob = get_depend_argument<GeneralBlob>(pre_node_names[pi]); input_blob = get_depend_argument<GeneralBlob>(pre_node_names[pi]);
VLOG(2) << "p size for input blob: " << input_blob->p_size; VLOG(2) << "(logid=" << log_id
<< ") p size for input blob: " << input_blob->p_size;
int profile_time_idx = -1; int profile_time_idx = -1;
if (pi == 0) { if (pi == 0) {
profile_time_idx = 0; profile_time_idx = 0;
......
...@@ -15,17 +15,9 @@ ...@@ -15,17 +15,9 @@
#pragma once #pragma once
#include <string> #include <string>
#include <vector> #include <vector>
#ifdef BCLOUD
#ifdef WITH_GPU
#include "paddle/paddle_inference_api.h"
#else
#include "paddle/fluid/inference/api/paddle_inference_api.h"
#endif
#else
#include "paddle_inference_api.h" // NOLINT
#endif
#include "core/general-server/general_model_service.pb.h" #include "core/general-server/general_model_service.pb.h"
#include "core/general-server/op/general_infer_helper.h" #include "core/general-server/op/general_infer_helper.h"
#include "paddle_inference_api.h" // NOLINT
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
......
...@@ -37,6 +37,7 @@ message Request { ...@@ -37,6 +37,7 @@ message Request {
repeated FeedInst insts = 1; repeated FeedInst insts = 1;
repeated string fetch_var_names = 2; repeated string fetch_var_names = 2;
optional bool profile_server = 3 [ default = false ]; optional bool profile_server = 3 [ default = false ];
required uint64 log_id = 4 [ default = 0 ];
}; };
message Response { message Response {
......
...@@ -21,6 +21,7 @@ option cc_generic_services = true; ...@@ -21,6 +21,7 @@ option cc_generic_services = true;
message RequestAndResponse { message RequestAndResponse {
required int32 a = 1; required int32 a = 1;
required float b = 2; required float b = 2;
required uint64 log_id = 3 [ default = 0 ];
}; };
service LoadGeneralModelService { service LoadGeneralModelService {
......
...@@ -280,25 +280,29 @@ class PdsCodeGenerator : public CodeGenerator { ...@@ -280,25 +280,29 @@ class PdsCodeGenerator : public CodeGenerator {
" baidu::rpc::ClosureGuard done_guard(done);\n" " baidu::rpc::ClosureGuard done_guard(done);\n"
" baidu::rpc::Controller* cntl = \n" " baidu::rpc::Controller* cntl = \n"
" static_cast<baidu::rpc::Controller*>(cntl_base);\n" " static_cast<baidu::rpc::Controller*>(cntl_base);\n"
" uint64_t log_id = request->log_id();\n"
" cntl->set_log_id(log_id);\n"
" ::baidu::paddle_serving::predictor::InferService* svr = \n" " ::baidu::paddle_serving::predictor::InferService* svr = \n"
" " " "
"::baidu::paddle_serving::predictor::InferServiceManager::instance(" "::baidu::paddle_serving::predictor::InferServiceManager::instance("
").item(\"$service$\");\n" ").item(\"$service$\");\n"
" if (svr == NULL) {\n" " if (svr == NULL) {\n"
" LOG(ERROR) << \"Not found service: $service$\";\n" " LOG(ERROR) << \"(logid=\" << log_id << \") Not found service: "
"$service$\";\n"
" cntl->SetFailed(404, \"Not found service: $service$\");\n" " cntl->SetFailed(404, \"Not found service: $service$\");\n"
" return ;\n" " return ;\n"
" }\n" " }\n"
" LOG(INFO) << \" remote_side=\[\" << cntl->remote_side() << " // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") remote_side=\[\" " // NOLINT
"\"\]\";\n" "<< cntl->remote_side() << \"\]\";\n"
" LOG(INFO) << \" local_side=\[\" << cntl->local_side() << " // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") local_side=\[\" " // NOLINT
"\"\]\";\n" "<< cntl->local_side() << \"\]\";\n"
" LOG(INFO) << \" service_name=\[\" << \"$name$\" << \"\]\";\n" // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") service_name=\[\" " // NOLINT
" LOG(INFO) << \" log_id=\[\" << cntl->log_id() << \"\]\";\n" // NOLINT "<< \"$name$\" << \"\]\";\n"
" int err_code = svr->inference(request, response);\n" " int err_code = svr->inference(request, response, log_id);\n"
" if (err_code != 0) {\n" " if (err_code != 0) {\n"
" LOG(WARNING)\n" " LOG(WARNING)\n"
" << \"Failed call inferservice[$name$], name[$service$]\"\n" " << \"(logid=\" << log_id << \") Failed call "
"inferservice[$name$], name[$service$]\"\n"
" << \", error_code: \" << err_code;\n" " << \", error_code: \" << err_code;\n"
" cntl->SetFailed(err_code, \"InferService inference " " cntl->SetFailed(err_code, \"InferService inference "
"failed!\");\n" "failed!\");\n"
...@@ -306,7 +310,8 @@ class PdsCodeGenerator : public CodeGenerator { ...@@ -306,7 +310,8 @@ class PdsCodeGenerator : public CodeGenerator {
" gettimeofday(&tv, NULL);\n" " gettimeofday(&tv, NULL);\n"
" long end = tv.tv_sec * 1000000 + tv.tv_usec;\n" " long end = tv.tv_sec * 1000000 + tv.tv_usec;\n"
" // flush notice log\n" " // flush notice log\n"
" LOG(INFO) << \" tc=\[\" << (end - start) << \"\]\";\n", // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") tc=\[\" << (end - " // NOLINT
"start) << \"\]\";\n", // NOLINT
"name", "name",
class_name, class_name,
"service", "service",
...@@ -317,26 +322,31 @@ class PdsCodeGenerator : public CodeGenerator { ...@@ -317,26 +322,31 @@ class PdsCodeGenerator : public CodeGenerator {
" baidu::rpc::ClosureGuard done_guard(done);\n" " baidu::rpc::ClosureGuard done_guard(done);\n"
" baidu::rpc::Controller* cntl = \n" " baidu::rpc::Controller* cntl = \n"
" static_cast<baidu::rpc::Controller*>(cntl_base);\n" " static_cast<baidu::rpc::Controller*>(cntl_base);\n"
" uint64_t log_id = equest->log_id();\n"
" cntl->set_log_id(log_id);\n"
" ::baidu::paddle_serving::predictor::InferService* svr = \n" " ::baidu::paddle_serving::predictor::InferService* svr = \n"
" " " "
"::baidu::paddle_serving::predictor::InferServiceManager::instance(" "::baidu::paddle_serving::predictor::InferServiceManager::instance("
").item(\"$service$\");\n" ").item(\"$service$\");\n"
" if (svr == NULL) {\n" " if (svr == NULL) {\n"
" LOG(ERROR) << \"Not found service: $service$\";\n" " LOG(ERROR) << \"(logid=\" << log_id << \") Not found service: "
"$service$\";\n"
" cntl->SetFailed(404, \"Not found service: $service$\");\n" " cntl->SetFailed(404, \"Not found service: $service$\");\n"
" return ;\n" " return ;\n"
" }\n" " }\n"
" LOG(INFO) << \" remote_side=\[\" << cntl->remote_side() << " // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") remote_side=\[\" " // NOLINT
"\"\]\";\n" "<< cntl->remote_side() << \"\]\";\n"
" LOG(INFO) << \" local_side=\[\" << cntl->local_side() << " // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") local_side=\[\" " // NOLINT
"\"\]\";\n" "<< cntl->local_side() << \"\]\";\n"
" LOG(INFO) << \" service_name=\[\" << \"$name$\" << \"\]\";\n" // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") service_name=\[\" " // NOLINT
" LOG(INFO) << \" log_id=\[\" << cntl->log_id() << \"\]\";\n" // NOLINT "<< \"$name$\" << \"\]\";\n"
" butil::IOBufBuilder debug_os;\n" " butil::IOBufBuilder debug_os;\n"
" int err_code = svr->inference(request, response, &debug_os);\n" " int err_code = svr->inference(request, response, log_id, "
"&debug_os);\n"
" if (err_code != 0) {\n" " if (err_code != 0) {\n"
" LOG(WARNING)\n" " LOG(WARNING)\n"
" << \"Failed call inferservice[$name$], name[$service$]\"\n" " << \"(logid=\" << log_id << \") Failed call "
"inferservice[$name$], name[$service$]\"\n"
" << \", error_code: \" << err_code;\n" " << \", error_code: \" << err_code;\n"
" cntl->SetFailed(err_code, \"InferService inference " " cntl->SetFailed(err_code, \"InferService inference "
"failed!\");\n" "failed!\");\n"
...@@ -345,9 +355,11 @@ class PdsCodeGenerator : public CodeGenerator { ...@@ -345,9 +355,11 @@ class PdsCodeGenerator : public CodeGenerator {
" gettimeofday(&tv, NULL);\n" " gettimeofday(&tv, NULL);\n"
" long end = tv.tv_sec * 1000000 + tv.tv_usec;\n" " long end = tv.tv_sec * 1000000 + tv.tv_usec;\n"
" // flush notice log\n" " // flush notice log\n"
" LOG(INFO) << \" tc=\[\" << (end - start) << \"\]\";\n" // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") tc=\[\" << (end - " // NOLINT
"start) << \"\]\";\n"
" LOG(INFO)\n" " LOG(INFO)\n"
" << \"TC=[\" << (end - start) << \"] Received debug " " << \"(logid=\" << log_id << \") TC=[\" << (end - start) << "
"\"] Received debug "
"request[log_id=\" << cntl->log_id()\n" "request[log_id=\" << cntl->log_id()\n"
" << \"] from \" << cntl->remote_side()\n" " << \"] from \" << cntl->remote_side()\n"
" << \" to \" << cntl->local_side();\n", " << \" to \" << cntl->local_side();\n",
...@@ -1011,25 +1023,31 @@ class PdsCodeGenerator : public CodeGenerator { ...@@ -1011,25 +1023,31 @@ class PdsCodeGenerator : public CodeGenerator {
" brpc::ClosureGuard done_guard(done);\n" " brpc::ClosureGuard done_guard(done);\n"
" brpc::Controller* cntl = \n" " brpc::Controller* cntl = \n"
" static_cast<brpc::Controller*>(cntl_base);\n" " static_cast<brpc::Controller*>(cntl_base);\n"
" uint64_t log_id = request->log_id();\n"
" cntl->set_log_id(log_id);\n"
" ::baidu::paddle_serving::predictor::InferService* svr = \n" " ::baidu::paddle_serving::predictor::InferService* svr = \n"
" " " "
"::baidu::paddle_serving::predictor::InferServiceManager::instance(" "::baidu::paddle_serving::predictor::InferServiceManager::instance("
").item(\"$service$\");\n" ").item(\"$service$\");\n"
" if (svr == NULL) {\n" " if (svr == NULL) {\n"
" LOG(ERROR) << \"Not found service: $service$\";\n" " LOG(ERROR) << \"(logid=\" << log_id << \") Not found service: "
"$service$\";\n"
" cntl->SetFailed(404, \"Not found service: $service$\");\n" " cntl->SetFailed(404, \"Not found service: $service$\");\n"
" return ;\n" " return ;\n"
" }\n" " }\n"
" LOG(INFO) << \" remote_side=\[\" << cntl->remote_side() << " // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") "
"remote_side=\[\" << cntl->remote_side() << " // NOLINT
"\"\]\";\n" "\"\]\";\n"
" LOG(INFO) << \" local_side=\[\" << cntl->local_side() << " // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") "
"local_side=\[\" << cntl->local_side() << " // NOLINT
"\"\]\";\n" "\"\]\";\n"
" LOG(INFO) << \" service_name=\[\" << \"$name$\" << \"\]\";\n" // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") "
" LOG(INFO) << \" log_id=\[\" << cntl->log_id() << \"\]\";\n" // NOLINT "service_name=\[\" << \"$name$\" << \"\]\";\n" // NOLINT
" int err_code = svr->inference(request, response);\n" " int err_code = svr->inference(request, response, log_id);\n"
" if (err_code != 0) {\n" " if (err_code != 0) {\n"
" LOG(WARNING)\n" " LOG(WARNING)\n"
" << \"Failed call inferservice[$name$], name[$service$]\"\n" " << \"(logid=\" << log_id << \") Failed call "
"inferservice[$name$], name[$service$]\"\n"
" << \", error_code: \" << err_code;\n" " << \", error_code: \" << err_code;\n"
" cntl->SetFailed(err_code, \"InferService inference " " cntl->SetFailed(err_code, \"InferService inference "
"failed!\");\n" "failed!\");\n"
...@@ -1037,7 +1055,8 @@ class PdsCodeGenerator : public CodeGenerator { ...@@ -1037,7 +1055,8 @@ class PdsCodeGenerator : public CodeGenerator {
" gettimeofday(&tv, NULL);\n" " gettimeofday(&tv, NULL);\n"
" long end = tv.tv_sec * 1000000 + tv.tv_usec;\n" " long end = tv.tv_sec * 1000000 + tv.tv_usec;\n"
" // flush notice log\n" " // flush notice log\n"
" LOG(INFO) << \" tc=\[\" << (end - start) << \"\]\";\n", // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") tc=\[\" << (end - " // NOLINT
"start) << \"\]\";\n", // NOLINT
"name", "name",
class_name, class_name,
"service", "service",
...@@ -1048,26 +1067,31 @@ class PdsCodeGenerator : public CodeGenerator { ...@@ -1048,26 +1067,31 @@ class PdsCodeGenerator : public CodeGenerator {
" brpc::ClosureGuard done_guard(done);\n" " brpc::ClosureGuard done_guard(done);\n"
" brpc::Controller* cntl = \n" " brpc::Controller* cntl = \n"
" static_cast<brpc::Controller*>(cntl_base);\n" " static_cast<brpc::Controller*>(cntl_base);\n"
" uint64_t log_id = request->log_id();\n"
" cntl->set_log_id(log_id);\n"
" ::baidu::paddle_serving::predictor::InferService* svr = \n" " ::baidu::paddle_serving::predictor::InferService* svr = \n"
" " " "
"::baidu::paddle_serving::predictor::InferServiceManager::instance(" "::baidu::paddle_serving::predictor::InferServiceManager::instance("
").item(\"$service$\");\n" ").item(\"$service$\");\n"
" if (svr == NULL) {\n" " if (svr == NULL) {\n"
" LOG(ERROR) << \"Not found service: $service$\";\n" " LOG(ERROR) << \"(logid=\" << log_id << \") Not found service: "
"$service$\";\n"
" cntl->SetFailed(404, \"Not found service: $service$\");\n" " cntl->SetFailed(404, \"Not found service: $service$\");\n"
" return ;\n" " return ;\n"
" }\n" " }\n"
" LOG(INFO) << \" remote_side=\[\" << cntl->remote_side() << " // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") remote_side=\[\" " // NOLINT
"\"\]\";\n" " << cntl->remote_side() << \"\]\";\n"
" LOG(INFO) << \" local_side=\[\" << cntl->local_side() << " // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") local_side=\[\" " // NOLINT
"\"\]\";\n" "<< cntl->local_side() << \"\]\";\n"
" LOG(INFO) << \" service_name=\[\" << \"$name$\" << \"\]\";\n" // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") service_name=\[\" " // NOLINT
" LOG(INFO) << \" log_id=\[\" << cntl->log_id() << \"\]\";\n" // NOLINT "<< \"$name$\" << \"\]\";\n"
" butil::IOBufBuilder debug_os;\n" " butil::IOBufBuilder debug_os;\n"
" int err_code = svr->inference(request, response, &debug_os);\n" " int err_code = svr->inference(request, response, log_id, "
"&debug_os);\n"
" if (err_code != 0) {\n" " if (err_code != 0) {\n"
" LOG(WARNING)\n" " LOG(WARNING)\n"
" << \"Failed call inferservice[$name$], name[$service$]\"\n" " << \"(logid=\" << log_id << \") Failed call "
"inferservice[$name$], name[$service$]\"\n"
" << \", error_code: \" << err_code;\n" " << \", error_code: \" << err_code;\n"
" cntl->SetFailed(err_code, \"InferService inference " " cntl->SetFailed(err_code, \"InferService inference "
"failed!\");\n" "failed!\");\n"
...@@ -1076,9 +1100,11 @@ class PdsCodeGenerator : public CodeGenerator { ...@@ -1076,9 +1100,11 @@ class PdsCodeGenerator : public CodeGenerator {
" gettimeofday(&tv, NULL);\n" " gettimeofday(&tv, NULL);\n"
" long end = tv.tv_sec * 1000000 + tv.tv_usec;\n" " long end = tv.tv_sec * 1000000 + tv.tv_usec;\n"
" // flush notice log\n" " // flush notice log\n"
" LOG(INFO) << \" tc=\[\" << (end - start) << \"\]\";\n" // NOLINT " LOG(INFO) << \"(logid=\" << log_id << \") tc=\[\" << (end - " // NOLINT
"start) << \"\]\";\n" // NOLINT
" LOG(INFO)\n" " LOG(INFO)\n"
" << \"TC=[\" << (end - start) << \"] Received debug " " << \"(logid=\" << log_id << \") TC=[\" << (end - start) << "
"\"] Received debug "
"request[log_id=\" << cntl->log_id()\n" "request[log_id=\" << cntl->log_id()\n"
" << \"] from \" << cntl->remote_side()\n" " << \"] from \" << cntl->remote_side()\n"
" << \" to \" << cntl->local_side();\n", " << \" to \" << cntl->local_side();\n",
......
...@@ -6,14 +6,16 @@ include(framework/CMakeLists.txt) ...@@ -6,14 +6,16 @@ include(framework/CMakeLists.txt)
include(tools/CMakeLists.txt) include(tools/CMakeLists.txt)
include(src/CMakeLists.txt) include(src/CMakeLists.txt)
add_definitions(-D__STDC_FORMAT_MACROS)
add_library(pdserving ${pdserving_srcs}) add_library(pdserving ${pdserving_srcs})
set_source_files_properties( set_source_files_properties(
${pdserving_srcs} ${pdserving_srcs}
PROPERTIES PROPERTIES
COMPILE_FLAGS "-Wno-strict-aliasing -Wno-unused-variable -Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor") COMPILE_FLAGS "-Wno-strict-aliasing -Wno-unused-variable -Wno-non-virtual-dtor -Wno-error=non-virtual-dtor -Wno-error=delete-non-virtual-dtor")
add_dependencies(pdserving protobuf boost brpc leveldb pdcodegen configure) add_dependencies(pdserving protobuf boost brpc leveldb pdcodegen configure)
if (WITH_TRT)
add_definitions(-DWITH_TRT)
endif()
target_link_libraries(pdserving target_link_libraries(pdserving
brpc protobuf boost leveldb configure -lpthread -lcrypto -lm -lrt -lssl -ldl -lz) brpc protobuf boost leveldb configure -lpthread -lcrypto -lm -lrt -lssl -ldl -lz)
......
...@@ -50,7 +50,7 @@ ...@@ -50,7 +50,7 @@
#include "butil/time.h" #include "butil/time.h"
#endif #endif
#include "glog/raw_logging.h" #define ERROR_STRING_LEN 10240
#include "core/configure/general_model_config.pb.h" #include "core/configure/general_model_config.pb.h"
#include "core/configure/include/configure_parser.h" #include "core/configure/include/configure_parser.h"
......
...@@ -72,9 +72,10 @@ class Channel { ...@@ -72,9 +72,10 @@ class Channel {
const std::string& op() { return _op; } const std::string& op() { return _op; }
int share_to_bus(Bus* bus) { int share_to_bus(Bus* bus, const uint64_t log_id) {
if (bus->regist(_op, this) != 0) { if (bus->regist(_op, this) != 0) {
LOG(ERROR) << "Failed regist channel[" << _op << "] to bus!"; LOG(ERROR) << "(logid=" << log_id << ") Failed regist channel[" << _op
<< "] to bus!";
return -1; return -1;
} }
......
...@@ -155,13 +155,11 @@ int Dag::init(const configure::Workflow& conf, const std::string& name) { ...@@ -155,13 +155,11 @@ int Dag::init(const configure::Workflow& conf, const std::string& name) {
} }
if (FLAGS_el_log_level == 16) { if (FLAGS_el_log_level == 16) {
LOG(INFO) << "DAG: " << _dag_name; LOG(INFO) << "DAG: " << _dag_name << ", Op Num: " << _index_nodes.size();
LOG(INFO) << ", Op Num: " << _index_nodes.size();
for (uint32_t nid = 0; nid < _index_nodes.size(); nid++) { for (uint32_t nid = 0; nid < _index_nodes.size(); nid++) {
DagNode* node = _index_nodes[nid]; DagNode* node = _index_nodes[nid];
LOG(INFO) << ", OP-" << node->id << "-" << node->name << "-" LOG(INFO) << "OP-" << node->id << "-" << node->name << "-" << node->type
<< node->type; << " depends: " << node->depends.size();
LOG(INFO) << " depends: " << node->depends.size();
boost::unordered_map<std::string, EdgeMode>::iterator it; boost::unordered_map<std::string, EdgeMode>::iterator it;
for (it = node->depends.begin(); it != node->depends.end(); it++) { for (it = node->depends.begin(); it != node->depends.end(); it++) {
...@@ -214,7 +212,7 @@ int Dag::topo_sort() { ...@@ -214,7 +212,7 @@ int Dag::topo_sort() {
} }
} }
for (int i = 0; i < in_degree.size(); ++i) { for (int i = 0; i < in_degree.size(); ++i) {
LOG(INFO) << "(" << _index_nodes[i]->name << ") in_degree[" << i VLOG(2) << "(" << _index_nodes[i]->name << ") in_degree[" << i
<< "]: " << in_degree[i]; << "]: " << in_degree[i];
} }
int sorted_num = 0; int sorted_num = 0;
......
...@@ -26,7 +26,9 @@ namespace baidu { ...@@ -26,7 +26,9 @@ namespace baidu {
namespace paddle_serving { namespace paddle_serving {
namespace predictor { namespace predictor {
int DagView::init(Dag* dag, const std::string& service_name) { int DagView::init(Dag* dag,
const std::string& service_name,
const uint64_t log_id) {
_name = dag->name(); _name = dag->name();
_full_name = service_name + NAME_DELIMITER + dag->name(); _full_name = service_name + NAME_DELIMITER + dag->name();
_bus = butil::get_object<Bus>(); _bus = butil::get_object<Bus>();
...@@ -36,17 +38,20 @@ int DagView::init(Dag* dag, const std::string& service_name) { ...@@ -36,17 +38,20 @@ int DagView::init(Dag* dag, const std::string& service_name) {
for (uint32_t si = 0; si < stage_size; si++) { for (uint32_t si = 0; si < stage_size; si++) {
const DagStage* stage = dag->stage_by_index(si); const DagStage* stage = dag->stage_by_index(si);
if (stage == NULL) { if (stage == NULL) {
LOG(ERROR) << "Failed get stage by index:" << si; LOG(ERROR) << "(logid=" << log_id << ") Failed get stage by index:" << si;
return ERR_INTERNAL_FAILURE; return ERR_INTERNAL_FAILURE;
} }
ViewStage* vstage = butil::get_object<ViewStage>(); ViewStage* vstage = butil::get_object<ViewStage>();
if (vstage == NULL) { if (vstage == NULL) {
LOG(ERROR) << "Failed get vstage from object pool" LOG(ERROR) << "(logid=" << log_id
<< ") Failed get vstage from object pool"
<< "at:" << si; << "at:" << si;
return ERR_MEM_ALLOC_FAILURE; return ERR_MEM_ALLOC_FAILURE;
} }
VLOG(2) << "stage[" << si << "] name: " << stage->full_name; VLOG(2) << "(logid=" << log_id << ") stage[" << si
VLOG(2) << "stage[" << si << "] node size: " << stage->nodes.size(); << "] name: " << stage->full_name;
VLOG(2) << "(logid=" << log_id << ") stage[" << si
<< "] node size: " << stage->nodes.size();
vstage->full_name = service_name + NAME_DELIMITER + stage->full_name; vstage->full_name = service_name + NAME_DELIMITER + stage->full_name;
uint32_t node_size = stage->nodes.size(); uint32_t node_size = stage->nodes.size();
// create tls view node // create tls view node
...@@ -54,31 +59,39 @@ int DagView::init(Dag* dag, const std::string& service_name) { ...@@ -54,31 +59,39 @@ int DagView::init(Dag* dag, const std::string& service_name) {
DagNode* node = stage->nodes[ni]; DagNode* node = stage->nodes[ni];
ViewNode* vnode = butil::get_object<ViewNode>(); ViewNode* vnode = butil::get_object<ViewNode>();
if (vnode == NULL) { if (vnode == NULL) {
LOG(ERROR) << "Failed get vnode at:" << ni; LOG(ERROR) << "(logid=" << log_id << ") Failed get vnode at:" << ni;
return ERR_MEM_ALLOC_FAILURE; return ERR_MEM_ALLOC_FAILURE;
} }
// factory type // factory type
Op* op = OpRepository::instance().get_op(node->type); Op* op = OpRepository::instance().get_op(node->type);
if (op == NULL) { if (op == NULL) {
LOG(ERROR) << "Failed get op with type:" << node->type; LOG(ERROR) << "(logid=" << log_id
<< ") Failed get op with type:" << node->type;
return ERR_INTERNAL_FAILURE; return ERR_INTERNAL_FAILURE;
} }
// initialize a TLS op object // initialize a TLS op object
VLOG(2) << "dag view initialized: \n" VLOG(2) << "(logid=" << log_id << ") dag view initialized: \n"
<< "node id: " << node->id << "\n" << "node id: " << node->id << "\n"
<< "node name: " << node->name << "\n" << "node name: " << node->name << "\n"
<< "node type: " << node->type; << "node type: " << node->type;
if (op->init(_bus, dag, node->id, node->name, node->type, node->conf) != if (op->init(_bus,
0) { dag,
LOG(WARNING) << "Failed init op, type:" << node->type; node->id,
node->name,
node->type,
node->conf,
log_id) != 0) {
LOG(WARNING) << "(logid=" << log_id
<< ") Failed init op, type:" << node->type;
return ERR_INTERNAL_FAILURE; return ERR_INTERNAL_FAILURE;
} }
op->set_full_name(service_name + NAME_DELIMITER + node->full_name); op->set_full_name(service_name + NAME_DELIMITER + node->full_name);
// Set the name of the Op as the key of the matching engine. // Set the name of the Op as the key of the matching engine.
VLOG(2) << "op->set_engine_name(" << node->name.c_str() << ")"; VLOG(2) << "(logid=" << log_id << ") op->set_engine_name("
<< node->name.c_str() << ")";
op->set_engine_name(node->name); op->set_engine_name(node->name);
vnode->conf = node; vnode->conf = node;
...@@ -88,7 +101,7 @@ int DagView::init(Dag* dag, const std::string& service_name) { ...@@ -88,7 +101,7 @@ int DagView::init(Dag* dag, const std::string& service_name) {
it != vnode->conf->depends.end(); it != vnode->conf->depends.end();
++it) { ++it) {
std::string pre_node_name = it->first; std::string pre_node_name = it->first;
VLOG(2) << "add op pre name: \n" VLOG(2) << "(logid=" << log_id << ") add op pre name: \n"
<< "current op name: " << vnode->op->op_name() << "current op name: " << vnode->op->op_name()
<< ", previous op name: " << pre_node_name; << ", previous op name: " << pre_node_name;
vnode->op->add_pre_node_name(pre_node_name); vnode->op->add_pre_node_name(pre_node_name);
...@@ -102,7 +115,7 @@ int DagView::init(Dag* dag, const std::string& service_name) { ...@@ -102,7 +115,7 @@ int DagView::init(Dag* dag, const std::string& service_name) {
//<< " previous op name: " //<< " previous op name: "
//<< _view[si - 1]->nodes.back()->op->op_name(); //<< _view[si - 1]->nodes.back()->op->op_name();
// vstage->nodes.back()->op->set_pre_node_name( // vstage->nodes.back()->op->set_pre_node_name(
//_view[si - 1]->nodes.back()->op->op_name()); // _view[si - 1]->nodes.back()->op->op_name());
/*}*/ /*}*/
_view.push_back(vstage); _view.push_back(vstage);
} }
...@@ -133,14 +146,15 @@ int DagView::deinit() { ...@@ -133,14 +146,15 @@ int DagView::deinit() {
return ERR_OK; return ERR_OK;
} }
int DagView::execute(butil::IOBufBuilder* debug_os) { int DagView::execute(const uint64_t log_id, butil::IOBufBuilder* debug_os) {
uint32_t stage_size = _view.size(); uint32_t stage_size = _view.size();
for (uint32_t si = 0; si < stage_size; si++) { for (uint32_t si = 0; si < stage_size; si++) {
TRACEPRINTF("start to execute stage[%u]", si); TRACEPRINTF("(logid=%" PRIu64 ") start to execute stage[%u]", log_id, si);
int errcode = execute_one_stage(_view[si], debug_os); int errcode = execute_one_stage(_view[si], log_id, debug_os);
TRACEPRINTF("finish to execute stage[%u]", si); TRACEPRINTF("(logid=%" PRIu64 ") finish to execute stage[%u]", log_id, si);
if (errcode < 0) { if (errcode < 0) {
LOG(ERROR) << "failed execute stage[" << _view[si]->debug(); LOG(ERROR) << "(logid=" << log_id << ") Failed execute stage["
<< _view[si]->debug();
return errcode; return errcode;
} }
} }
...@@ -151,29 +165,34 @@ int DagView::execute(butil::IOBufBuilder* debug_os) { ...@@ -151,29 +165,34 @@ int DagView::execute(butil::IOBufBuilder* debug_os) {
// You can derive a subclass to implement this func. // You can derive a subclass to implement this func.
// ParallelDagView maybe the one you want. // ParallelDagView maybe the one you want.
int DagView::execute_one_stage(ViewStage* vstage, int DagView::execute_one_stage(ViewStage* vstage,
const uint64_t log_id,
butil::IOBufBuilder* debug_os) { butil::IOBufBuilder* debug_os) {
butil::Timer stage_time(butil::Timer::STARTED); butil::Timer stage_time(butil::Timer::STARTED);
uint32_t node_size = vstage->nodes.size(); uint32_t node_size = vstage->nodes.size();
VLOG(2) << "vstage->nodes.size(): " << node_size; VLOG(2) << "(logid=" << log_id << ") vstage->nodes.size(): " << node_size;
for (uint32_t ni = 0; ni < node_size; ni++) { for (uint32_t ni = 0; ni < node_size; ni++) {
ViewNode* vnode = vstage->nodes[ni]; ViewNode* vnode = vstage->nodes[ni];
DagNode* conf = vnode->conf; DagNode* conf = vnode->conf;
Op* op = vnode->op; Op* op = vnode->op;
TRACEPRINTF("start to execute op[%s]", op->name()); TRACEPRINTF(
int errcode = op->process(debug_os != NULL); "(logid=%" PRIu64 ") start to execute op[%s]", log_id, op->name());
TRACEPRINTF("finish to execute op[%s]", op->name()); int errcode = op->process(log_id, debug_os != NULL);
TRACEPRINTF(
"(logid=%" PRIu64 ") finish to execute op[%s]", log_id, op->name());
if (errcode < 0) { if (errcode < 0) {
LOG(ERROR) << "Execute failed, Op:" << op->debug_string(); LOG(ERROR) << "(logid=" << log_id
<< ") Execute failed, Op:" << op->debug_string();
return errcode; return errcode;
} }
if (errcode > 0) { if (errcode > 0) {
LOG(INFO) << "Execute ignore, Op:" << op->debug_string(); LOG(INFO) << "(logid=" << log_id
<< ") Execute ignore, Op:" << op->debug_string();
continue; continue;
} }
if (debug_os) { if (debug_os) {
(*debug_os) << "{\"op_name\": \"" << op->name() (*debug_os) << "(logid=" << log_id << ") {\"op_name\": \"" << op->name()
<< "\", \"debug_str:\": \"" << op->debug_string() << "\", \"debug_str:\": \"" << op->debug_string()
<< "\", \"time_info\": \"" << op->time_info() << "\"}"; << "\", \"time_info\": \"" << op->time_info() << "\"}";
} }
...@@ -186,34 +205,34 @@ int DagView::execute_one_stage(ViewStage* vstage, ...@@ -186,34 +205,34 @@ int DagView::execute_one_stage(ViewStage* vstage,
return ERR_OK; return ERR_OK;
} }
int DagView::set_request_channel(Channel& request) { int DagView::set_request_channel(Channel& request, const uint64_t log_id) {
// Each workflow should get the very beginning // Each workflow should get the very beginning
// request (channel), and commit it to bus, for // request (channel), and commit it to bus, for
// the first stage ops consuming. // the first stage ops consuming.
request.share_to_bus(_bus); request.share_to_bus(_bus, log_id);
return ERR_OK; return ERR_OK;
} }
const Channel* DagView::get_response_channel() const { const Channel* DagView::get_response_channel(const uint64_t log_id) const {
// Caller obtains response channel from bus, and // Caller obtains response channel from bus, and
// writes it to rpc response(protbuf/json) // writes it to rpc response(protbuf/json)
if (_view.size() < 1) { if (_view.size() < 1) {
LOG(ERROR) << "invalid empty view stage!"; LOG(ERROR) << "(logid=" << log_id << ") invalid empty view stage!";
return NULL; return NULL;
} }
ViewStage* last_stage = _view[_view.size() - 1]; ViewStage* last_stage = _view[_view.size() - 1];
if (last_stage->nodes.size() != 1 || last_stage->nodes[0] == NULL) { if (last_stage->nodes.size() != 1 || last_stage->nodes[0] == NULL) {
LOG(ERROR) << "Invalid last stage, size[" << last_stage->nodes.size() LOG(ERROR) << "(logid=" << log_id << ") Invalid last stage, size["
<< "] != 1"; << last_stage->nodes.size() << "] != 1";
return NULL; return NULL;
} }
Op* last_op = last_stage->nodes[0]->op; Op* last_op = last_stage->nodes[0]->op;
if (last_op == NULL) { if (last_op == NULL) {
LOG(ERROR) << "Last op is NULL"; LOG(ERROR) << "(logid=" << log_id << ") Last op is NULL";
return NULL; return NULL;
} }
return last_op->mutable_channel(); return last_op->mutable_channel();
......
...@@ -47,21 +47,22 @@ class DagView { ...@@ -47,21 +47,22 @@ class DagView {
~DagView() {} ~DagView() {}
int init(Dag* dag, const std::string& service_name); int init(Dag* dag, const std::string& service_name, const uint64_t log_id);
int deinit(); int deinit();
int execute(butil::IOBufBuilder* debug_os); int execute(const uint64_t log_id, butil::IOBufBuilder* debug_os);
// The default execution strategy is in sequencing // The default execution strategy is in sequencing
// You can derive a subclass to implement this func. // You can derive a subclass to implement this func.
// ParallelDagView maybe the one you want. // ParallelDagView maybe the one you want.
virtual int execute_one_stage(ViewStage* vstage, virtual int execute_one_stage(ViewStage* vstage,
const uint64_t log_id,
butil::IOBufBuilder* debug_os); butil::IOBufBuilder* debug_os);
int set_request_channel(Channel& request); // NOLINT int set_request_channel(Channel& request, const uint64_t log_id); // NOLINT
const Channel* get_response_channel() const; const Channel* get_response_channel(const uint64_t log_id) const;
const std::string& name() const { return _name; } const std::string& name() const { return _name; }
......
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
#include <string> #include <string>
#include <utility> #include <utility>
#include "core/predictor/common/inner_common.h" #include "core/predictor/common/inner_common.h"
#include "glog/raw_logging.h"
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
namespace predictor { namespace predictor {
...@@ -28,7 +28,12 @@ namespace predictor { ...@@ -28,7 +28,12 @@ namespace predictor {
FactoryDerive<D, B>* factory = new (std::nothrow) FactoryDerive<D, B>(); \ FactoryDerive<D, B>* factory = new (std::nothrow) FactoryDerive<D, B>(); \
if (factory == NULL || \ if (factory == NULL || \
FactoryPool<B>::instance().register_factory(tag, factory) != 0) { \ FactoryPool<B>::instance().register_factory(tag, factory) != 0) { \
RAW_LOG_FATAL("Failed regist factory: %s in macro!", #D); \ char err_str[ERROR_STRING_LEN]; \
snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Failed regist factory: %s in macro!", \
#D); \
RAW_LOG(FATAL, err_str); \
return -1; \ return -1; \
} \ } \
return 0; \ return 0; \
...@@ -54,7 +59,13 @@ namespace predictor { ...@@ -54,7 +59,13 @@ namespace predictor {
if (factory == NULL || \ if (factory == NULL || \
::baidu::paddle_serving::predictor::FactoryPool<B>::instance() \ ::baidu::paddle_serving::predictor::FactoryPool<B>::instance() \
.register_factory(#D, factory) != 0) { \ .register_factory(#D, factory) != 0) { \
RAW_LOG_FATAL("Failed regist factory: %s->%s in macro!", #D, #B); \ char err_str[ERROR_STRING_LEN]; \
snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Failed regist factory: %s->%s in macro!", \
#D, \
#B); \
RAW_LOG(FATAL, err_str); \
return; \ return; \
} \ } \
return; \ return; \
...@@ -66,15 +77,26 @@ namespace predictor { ...@@ -66,15 +77,26 @@ namespace predictor {
::baidu::paddle_serving::predictor::FactoryDerive<D, B>* factory = new ( \ ::baidu::paddle_serving::predictor::FactoryDerive<D, B>* factory = new ( \
::std::nothrow)::baidu::paddle_serving::predictor::FactoryDerive<D, \ ::std::nothrow)::baidu::paddle_serving::predictor::FactoryDerive<D, \
B>(); \ B>(); \
char err_str[ERROR_STRING_LEN]; \
if (factory == NULL || \ if (factory == NULL || \
::baidu::paddle_serving::predictor::FactoryPool<B>::instance() \ ::baidu::paddle_serving::predictor::FactoryPool<B>::instance() \
.register_factory(N, factory) != 0) { \ .register_factory(N, factory) != 0) { \
RAW_LOG_FATAL( \ snprintf(err_str, \
"Failed regist factory: %s->%s, tag: %s in macro!", #D, #B, N); \ ERROR_STRING_LEN - 1, \
"Failed regist factory: %s->%s, tag: %s in macro!", \
#D, \
#B, \
N); \
RAW_LOG(FATAL, err_str); \
return; \ return; \
} \ } \
RAW_LOG_WARNING( \ snprintf(err_str, \
"Succ regist factory: %s->%s, tag: %s in macro!", #D, #B, N); \ ERROR_STRING_LEN - 1, \
"Succ regist factory: %s->%s, tag: %s in macro!", \
#D, \
#B, \
N); \
RAW_LOG(WARNING, err_str); \
return; \ return; \
} }
...@@ -102,24 +124,35 @@ class FactoryPool { ...@@ -102,24 +124,35 @@ class FactoryPool {
} }
int register_factory(const std::string& tag, FactoryBase<B>* factory) { int register_factory(const std::string& tag, FactoryBase<B>* factory) {
char err_str[ERROR_STRING_LEN];
typename std::map<std::string, FactoryBase<B>*>::iterator it = typename std::map<std::string, FactoryBase<B>*>::iterator it =
_pool.find(tag); _pool.find(tag);
if (it != _pool.end()) { if (it != _pool.end()) {
RAW_LOG_FATAL("Insert duplicate with tag: %s", tag.c_str()); snprintf(err_str,
ERROR_STRING_LEN - 1,
"Insert duplicate with tag: %s",
tag.c_str());
RAW_LOG(FATAL, err_str);
return -1; return -1;
} }
std::pair<typename std::map<std::string, FactoryBase<B>*>::iterator, bool> std::pair<typename std::map<std::string, FactoryBase<B>*>::iterator, bool>
r = _pool.insert(std::make_pair(tag, factory)); r = _pool.insert(std::make_pair(tag, factory));
if (!r.second) { if (!r.second) {
RAW_LOG_FATAL("Failed insert new factory with: %s", tag.c_str()); snprintf(err_str,
ERROR_STRING_LEN - 1,
"Failed insert new factory with: %s",
tag.c_str());
RAW_LOG(FATAL, err_str);
return -1; return -1;
} }
RAW_LOG_INFO("Succ insert one factory, tag: %s, base type %s", snprintf(err_str,
ERROR_STRING_LEN - 1,
"Succ insert one factory, tag: %s, base type %s",
tag.c_str(), tag.c_str(),
typeid(B).name()); typeid(B).name());
RAW_LOG(INFO, err_str);
return 0; return 0;
} }
...@@ -127,9 +160,13 @@ class FactoryPool { ...@@ -127,9 +160,13 @@ class FactoryPool {
typename std::map<std::string, FactoryBase<B>*>::iterator it = typename std::map<std::string, FactoryBase<B>*>::iterator it =
_pool.find(tag); _pool.find(tag);
if (it == _pool.end() || it->second == NULL) { if (it == _pool.end() || it->second == NULL) {
RAW_LOG_FATAL("Not found factory pool, tag: %s, pool size %u", char err_str[ERROR_STRING_LEN];
snprintf(err_str,
ERROR_STRING_LEN - 1,
"Not found factory pool, tag: %s, pool size %u",
tag.c_str(), tag.c_str(),
_pool.size()); _pool.size());
RAW_LOG(FATAL, err_str);
return NULL; return NULL;
} }
......
...@@ -38,6 +38,7 @@ class InferEngineCreationParams { ...@@ -38,6 +38,7 @@ class InferEngineCreationParams {
_enable_ir_optimization = false; _enable_ir_optimization = false;
_static_optimization = false; _static_optimization = false;
_force_update_static_cache = false; _force_update_static_cache = false;
_use_trt = false;
} }
void set_path(const std::string& path) { _path = path; } void set_path(const std::string& path) { _path = path; }
...@@ -50,12 +51,16 @@ class InferEngineCreationParams { ...@@ -50,12 +51,16 @@ class InferEngineCreationParams {
_enable_ir_optimization = enable_ir_optimization; _enable_ir_optimization = enable_ir_optimization;
} }
void set_use_trt(bool use_trt) { _use_trt = use_trt; }
bool enable_memory_optimization() const { bool enable_memory_optimization() const {
return _enable_memory_optimization; return _enable_memory_optimization;
} }
bool enable_ir_optimization() const { return _enable_ir_optimization; } bool enable_ir_optimization() const { return _enable_ir_optimization; }
bool use_trt() const { return _use_trt; }
void set_static_optimization(bool static_optimization = false) { void set_static_optimization(bool static_optimization = false) {
_static_optimization = static_optimization; _static_optimization = static_optimization;
} }
...@@ -86,6 +91,7 @@ class InferEngineCreationParams { ...@@ -86,6 +91,7 @@ class InferEngineCreationParams {
bool _enable_ir_optimization; bool _enable_ir_optimization;
bool _static_optimization; bool _static_optimization;
bool _force_update_static_cache; bool _force_update_static_cache;
bool _use_trt;
}; };
class InferEngine { class InferEngine {
...@@ -172,6 +178,10 @@ class ReloadableInferEngine : public InferEngine { ...@@ -172,6 +178,10 @@ class ReloadableInferEngine : public InferEngine {
force_update_static_cache); force_update_static_cache);
} }
if (conf.has_use_trt()) {
_infer_engine_params.set_use_trt(conf.use_trt());
}
if (!check_need_reload() || load(_infer_engine_params) != 0) { if (!check_need_reload() || load(_infer_engine_params) != 0) {
LOG(ERROR) << "Failed load model_data_path" << _model_data_path; LOG(ERROR) << "Failed load model_data_path" << _model_data_path;
return -1; return -1;
...@@ -553,8 +563,12 @@ class CloneDBReloadableInferEngine ...@@ -553,8 +563,12 @@ class CloneDBReloadableInferEngine
}; };
template <typename FluidFamilyCore> template <typename FluidFamilyCore>
#ifdef WITH_TRT
class FluidInferEngine : public DBReloadableInferEngine<FluidFamilyCore> {
#else
class FluidInferEngine : public CloneDBReloadableInferEngine<FluidFamilyCore> { class FluidInferEngine : public CloneDBReloadableInferEngine<FluidFamilyCore> {
public: #endif
public: // NOLINT
FluidInferEngine() {} FluidInferEngine() {}
~FluidInferEngine() {} ~FluidInferEngine() {}
...@@ -603,14 +617,21 @@ class VersionedInferEngine : public InferEngine { ...@@ -603,14 +617,21 @@ class VersionedInferEngine : public InferEngine {
LOG(ERROR) << "Failed generate engine with type:" << engine_type; LOG(ERROR) << "Failed generate engine with type:" << engine_type;
return -1; return -1;
} }
VLOG(2) << "FLGS_logtostderr " << FLAGS_logtostderr; #ifndef BCLOUD
VLOG(2) << "FLAGS_logtostderr " << FLAGS_logtostderr;
int tmp = FLAGS_logtostderr; int tmp = FLAGS_logtostderr;
if (engine->proc_initialize(conf, version) != 0) { if (engine->proc_initialize(conf, version) != 0) {
LOG(ERROR) << "Failed initialize engine, type:" << engine_type; LOG(ERROR) << "Failed initialize engine, type:" << engine_type;
return -1; return -1;
} }
VLOG(2) << "FLGS_logtostderr " << FLAGS_logtostderr; VLOG(2) << "FLAGS_logtostderr " << FLAGS_logtostderr;
FLAGS_logtostderr = tmp; FLAGS_logtostderr = tmp;
#else
if (engine->proc_initialize(conf, version) != 0) {
LOG(ERROR) << "Failed initialize engine, type:" << engine_type;
return -1;
}
#endif
auto r = _versions.insert(std::make_pair(engine->version(), engine)); auto r = _versions.insert(std::make_pair(engine->version(), engine));
if (!r.second) { if (!r.second) {
LOG(ERROR) << "Failed insert item: " << engine->version() LOG(ERROR) << "Failed insert item: " << engine->version()
......
...@@ -62,7 +62,10 @@ class OpRepository { ...@@ -62,7 +62,10 @@ class OpRepository {
template <typename OP_TYPE> template <typename OP_TYPE>
void regist_op(std::string op_type) { void regist_op(std::string op_type) {
_repository[op_type] = &OpFactory<OP_TYPE>::instance(); _repository[op_type] = &OpFactory<OP_TYPE>::instance();
RAW_LOG_INFO("Succ regist op: %s", op_type.c_str()); char err_str[ERROR_STRING_LEN];
snprintf(
err_str, ERROR_STRING_LEN - 1, "Succ regist op: %s", op_type.c_str());
RAW_LOG(INFO, err_str);
} }
Op* get_op(std::string op_type); Op* get_op(std::string op_type);
......
...@@ -17,6 +17,9 @@ ...@@ -17,6 +17,9 @@
#include <string> #include <string>
#include "core/predictor/common/inner_common.h" #include "core/predictor/common/inner_common.h"
#include "core/predictor/framework/kv_manager.h" #include "core/predictor/framework/kv_manager.h"
#ifdef BCLOUD
#include "aipe_sec_client.h" // NOLINT
#endif
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
namespace predictor { namespace predictor {
...@@ -109,6 +112,42 @@ int Resource::initialize(const std::string& path, const std::string& file) { ...@@ -109,6 +112,42 @@ int Resource::initialize(const std::string& path, const std::string& file) {
} }
LOG(WARNING) << "Successfully proc initialized mempool wrapper"; LOG(WARNING) << "Successfully proc initialized mempool wrapper";
#ifdef WITH_AUTH
std::string product_name_str = resource_conf.auth_product_name();
std::string container_id_str = resource_conf.auth_container_id();
char* product_name = new char[product_name_str.size() + 1];
snprintf(product_name,
product_name_str.size() + 1,
"%s",
product_name_str.c_str());
char* container_id = new char[container_id_str.size() + 1];
snprintf(container_id,
container_id_str.size() + 1,
"%s",
container_id_str.c_str());
aipe_auth_request request;
request.product_name = product_name;
request.container_id = container_id;
request.request_ts = (int64_t)time(NULL);
LOG(INFO) << "\nEasypack info"
<< "\nproduct name: " << request.product_name
<< "\ncontainer_id: " << request.container_id
<< "\nrequest time stamp: " << request.request_ts;
aipe_auth_response response;
response = check_auth(request);
if (response.result == 0) {
LOG(INFO) << "Authentication succeed.";
} else {
LOG(ERROR) << "Authentication failed. Error code: " << response.result;
return -1;
}
#endif
if (FLAGS_enable_model_toolkit) { if (FLAGS_enable_model_toolkit) {
int err = 0; int err = 0;
std::string model_toolkit_path = resource_conf.model_toolkit_path(); std::string model_toolkit_path = resource_conf.model_toolkit_path();
......
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
#include <butil/time.h> // butil::Timer #include <butil/time.h> // butil::Timer
#endif #endif
#include <inttypes.h>
#include <list> #include <list>
#include <string> #include <string>
#include <vector> #include <vector>
...@@ -135,50 +136,63 @@ const std::string& InferService::name() const { return _infer_service_format; } ...@@ -135,50 +136,63 @@ const std::string& InferService::name() const { return _infer_service_format; }
// ´®ÐÐÖ´ÐÐÿ¸öworkflow // ´®ÐÐÖ´ÐÐÿ¸öworkflow
int InferService::inference(const google::protobuf::Message* request, int InferService::inference(const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os) { butil::IOBufBuilder* debug_os) {
TRACEPRINTF("start to inference"); TRACEPRINTF("(logid=%" PRIu64 ") start to inference", log_id);
// when funtion call begins, framework will reset // when funtion call begins, framework will reset
// thread local variables&resources automatically. // thread local variables&resources automatically.
if (Resource::instance().thread_clear() != 0) { if (Resource::instance().thread_clear() != 0) {
LOG(ERROR) << "Failed thread clear whole resource"; LOG(ERROR) << "(logid=" << log_id << ") Failed thread clear whole resource";
return ERR_INTERNAL_FAILURE; return ERR_INTERNAL_FAILURE;
} }
TRACEPRINTF("finish to thread clear"); TRACEPRINTF("(logid=%" PRIu64 ") finish to thread clear", log_id);
if (_enable_map_request_to_workflow) { if (_enable_map_request_to_workflow) {
LOG(INFO) << "enable map request == True"; VLOG(2) << "(logid=" << log_id << ") enable map request == True";
std::vector<Workflow*>* workflows = _map_request_to_workflow(request); std::vector<Workflow*>* workflows =
_map_request_to_workflow(request, log_id);
if (!workflows || workflows->size() == 0) { if (!workflows || workflows->size() == 0) {
LOG(ERROR) << "Failed to map request to workflow"; LOG(ERROR) << "(logid=" << log_id
<< ") Failed to map request to workflow";
return ERR_INTERNAL_FAILURE; return ERR_INTERNAL_FAILURE;
} }
size_t fsize = workflows->size(); size_t fsize = workflows->size();
for (size_t fi = 0; fi < fsize; ++fi) { for (size_t fi = 0; fi < fsize; ++fi) {
Workflow* workflow = (*workflows)[fi]; Workflow* workflow = (*workflows)[fi];
if (workflow == NULL) { if (workflow == NULL) {
LOG(ERROR) << "Failed to get valid workflow at: " << fi; LOG(ERROR) << "(logid=" << log_id
<< ") Failed to get valid workflow at: " << fi;
return ERR_INTERNAL_FAILURE; return ERR_INTERNAL_FAILURE;
} }
TRACEPRINTF("start to execute workflow[%s]", workflow->name().c_str()); TRACEPRINTF("(logid=%" PRIu64 ") start to execute workflow[%s]",
int errcode = _execute_workflow(workflow, request, response, debug_os); log_id,
TRACEPRINTF("finish to execute workflow[%s]", workflow->name().c_str()); workflow->name().c_str());
int errcode =
_execute_workflow(workflow, request, response, log_id, debug_os);
TRACEPRINTF("(logid=%" PRIu64 ") finish to execute workflow[%s]",
log_id,
workflow->name().c_str());
if (errcode < 0) { if (errcode < 0) {
LOG(ERROR) << "Failed execute workflow[" << workflow->name() LOG(ERROR) << "(logid=" << log_id << ") Failed execute workflow["
<< "] in:" << name(); << workflow->name() << "] in:" << name();
return errcode; return errcode;
} }
} }
} else { } else {
LOG(INFO) << "enable map request == False"; VLOG(2) << "(logid=" << log_id << ") enable map request == False";
TRACEPRINTF("start to execute one workflow"); TRACEPRINTF("(logid=%" PRIu64 ") start to execute one workflow", log_id);
size_t fsize = _flows.size(); size_t fsize = _flows.size();
for (size_t fi = 0; fi < fsize; ++fi) { for (size_t fi = 0; fi < fsize; ++fi) {
TRACEPRINTF("start to execute one workflow-%lu", fi); TRACEPRINTF(
int errcode = execute_one_workflow(fi, request, response, debug_os); "(logid=%" PRIu64 ") start to execute one workflow-%lu", log_id, fi);
TRACEPRINTF("finish to execute one workflow-%lu", fi); int errcode =
execute_one_workflow(fi, request, response, log_id, debug_os);
TRACEPRINTF(
"(logid=%" PRIu64 ") finish to execute one workflow-%lu", log_id, fi);
if (errcode < 0) { if (errcode < 0) {
LOG(ERROR) << "Failed execute 0-th workflow in:" << name(); LOG(ERROR) << "(logid=" << log_id
<< ") Failed execute 0-th workflow in:" << name();
return errcode; return errcode;
} }
} }
...@@ -188,26 +202,30 @@ int InferService::inference(const google::protobuf::Message* request, ...@@ -188,26 +202,30 @@ int InferService::inference(const google::protobuf::Message* request,
int InferService::debug(const google::protobuf::Message* request, int InferService::debug(const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os) { butil::IOBufBuilder* debug_os) {
return inference(request, response, debug_os); return inference(request, response, log_id, debug_os);
} }
int InferService::execute_one_workflow(uint32_t index, int InferService::execute_one_workflow(uint32_t index,
const google::protobuf::Message* request, const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os) { butil::IOBufBuilder* debug_os) {
if (index >= _flows.size()) { if (index >= _flows.size()) {
LOG(ERROR) << "Faield execute workflow, index: " << index LOG(ERROR) << "(logid=" << log_id
<< ") Faield execute workflow, index: " << index
<< " >= max:" << _flows.size(); << " >= max:" << _flows.size();
return ERR_OVERFLOW_FAILURE; return ERR_OVERFLOW_FAILURE;
} }
Workflow* workflow = _flows[index]; Workflow* workflow = _flows[index];
return _execute_workflow(workflow, request, response, debug_os); return _execute_workflow(workflow, request, response, log_id, debug_os);
} }
int InferService::_execute_workflow(Workflow* workflow, int InferService::_execute_workflow(Workflow* workflow,
const google::protobuf::Message* request, const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os) { butil::IOBufBuilder* debug_os) {
butil::Timer workflow_time(butil::Timer::STARTED); butil::Timer workflow_time(butil::Timer::STARTED);
// create and submit beginer channel // create and submit beginer channel
...@@ -215,54 +233,62 @@ int InferService::_execute_workflow(Workflow* workflow, ...@@ -215,54 +233,62 @@ int InferService::_execute_workflow(Workflow* workflow,
req_channel.init(0, START_OP_NAME); req_channel.init(0, START_OP_NAME);
req_channel = request; req_channel = request;
DagView* dv = workflow->fetch_dag_view(full_name()); DagView* dv = workflow->fetch_dag_view(full_name(), log_id);
dv->set_request_channel(req_channel); dv->set_request_channel(req_channel, log_id);
// call actual inference interface // call actual inference interface
int errcode = dv->execute(debug_os); int errcode = dv->execute(log_id, debug_os);
if (errcode < 0) { if (errcode < 0) {
LOG(ERROR) << "Failed execute dag for workflow:" << workflow->name(); LOG(ERROR) << "(logid=" << log_id
<< ") Failed execute dag for workflow:" << workflow->name();
return errcode; return errcode;
} }
TRACEPRINTF("finish to dv execute"); TRACEPRINTF("(logid=%" PRIu64 ") finish to dv execute", log_id);
// create ender channel and copy // create ender channel and copy
const Channel* res_channel = dv->get_response_channel(); const Channel* res_channel = dv->get_response_channel(log_id);
if (res_channel == NULL) {
LOG(ERROR) << "(logid=" << log_id << ") Failed get response channel";
return ERR_INTERNAL_FAILURE;
}
if (!_merger || !_merger->merge(res_channel->message(), response)) { if (!_merger || !_merger->merge(res_channel->message(), response)) {
LOG(ERROR) << "Failed merge channel res to response"; LOG(ERROR) << "(logid=" << log_id
<< ") Failed merge channel res to response";
return ERR_INTERNAL_FAILURE; return ERR_INTERNAL_FAILURE;
} }
TRACEPRINTF("finish to copy from"); TRACEPRINTF("(logid=%" PRIu64 ") finish to copy from", log_id);
workflow_time.stop(); workflow_time.stop();
LOG(INFO) << "workflow total time: " << workflow_time.u_elapsed(); LOG(INFO) << "(logid=" << log_id
<< ") workflow total time: " << workflow_time.u_elapsed();
PredictorMetric::GetInstance()->update_latency_metric( PredictorMetric::GetInstance()->update_latency_metric(
WORKFLOW_METRIC_PREFIX + dv->full_name(), workflow_time.u_elapsed()); WORKFLOW_METRIC_PREFIX + dv->full_name(), workflow_time.u_elapsed());
// return tls data to object pool // return tls data to object pool
workflow->return_dag_view(dv); workflow->return_dag_view(dv);
TRACEPRINTF("finish to return dag view"); TRACEPRINTF("(logid=%" PRIu64 ") finish to return dag view", log_id);
return ERR_OK; return ERR_OK;
} }
std::vector<Workflow*>* InferService::_map_request_to_workflow( std::vector<Workflow*>* InferService::_map_request_to_workflow(
const google::protobuf::Message* request) { const google::protobuf::Message* request, const uint64_t log_id) {
const google::protobuf::Descriptor* desc = request->GetDescriptor(); const google::protobuf::Descriptor* desc = request->GetDescriptor();
const google::protobuf::FieldDescriptor* field = const google::protobuf::FieldDescriptor* field =
desc->FindFieldByName(_request_field_key); desc->FindFieldByName(_request_field_key);
if (field == NULL) { if (field == NULL) {
LOG(ERROR) << "No field[" << _request_field_key << "] in [" LOG(ERROR) << "(logid=" << log_id << ") No field[" << _request_field_key
<< desc->full_name() << "]."; << "] in [" << desc->full_name() << "].";
return NULL; return NULL;
} }
if (field->is_repeated()) { if (field->is_repeated()) {
LOG(ERROR) << "field[" << desc->full_name() << "." << _request_field_key LOG(ERROR) << "(logid=" << log_id << ") field[" << desc->full_name() << "."
<< "] is repeated."; << _request_field_key << "] is repeated.";
return NULL; return NULL;
} }
if (field->cpp_type() != google::protobuf::FieldDescriptor::CPPTYPE_STRING) { if (field->cpp_type() != google::protobuf::FieldDescriptor::CPPTYPE_STRING) {
LOG(ERROR) << "field[" << desc->full_name() << "." << _request_field_key LOG(ERROR) << "(logid=" << log_id << ") field[" << desc->full_name() << "."
<< "] should be string"; << _request_field_key << "] should be string";
return NULL; return NULL;
} }
const std::string& field_value = const std::string& field_value =
...@@ -270,7 +296,7 @@ std::vector<Workflow*>* InferService::_map_request_to_workflow( ...@@ -270,7 +296,7 @@ std::vector<Workflow*>* InferService::_map_request_to_workflow(
std::vector<Workflow*>* p_workflow = std::vector<Workflow*>* p_workflow =
_request_to_workflow_map.seek(field_value); _request_to_workflow_map.seek(field_value);
if (p_workflow == NULL) { if (p_workflow == NULL) {
LOG(ERROR) << "cannot find key[" << field_value LOG(ERROR) << "(logid=" << log_id << ") cannot find key[" << field_value
<< "] in _request_to_workflow_map"; << "] in _request_to_workflow_map";
return NULL; return NULL;
} }
......
...@@ -52,25 +52,29 @@ class InferService { ...@@ -52,25 +52,29 @@ class InferService {
// Execute each workflow serially // Execute each workflow serially
virtual int inference(const google::protobuf::Message* request, virtual int inference(const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os = NULL); butil::IOBufBuilder* debug_os = NULL);
int debug(const google::protobuf::Message* request, int debug(const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os); butil::IOBufBuilder* debug_os);
int execute_one_workflow(uint32_t index, int execute_one_workflow(uint32_t index,
const google::protobuf::Message* request, const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os); butil::IOBufBuilder* debug_os);
private: private:
int _execute_workflow(Workflow* workflow, int _execute_workflow(Workflow* workflow,
const google::protobuf::Message* request, const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os); butil::IOBufBuilder* debug_os);
std::vector<Workflow*>* _map_request_to_workflow( std::vector<Workflow*>* _map_request_to_workflow(
const google::protobuf::Message* request); const google::protobuf::Message* request, const uint64_t log_id);
private: private:
std::vector<Workflow*> _flows; std::vector<Workflow*> _flows;
...@@ -88,6 +92,7 @@ class ParallelInferService : public InferService { ...@@ -88,6 +92,7 @@ class ParallelInferService : public InferService {
// Execute workflows in parallel // Execute workflows in parallel
int inference(const google::protobuf::Message* request, int inference(const google::protobuf::Message* request,
google::protobuf::Message* response, google::protobuf::Message* response,
const uint64_t log_id,
butil::IOBufBuilder* debug_os) { butil::IOBufBuilder* debug_os) {
return 0; return 0;
} }
......
...@@ -23,17 +23,24 @@ namespace predictor { ...@@ -23,17 +23,24 @@ namespace predictor {
#define REGIST_FORMAT_SERVICE(svr_name, svr) \ #define REGIST_FORMAT_SERVICE(svr_name, svr) \
do { \ do { \
char err_str[ERROR_STRING_LEN]; \
int ret = \ int ret = \
::baidu::paddle_serving::predictor::FormatServiceManager::instance() \ ::baidu::paddle_serving::predictor::FormatServiceManager::instance() \
.regist_service(svr_name, svr); \ .regist_service(svr_name, svr); \
if (ret != 0) { \ if (ret != 0) { \
RAW_LOG_ERROR("Failed regist service[%s][%s]", \ snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Failed regist service[%s][%s]", \
svr_name.c_str(), \ svr_name.c_str(), \
typeid(svr).name()); \ typeid(svr).name()); \
RAW_LOG(ERROR, err_str); \
} else { \ } else { \
RAW_LOG_INFO("Success regist service[%s][%s]", \ snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Success regist service[%s][%s]", \
svr_name.c_str(), \ svr_name.c_str(), \
typeid(svr).name()); \ typeid(svr).name()); \
RAW_LOG(INFO, err_str); \
} \ } \
} while (0) } while (0)
...@@ -42,31 +49,46 @@ class FormatServiceManager { ...@@ -42,31 +49,46 @@ class FormatServiceManager {
typedef google::protobuf::Service Service; typedef google::protobuf::Service Service;
int regist_service(const std::string& svr_name, Service* svr) { int regist_service(const std::string& svr_name, Service* svr) {
char err_str[ERROR_STRING_LEN];
if (_service_map.find(svr_name) != _service_map.end()) { if (_service_map.find(svr_name) != _service_map.end()) {
RAW_LOG_ERROR("Service[%s][%s] already exist!", snprintf(err_str,
ERROR_STRING_LEN - 1,
"Service[%s][%s] already exist!",
svr_name.c_str(), svr_name.c_str(),
typeid(svr).name()); typeid(svr).name());
RAW_LOG(ERROR, err_str);
return -1; return -1;
} }
std::pair<boost::unordered_map<std::string, Service*>::iterator, bool> ret; std::pair<boost::unordered_map<std::string, Service*>::iterator, bool> ret;
ret = _service_map.insert(std::make_pair(svr_name, svr)); ret = _service_map.insert(std::make_pair(svr_name, svr));
if (ret.second == false) { if (ret.second == false) {
RAW_LOG_ERROR("Service[%s][%s] insert failed!", snprintf(err_str,
ERROR_STRING_LEN - 1,
"Service[%s][%s] insert failed!",
svr_name.c_str(), svr_name.c_str(),
typeid(svr).name()); typeid(svr).name());
RAW_LOG(ERROR, err_str);
return -1; return -1;
} }
RAW_LOG_INFO("Service[%s] insert successfully!", svr_name.c_str()); snprintf(err_str,
ERROR_STRING_LEN - 1,
"Service[%s] insert successfully!",
svr_name.c_str());
RAW_LOG(INFO, err_str);
return 0; return 0;
} }
Service* get_service(const std::string& svr_name) { Service* get_service(const std::string& svr_name) {
char err_str[ERROR_STRING_LEN];
boost::unordered_map<std::string, Service*>::iterator res; boost::unordered_map<std::string, Service*>::iterator res;
if ((res = _service_map.find(svr_name)) == _service_map.end()) { if ((res = _service_map.find(svr_name)) == _service_map.end()) {
RAW_LOG_WARNING("Service[%s] not found in service manager!", snprintf(err_str,
ERROR_STRING_LEN - 1,
"Service[%s] not found in service manager!",
svr_name.c_str()); svr_name.c_str());
RAW_LOG(WARNING, err_str);
return NULL; return NULL;
} }
return (*res).second; return (*res).second;
......
...@@ -32,21 +32,22 @@ int Workflow::init(const configure::Workflow& conf) { ...@@ -32,21 +32,22 @@ int Workflow::init(const configure::Workflow& conf) {
return 0; return 0;
} }
DagView* Workflow::fetch_dag_view(const std::string& service_name) { DagView* Workflow::fetch_dag_view(const std::string& service_name,
const uint64_t log_id) {
DagView* view = NULL; DagView* view = NULL;
if (_type == "Sequence") { if (_type == "Sequence") {
view = butil::get_object<DagView>(); view = butil::get_object<DagView>();
} else if (_type == "Parallel") { } else if (_type == "Parallel") {
view = butil::get_object<ParallelDagView>(); view = butil::get_object<ParallelDagView>();
} else { } else {
LOG(ERROR) << "Unknown dag type:" << _type << "!"; LOG(ERROR) << "(logid=" << log_id << ") Unknown dag type:" << _type << "!";
return NULL; return NULL;
} }
if (view == NULL) { if (view == NULL) {
LOG(ERROR) << "create dag view from pool failed!"; LOG(ERROR) << "(logid=" << log_id << ") create dag view from pool failed!";
return NULL; return NULL;
} }
view->init(&_dag, service_name); view->init(&_dag, service_name, log_id);
return view; return view;
} }
......
...@@ -36,7 +36,8 @@ class Workflow { ...@@ -36,7 +36,8 @@ class Workflow {
// different apps. // different apps.
int init(const configure::Workflow& conf); int init(const configure::Workflow& conf);
DagView* fetch_dag_view(const std::string& service_name); DagView* fetch_dag_view(const std::string& service_name,
const uint64_t log_id);
int deinit() { return 0; } int deinit() { return 0; }
......
...@@ -35,7 +35,8 @@ int Op::init(Bus* bus, ...@@ -35,7 +35,8 @@ int Op::init(Bus* bus,
uint32_t id, uint32_t id,
const std::string& name, const std::string& name,
const std::string& type, const std::string& type,
void* conf) { void* conf,
const uint64_t log_id) {
_bus = bus; _bus = bus;
_dag = dag; _dag = dag;
_id = id; _id = id;
...@@ -45,7 +46,8 @@ int Op::init(Bus* bus, ...@@ -45,7 +46,8 @@ int Op::init(Bus* bus,
_timer = butil::get_object<TimerFlow>(); _timer = butil::get_object<TimerFlow>();
if (!_timer) { if (!_timer) {
LOG(ERROR) << "Invalid timerflow in op:" << this->name(); LOG(ERROR) << "(logid=" << log_id
<< ") Invalid timerflow in op:" << this->name();
return -1; return -1;
} }
...@@ -55,7 +57,8 @@ int Op::init(Bus* bus, ...@@ -55,7 +57,8 @@ int Op::init(Bus* bus,
Channel* channel = mutable_channel(); Channel* channel = mutable_channel();
if (channel == NULL) { if (channel == NULL) {
LOG(ERROR) << "Failed mutable channel in op: " << this->id() << ", " LOG(ERROR) << "(logid=" << log_id
<< ") Failed mutable channel in op: " << this->id() << ", "
<< this->name() << "!"; << this->name() << "!";
return -1; return -1;
} }
...@@ -96,18 +99,20 @@ int Op::check_time(const char* tag) { ...@@ -96,18 +99,20 @@ int Op::check_time(const char* tag) {
return 0; return 0;
} }
int Op::process(bool debug) { int Op::process(const uint64_t log_id, bool debug) {
butil::Timer op_time(butil::Timer::STARTED); butil::Timer op_time(butil::Timer::STARTED);
if (debug && _timer) { if (debug && _timer) {
_timer->start(); _timer->start();
} }
if (!_has_init) { if (!_has_init) {
LOG(ERROR) << "Make sure op has been init before inference"; LOG(ERROR) << "(logid=" << log_id
<< ") Make sure op has been init before inference";
return ERR_INTERNAL_FAILURE; return ERR_INTERNAL_FAILURE;
} }
if (_has_calc) { if (_has_calc) {
LOG(INFO) << "Op: " << _name << " already processed before"; LOG(INFO) << "(logid=" << log_id << ") Op: " << _name
<< " already processed before";
return ERR_OK; return ERR_OK;
} }
...@@ -143,7 +148,7 @@ int Op::process(bool debug) { ...@@ -143,7 +148,7 @@ int Op::process(bool debug) {
// 3. share output to bus // 3. share output to bus
Channel* channel = mutable_channel(); Channel* channel = mutable_channel();
channel->share_to_bus(_bus); channel->share_to_bus(_bus, log_id);
// 4. mark has calculated // 4. mark has calculated
_has_calc = true; _has_calc = true;
...@@ -156,7 +161,8 @@ int Op::process(bool debug) { ...@@ -156,7 +161,8 @@ int Op::process(bool debug) {
op_time.stop(); op_time.stop();
PredictorMetric::GetInstance()->update_latency_metric( PredictorMetric::GetInstance()->update_latency_metric(
OP_METRIC_PREFIX + full_name(), op_time.u_elapsed()); OP_METRIC_PREFIX + full_name(), op_time.u_elapsed());
LOG(INFO) << " " << name() << "_time=[" << op_time.u_elapsed() << "]"; LOG(INFO) << "(logid=" << log_id << ") " << name() << "_time=["
<< op_time.u_elapsed() << "]";
return ERR_OK; return ERR_OK;
} }
......
...@@ -113,13 +113,14 @@ class Op { ...@@ -113,13 +113,14 @@ class Op {
uint32_t id, uint32_t id,
const std::string& name, const std::string& name,
const std::string& type, const std::string& type,
void* conf); void* conf,
const uint64_t log_id);
int deinit(); int deinit();
int check_time(const char* tag); int check_time(const char* tag);
int process(bool debug); int process(const uint64_t log_id, bool debug);
std::string time_info(); std::string time_info();
......
...@@ -202,8 +202,6 @@ int main(int argc, char** argv) { ...@@ -202,8 +202,6 @@ int main(int argc, char** argv) {
} }
VLOG(2) << "Succ call pthread worker start function"; VLOG(2) << "Succ call pthread worker start function";
#ifndef BCLOUD
if (Resource::instance().general_model_initialize(FLAGS_resource_path, if (Resource::instance().general_model_initialize(FLAGS_resource_path,
FLAGS_resource_file) != 0) { FLAGS_resource_file) != 0) {
LOG(ERROR) << "Failed to initialize general model conf: " LOG(ERROR) << "Failed to initialize general model conf: "
...@@ -213,6 +211,7 @@ int main(int argc, char** argv) { ...@@ -213,6 +211,7 @@ int main(int argc, char** argv) {
VLOG(2) << "Succ initialize general model"; VLOG(2) << "Succ initialize general model";
#ifndef BCLOUD
// FATAL messages are output to stderr // FATAL messages are output to stderr
FLAGS_stderrthreshold = 3; FLAGS_stderrthreshold = 3;
#endif #endif
......
...@@ -12,13 +12,23 @@ ...@@ -12,13 +12,23 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
#include <sys/time.h>
#include <fstream> #include <fstream>
#include <iostream> #include <iostream>
#include <memory> #include <memory>
#include <thread>
#include "core/predictor/framework.pb.h" #include "core/predictor/framework.pb.h"
#include "quant.h" #include "quant.h"
#include "seq_file.h" #include "seq_file.h"
inline uint64_t time_diff(const struct timeval &start_time,
const struct timeval &end_time) {
return (end_time.tv_sec - start_time.tv_sec) * 1000000 +
(end_time.tv_usec - start_time.tv_usec);
}
using paddle::framework::proto::VarType; using paddle::framework::proto::VarType;
std::map<int, size_t> var_type_size; std::map<int, size_t> var_type_size;
void reg_var_types() { void reg_var_types() {
...@@ -100,8 +110,8 @@ int dump_parameter(const char *input_file, const char *output_file) { ...@@ -100,8 +110,8 @@ int dump_parameter(const char *input_file, const char *output_file) {
char *value_buf = new char[value_buf_len]; char *value_buf = new char[value_buf_len];
size_t offset = 0; size_t offset = 0;
for (int64_t i = 0; i < dims[0]; ++i) { for (int64_t i = 0; i < dims[0]; ++i) {
// std::cout << "key_len " << key_len << " value_len " << value_buf_len << // std::cout << "key_len " << key_len << " value_len " << value_buf_len
// std::endl; // << std::endl;
memcpy(value_buf, tensor_buf + offset, value_buf_len); memcpy(value_buf, tensor_buf + offset, value_buf_len);
seq_file_writer.write((char *)&i, sizeof(i), value_buf, value_buf_len); seq_file_writer.write((char *)&i, sizeof(i), value_buf, value_buf_len);
offset += value_buf_len; offset += value_buf_len;
...@@ -109,14 +119,14 @@ int dump_parameter(const char *input_file, const char *output_file) { ...@@ -109,14 +119,14 @@ int dump_parameter(const char *input_file, const char *output_file) {
return 0; return 0;
} }
int compress_parameter(const char *file1, const char *file2, int bits) { float *read_embedding_table(const char *file1, std::vector<int64_t> &dims) {
std::ifstream is(file1); std::ifstream is(file1);
// Step 1: is read version, os write version // Step 1: is read version, os write version
uint32_t version; uint32_t version;
is.read(reinterpret_cast<char *>(&version), sizeof(version)); is.read(reinterpret_cast<char *>(&version), sizeof(version));
if (version != 0) { if (version != 0) {
std::cout << "Version number " << version << " not supported" << std::endl; std::cout << "Version number " << version << " not supported" << std::endl;
return -1; return NULL;
} }
std::cout << "Version size: " << sizeof(version) << std::endl; std::cout << "Version size: " << sizeof(version) << std::endl;
// Step 2: is read LoD level, os write LoD level // Step 2: is read LoD level, os write LoD level
...@@ -138,7 +148,7 @@ int compress_parameter(const char *file1, const char *file2, int bits) { ...@@ -138,7 +148,7 @@ int compress_parameter(const char *file1, const char *file2, int bits) {
is.read(reinterpret_cast<char *>(&version), sizeof(version)); is.read(reinterpret_cast<char *>(&version), sizeof(version));
if (version != 0) { if (version != 0) {
std::cout << "Version number " << version << " not supported" << std::endl; std::cout << "Version number " << version << " not supported" << std::endl;
return -1; return NULL;
} }
// Step 4: is read Tensor Data, os write min/max/quant data // Step 4: is read Tensor Data, os write min/max/quant data
...@@ -149,10 +159,10 @@ int compress_parameter(const char *file1, const char *file2, int bits) { ...@@ -149,10 +159,10 @@ int compress_parameter(const char *file1, const char *file2, int bits) {
is.read(reinterpret_cast<char *>(buf.get()), size); is.read(reinterpret_cast<char *>(buf.get()), size);
if (!desc.ParseFromArray(buf.get(), size)) { if (!desc.ParseFromArray(buf.get(), size)) {
std::cout << "Cannot parse tensor desc" << std::endl; std::cout << "Cannot parse tensor desc" << std::endl;
return -1; return NULL;
} }
// read tensor // read tensor
std::vector<int64_t> dims; // std::vector<int64_t> dims;
dims.reserve(static_cast<size_t>(desc.dims().size())); dims.reserve(static_cast<size_t>(desc.dims().size()));
std::copy(desc.dims().begin(), desc.dims().end(), std::back_inserter(dims)); std::copy(desc.dims().begin(), desc.dims().end(), std::back_inserter(dims));
...@@ -164,7 +174,7 @@ int compress_parameter(const char *file1, const char *file2, int bits) { ...@@ -164,7 +174,7 @@ int compress_parameter(const char *file1, const char *file2, int bits) {
if (dims.size() != 2) { if (dims.size() != 2) {
std::cout << "Parameter dims not 2D" << std::endl; std::cout << "Parameter dims not 2D" << std::endl;
return -1; return NULL;
} }
size_t numel = 1; size_t numel = 1;
...@@ -176,47 +186,96 @@ int compress_parameter(const char *file1, const char *file2, int bits) { ...@@ -176,47 +186,96 @@ int compress_parameter(const char *file1, const char *file2, int bits) {
char *tensor_buf = new char[buf_size]; char *tensor_buf = new char[buf_size];
is.read(static_cast<char *>(tensor_buf), buf_size); is.read(static_cast<char *>(tensor_buf), buf_size);
float *tensor_float_buf = reinterpret_cast<float *>(tensor_buf); float *tensor_float_buf = reinterpret_cast<float *>(tensor_buf);
size_t per_line_size = dims[1] * 1 + 2 * sizeof(float); return tensor_float_buf;
char *tensor_out = new char[per_line_size * dims[0]]; }
float loss = 0; int compress_parameter_parallel(const char *file1,
float all_loss = 0; const char *file2,
int bits,
int n_threads) {
#define MIN_THREADS (1)
#define MAX_THREADS (80)
std::vector<int64_t> dims;
float *emb_table = read_embedding_table(file1, dims);
if (emb_table == NULL || dims.size() != 2) {
return -1;
}
// int64_t dict_size = dims[0]/100000000;
int64_t dict_size = dims[0];
int64_t emb_size = dims[1];
size_t per_line_size = emb_size * 1 + 2 * sizeof(float);
n_threads = std::min(std::max(MIN_THREADS, n_threads), MAX_THREADS);
int64_t step = dict_size / n_threads;
std::vector<char *> result;
result.reserve(dict_size + 1);
double pow2bits = pow(2, bits);
std::cout << "Start Quant" << std::endl; std::cout << "Start Quant" << std::endl;
SeqFileWriter seq_file_writer(file2); std::vector<std::thread> threads;
for (int i = 0; i < n_threads + 1; ++i) {
size_t offset = 0; threads.push_back(std::thread([=, &result]() {
int64_t start = i * step;
for (int64_t i = 0; i < dims[0]; ++i) { int64_t end = (i + 1) * step;
if (i == n_threads) {
if (start == dict_size) {
return;
}
end = dict_size;
}
printf("THREAD[%d], index [%ld, %ld), start Quant table...\n",
i,
start,
end);
struct timeval quant_start;
gettimeofday(&(quant_start), NULL);
for (int64_t k = start; k < end; ++k) {
float xmin = 0, xmax = 0, loss = 0; float xmin = 0, xmax = 0, loss = 0;
size_t scale = dims[1];
char *tensor_temp = new char[per_line_size]; char *tensor_temp = new char[per_line_size];
greedy_search( greedy_search(
tensor_float_buf + i * dims[1], xmin, xmax, loss, scale, bits); emb_table + k * emb_size, xmin, xmax, loss, emb_size, bits);
for (size_t e = 0; e < dims[1]; ++e) { // 得出 loss 最小的时候的 scale
float x = *(tensor_float_buf + i * dims[1] + e); float scale = (xmax - xmin) / (pow2bits - 1);
int val = round((x - xmin) / (xmax - xmin) * (pow(2, bits) - 1));
val = std::max(0, val);
val = std::min((int)pow(2, bits) - 1, val);
char *min_ptr = tensor_temp; char *min_ptr = tensor_temp;
char *max_ptr = tensor_temp + sizeof(float); char *max_ptr = tensor_temp + sizeof(float);
memcpy(min_ptr, &xmin, sizeof(float)); memcpy(min_ptr, &xmin, sizeof(float));
memcpy(max_ptr, &xmax, sizeof(float)); memcpy(max_ptr, &xmax, sizeof(float));
for (size_t e = 0; e < emb_size; ++e) {
float x = *(emb_table + k * emb_size + e);
int val = round((x - xmin) / scale);
val = std::max(0, val);
val = std::min((int)pow2bits - 1, val);
*(tensor_temp + 2 * sizeof(float) + e) = val; *(tensor_temp + 2 * sizeof(float) + e) = val;
float unit = (xmax - xmin) / pow(2, bits);
float trans_val = unit * val + xmin;
} }
seq_file_writer.write((char *)&i, sizeof(i), tensor_temp, per_line_size); result[k] = tensor_temp;
if ((k - start) % 10000 == 0) {
printf("THREAD[%d], handle line: %ld\n", i, k - start);
}
}
struct timeval quant_end;
gettimeofday(&(quant_end), NULL);
printf("THREAD[%d], Quantization finished, cost: %lu us!!!\n",
i,
time_diff(quant_start, quant_end));
}));
}
for (auto &thread : threads) {
thread.join();
}
SeqFileWriter seq_file_writer(file2);
for (int64_t i = 0; i < dict_size; i++) {
seq_file_writer.write((char *)&i, sizeof(i), result[i], per_line_size);
} }
return 0; return 0;
} }
int main(int argc, char **argv) { int main(int argc, char **argv) {
if (argc < 3 || argc > 4) { if (argc < 3 || argc > 5) {
std::cout << "Usage: if no compress, please follow:" << std::endl; std::cout << "Usage:" << std::endl;
std::cout << "seq_generator PARAMETER_FILE OUTPUT_FILE\n" << std::endl; std::cout << "if no compress, please follow:" << std::endl;
std::cout << " seq_generator PARAMETER_FILE OUTPUT_FILE\n" << std::endl;
std::cout << "if compress, please follow: " << std::endl; std::cout << "if compress, please follow: " << std::endl;
std::cout << "seq_generator PARAMETER_FILE OUTPUT_FILE QUANT_BITS" std::cout << " seq_generator PARAMETER_FILE OUTPUT_FILE QUANT_BITS "
"[N_THREADS]"
<< std::endl; << std::endl;
std::cout << "Now it only support 8 bit." << std::endl; std::cout << " Now it only support 8 bit." << std::endl;
return -1; return -1;
} }
reg_var_types(); reg_var_types();
...@@ -227,7 +286,13 @@ int main(int argc, char **argv) { ...@@ -227,7 +286,13 @@ int main(int argc, char **argv) {
} }
if (argc == 4) { if (argc == 4) {
std::cout << "generate compressed sparse param sequence file" << std::endl; std::cout << "generate compressed sparse param sequence file" << std::endl;
compress_parameter(argv[1], argv[2], atoi(argv[3])); compress_parameter_parallel(argv[1], argv[2], atoi(argv[3]), 1);
return 0;
}
if (argc == 5) {
std::cout << "parallel generate compressed sparse param sequence file"
<< std::endl;
compress_parameter_parallel(argv[1], argv[2], atoi(argv[3]), atoi(argv[4]));
return 0; return 0;
} }
} }
......
...@@ -50,9 +50,9 @@ class WeightedRandomRender : public EndpointRouterBase { ...@@ -50,9 +50,9 @@ class WeightedRandomRender : public EndpointRouterBase {
Factory<WeightedRandomRender, EndpointRouterBase>* factory = Factory<WeightedRandomRender, EndpointRouterBase>* factory =
new (std::nothrow) Factory<WeightedRandomRender, EndpointRouterBase>(); new (std::nothrow) Factory<WeightedRandomRender, EndpointRouterBase>();
if (factory == NULL) { if (factory == NULL) {
RAW_LOG_ERROR( RAW_LOG(ERROR,
"Failed regist factory: WeightedRandomRender->EndpointRouterBase in " "Failed regist factory: WeightedRandomRender->EndpointRouterBase "
"macro!"); "in macro!");
return -1; return -1;
} }
...@@ -62,7 +62,7 @@ class WeightedRandomRender : public EndpointRouterBase { ...@@ -62,7 +62,7 @@ class WeightedRandomRender : public EndpointRouterBase {
// together. // together.
if (FactoryPool<EndpointRouterBase>::instance().register_factory( if (FactoryPool<EndpointRouterBase>::instance().register_factory(
"WeightedRandomRender", factory) != 0) { "WeightedRandomRender", factory) != 0) {
RAW_LOG_INFO( RAW_LOG(INFO,
"Factory has been registed: " "Factory has been registed: "
"WeightedRandomRender->EndpointRouterBase."); "WeightedRandomRender->EndpointRouterBase.");
} }
......
...@@ -18,7 +18,6 @@ ...@@ -18,7 +18,6 @@
#include <utility> #include <utility>
#include "core/sdk-cpp/include/common.h" #include "core/sdk-cpp/include/common.h"
#include "core/sdk-cpp/include/stub_impl.h" #include "core/sdk-cpp/include/stub_impl.h"
#include "glog/raw_logging.h"
namespace baidu { namespace baidu {
namespace paddle_serving { namespace paddle_serving {
...@@ -28,12 +27,20 @@ namespace sdk_cpp { ...@@ -28,12 +27,20 @@ namespace sdk_cpp {
namespace brpc = baidu::rpc; namespace brpc = baidu::rpc;
#endif #endif
#define ERROR_STRING_LEN 10240
#define INLINE_REGIST_OBJECT(D, B, E) \ #define INLINE_REGIST_OBJECT(D, B, E) \
do { \ do { \
Factory<D, B>* factory = new (std::nothrow) Factory<D, B>(); \ Factory<D, B>* factory = new (std::nothrow) Factory<D, B>(); \
if (factory == NULL || \ if (factory == NULL || \
FactoryPool<B>::instance().register_factory(#D, factory) != 0) { \ FactoryPool<B>::instance().register_factory(#D, factory) != 0) { \
RAW_LOG_ERROR("Failed regist factory: %s->%s in macro!", #D, #B); \ char err_str[ERROR_STRING_LEN]; \
snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Failed regist factory: %s->%s in macro!", \
#D, \
#B); \
RAW_LOG(ERROR, err_str); \
return E; \ return E; \
} \ } \
} while (0) } while (0)
...@@ -43,7 +50,12 @@ namespace brpc = baidu::rpc; ...@@ -43,7 +50,12 @@ namespace brpc = baidu::rpc;
Factory<D, B>* factory = new (std::nothrow) Factory<D, B>(); \ Factory<D, B>* factory = new (std::nothrow) Factory<D, B>(); \
if (factory == NULL || \ if (factory == NULL || \
FactoryPool<B>::instance().register_factory(tag, factory) != 0) { \ FactoryPool<B>::instance().register_factory(tag, factory) != 0) { \
RAW_LOG_ERROR("Failed regist factory: %s in macro!", #D); \ char err_str[ERROR_STRING_LEN]; \
snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Failed regist factory: %s in macro!", \
#D); \
RAW_LOG(ERROR, err_str); \
return -1; \ return -1; \
} \ } \
return 0; \ return 0; \
...@@ -66,7 +78,13 @@ namespace brpc = baidu::rpc; ...@@ -66,7 +78,13 @@ namespace brpc = baidu::rpc;
if (factory == NULL || \ if (factory == NULL || \
::baidu::paddle_serving::sdk_cpp::FactoryPool<B>::instance() \ ::baidu::paddle_serving::sdk_cpp::FactoryPool<B>::instance() \
.register_factory(#D, factory) != 0) { \ .register_factory(#D, factory) != 0) { \
RAW_LOG_ERROR("Failed regist factory: %s->%s in macro!", #D, #B); \ char err_str[ERROR_STRING_LEN]; \
snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Failed regist factory: %s->%s in macro!", \
#D, \
#B); \
RAW_LOG(ERROR, err_str); \
return; \ return; \
} \ } \
return; \ return; \
...@@ -80,8 +98,14 @@ namespace brpc = baidu::rpc; ...@@ -80,8 +98,14 @@ namespace brpc = baidu::rpc;
if (factory == NULL || \ if (factory == NULL || \
::baidu::paddle_serving::sdk_cpp::FactoryPool<B>::instance() \ ::baidu::paddle_serving::sdk_cpp::FactoryPool<B>::instance() \
.register_factory(T, factory) != 0) { \ .register_factory(T, factory) != 0) { \
RAW_LOG_ERROR( \ char err_str[ERROR_STRING_LEN]; \
"Failed regist factory: %s->%s, tag %s in macro!", #D, #B, T); \ snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Failed regist factory: %s->%s, tag %s in macro!", \
#D, \
#B, \
T); \
RAW_LOG(ERROR, err_str); \
return; \ return; \
} \ } \
return; \ return; \
...@@ -108,8 +132,13 @@ namespace brpc = baidu::rpc; ...@@ -108,8 +132,13 @@ namespace brpc = baidu::rpc;
::baidu::paddle_serving::sdk_cpp::FactoryPool< \ ::baidu::paddle_serving::sdk_cpp::FactoryPool< \
::baidu::paddle_serving::sdk_cpp::Stub>::instance() \ ::baidu::paddle_serving::sdk_cpp::Stub>::instance() \
.register_factory(T, factory) != 0) { \ .register_factory(T, factory) != 0) { \
RAW_LOG_ERROR( \ char err_str[ERROR_STRING_LEN]; \
"Failed regist factory: %s->Stub, tag: %s in macro!", #D, T); \ snprintf(err_str, \
ERROR_STRING_LEN - 1, \
"Failed regist factory: %s->Stub, tag: %s in macro!", \
#D, \
T); \
RAW_LOG(ERROR, err_str); \
return; \ return; \
} \ } \
return; \ return; \
...@@ -146,14 +175,24 @@ class FactoryPool { ...@@ -146,14 +175,24 @@ class FactoryPool {
typename std::map<std::string, FactoryBase<B>*>::iterator it = typename std::map<std::string, FactoryBase<B>*>::iterator it =
_pool.find(tag); _pool.find(tag);
if (it != _pool.end()) { if (it != _pool.end()) {
RAW_LOG_ERROR("Insert duplicate with tag: %s", tag.c_str()); char err_str[ERROR_STRING_LEN];
snprintf(err_str,
ERROR_STRING_LEN - 1,
"Insert duplicate with tag: %s",
tag.c_str());
RAW_LOG(ERROR, err_str);
return -1; return -1;
} }
std::pair<typename std::map<std::string, FactoryBase<B>*>::iterator, bool> std::pair<typename std::map<std::string, FactoryBase<B>*>::iterator, bool>
r = _pool.insert(std::make_pair(tag, factory)); r = _pool.insert(std::make_pair(tag, factory));
if (!r.second) { if (!r.second) {
RAW_LOG_ERROR("Failed insert new factory with: %s", tag.c_str()); char err_str[ERROR_STRING_LEN];
snprintf(err_str,
ERROR_STRING_LEN - 1,
"Failed insert new factory with: %s",
tag.c_str());
RAW_LOG(ERROR, err_str);
return -1; return -1;
} }
...@@ -164,9 +203,13 @@ class FactoryPool { ...@@ -164,9 +203,13 @@ class FactoryPool {
typename std::map<std::string, FactoryBase<B>*>::iterator it = typename std::map<std::string, FactoryBase<B>*>::iterator it =
_pool.find(tag); _pool.find(tag);
if (it == _pool.end() || it->second == NULL) { if (it == _pool.end() || it->second == NULL) {
RAW_LOG_ERROR("Not found factory pool, tag: %s, pool size: %u", char err_str[ERROR_STRING_LEN];
snprintf(err_str,
ERROR_STRING_LEN - 1,
"Not found factory pool, tag: %s, pool size: %u",
tag.c_str(), tag.c_str(),
_pool.size()); _pool.size());
RAW_LOG(ERROR, err_str);
return NULL; return NULL;
} }
......
...@@ -37,6 +37,7 @@ message Request { ...@@ -37,6 +37,7 @@ message Request {
repeated FeedInst insts = 1; repeated FeedInst insts = 1;
repeated string fetch_var_names = 2; repeated string fetch_var_names = 2;
optional bool profile_server = 3 [ default = false ]; optional bool profile_server = 3 [ default = false ];
required uint64 log_id = 4 [ default = 0 ];
}; };
message Response { message Response {
......
...@@ -4,17 +4,27 @@ ...@@ -4,17 +4,27 @@
## Compilation environment requirements ## Compilation environment requirements
- OS: CentOS 7 | module | version |
- GCC: 4.8.2 and later | :--------------------------: | :-------------------------------: |
- Golang: 1.9.2 and later | OS | CentOS 7 |
- Git:2.17.1 and later | gcc | 4.8.5 and later |
- CMake:3.2.2 and later | gcc-c++ | 4.8.5 and later |
- Python:2.7.2 and later / 3.6 and later | git | 3.82 and later |
| cmake | 3.2.0 and later |
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you: | Python | 2.7.2 and later / 3.6 and later |
| Go | 1.9.2 and later |
- CPU: `hub.baidubce.com/paddlepaddle/serving:latest-devel`,dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel) | git | 2.17.1 and later |
- GPU: `hub.baidubce.com/paddlepaddle/serving:latest-gpu-devel`,dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel) | glibc-static | 2.17 |
| openssl-devel | 1.0.2k |
| bzip2-devel | 1.0.6 and later |
| python-devel / python3-devel | 2.7.5 and later / 3.6.8 and later |
| sqlite-devel | 3.7.17 and later |
| patchelf | 0.9 and later |
| libXext | 1.3.3 |
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](DOCKER_IMAGES.md).
This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake: This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake:
...@@ -29,6 +39,9 @@ git clone https://github.com/PaddlePaddle/Serving ...@@ -29,6 +39,9 @@ git clone https://github.com/PaddlePaddle/Serving
cd Serving && git submodule update --init --recursive cd Serving && git submodule update --init --recursive
``` ```
## PYTHONROOT Setting ## PYTHONROOT Setting
```shell ```shell
...@@ -38,13 +51,49 @@ export PYTHONROOT=/usr/ ...@@ -38,13 +51,49 @@ export PYTHONROOT=/usr/
In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`. In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
## Install Python dependencies
```shell
pip install -r python/requirements.txt
```
If Python3 is used, replace `pip` with `pip3`.
## GOPATH Setting
## Compile Arguments
The default GOPATH is `$HOME/go`, which you can set to other values.
```shell
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
```
## Get go packages
```shell
go env -w GO111MODULE=on
go env -w GOPROXY=https://goproxy.cn,direct
go get -u github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway@v1.15.2
go get -u github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger@v1.15.2
go get -u github.com/golang/protobuf/protoc-gen-go@v1.4.3
go get -u google.golang.org/grpc@v1.33.0
```
## Compile Server ## Compile Server
### Integrated CPU version paddle inference library ### Integrated CPU version paddle inference library
``` shell ``` shell
mkdir build && cd build mkdir server-build-cpu && cd server-build-cpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DSERVER=ON ..
make -j10 make -j10
``` ```
...@@ -53,8 +102,30 @@ you can execute `make install` to put targets under directory `./output`, you ne ...@@ -53,8 +102,30 @@ you can execute `make install` to put targets under directory `./output`, you ne
### Integrated GPU version paddle inference library ### Integrated GPU version paddle inference library
``` shell ``` shell
mkdir build && cd build mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON -DWITH_GPU=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DSERVER=ON \
-DWITH_GPU=ON ..
make -j10
```
### Integrated TRT version paddle inference library
```
mkdir server-build-trt && cd server-build-trt
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DSERVER=ON \
-DWITH_GPU=ON \
-DWITH_TRT=ON ..
make -j10 make -j10
``` ```
...@@ -62,33 +133,54 @@ execute `make install` to put targets under directory `./output` ...@@ -62,33 +133,54 @@ execute `make install` to put targets under directory `./output`
**Attention:** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details. **Attention:** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details.
## Compile Client ## Compile Client
``` shell ``` shell
mkdir build && cd build mkdir client-build && cd client-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCLIENT=ON ..
make -j10 make -j10
``` ```
execute `make install` to put targets under directory `./output` execute `make install` to put targets under directory `./output`
## Compile the App ## Compile the App
```bash ```bash
mkdir build && cd build mkdir app-build && cd app-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DAPP=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DAPP=ON ..
make make
``` ```
## Install wheel package ## Install wheel package
Regardless of the client, server or App part, after compiling, install the whl package under `python/dist/`. Regardless of the client, server or App part, after compiling, install the whl package in `python/dist/` in the temporary directory(`server-build-cpu`, `server-build-gpu`, `client-build`,`app-build`) of the compilation process.
## Note ## Note
When running the python server, it will check the `SERVING_BIN` environment variable. If you want to use your own compiled binary file, set the environment variable to the path of the corresponding binary file, usually`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`. When running the python server, it will check the `SERVING_BIN` environment variable. If you want to use your own compiled binary file, set the environment variable to the path of the corresponding binary file, usually`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`.
## Verify
Please use the example under `python/examples` to verify.
## CMake Option Description ## CMake Option Description
| Compile Options | Description | Default | | Compile Options | Description | Default |
...@@ -96,7 +188,9 @@ When running the python server, it will check the `SERVING_BIN` environment vari ...@@ -96,7 +188,9 @@ When running the python server, it will check the `SERVING_BIN` environment vari
| WITH_AVX | Compile Paddle Serving with AVX intrinsics | OFF | | WITH_AVX | Compile Paddle Serving with AVX intrinsics | OFF |
| WITH_MKL | Compile Paddle Serving with MKL support | OFF | | WITH_MKL | Compile Paddle Serving with MKL support | OFF |
| WITH_GPU | Compile Paddle Serving with NVIDIA GPU | OFF | | WITH_GPU | Compile Paddle Serving with NVIDIA GPU | OFF |
| CUDNN_ROOT | Define CuDNN library and header path | | | CUDNN_LIBRARY | Define CuDNN library and header path | |
| CUDA_TOOLKIT_ROOT_DIR | Define CUDA PATH | |
| TENSORRT_ROOT | Define TensorRT PATH | |
| CLIENT | Compile Paddle Serving Client | OFF | | CLIENT | Compile Paddle Serving Client | OFF |
| SERVER | Compile Paddle Serving Server | OFF | | SERVER | Compile Paddle Serving Server | OFF |
| APP | Compile Paddle Serving App package | OFF | | APP | Compile Paddle Serving App package | OFF |
...@@ -111,7 +205,8 @@ To compile the Paddle Serving GPU version on bare metal, you need to install the ...@@ -111,7 +205,8 @@ To compile the Paddle Serving GPU version on bare metal, you need to install the
- CUDA - CUDA
- CuDNN - CuDNN
- NCCL2
To compile the TensorRT version, you need to install the TensorRT library.
Note here: Note here:
...@@ -121,21 +216,12 @@ Note here: ...@@ -121,21 +216,12 @@ Note here:
The following is the base library version matching relationship used by the PaddlePaddle release version for reference: The following is the base library version matching relationship used by the PaddlePaddle release version for reference:
| | CUDA | CuDNN | NCCL2 | | | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------------------: | :----: | | :----: | :-----: | :----------------------: | :----: |
| CUDA 8 | 8.0.61 | CuDNN 7.1.2 for CUDA 8.0 | 2.1.4 | | post9 | 9.0 | CuDNN 7.3.1 for CUDA 9.0 | |
| CUDA 9 | 9.0.176 | CuDNN 7.3.1 for CUDA 9.0 | 2.2.12 | | post10 | 10.0 | CuDNN 7.5.1 for CUDA 10.0| |
| trt | 10.1 | CuDNN 7.5.1 for CUDA 10.1| 6.0.1.5 |
### How to make the compiler detect the CuDNN library ### How to make the compiler detect the CuDNN library
Download the corresponding CUDNN version from NVIDIA developer official website and decompressing it, add `-DCUDNN_ROOT` to cmake command, to specify the path of CUDNN. Download the corresponding CUDNN version from NVIDIA developer official website and decompressing it, add `-DCUDNN_ROOT` to cmake command, to specify the path of CUDNN.
### How to make the compiler detect the nccl library
After downloading the corresponding version of the nccl2 library from the NVIDIA developer official website and decompressing it, add the following environment variables (take nccl2.1.4 as an example):
```shell
export C_INCLUDE_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/include:$CPLUS_INCLUDE_PATH
export LD_LIBRARY_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/lib/:$LD_LIBRARY_PATH
```
...@@ -4,17 +4,27 @@ ...@@ -4,17 +4,27 @@
## 编译环境设置 ## 编译环境设置
- OS: CentOS 7 | 组件 | 版本要求 |
- GCC: 4.8.2及以上 | :--------------------------: | :-------------------------------: |
- Golang: 1.9.2及以上 | OS | CentOS 7 |
- Git:2.17.1及以上 | gcc | 4.8.5 and later |
- CMake:3.2.2及以上 | gcc-c++ | 4.8.5 and later |
- Python:2.7.2及以上 / 3.6及以上 | git | 3.82 and later |
| cmake | 3.2.0 and later |
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境: | Python | 2.7.2 and later / 3.6 and later |
| Go | 1.9.2 and later |
- CPU: `hub.baidubce.com/paddlepaddle/serving:latest-devel`,dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel) | git | 2.17.1 and later |
- GPU: `hub.baidubce.com/paddlepaddle/serving:latest-gpu-devel`,dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel) | glibc-static | 2.17 |
| openssl-devel | 1.0.2k |
| bzip2-devel | 1.0.6 and later |
| python-devel / python3-devel | 2.7.5 and later / 3.6.8 and later |
| sqlite-devel | 3.7.17 and later |
| patchelf | 0.9 |
| libXext | 1.3.3 |
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境,详见[该文档](DOCKER_IMAGES_CN.md)
本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译,只需要调整cmake的Python相关选项即可: 本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译,只需要调整cmake的Python相关选项即可:
...@@ -29,6 +39,9 @@ git clone https://github.com/PaddlePaddle/Serving ...@@ -29,6 +39,9 @@ git clone https://github.com/PaddlePaddle/Serving
cd Serving && git submodule update --init --recursive cd Serving && git submodule update --init --recursive
``` ```
## PYTHONROOT设置 ## PYTHONROOT设置
```shell ```shell
...@@ -38,13 +51,46 @@ export PYTHONROOT=/usr/ ...@@ -38,13 +51,46 @@ export PYTHONROOT=/usr/
我们提供默认Centos7的Python路径为`/usr/bin/python`,如果您要使用我们的Centos6镜像,需要将其设置为`export PYTHONROOT=/usr/local/python2.7/` 我们提供默认Centos7的Python路径为`/usr/bin/python`,如果您要使用我们的Centos6镜像,需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`
## 安装Python依赖
```shell
pip install -r python/requirements.txt
```
如果使用 Python3,请以 `pip3` 替换 `pip`
## GOPATH 设置
默认 GOPATH 设置为 `$HOME/go`,您也可以设置为其他值。
```shell
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
```
## 获取 Go packages
```shell
go env -w GO111MODULE=on
go env -w GOPROXY=https://goproxy.cn,direct
go get -u github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway@v1.15.2
go get -u github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger@v1.15.2
go get -u github.com/golang/protobuf/protoc-gen-go@v1.4.3
go get -u google.golang.org/grpc@v1.33.0
```
## 编译Server部分 ## 编译Server部分
### 集成CPU版本Paddle Inference Library ### 集成CPU版本Paddle Inference Library
``` shell ``` shell
mkdir build && cd build mkdir server-build-cpu && cd server-build-cpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DSERVER=ON ..
make -j10 make -j10
``` ```
...@@ -53,8 +99,30 @@ make -j10 ...@@ -53,8 +99,30 @@ make -j10
### 集成GPU版本Paddle Inference Library ### 集成GPU版本Paddle Inference Library
``` shell ``` shell
mkdir build && cd build mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON -DWITH_GPU=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DSERVER=ON \
-DWITH_GPU=ON ..
make -j10
```
### 集成TensorRT版本Paddle Inference Library
```
mkdir server-build-trt && cd server-build-trt
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DSERVER=ON \
-DWITH_GPU=ON \
-DWITH_TRT=ON ..
make -j10 make -j10
``` ```
...@@ -65,29 +133,50 @@ make -j10 ...@@ -65,29 +133,50 @@ make -j10
## 编译Client部分 ## 编译Client部分
``` shell ``` shell
mkdir build && cd build mkdir client-build && cd client-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCLIENT=ON ..
make -j10 make -j10
``` ```
执行`make install`可以把目标产出放在`./output`目录下。 执行`make install`可以把目标产出放在`./output`目录下。
## 编译App部分 ## 编译App部分
```bash ```bash
mkdir build && cd build mkdir app-build && cd app-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCMAKE_INSTALL_PREFIX=./output -DAPP=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCMAKE_INSTALL_PREFIX=./output \
-DAPP=ON ..
make make
``` ```
## 安装wheel包 ## 安装wheel包
无论是Client端,Server端还是App部分,编译完成后,安装`python/dist/`下的whl包即可。 无论是Client端,Server端还是App部分,编译完成后,安装编译过程临时目录(`server-build-cpu``server-build-gpu``client-build``app-build`)下的`python/dist/` 中的whl包即可。
## 注意事项 ## 注意事项
运行python端Server时,会检查`SERVING_BIN`环境变量,如果想使用自己编译的二进制文件,请将设置该环境变量为对应二进制文件的路径,通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving` 运行python端Server时,会检查`SERVING_BIN`环境变量,如果想使用自己编译的二进制文件,请将设置该环境变量为对应二进制文件的路径,通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`
## 如何验证
请使用 `python/examples` 下的例子进行验证。
## CMake选项说明 ## CMake选项说明
| 编译选项 | 说明 | 默认 | | 编译选项 | 说明 | 默认 |
...@@ -95,7 +184,10 @@ make ...@@ -95,7 +184,10 @@ make
| WITH_AVX | Compile Paddle Serving with AVX intrinsics | OFF | | WITH_AVX | Compile Paddle Serving with AVX intrinsics | OFF |
| WITH_MKL | Compile Paddle Serving with MKL support | OFF | | WITH_MKL | Compile Paddle Serving with MKL support | OFF |
| WITH_GPU | Compile Paddle Serving with NVIDIA GPU | OFF | | WITH_GPU | Compile Paddle Serving with NVIDIA GPU | OFF |
| CUDNN_ROOT | Define CuDNN library and header path | | | WITH_TRT | Compile Paddle Serving with TensorRT | OFF |
| CUDNN_LIBRARY | Define CuDNN library and header path | |
| CUDA_TOOLKIT_ROOT_DIR | Define CUDA PATH | |
| TENSORRT_ROOT | Define TensorRT PATH | |
| CLIENT | Compile Paddle Serving Client | OFF | | CLIENT | Compile Paddle Serving Client | OFF |
| SERVER | Compile Paddle Serving Server | OFF | | SERVER | Compile Paddle Serving Server | OFF |
| APP | Compile Paddle Serving App package | OFF | | APP | Compile Paddle Serving App package | OFF |
...@@ -110,7 +202,8 @@ Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选 ...@@ -110,7 +202,8 @@ Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选
- CUDA - CUDA
- CuDNN - CuDNN
- NCCL2
编译TensorRT版本,需要安装TensorRT库。
这里要注意的是: 这里要注意的是:
...@@ -119,21 +212,12 @@ Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选 ...@@ -119,21 +212,12 @@ Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选
以下是PaddlePaddle发布版本所使用的基础库版本匹配关系,供参考: 以下是PaddlePaddle发布版本所使用的基础库版本匹配关系,供参考:
| | CUDA | CuDNN | NCCL2 | | | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------------------: | :----: | | :----: | :-----: | :----------------------: | :----: |
| CUDA 8 | 8.0.61 | CuDNN 7.1.2 for CUDA 8.0 | 2.1.4 | | post9 | 9.0 | CuDNN 7.3.1 for CUDA 9.0 | |
| CUDA 9 | 9.0.176 | CuDNN 7.3.1 for CUDA 9.0 | 2.2.12 | | post10 | 10.0 | CuDNN 7.5.1 for CUDA 10.0| |
| trt | 10.1 | CuDNN 7.5.1 for CUDA 10.1| 6.0.1.5 |
### 如何让Paddle Serving编译系统探测到CuDNN库 ### 如何让Paddle Serving编译系统探测到CuDNN库
从NVIDIA developer官网下载对应版本CuDNN并在本地解压后,在cmake编译命令中增加`-DCUDNN_ROOT`参数,指定CuDNN库所在路径。 从NVIDIA developer官网下载对应版本CuDNN并在本地解压后,在cmake编译命令中增加`-DCUDNN_LIBRARY`参数,指定CuDNN库所在路径。
### 如何让Paddle Serving编译系统探测到nccl库
从NVIDIA developer官网下载对应版本nccl2库并解压后,增加如下环境变量 (以nccl2.1.4为例):
```shell
export C_INCLUDE_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/include:$CPLUS_INCLUDE_PATH
export LD_LIBRARY_PATH=/path/to/nccl2/cuda8/nccl_2.1.4-1+cuda8.0_x86_64/lib/:$LD_LIBRARY_PATH
```
...@@ -68,7 +68,7 @@ Paddle Serving uses this [Git branching model](http://nvie.com/posts/a-successfu ...@@ -68,7 +68,7 @@ Paddle Serving uses this [Git branching model](http://nvie.com/posts/a-successfu
1. Build and test 1. Build and test
Users can build Paddle Serving natively on Linux, see the [BUILD steps](doc/INSTALL.md). Users can build Paddle Serving natively on Linux, see the [BUILD steps](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md).
1. Keep pulling 1. Keep pulling
......
...@@ -6,7 +6,8 @@ ...@@ -6,7 +6,8 @@
There are two examples on CTR under python / examples, they are criteo_ctr, criteo_ctr_with_cube. The former is to save the entire model during training, including sparse parameters. The latter is to cut out the sparse parameters and save them into two parts, one is the sparse parameter and the other is the dense parameter. Because the scale of sparse parameters is very large in industrial cases, reaching the order of 10 ^ 9. Therefore, it is not practical to start large-scale sparse parameter prediction on one machine. Therefore, we introduced Baidu's industrial-grade product Cube to provide the sparse parameter service for many years to provide distributed sparse parameter services. There are two examples on CTR under python / examples, they are criteo_ctr, criteo_ctr_with_cube. The former is to save the entire model during training, including sparse parameters. The latter is to cut out the sparse parameters and save them into two parts, one is the sparse parameter and the other is the dense parameter. Because the scale of sparse parameters is very large in industrial cases, reaching the order of 10 ^ 9. Therefore, it is not practical to start large-scale sparse parameter prediction on one machine. Therefore, we introduced Baidu's industrial-grade product Cube to provide the sparse parameter service for many years to provide distributed sparse parameter services.
The local mode of Cube is different from distributed Cube, which is designed to be convenient for developers to use in experiments and demos. If there is a demand for distributed sparse parameter service, please continue reading [Distributed Cube User Guide](./Distributed_Cube) after reading this document (still developing). The local mode of Cube is different from distributed Cube, which is designed to be convenient for developers to use in experiments and demos.
<!--If there is a demand for distributed sparse parameter service, please continue reading [Distributed Cube User Guide](./Distributed_Cube) after reading this document (still developing).-->
This document uses the original model without any compression algorithm. If there is a need for a quantitative model to go online, please read the [Quantization Storage on Cube Sparse Parameter Indexing](./CUBE_QUANT.md) This document uses the original model without any compression algorithm. If there is a need for a quantitative model to go online, please read the [Quantization Storage on Cube Sparse Parameter Indexing](./CUBE_QUANT.md)
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
在python/examples下有两个关于CTR的示例,他们分别是criteo_ctr, criteo_ctr_with_cube。前者是在训练时保存整个模型,包括稀疏参数。后者是将稀疏参数裁剪出来,保存成两个部分,一个是稀疏参数,另一个是稠密参数。由于在工业级的场景中,稀疏参数的规模非常大,达到10^9数量级。因此在一台机器上启动大规模稀疏参数预测是不实际的,因此我们引入百度多年来在稀疏参数索引领域的工业级产品Cube,提供分布式的稀疏参数服务。 在python/examples下有两个关于CTR的示例,他们分别是criteo_ctr, criteo_ctr_with_cube。前者是在训练时保存整个模型,包括稀疏参数。后者是将稀疏参数裁剪出来,保存成两个部分,一个是稀疏参数,另一个是稠密参数。由于在工业级的场景中,稀疏参数的规模非常大,达到10^9数量级。因此在一台机器上启动大规模稀疏参数预测是不实际的,因此我们引入百度多年来在稀疏参数索引领域的工业级产品Cube,提供分布式的稀疏参数服务。
单机版Cube是分布式Cube的弱化版本,旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求,请在读完此文档之后,继续阅读 [稀疏参数索引服务Cube使用指南](分布式Cube)(正在建设中)。 <!--单机版Cube是分布式Cube的弱化版本,旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求,请在读完此文档之后,继续阅读 [稀疏参数索引服务Cube使用指南](分布式Cube)(正在建设中)。-->
本文档使用的都是未经过任何压缩算法处理的原始模型,如果有量化模型上线需求,请阅读[Cube稀疏参数索引量化存储使用指南](./CUBE_QUANT_CN.md) 本文档使用的都是未经过任何压缩算法处理的原始模型,如果有量化模型上线需求,请阅读[Cube稀疏参数索引量化存储使用指南](./CUBE_QUANT_CN.md)
......
...@@ -42,7 +42,7 @@ cd python/examples/criteo_ctr_with_cube ...@@ -42,7 +42,7 @@ cd python/examples/criteo_ctr_with_cube
python local_train.py python local_train.py
cp ../../../build_server/core/predictor/seq_generator seq_generator cp ../../../build_server/core/predictor/seq_generator seq_generator
cp ../../../build_server/output/bin/cube* ./cube/ cp ../../../build_server/output/bin/cube* ./cube/
sh cube_prepare_quant.sh & sh cube_quant_prepare.sh &
python test_server_quant.py ctr_serving_model_kv & python test_server_quant.py ctr_serving_model_kv &
python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
``` ```
......
...@@ -42,7 +42,7 @@ cd python/examples/criteo_ctr_with_cube ...@@ -42,7 +42,7 @@ cd python/examples/criteo_ctr_with_cube
python local_train.py python local_train.py
cp ../../../build_server/core/predictor/seq_generator seq_generator cp ../../../build_server/core/predictor/seq_generator seq_generator
cp ../../../build_server/output/bin/cube* ./cube/ cp ../../../build_server/output/bin/cube* ./cube/
sh cube_prepare_quant.sh & sh cube_quant_prepare.sh &
python test_server_quant.py ctr_serving_model_kv & python test_server_quant.py ctr_serving_model_kv &
python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
``` ```
......
...@@ -106,7 +106,7 @@ class FluidFamilyCore { ...@@ -106,7 +106,7 @@ class FluidFamilyCore {
![预测服务Service](predict-service.png) ![预测服务Service](predict-service.png)
关于OP之间的依赖关系,以及通过OP组建workflow,可以参考[从零开始写一个预测服务](CREATING.md)的相关章节 关于OP之间的依赖关系,以及通过OP组建workflow,可以参考[从零开始写一个预测服务](https://github.com/PaddlePaddle/Serving/blob/develop/doc/deprecated/CREATING.md)的相关章节
服务端实例透视图 服务端实例透视图
......
# Docker Images
([简体中文](DOCKER_IMAGES_CN.md)|English)
This document maintains a list of docker images provided by Paddle Serving.
## Get docker image
You can get images in two ways:
1. Pull image directly from `hub.baidubce.com ` or `docker.io` through TAG:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
docker pull paddlepaddle/serving:<TAG> # hub.docker.com
```
2. Building image based on dockerfile
Create a new folder and copy Dockerfile to this folder, and run the following command:
```shell
docker build -t <image-name>:<images-tag> .
```
## Image description
Runtime images cannot be used for compilation.
| Description | OS | TAG | Dockerfile |
| :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
| CPU runtime | CentOS7 | latest | [Dockerfile](../tools/Dockerfile) |
| CPU development | CentOS7 | latest-devel | [Dockerfile.devel](../tools/Dockerfile.devel) |
| GPU (cuda9.0-cudnn7) runtime | CentOS7 | latest-cuda9.0-cudnn7 | [Dockerfile.cuda9.0-cudnn7](../tools/Dockerfile.cuda9.0-cudnn7) |
| GPU (cuda9.0-cudnn7) development | CentOS7 | latest-cuda9.0-cudnn7-devel | [Dockerfile.cuda9.0-cudnn7.devel](../tools/Dockerfile.cuda9.0-cudnn7.devel) |
| GPU (cuda10.0-cudnn7) runtime | CentOS7 | latest-cuda10.0-cudnn7 | [Dockerfile.cuda10.0-cudnn7](../tools/Dockerfile.cuda10.0-cudnn7) |
| GPU (cuda10.0-cudnn7) development | CentOS7 | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
| CPU development (Used to compile packages on Ubuntu) | CentOS6 | <None> | [Dockerfile.centos6.devel](../tools/Dockerfile.centos6.devel) |
| GPU (cuda9.0-cudnn7) development (Used to compile packages on Ubuntu) | CentOS6 | <None> | [Dockerfile.centos6.cuda9.0-cudnn7.devel](../tools/Dockerfile.centos6.cuda9.0-cudnn7.devel) |
# Docker 镜像
(简体中文|[English](DOCKER_IMAGES.md))
该文档维护了 Paddle Serving 提供的镜像列表。
## 获取镜像
您可以通过两种方式获取镜像。
1. 通过 TAG 直接从 `hub.baidubce.com ``docker.io` 拉取镜像:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
docker pull paddlepaddle/serving:<TAG> # hub.docker.com
```
2. 基于 Dockerfile 构建镜像
建立新目录,复制对应 Dockerfile 内容到该目录下 Dockerfile 文件。执行
```shell
docker build -t <image-name>:<images-tag> .
```
## 镜像说明
运行时镜像不能用于开发编译。
| 镜像说明 | 操作系统 | TAG | Dockerfile |
| -------------------------------------------------- | -------- | ---------------------------- | ------------------------------------------------------------ |
| CPU 运行镜像 | CentOS7 | latest | [Dockerfile](../tools/Dockerfile) |
| CPU 开发镜像 | CentOS7 | latest-devel | [Dockerfile.devel](../tools/Dockerfile.devel) |
| GPU (cuda9.0-cudnn7) 运行镜像 | CentOS7 | latest-cuda9.0-cudnn7 | [Dockerfile.cuda9.0-cudnn7](../tools/Dockerfile.cuda9.0-cudnn7) |
| GPU (cuda9.0-cudnn7) 开发镜像 | CentOS7 | latest-cuda9.0-cudnn7-devel | [Dockerfile.cuda9.0-cudnn7.devel](../tools/Dockerfile.cuda9.0-cudnn7.devel) |
| GPU (cuda10.0-cudnn7) 运行镜像 | CentOS7 | latest-cuda10.0-cudnn7 | [Dockerfile.cuda10.0-cudnn7](../tools/Dockerfile.cuda10.0-cudnn7) |
| GPU (cuda10.0-cudnn7) 开发镜像 | CentOS7 | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
| CPU 开发镜像 (用于编译 Ubuntu 包) | CentOS6 | <无> | [Dockerfile.centos6.devel](../tools/Dockerfile.centos6.devel) |
| GPU (cuda9.0-cudnn7) 开发镜像 (用于编译 Ubuntu 包) | CentOS6 | <无> | [Dockerfile.centos6.cuda9.0-cudnn7.devel](../tools/Dockerfile.centos6.cuda9.0-cudnn7.devel) |
# FAQ # FAQ
- Q:如何调整RPC服务的等待时间,避免超时?
A:使用set_rpc_timeout_ms设置更长的等待时间,单位为毫秒,默认时间为20秒。
示例: ## 基础知识
```
from paddle_serving_client import Client
client = Client() #### Q: Paddle Serving 、Paddle Inference、PaddleHub Serving三者的区别及联系?
client.load_client_config(sys.argv[1])
client.set_rpc_timeout_ms(100000) **A:** paddle serving是远程服务,即发起预测的设备(手机、浏览器、客户端等)与实际预测的硬件不在一起。 paddle inference是一个library,适合嵌入到一个大系统中保证预测效率,paddle serving调用了paddle inference做远程服务。paddlehub serving可以认为是一个示例,都会使用paddle serving作为统一预测服务入口。如果在web端交互,一般是调用远程服务的形式,可以使用paddle serving的web service搭建。
client.connect(["127.0.0.1:9393"])
``` #### Q: paddle-serving是否支持Int32支持
**A:** 在protobuf定feed_type和fetch_type编号与数据类型对应如下
​ 0-int64
​ 1-float32
​ 2-int32
#### Q: paddle-serving是否支持windows和Linux环境下的多线程调用
**A:** 客户端可以发起多线程访问调用服务端
#### Q: paddle-serving如何修改消息大小限制
**A:** 在server端和client但通过FLAGS_max_body_size来扩大数据量限制,单位为字节,默认为64MB
#### Q: paddle-serving客户端目前支持哪些语言
**A:** java c++ python
#### Q: paddle-serving目前支持哪些协议
**A:** http rpc
## 编译问题
#### Q: 如何使用自己编译的Paddle Serving进行预测?
**A:** 通过pip命令安装自己编译出的whl包,并设置SERVING_BIN环境变量为编译出的serving二进制文件路径。
## 部署问题
#### Q: GPU环境运行Serving报错,GPU count is: 0。
```
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
what():
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::SetDeviceId(int)
2 paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const
3 std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
4 std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(paddle::AnalysisConfig const&)
----------------------
Error Message Summary:
----------------------
InvalidArgumentError: Device id must be less than GPU count, but received id is: 0. GPU count is: 0.
[Hint: Expected id < GetCUDADeviceCount(), but received id:0 >= GetCUDADeviceCount():0.] at (/home/scmbuild/workspaces_cluster.dev/baidu.lib.paddlepaddle/baidu/lib/paddlepaddle/Paddle/paddle/fluid/platform/gpu_info.cc:211)
```
**A:** libcuda.so没有链接成功。首先在机器上找到libcuda.so,ldd检查libnvidia版本与nvidia-smi中版本一致(libnvidia-fatbinaryloader.so.418.39,与NVIDIA-SMI 418.39 Driver Version: 418.39),然后用export导出libcuda.so的路径即可(例如libcuda.so在/usr/lib64/,export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/)
#### Q: 遇到 GPU not found, please check your environment or use cpu version by "pip install paddle_serving_server"
**A:** 检查环境中是否有N卡:ls /dev/ | grep nvidia
#### Q: 目前Paddle Serving支持哪些镜像环境?
**A:** 目前(0.4.0)仅支持CentOS,具体列表查阅[这里](https://github.com/PaddlePaddle/Serving/blob/develop/doc/DOCKER_IMAGES.md)
#### Q: python编译的GCC版本与serving的版本不匹配
**A:**:1)使用[GPU docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md#gpunvidia-docker)解决环境问题
​ 2)修改anaconda的虚拟环境下安装的python的gcc版本[参考](https://www.jianshu.com/p/c498b3d86f77)
#### Q: paddle-serving是否支持本地离线安装
**A:** 支持离线部署,需要把一些相关的[依赖包](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md)提前准备安装好
## 预测问题
#### Q: 使用GPU第一次预测时特别慢,如何调整RPC服务的等待时间避免超时?
**A:** GPU第一次预测需要初始化。使用set_rpc_timeout_ms设置更长的等待时间,单位为毫秒,默认时间为20秒。
示例:
```
from paddle_serving_client import Client
client = Client()
client.load_client_config(sys.argv[1])
client.set_rpc_timeout_ms(100000)
client.connect(["127.0.0.1:9393"])
```
#### Q: 执行GPU预测时遇到InvalidArgumentError: Device id must be less than GPU count, but received id is: 0. GPU count is: 0.
**A:** 将显卡驱动对应的libcuda.so的目录添加到LD_LIBRARY_PATH环境变量中
#### Q: 执行GPU预测时遇到ExternalError: Cudnn error, CUDNN_STATUS_BAD_PARAM at (../batch_norm_op.cu:198)
**A:** 将cudnn的lib64路径添加到LD_LIBRARY_PATH,安装自pypi的Paddle Serving中post9版使用的是cudnn 7.3,post10使用的是cudnn 7.5。如果是使用自己编译的Paddle Serving,可以在log/serving.INFO日志文件中查看对应的cudnn版本。
#### Q: 执行GPU预测时遇到Error: Failed to find dynamic library: libcublas.so
**A:** 将cuda的lib64路径添加到LD_LIBRARY_PATH, post9版本的Paddle Serving使用的是cuda 9.0,post10版本使用的cuda 10.0。
#### Q: Client端fetch的变量名如何设置
**A:** 可以查看配置文件serving_server_conf.prototxt,获取需要的变量名
#### Q: 如何使用多语言客户端
**A:** 多语言客户端要与多语言服务端配套使用。当前版本下(0.4.0),服务端需要将Server改为MultiLangServer(如果是以命令行启动的话只需要添加--use_multilang参数),Python客户端需要将Client改为MultiLangClient,同时去除load_client_config的过程。[Java客户端参考文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/JAVA_SDK_CN.md)
#### Q: 如何在Windows下使用Paddle Serving
**A:** 当前版本(0.4.0)在Windows上可以运行多语言RPC客户端,或使用HTTP方式访问。如果使用多语言RPC客户端,需要在Linux环境(比如本机容器,或远程Linux机器)中运行多语言服务端;如果使用HTTP方式,需要在Linux环境中运行普通服务端
#### Q: libnvinfer.so: cannot open shared object file: No such file or directory)
**A:** 参考该文档安装TensorRT: https://blog.csdn.net/hesongzefairy/article/details/105343525
## 日志排查
#### Q: 部署和预测中的日志信息在哪里查看?
**A:** server端的日志分为两部分,一部分打印到标准输出,一部分打印到启动服务时的目录下的log/serving.INFO文件中。
client端的日志直接打印到标准输出。
通过在部署服务之前 'export GLOG_v=3'可以输出更为详细的日志信息。
#### Q: paddle-serving启动成功后,相关的日志在哪里设置
**A:** 1)警告是glog组件打印的,告知glog初始化之前日志打印在STDERR
​ 2)一般采用GLOG_v方式启动服务同时设置日志级别。
例如:
```
GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999
```
#### Q: (GLOG_v=2下)Server端日志一切正常,但Client端始终得不到正确的预测结果
**A:** 可能是配置文件有问题,检查下配置文件(is_load_tensor,fetch_type等有没有问题)
#### Q: 如何给Server传递Logid
**A:** Logid默认为0(后续应该有自动生成Logid的计划,当前版本0.4.0),Client端通过在predict函数中指定log_id参数传递
## 性能优化
# gRPC接口
gRPC 接口实现形式类似 Web Service:
![](grpc_impl.png)
## 与bRPC接口对比
1. gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数:
```python
def load_model_config(self, server_config_paths, client_config_path=None)
```
在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能是不同的(如 cube local 例子中,Client 端的数据先交给 cube,经过 cube 处理后再交给预测库),所以 gRPC Server 端需要获取 gRPC Client 端的配置;同时为了取消 gRPC Client 端手动加载配置文件的过程,所以设计 gRPC Server 端同时加载两个配置文件。`client_config_path` 默认为 `<server_config_path>/serving_server_conf.prototxt`
2. gRPC Client 端取消 `load_client_config` 步骤:
`connect` 步骤通过 RPC 获取相应的 prototxt(从任意一个 endpoint 获取即可)。
3. gRPC Client 需要通过 RPC 方式设置 timeout 时间(调用形式与 bRPC Client保持一致)
因为 bRPC Client 在 `connect` 后无法更改 timeout 时间,所以当 gRPC Server 收到变更 timeout 的调用请求时会重新创建 bRPC Client 实例以变更 bRPC Client timeout时间,同时 gRPC Client 会设置 gRPC 的 deadline 时间。
**注意,设置 timeout 接口和 Inference 接口不能同时调用(非线程安全),出于性能考虑暂时不加锁。**
4. gRPC Client 端 `predict` 函数添加 `asyn``is_python` 参数:
```python
def predict(self, feed, fetch, need_variant_tag=False, asyn=False, is_python=True)
```
其中,`asyn` 为异步调用选项。当 `asyn=True` 时为异步调用,返回 `MultiLangPredictFuture` 对象,通过 `MultiLangPredictFuture.result()` 阻塞获取预测值;当 `asyn=Fasle` 为同步调用。
`is_python` 为 proto 格式选项。当 `is_python=True` 时,基于 numpy bytes 格式进行数据传输,目前只适用于 Python;当 `is_python=False` 时,以普通数据格式传输,更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多(详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654))。
5. 异常处理:当 gRPC Server 端的 bRPC Client 预测失败(返回 `None`)时,gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获,并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常(参考 timeout 样例)。
6. 由于 gRPC 只支持 pick_first 和 round_robin 负载均衡策略,ABTEST 特性还未打齐。
7. 经测试,gRPC 版本可以在 Windows、macOS 平台使用。
8. 计划支持的客户端语言:
- [x] Python
- [ ] Java
- [ ] Go
- [ ] JavaScript
## Python 端的一些例子
详见 `python/examples/grpc_impl_example` 下的示例文件。
...@@ -2,6 +2,20 @@ ...@@ -2,6 +2,20 @@
([简体中文](./INFERENCE_TO_SERVING_CN.md)|English) ([简体中文](./INFERENCE_TO_SERVING_CN.md)|English)
We should know something before converting to serving model
**inference_model_dir**:the directory of Paddle inference model
**serving_client_dir**: the directory of server side configuration
**serving_client_dir**: the directory of client side configuration
**model_filename**: this is model description file whose default value is `__model__`, if it's not default name, set `model_filename` explicitly
**params_filename**: during `save_inference_model` every Variable will be save as a single file. If we have the inference model whose params are compressed into one file, please set `params_filename` explicitly
## Example ## Example
``` python ``` python
...@@ -12,3 +26,11 @@ serving_server_dir = "serving_server_dir" ...@@ -12,3 +26,11 @@ serving_server_dir = "serving_server_dir"
feed_var_names, fetch_var_names = inference_model_to_serving( feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir) inference_model_dir, serving_client_dir, serving_server_dir)
``` ```
if your model file and params file are both standalone, please use the following api.
```
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir,
model_filename="model", params_filename="params")
```
...@@ -4,6 +4,19 @@ ...@@ -4,6 +4,19 @@
## 示例 ## 示例
在下列代码中,我们需要知道以下信息。
**模型文件夹**:这个文件夹就是Paddle的inference_model所在的文件夹
**serving_client_dir**: 这个文件夹是inference_model转换成Serving模型后,服务端配置的保存路径
**serving_client_dir**: 这个文件夹是inference_model转换成Serving模型后,客户端配置的保存路径
**模型描述文件**: 模型描述文件也就是`model_filename`默认值为`__model__`,是一个pb2文本文件,如果是别的文件名需要显式指定
**模型参数文件**: 在`save_inference_model`阶段,默认方式是每一个Variable保存一个二进制文件,如果是这种情况就不需要做指定。如果所有参数用压缩成一个文件的形式保存,则需要显式指定`params_filename`
``` python ``` python
from paddle_serving_client.io import inference_model_to_serving from paddle_serving_client.io import inference_model_to_serving
inference_model_dir = "your_inference_model" inference_model_dir = "your_inference_model"
...@@ -12,3 +25,9 @@ serving_server_dir = "serving_server_dir" ...@@ -12,3 +25,9 @@ serving_server_dir = "serving_server_dir"
feed_var_names, fetch_var_names = inference_model_to_serving( feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir) inference_model_dir, serving_client_dir, serving_server_dir)
``` ```
如果模型中有模型描述文件`model_filename` 和 模型参数文件`params_filename`,那么请用
```
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir,
model_filename="model", params_filename="params")
```
# Paddle Serving Client Java SDK
([简体中文](JAVA_SDK_CN.md)|English)
Paddle Serving provides Java SDK,which supports predict on the Client side with Java language. This document shows how to use the Java SDK.
## Getting started
### Prerequisites
```
- Java 8 or higher
- Apache Maven
```
The following table shows compatibilities between Paddle Serving Server and Java SDK.
| Paddle Serving Server version | Java SDK version |
| :---------------------------: | :--------------: |
| 0.3.2 | 0.0.1 |
### Install Java SDK
You can download jar and install it to the local Maven repository:
```shell
wget https://paddle-serving.bj.bcebos.com/jar/paddle-serving-sdk-java-0.0.1.jar
mvn install:install-file -Dfile=$PWD/paddle-serving-sdk-java-0.0.1.jar -DgroupId=io.paddle.serving.client -DartifactId=paddle-serving-sdk-java -Dversion=0.0.1 -Dpackaging=jar
```
Or compile from the source code and install it to the local Maven repository:
```shell
cd Serving/java
mvn compile
mvn install
```
### Maven configure
```text
<dependency>
<groupId>io.paddle.serving.client</groupId>
<artifactId>paddle-serving-sdk-java</artifactId>
<version>0.0.1</version>
</dependency>
```
## Example
Here we will show how to use Java SDK for Boston house price prediction. Please refer to [examples](../java/examples) folder for more examples.
### Get model
```shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
### Start Python Server
```shell
python -m paddle_serving_server.serve --model uci_housing_model --port 9393 --use_multilang
```
#### Client side code example
```java
import io.paddle.serving.client.*;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import java.util.*;
public class PaddleServingClientExample {
public static void main( String[] args ) {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return ;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
System.out.println("predict failed.");
return ;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return ;
}
}
```
# Paddle Serving Client Java SDK
(简体中文|[English](JAVA_SDK.md))
Paddle Serving 提供了 Java SDK,支持 Client 端用 Java 语言进行预测,本文档说明了如何使用 Java SDK。
## 快速开始
### 环境要求
```
- Java 8 or higher
- Apache Maven
```
下表显示了 Paddle Serving Server 和 Java SDK 之间的兼容性
| Paddle Serving Server version | Java SDK version |
| :---------------------------: | :--------------: |
| 0.3.2 | 0.0.1 |
### 安装
您可以直接下载 jar,安装到本地 Maven 库:
```shell
wget https://paddle-serving.bj.bcebos.com/jar/paddle-serving-sdk-java-0.0.1.jar
mvn install:install-file -Dfile=$PWD/paddle-serving-sdk-java-0.0.1.jar -DgroupId=io.paddle.serving.client -DartifactId=paddle-serving-sdk-java -Dversion=0.0.1 -Dpackaging=jar
```
或者从源码进行编译,安装到本地 Maven 库:
```shell
cd Serving/java
mvn compile
mvn install
```
### Maven 配置
```text
<dependency>
<groupId>io.paddle.serving.client</groupId>
<artifactId>paddle-serving-sdk-java</artifactId>
<version>0.0.1</version>
</dependency>
```
## 使用样例
这里将展示如何使用 Java SDK 进行房价预测,更多例子详见 [examples](../java/examples) 文件夹。
### 获取房价预测模型
```shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
### 启动 Python 端 Server
```shell
python -m paddle_serving_server.serve --model uci_housing_model --port 9393 --use_multilang
```
### Client 端代码示例
```java
import io.paddle.serving.client.*;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import java.util.*;
public class PaddleServingClientExample {
public static void main( String[] args ) {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return ;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
System.out.println("predict failed.");
return ;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return ;
}
}
```
...@@ -3,45 +3,59 @@ ...@@ -3,45 +3,59 @@
## CPU server ## CPU server
### Python 3 ### Python 3
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.0-py3-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.0.0-py3-none-any.whl
``` ```
### Python 2 ### Python 2
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.0-py2-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.0.0-py2-none-any.whl
``` ```
## GPU server ## GPU server
### Python 3 ### Python 3
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.0-py3-none-any.whl #cuda 9.0
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post9-py3-none-any.whl
#cuda 10.0
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post10-py3-none-any.whl
#cuda10.1 with TensorRT 6
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.trt-py3-none-any.whl
``` ```
### Python 2 ### Python 2
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.0-py2-none-any.whl #cuda 9.0
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post9-py2-none-any.whl
#cuda 10.0
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post10-py2-none-any.whl
##cuda10.1 with TensorRT 6
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.trt-py2-none-any.whl
``` ```
## Client ## Client
### Python 3.7 ### Python 3.7
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp37-none-manylinux1_x86_64.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
``` ```
### Python 3.6 ### Python 3.6
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp36-none-manylinux1_x86_64.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp36-none-any.whl
```
### Python 3.5
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp35-none-any.whl
``` ```
### Python 2.7 ### Python 2.7
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp27-none-manylinux1_x86_64.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp27-none-any.whl
``` ```
## App ## App
### Python 3 ### Python 3
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.0-py3-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.0.0-py3-none-any.whl
``` ```
### Python 2 ### Python 2
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.0-py2-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.0.0-py2-none-any.whl
``` ```
# How to develop a new Web service? # How to develop a new Web service?
([简体中文](NEW_WEB_SERVICE_CN.md)|English) ([简体中文](NEW_WEB_SERVICE_CN.md)|English)
This document will take the image classification service based on the Imagenet data set as an example to introduce how to develop a new web service. The complete code can be visited at [here](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imagenet/image_classification_service.py). This document will take Uci service as an example to introduce how to develop a new Web Service. You can check out the complete code [here](../python/examples/pipeline/simple_web_service/web_service.py).
## WebService base class ## Op base class
Paddle Serving implements the [WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23) base class. You need to override its `preprocess` and `postprocess` method. The default implementation is as follows: In some services, a single model may not meet business needs, requiring multiple models to be concatenated or parallel to complete the entire service. We call a single model operation Op and provide a simple set of interfaces to implement the complex logic of Op concatenation or parallelism.
```python Data between Ops is passed as a dictionary, Op can be started as threads or process, and Op can be configured for the number of concurrencies, etc.
class WebService(object):
Typically, you need to inherit the Op base class and override its `init_op`, `preprocess` and `postprocess` methods, which are implemented by default as follows:
def preprocess(self, feed={}, fetch=[]): ```python
return feed, fetch class Op(object):
def postprocess(self, feed={}, fetch=[], fetch_map=None): def init_op(self):
return fetch_map pass
def preprocess(self, input_dicts):
# multiple previous Op
if len(input_dicts) != 1:
_LOGGER.critical(
"Failed to run preprocess: this Op has multiple previous "
"inputs. Please override this func.")
os._exit(-1)
(_, input_dict), = input_dicts.items()
return input_dict
def postprocess(self, input_dicts, fetch_dict):
return fetch_dict
``` ```
### init_op
This method is used to load user-defined resources such as dictionaries. A separator is loaded in the [UciOp](../python/examples/pipeline/simple_web_service/web_service.py).
**Note**: If Op is launched in threaded mode, different threads of the same Op execute `init_op` only once and share `init_op` loaded resources when Op is multi-concurrent.
### preprocess ### preprocess
The preprocess method has two input parameters, `feed` and `fetch`. For an HTTP request `request`: This method is used to preprocess the data before model prediction. It has an `input_dicts` parameter, `input_dicts` is a dictionary, key is the `name` of the previous Op, and value is the data transferred from the corresponding previous op (the data is also in dictionary format).
- The value of `feed` is the feed part `request.json["feed"]` in the request data The `preprocess` method needs to process the data into a ndarray dictionary (key is the feed variable name, and value is the corresponding ndarray value). Op will take the return value as the input of the model prediction and pass the output to the `postprocess` method.
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
The return values are the feed and fetch values used in the prediction. **Note**: if Op does not have a model configuration file, the return value of `preprocess` will be directly passed to `postprocess`.
### postprocess ### postprocess
The postprocess method has three input parameters, `feed`, `fetch` and `fetch_map`: This method is used for data post-processing after model prediction. It has two parameters, `input_dicts` and `fetch_dict`.
Where the `input_dicts` parameter is consistent with the parameter in `preprocess` method, and `fetch_dict` is the output of the model prediction (key is the name of the fetch variable, and value is the corresponding ndarray value). Op will take the return value of `postprocess` as the input of subsequent Op `preprocess`.
**Note**: if Op does not have a model configuration file, `fetch_dict` will be the return value of `preprocess`.
Here is the op of the UCI example:
```python
class UciOp(Op):
def init_op(self):
self.separator = ","
def preprocess(self, input_dicts):
(_, input_dict), = input_dicts.items()
x_value = input_dict["x"]
if isinstance(x_value, (str, unicode)):
input_dict["x"] = np.array(
[float(x.strip()) for x in x_value.split(self.separator)])
return input_dict
def postprocess(self, input_dicts, fetch_dict):
fetch_dict["price"] = str(fetch_dict["price"][0][0])
return fetch_dict
```
- The value of `feed` is the feed part `request.json["feed"]` in the request data
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
- The value of `fetch_map` is the model output value.
The return value will be processed as `{"reslut": fetch_map}` as the return of the HTTP request. ## WebService base class
Paddle Serving implements the [WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23) base class. You need to override its `get_pipeline_response` method to define the topological relationship between Ops. The default implementation is as follows:
```python
class WebService(object):
def get_pipeline_response(self, read_op):
return None
```
Where `read_op` serves as the entry point of the topology map of the whole service (that is, the first op defined by the user is followed by `read_op`).
For single Op service (single model), take Uci service as an example (there is only one Uci prediction model in the whole service):
```python
class UciService(WebService):
def get_pipeline_response(self, read_op):
uci_op = UciOp(name="uci", input_ops=[read_op])
return uci_op
```
## Develop ImageService class For multiple Op services (multiple models), take Ocr service as an example (the whole service is completed in series by Det model and Rec model):
```python ```python
class ImageService(WebService): class OcrService(WebService):
def get_pipeline_response(self, read_op):
def preprocess(self, feed={}, fetch=[]): det_op = DetOp(name="det", input_ops=[read_op])
reader = ImageReader() rec_op = RecOp(name="rec", input_ops=[det_op])
feed_batch = [] return rec_op
for ins in feed: ```
if "image" not in ins:
raise ("feed data error!")
sample = base64.b64decode(ins["image"])
img = reader.process_image(sample) WebService objects need to load a yaml configuration file through the `prepare_pipeline_config` to configure each Op and the entire service. The simplest configuration file is as follows (Uci example):
feed_batch.append({"image": img})
return feed_batch, fetch ```yaml
http_port: 18080
op:
uci:
local_service_conf:
model_config: uci_housing_model # path
```
All field names of yaml file are as follows:
```yaml
rpc_port: 18080 # gRPC port
build_dag_each_worker: false # Whether to use process server or not. The default is false
worker_num: 1 # gRPC thread pool size (the number of processes in the process version servicer). The default is 1
http_port: 0 # HTTP service port. Do not start HTTP service when the value is less or equals 0. The default value is 0.
dag:
is_thread_op: true # Whether to use the thread version of OP. The default is true
client_type: brpc # Use brpc or grpc client. The default is brpc
retry: 1 # The number of times DAG executor retries after failure. The default value is 1, that is, no retrying
use_profile: false # Whether to print the log on the server side. The default is false
tracer:
interval_s: -1 # Monitoring time interval of Tracer (in seconds). Do not start monitoring when the value is less than 1. The default value is -1
op:
<op_name>: # op name, corresponding to the one defined in the program
concurrency: 1 # op concurrency number, the default is 1
timeout: -1 # predict timeout in milliseconds. The default value is -1, that is, no timeout
retry: 1 # timeout retransmissions. The default value is 1, that is, do not try again
batch_size: 1 # If this field is set, Op will merge multiple request outputs into a single batch
auto_batching_timeout: -1 # auto-batching timeout in milliseconds. The default value is -1, that is, no timeout
local_service_conf:
model_config: # the path of the corresponding model file. There is no default value(None). If this item is not configured, the model file will not be loaded.
workdir: "" # working directory of corresponding model
thread_num: 2 # the corresponding model is started with thread_num threads
devices: "" # on which device does the model launched. You can specify the GPU card number(such as "0,1,2"), which is CPU by default
mem_optim: true # mem optimization option, the default is true
ir_optim: false # ir optimization option, the default is false
``` ```
For the above `ImageService`, only the `preprocess` method is rewritten to process the image data in Base64 format into the data format required by prediction. All fields of Op can be defined when Op is created in the program (which will override yaml fields).
此差异已折叠。
此差异已折叠。
...@@ -14,7 +14,33 @@ ...@@ -14,7 +14,33 @@
性能优化相关参数: 性能优化相关参数:
Paddle Serving中默认开启内存/显存优化选项,可以减少对内存/显存的占用,通常不会对性能造成影响,如果需要关闭可以在命令行启动模式中使用--mem_optim_off。
ir_optim可以优化计算图,提升推理速度,默认关闭,在命令行启动的模式中通过--ir_optim开启。
| 参数 | 类型 | 默认值 | 含义 | | 参数 | 类型 | 默认值 | 含义 |
| --------- | ---- | ------ | -------------------------------- | | --------- | ---- | ------ | -------------------------------- |
| mem_optim | - | - | 开启内存/显存优化 | | mem_optim_off | - | - | 关闭内存/显存优化 |
| ir_optim | - | - | 开启计算图分析优化,包括OP融合等 | | ir_optim | - | - | 开启计算图分析优化,包括OP融合等 |
对于使用Python代码启动预测服务的模式,以上两个参数的接口如下:
RPC服务
```
from paddle_serving_server import Server
server = Server()
...
server.set_memory_optimize(mem_optim)
server.set_ir_optimize(ir_optim)
...
```
HTTP服务
```
from paddle_serving_server import WebService
class NewService(WebService):
...
new_service = NewService(name="new")
...
new_service.prepare_server(mem_optim=True, ir_optim=False)
...
```
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册