“fa2e9907823c0e4b5b11de0bab9fd484c86526a3”上不存在“release_doc/v0.9.0a0/doc/source/gserver/layers/index.html”
提交 b34831ff 编写于 作者: M MRXLT

Merge remote-tracking branch 'upstream/develop' into release/0.3

...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
<p> <p>
<p align="center"> <p align="center">
<br> <br>
<a href="https://travis-ci.com/PaddlePaddle/Serving"> <a href="https://travis-ci.com/PaddlePaddle/Serving">
...@@ -29,7 +30,7 @@ We consider deploying deep learning inference service online to be a user-facing ...@@ -29,7 +30,7 @@ We consider deploying deep learning inference service online to be a user-facing
<h2 align="center">Installation</h2> <h2 align="center">Installation</h2>
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md) We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
``` ```
# Run CPU Docker # Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:latest docker pull hub.baidubce.com/paddlepaddle/serving:latest
...@@ -38,8 +39,8 @@ docker exec -it test bash ...@@ -38,8 +39,8 @@ docker exec -it test bash
``` ```
``` ```
# Run GPU Docker # Run GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker exec -it test bash nvidia-docker exec -it test bash
``` ```
...@@ -53,11 +54,23 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua ...@@ -53,11 +54,23 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command. If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use HTTP service without install client. Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7 and Ubuntu 16/18.
Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.6/3.7.
Recommended to install paddle >= 1.8.2.
<h2 align="center"> Pre-built services with Paddle Serving</h2> <h2 align="center"> Pre-built services with Paddle Serving</h2>
<h3 align="center">Latest release</h4>
<p align="center">
<a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/ocr">Optical Character Recognition</a>
<br>
<a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/faster_rcnn_model">Object Detection</a>
<br>
<a href="https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/deeplabv3">Image Segmentation</a>
<p>
<h3 align="center">Chinese Word Segmentation</h4> <h3 align="center">Chinese Word Segmentation</h4>
``` shell ``` shell
...@@ -75,7 +88,7 @@ Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use H ...@@ -75,7 +88,7 @@ Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use H
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200"> <img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br> <br>
<p> <p>
``` shell ``` shell
> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet > python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz > tar -xzf resnet_v2_50_imagenet.tar.gz
...@@ -111,7 +124,7 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -111,7 +124,7 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `port` | int | `9292` | Exposed port of current service to users| | `port` | int | `9292` | Exposed port of current service to users|
| `name` | str | `""` | Service name, can be used to generate HTTP request url | | `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served | | `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim` | - | - | Enable memory / graphic memory optimization | | `mem_optim_off` | - | - | Disable memory / graphic memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph | | `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
...@@ -184,6 +197,7 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit ...@@ -184,6 +197,7 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit
<h2 align="center">Community</h2> <h2 align="center">Community</h2>
### Slack ### Slack
To connect with other users and contributors, welcome to join our [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ) To connect with other users and contributors, welcome to join our [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
......
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
<p> <p>
<p align="center"> <p align="center">
<br> <br>
<a href="https://travis-ci.com/PaddlePaddle/Serving"> <a href="https://travis-ci.com/PaddlePaddle/Serving">
...@@ -31,7 +32,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务 ...@@ -31,7 +32,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
<h2 align="center">安装</h2> <h2 align="center">安装</h2>
**强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md) **强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)。更多镜像请查看[Docker镜像列表](doc/DOCKER_IMAGES_CN.md)
``` ```
# 启动 CPU Docker # 启动 CPU Docker
...@@ -41,8 +42,8 @@ docker exec -it test bash ...@@ -41,8 +42,8 @@ docker exec -it test bash
``` ```
``` ```
# 启动 GPU Docker # 启动 GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker exec -it test bash nvidia-docker exec -it test bash
``` ```
```shell ```shell
...@@ -55,7 +56,11 @@ pip install paddle-serving-server-gpu # GPU ...@@ -55,7 +56,11 @@ pip install paddle-serving-server-gpu # GPU
如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。 如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。
Paddle Serving安装包支持Centos 6/7和Ubuntu 16/18,或者您可以使用HTTP服务,这种情况下不需要安装客户端。 paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7和Ubuntu 16/18。
paddle-serving-client和paddle-serving-app安装包支持Linux和Windows,其中paddle-serving-client仅支持python2.7/3.5/3.6。
推荐安装1.8.2及以上版本的paddle
<h2 align="center"> Paddle Serving预装的服务 </h2> <h2 align="center"> Paddle Serving预装的服务 </h2>
...@@ -76,7 +81,7 @@ Paddle Serving安装包支持Centos 6/7和Ubuntu 16/18,或者您可以使用HT ...@@ -76,7 +81,7 @@ Paddle Serving安装包支持Centos 6/7和Ubuntu 16/18,或者您可以使用HT
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200"> <img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br> <br>
<p> <p>
``` shell ``` shell
> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet > python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz > tar -xzf resnet_v2_50_imagenet.tar.gz
...@@ -115,7 +120,7 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -115,7 +120,7 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `port` | int | `9292` | Exposed port of current service to users| | `port` | int | `9292` | Exposed port of current service to users|
| `name` | str | `""` | Service name, can be used to generate HTTP request url | | `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served | | `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim` | - | - | Enable memory optimization | | `mem_optim_off` | - | - | Disable memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph | | `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
......
...@@ -40,8 +40,8 @@ ExternalProject_Add( ...@@ -40,8 +40,8 @@ ExternalProject_Add(
extern_brpc extern_brpc
${EXTERNAL_PROJECT_LOG_ARGS} ${EXTERNAL_PROJECT_LOG_ARGS}
# TODO(gongwb): change to de newst repo when they changed. # TODO(gongwb): change to de newst repo when they changed.
GIT_REPOSITORY "https://github.com/gongweibao/brpc" GIT_REPOSITORY "https://github.com/wangjiawei04/brpc"
GIT_TAG "e9b67ec1b7458f2af5fae76451afe1e27e01b4b4" GIT_TAG "6d79e0b17f25107c35b705ea58d888083f59ff47"
PREFIX ${BRPC_SOURCES_DIR} PREFIX ${BRPC_SOURCES_DIR}
UPDATE_COMMAND "" UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER} CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
......
...@@ -86,6 +86,63 @@ function(protobuf_generate_python SRCS) ...@@ -86,6 +86,63 @@ function(protobuf_generate_python SRCS)
set(${SRCS} ${${SRCS}} PARENT_SCOPE) set(${SRCS} ${${SRCS}} PARENT_SCOPE)
endfunction() endfunction()
function(grpc_protobuf_generate_python SRCS)
# shameless copy from https://github.com/Kitware/CMake/blob/master/Modules/FindProtobuf.cmake
if(NOT ARGN)
message(SEND_ERROR "Error: GRPC_PROTOBUF_GENERATE_PYTHON() called without any proto files")
return()
endif()
if(PROTOBUF_GENERATE_CPP_APPEND_PATH)
# Create an include path for each file specified
foreach(FIL ${ARGN})
get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
get_filename_component(ABS_PATH ${ABS_FIL} PATH)
list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
if(${_contains_already} EQUAL -1)
list(APPEND _protobuf_include_path -I ${ABS_PATH})
endif()
endforeach()
else()
set(_protobuf_include_path -I ${CMAKE_CURRENT_SOURCE_DIR})
endif()
if(DEFINED PROTOBUF_IMPORT_DIRS AND NOT DEFINED Protobuf_IMPORT_DIRS)
set(Protobuf_IMPORT_DIRS "${PROTOBUF_IMPORT_DIRS}")
endif()
if(DEFINED Protobuf_IMPORT_DIRS)
foreach(DIR ${Protobuf_IMPORT_DIRS})
get_filename_component(ABS_PATH ${DIR} ABSOLUTE)
list(FIND _protobuf_include_path ${ABS_PATH} _contains_already)
if(${_contains_already} EQUAL -1)
list(APPEND _protobuf_include_path -I ${ABS_PATH})
endif()
endforeach()
endif()
set(${SRCS})
foreach(FIL ${ARGN})
get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
get_filename_component(FIL_WE ${FIL} NAME_WE)
if(NOT PROTOBUF_GENERATE_CPP_APPEND_PATH)
get_filename_component(FIL_DIR ${FIL} DIRECTORY)
if(FIL_DIR)
set(FIL_WE "${FIL_DIR}/${FIL_WE}")
endif()
endif()
list(APPEND ${SRCS} "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2_grpc.py")
add_custom_command(
OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2_grpc.py"
COMMAND ${PYTHON_EXECUTABLE} -m grpc_tools.protoc --python_out ${CMAKE_CURRENT_BINARY_DIR} --grpc_python_out ${CMAKE_CURRENT_BINARY_DIR} ${_protobuf_include_path} ${ABS_FIL}
DEPENDS ${ABS_FIL}
COMMENT "Running Python grpc protocol buffer compiler on ${FIL}"
VERBATIM )
endforeach()
set(${SRCS} ${${SRCS}} PARENT_SCOPE)
endfunction()
# Print and set the protobuf library information, # Print and set the protobuf library information,
# finish this cmake process and exit from this file. # finish this cmake process and exit from this file.
macro(PROMPT_PROTOBUF_LIB) macro(PROMPT_PROTOBUF_LIB)
......
...@@ -704,6 +704,15 @@ function(py_proto_compile TARGET_NAME) ...@@ -704,6 +704,15 @@ function(py_proto_compile TARGET_NAME)
add_custom_target(${TARGET_NAME} ALL DEPENDS ${py_srcs}) add_custom_target(${TARGET_NAME} ALL DEPENDS ${py_srcs})
endfunction() endfunction()
function(py_grpc_proto_compile TARGET_NAME)
set(oneValueArgs "")
set(multiValueArgs SRCS)
cmake_parse_arguments(py_grpc_proto_compile "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
set(py_srcs)
grpc_protobuf_generate_python(py_srcs ${py_grpc_proto_compile_SRCS})
add_custom_target(${TARGET_NAME} ALL DEPENDS ${py_srcs})
endfunction()
function(py_test TARGET_NAME) function(py_test TARGET_NAME)
if(WITH_TESTING) if(WITH_TESTING)
set(options "") set(options "")
......
...@@ -35,6 +35,10 @@ py_proto_compile(general_model_config_py_proto SRCS proto/general_model_config.p ...@@ -35,6 +35,10 @@ py_proto_compile(general_model_config_py_proto SRCS proto/general_model_config.p
add_custom_target(general_model_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py) add_custom_target(general_model_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
add_dependencies(general_model_config_py_proto general_model_config_py_proto_init) add_dependencies(general_model_config_py_proto general_model_config_py_proto_init)
py_grpc_proto_compile(multi_lang_general_model_service_py_proto SRCS proto/multi_lang_general_model_service.proto)
add_custom_target(multi_lang_general_model_service_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
add_dependencies(multi_lang_general_model_service_py_proto multi_lang_general_model_service_py_proto_init)
if (CLIENT) if (CLIENT)
py_proto_compile(sdk_configure_py_proto SRCS proto/sdk_configure.proto) py_proto_compile(sdk_configure_py_proto SRCS proto/sdk_configure.proto)
add_custom_target(sdk_configure_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py) add_custom_target(sdk_configure_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
...@@ -51,6 +55,11 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD ...@@ -51,6 +55,11 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
COMMENT "Copy generated general_model_config proto file into directory paddle_serving_client/proto." COMMENT "Copy generated general_model_config proto file into directory paddle_serving_client/proto."
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto
COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_client/proto."
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
endif() endif()
if (APP) if (APP)
...@@ -77,6 +86,12 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD ...@@ -77,6 +86,12 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
COMMENT "Copy generated general_model_config proto file into directory paddle_serving_server/proto." COMMENT "Copy generated general_model_config proto file into directory paddle_serving_server/proto."
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto
COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_server/proto."
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
else() else()
add_custom_command(TARGET server_config_py_proto POST_BUILD add_custom_command(TARGET server_config_py_proto POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory COMMAND ${CMAKE_COMMAND} -E make_directory
...@@ -95,5 +110,11 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD ...@@ -95,5 +110,11 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
COMMENT "Copy generated general_model_config proto file into directory COMMENT "Copy generated general_model_config proto file into directory
paddle_serving_server_gpu/proto." paddle_serving_server_gpu/proto."
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
add_custom_command(TARGET multi_lang_general_model_service_py_proto POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto
COMMENT "Copy generated multi_lang_general_model_service proto file into directory paddle_serving_server_gpu/proto."
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
endif() endif()
endif() endif()
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
option java_multiple_files = true;
option java_package = "io.paddle.serving.grpc";
option java_outer_classname = "ServingProto";
message Tensor {
optional bytes data = 1;
repeated int32 int_data = 2;
repeated int64 int64_data = 3;
repeated float float_data = 4;
optional int32 elem_type = 5;
repeated int32 shape = 6;
repeated int32 lod = 7; // only for fetch tensor currently
};
message FeedInst { repeated Tensor tensor_array = 1; };
message FetchInst { repeated Tensor tensor_array = 1; };
message InferenceRequest {
repeated FeedInst insts = 1;
repeated string feed_var_names = 2;
repeated string fetch_var_names = 3;
required bool is_python = 4 [ default = false ];
};
message InferenceResponse {
repeated ModelOutput outputs = 1;
optional string tag = 2;
required int32 err_code = 3;
};
message ModelOutput {
repeated FetchInst insts = 1;
optional string engine_name = 2;
}
message SetTimeoutRequest { required int32 timeout_ms = 1; }
message SimpleResponse { required int32 err_code = 1; }
message GetClientConfigRequest {}
message GetClientConfigResponse { required string client_config_str = 1; }
service MultiLangGeneralModelService {
rpc Inference(InferenceRequest) returns (InferenceResponse) {}
rpc SetTimeout(SetTimeoutRequest) returns (SimpleResponse) {}
rpc GetClientConfig(GetClientConfigRequest)
returns (GetClientConfigResponse) {}
};
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
// limitations under the License. // limitations under the License.
#include <gflags/gflags.h> #include <gflags/gflags.h>
#include <algorithm>
#include <atomic> #include <atomic>
#include <fstream> #include <fstream>
#include <thread> //NOLINT #include <thread> //NOLINT
...@@ -31,8 +32,9 @@ DEFINE_bool(print_output, false, "print output flag"); ...@@ -31,8 +32,9 @@ DEFINE_bool(print_output, false, "print output flag");
DEFINE_int32(thread_num, 1, "thread num"); DEFINE_int32(thread_num, 1, "thread num");
std::atomic<int> g_concurrency(0); std::atomic<int> g_concurrency(0);
std::vector<uint64_t> time_list; std::vector<std::vector<uint64_t>> time_list;
std::vector<uint64_t> request_list; std::vector<uint64_t> request_list;
int turns = 1000;
namespace { namespace {
inline uint64_t time_diff(const struct timeval& start_time, inline uint64_t time_diff(const struct timeval& start_time,
...@@ -93,14 +95,15 @@ int run(int argc, char** argv, int thread_id) { ...@@ -93,14 +95,15 @@ int run(int argc, char** argv, int thread_id) {
uint64_t file_size = key_list.size(); uint64_t file_size = key_list.size();
uint64_t index = 0; uint64_t index = 0;
uint64_t request = 0; uint64_t request = 0;
while (g_concurrency.load() >= FLAGS_thread_num) { while (g_concurrency.load() >= FLAGS_thread_num) {
} }
g_concurrency++; g_concurrency++;
time_list[thread_id].resize(turns);
while (index < file_size) { while (request < turns) {
// uint64_t key = strtoul(buffer, NULL, 10); // uint64_t key = strtoul(buffer, NULL, 10);
if (index >= file_size) {
index = 0;
}
keys.push_back(key_list[index]); keys.push_back(key_list[index]);
index += 1; index += 1;
int ret = 0; int ret = 0;
...@@ -121,47 +124,12 @@ int run(int argc, char** argv, int thread_id) { ...@@ -121,47 +124,12 @@ int run(int argc, char** argv, int thread_id) {
} }
++seek_counter; ++seek_counter;
uint64_t seek_cost = time_diff(seek_start, seek_end); uint64_t seek_cost = time_diff(seek_start, seek_end);
seek_cost_total += seek_cost; time_list[thread_id][request - 1] = seek_cost;
if (seek_cost > seek_cost_max) {
seek_cost_max = seek_cost;
}
if (seek_cost < seek_cost_min) {
seek_cost_min = seek_cost;
}
keys.clear(); keys.clear();
values.clear(); values.clear();
} }
} }
/*
if (keys.size() > 0) {
int ret = 0;
values.resize(keys.size());
TIME_FLAG(seek_start);
ret = cube->seek(FLAGS_dict, keys, &values);
TIME_FLAG(seek_end);
if (ret != 0) {
LOG(WARNING) << "cube seek failed";
} else if (FLAGS_print_output) {
for (size_t i = 0; i < keys.size(); ++i) {
fprintf(stdout,
"key:%lu value:%s\n",
keys[i],
string_to_hex(values[i].buff).c_str());
}
}
++seek_counter;
uint64_t seek_cost = time_diff(seek_start, seek_end);
seek_cost_total += seek_cost;
if (seek_cost > seek_cost_max) {
seek_cost_max = seek_cost;
}
if (seek_cost < seek_cost_min) {
seek_cost_min = seek_cost;
}
}
*/
g_concurrency--; g_concurrency--;
// fclose(key_file); // fclose(key_file);
...@@ -171,12 +139,6 @@ int run(int argc, char** argv, int thread_id) { ...@@ -171,12 +139,6 @@ int run(int argc, char** argv, int thread_id) {
LOG(WARNING) << "destroy cube api failed err=" << ret; LOG(WARNING) << "destroy cube api failed err=" << ret;
} }
uint64_t seek_cost_avg = seek_cost_total / seek_counter;
LOG(INFO) << "seek cost avg = " << seek_cost_avg;
LOG(INFO) << "seek cost max = " << seek_cost_max;
LOG(INFO) << "seek cost min = " << seek_cost_min;
time_list[thread_id] = seek_cost_avg;
request_list[thread_id] = request; request_list[thread_id] = request;
return 0; return 0;
...@@ -188,6 +150,7 @@ int run_m(int argc, char** argv) { ...@@ -188,6 +150,7 @@ int run_m(int argc, char** argv) {
request_list.resize(thread_num); request_list.resize(thread_num);
time_list.resize(thread_num); time_list.resize(thread_num);
std::vector<std::thread*> thread_pool; std::vector<std::thread*> thread_pool;
TIME_FLAG(main_start);
for (int i = 0; i < thread_num; i++) { for (int i = 0; i < thread_num; i++) {
thread_pool.push_back(new std::thread(run, argc, argv, i)); thread_pool.push_back(new std::thread(run, argc, argv, i));
} }
...@@ -195,28 +158,43 @@ int run_m(int argc, char** argv) { ...@@ -195,28 +158,43 @@ int run_m(int argc, char** argv) {
thread_pool[i]->join(); thread_pool[i]->join();
delete thread_pool[i]; delete thread_pool[i];
} }
TIME_FLAG(main_end);
uint64_t sum_time = 0; uint64_t sum_time = 0;
uint64_t max_time = 0; uint64_t max_time = 0;
uint64_t min_time = 1000000; uint64_t min_time = 1000000;
uint64_t request_num = 0; std::vector<uint64_t> all_time_list;
for (int i = 0; i < thread_num; i++) { for (int i = 0; i < thread_num; i++) {
sum_time += time_list[i]; for (int j = 0; j < request_list[i]; j++) {
if (time_list[i] > max_time) { sum_time += time_list[i][j];
max_time = time_list[i]; if (time_list[i][j] > max_time) {
} max_time = time_list[i][j];
if (time_list[i] < min_time) { }
min_time = time_list[i]; if (time_list[i][j] < min_time) {
min_time = time_list[i][j];
}
all_time_list.push_back(time_list[i][j]);
} }
request_num += request_list[i];
} }
uint64_t mean_time = sum_time / thread_num; std::sort(all_time_list.begin(), all_time_list.end());
LOG(INFO) << thread_num << " thread seek cost" uint64_t mean_time = sum_time / (thread_num * turns);
<< " avg = " << std::to_string(mean_time) uint64_t main_time = time_diff(main_start, main_end);
<< " max = " << std::to_string(max_time) uint64_t request_num = turns * thread_num;
<< " min = " << std::to_string(min_time); LOG(INFO)
LOG(INFO) << " total_request = " << std::to_string(request_num) << " speed = " << "\n"
<< std::to_string(1000000 * thread_num / mean_time) // mean_time us << thread_num << " thread seek cost"
<< " query per second"; << "\navg: " << std::to_string(mean_time) << "\n50 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.5 * request_num)])
<< "\n80 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.8 * request_num)])
<< "\n90 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.9 * request_num)])
<< "\n99 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.99 * request_num)])
<< "\n99.9 percent: "
<< std::to_string(all_time_list[static_cast<int>(0.999 * request_num)])
<< "\ntotal_request: " << std::to_string(request_num) << "\nspeed: "
<< std::to_string(turns * 1000000 / main_time) // mean_time us
<< " query per second";
return 0; return 0;
} }
......
...@@ -49,6 +49,8 @@ class ModelRes { ...@@ -49,6 +49,8 @@ class ModelRes {
res._int64_value_map.end()); res._int64_value_map.end());
_float_value_map.insert(res._float_value_map.begin(), _float_value_map.insert(res._float_value_map.begin(),
res._float_value_map.end()); res._float_value_map.end());
_int32_value_map.insert(res._int32_value_map.begin(),
res._int32_value_map.end());
_shape_map.insert(res._shape_map.begin(), res._shape_map.end()); _shape_map.insert(res._shape_map.begin(), res._shape_map.end());
_lod_map.insert(res._lod_map.begin(), res._lod_map.end()); _lod_map.insert(res._lod_map.begin(), res._lod_map.end());
} }
...@@ -60,6 +62,9 @@ class ModelRes { ...@@ -60,6 +62,9 @@ class ModelRes {
_float_value_map.insert( _float_value_map.insert(
std::make_move_iterator(std::begin(res._float_value_map)), std::make_move_iterator(std::begin(res._float_value_map)),
std::make_move_iterator(std::end(res._float_value_map))); std::make_move_iterator(std::end(res._float_value_map)));
_int32_value_map.insert(
std::make_move_iterator(std::begin(res._int32_value_map)),
std::make_move_iterator(std::end(res._int32_value_map)));
_shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)), _shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)),
std::make_move_iterator(std::end(res._shape_map))); std::make_move_iterator(std::end(res._shape_map)));
_lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)), _lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)),
...@@ -78,6 +83,12 @@ class ModelRes { ...@@ -78,6 +83,12 @@ class ModelRes {
std::vector<float>&& get_float_by_name_with_rv(const std::string& name) { std::vector<float>&& get_float_by_name_with_rv(const std::string& name) {
return std::move(_float_value_map[name]); return std::move(_float_value_map[name]);
} }
const std::vector<int32_t>& get_int32_by_name(const std::string& name) {
return _int32_value_map[name];
}
std::vector<int32_t>&& get_int32_by_name_with_rv(const std::string& name) {
return std::move(_int32_value_map[name]);
}
const std::vector<int>& get_shape_by_name(const std::string& name) { const std::vector<int>& get_shape_by_name(const std::string& name) {
return _shape_map[name]; return _shape_map[name];
} }
...@@ -103,6 +114,9 @@ class ModelRes { ...@@ -103,6 +114,9 @@ class ModelRes {
_float_value_map.insert( _float_value_map.insert(
std::make_move_iterator(std::begin(res._float_value_map)), std::make_move_iterator(std::begin(res._float_value_map)),
std::make_move_iterator(std::end(res._float_value_map))); std::make_move_iterator(std::end(res._float_value_map)));
_int32_value_map.insert(
std::make_move_iterator(std::begin(res._int32_value_map)),
std::make_move_iterator(std::end(res._int32_value_map)));
_shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)), _shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)),
std::make_move_iterator(std::end(res._shape_map))); std::make_move_iterator(std::end(res._shape_map)));
_lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)), _lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)),
...@@ -115,6 +129,7 @@ class ModelRes { ...@@ -115,6 +129,7 @@ class ModelRes {
std::string _engine_name; std::string _engine_name;
std::map<std::string, std::vector<int64_t>> _int64_value_map; std::map<std::string, std::vector<int64_t>> _int64_value_map;
std::map<std::string, std::vector<float>> _float_value_map; std::map<std::string, std::vector<float>> _float_value_map;
std::map<std::string, std::vector<int32_t>> _int32_value_map;
std::map<std::string, std::vector<int>> _shape_map; std::map<std::string, std::vector<int>> _shape_map;
std::map<std::string, std::vector<int>> _lod_map; std::map<std::string, std::vector<int>> _lod_map;
}; };
...@@ -145,6 +160,14 @@ class PredictorRes { ...@@ -145,6 +160,14 @@ class PredictorRes {
const std::string& name) { const std::string& name) {
return std::move(_models[model_idx].get_float_by_name_with_rv(name)); return std::move(_models[model_idx].get_float_by_name_with_rv(name));
} }
const std::vector<int32_t>& get_int32_by_name(const int model_idx,
const std::string& name) {
return _models[model_idx].get_int32_by_name(name);
}
std::vector<int32_t>&& get_int32_by_name_with_rv(const int model_idx,
const std::string& name) {
return std::move(_models[model_idx].get_int32_by_name_with_rv(name));
}
const std::vector<int>& get_shape_by_name(const int model_idx, const std::vector<int>& get_shape_by_name(const int model_idx,
const std::string& name) { const std::string& name) {
return _models[model_idx].get_shape_by_name(name); return _models[model_idx].get_shape_by_name(name);
......
...@@ -207,17 +207,28 @@ int PredictorClient::batch_predict( ...@@ -207,17 +207,28 @@ int PredictorClient::batch_predict(
for (auto &name : int_feed_name) { for (auto &name : int_feed_name) {
int idx = _feed_name_to_idx[name]; int idx = _feed_name_to_idx[name];
Tensor *tensor = tensor_vec[idx]; Tensor *tensor = tensor_vec[idx];
VLOG(2) << "prepare int feed " << name << " shape size " if (_type[idx] == 0) {
<< int_shape[vec_idx].size(); VLOG(2) << "prepare int64 feed " << name << " shape size "
<< int_shape[vec_idx].size();
VLOG(3) << "feed var name " << name << " index " << vec_idx
<< "first data " << int_feed[vec_idx][0];
for (uint32_t j = 0; j < int_feed[vec_idx].size(); ++j) {
tensor->add_int64_data(int_feed[vec_idx][j]);
}
} else if (_type[idx] == 2) {
VLOG(2) << "prepare int32 feed " << name << " shape size "
<< int_shape[vec_idx].size();
VLOG(3) << "feed var name " << name << " index " << vec_idx
<< "first data " << int32_t(int_feed[vec_idx][0]);
for (uint32_t j = 0; j < int_feed[vec_idx].size(); ++j) {
tensor->add_int_data(int32_t(int_feed[vec_idx][j]));
}
}
for (uint32_t j = 0; j < int_shape[vec_idx].size(); ++j) { for (uint32_t j = 0; j < int_shape[vec_idx].size(); ++j) {
tensor->add_shape(int_shape[vec_idx][j]); tensor->add_shape(int_shape[vec_idx][j]);
} }
tensor->set_elem_type(0); tensor->set_elem_type(_type[idx]);
VLOG(3) << "feed var name " << name << " index " << vec_idx
<< "first data " << int_feed[vec_idx][0];
for (uint32_t j = 0; j < int_feed[vec_idx].size(); ++j) {
tensor->add_int64_data(int_feed[vec_idx][j]);
}
vec_idx++; vec_idx++;
} }
...@@ -284,24 +295,25 @@ int PredictorClient::batch_predict( ...@@ -284,24 +295,25 @@ int PredictorClient::batch_predict(
for (auto &name : fetch_name) { for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name]; // int idx = _fetch_name_to_idx[name];
if (_fetch_name_to_type[name] == 0) { if (_fetch_name_to_type[name] == 0) {
VLOG(2) << "ferch var " << name << "type int"; VLOG(2) << "ferch var " << name << "type int64";
model._int64_value_map[name].resize(
output.insts(0).tensor_array(idx).int64_data_size());
int size = output.insts(0).tensor_array(idx).int64_data_size(); int size = output.insts(0).tensor_array(idx).int64_data_size();
for (int i = 0; i < size; ++i) { model._int64_value_map[name] = std::vector<int64_t>(
model._int64_value_map[name][i] = output.insts(0).tensor_array(idx).int64_data().begin(),
output.insts(0).tensor_array(idx).int64_data(i); output.insts(0).tensor_array(idx).int64_data().begin() + size);
} } else if (_fetch_name_to_type[name] == 1) {
} else {
VLOG(2) << "fetch var " << name << "type float"; VLOG(2) << "fetch var " << name << "type float";
model._float_value_map[name].resize(
output.insts(0).tensor_array(idx).float_data_size());
int size = output.insts(0).tensor_array(idx).float_data_size(); int size = output.insts(0).tensor_array(idx).float_data_size();
for (int i = 0; i < size; ++i) { model._float_value_map[name] = std::vector<float>(
model._float_value_map[name][i] = output.insts(0).tensor_array(idx).float_data().begin(),
output.insts(0).tensor_array(idx).float_data(i); output.insts(0).tensor_array(idx).float_data().begin() + size);
} } else if (_fetch_name_to_type[name] == 2) {
VLOG(2) << "fetch var " << name << "type int32";
int size = output.insts(0).tensor_array(idx).int_data_size();
model._int32_value_map[name] = std::vector<int32_t>(
output.insts(0).tensor_array(idx).int_data().begin(),
output.insts(0).tensor_array(idx).int_data().begin() + size);
} }
idx += 1; idx += 1;
} }
predict_res_batch.add_model_res(std::move(model)); predict_res_batch.add_model_res(std::move(model));
...@@ -448,12 +460,19 @@ int PredictorClient::numpy_predict( ...@@ -448,12 +460,19 @@ int PredictorClient::numpy_predict(
for (auto &name : int_feed_name) { for (auto &name : int_feed_name) {
int idx = _feed_name_to_idx[name]; int idx = _feed_name_to_idx[name];
Tensor *tensor = tensor_vec[idx]; Tensor *tensor = tensor_vec[idx];
VLOG(2) << "prepare int feed " << name << " shape size "
<< int_shape[vec_idx].size();
for (uint32_t j = 0; j < int_shape[vec_idx].size(); ++j) { for (uint32_t j = 0; j < int_shape[vec_idx].size(); ++j) {
tensor->add_shape(int_shape[vec_idx][j]); tensor->add_shape(int_shape[vec_idx][j]);
} }
tensor->set_elem_type(0); tensor->set_elem_type(_type[idx]);
if (_type[idx] == 0) {
VLOG(2) << "prepare int feed " << name << " shape size "
<< int_shape[vec_idx].size();
} else {
VLOG(2) << "prepare int32 feed " << name << " shape size "
<< int_shape[vec_idx].size();
}
const int int_shape_size = int_shape[vec_idx].size(); const int int_shape_size = int_shape[vec_idx].size();
switch (int_shape_size) { switch (int_shape_size) {
...@@ -463,7 +482,11 @@ int PredictorClient::numpy_predict( ...@@ -463,7 +482,11 @@ int PredictorClient::numpy_predict(
for (ssize_t j = 0; j < int_array.shape(1); j++) { for (ssize_t j = 0; j < int_array.shape(1); j++) {
for (ssize_t k = 0; k < int_array.shape(2); k++) { for (ssize_t k = 0; k < int_array.shape(2); k++) {
for (ssize_t l = 0; k < int_array.shape(3); l++) { for (ssize_t l = 0; k < int_array.shape(3); l++) {
tensor->add_int64_data(int_array(i, j, k, l)); if (_type[idx] == 0) {
tensor->add_int64_data(int_array(i, j, k, l));
} else {
tensor->add_int_data(int_array(i, j, k, l));
}
} }
} }
} }
...@@ -475,7 +498,11 @@ int PredictorClient::numpy_predict( ...@@ -475,7 +498,11 @@ int PredictorClient::numpy_predict(
for (ssize_t i = 0; i < int_array.shape(0); i++) { for (ssize_t i = 0; i < int_array.shape(0); i++) {
for (ssize_t j = 0; j < int_array.shape(1); j++) { for (ssize_t j = 0; j < int_array.shape(1); j++) {
for (ssize_t k = 0; k < int_array.shape(2); k++) { for (ssize_t k = 0; k < int_array.shape(2); k++) {
tensor->add_int64_data(int_array(i, j, k)); if (_type[idx] == 0) {
tensor->add_int64_data(int_array(i, j, k));
} else {
tensor->add_int_data(int_array(i, j, k));
}
} }
} }
} }
...@@ -485,7 +512,11 @@ int PredictorClient::numpy_predict( ...@@ -485,7 +512,11 @@ int PredictorClient::numpy_predict(
auto int_array = int_feed[vec_idx].unchecked<2>(); auto int_array = int_feed[vec_idx].unchecked<2>();
for (ssize_t i = 0; i < int_array.shape(0); i++) { for (ssize_t i = 0; i < int_array.shape(0); i++) {
for (ssize_t j = 0; j < int_array.shape(1); j++) { for (ssize_t j = 0; j < int_array.shape(1); j++) {
tensor->add_int64_data(int_array(i, j)); if (_type[idx] == 0) {
tensor->add_int64_data(int_array(i, j));
} else {
tensor->add_int_data(int_array(i, j));
}
} }
} }
break; break;
...@@ -493,7 +524,11 @@ int PredictorClient::numpy_predict( ...@@ -493,7 +524,11 @@ int PredictorClient::numpy_predict(
case 1: { case 1: {
auto int_array = int_feed[vec_idx].unchecked<1>(); auto int_array = int_feed[vec_idx].unchecked<1>();
for (ssize_t i = 0; i < int_array.shape(0); i++) { for (ssize_t i = 0; i < int_array.shape(0); i++) {
tensor->add_int64_data(int_array(i)); if (_type[idx] == 0) {
tensor->add_int64_data(int_array(i));
} else {
tensor->add_int_data(int_array(i));
}
} }
break; break;
} }
...@@ -563,23 +598,23 @@ int PredictorClient::numpy_predict( ...@@ -563,23 +598,23 @@ int PredictorClient::numpy_predict(
for (auto &name : fetch_name) { for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name]; // int idx = _fetch_name_to_idx[name];
if (_fetch_name_to_type[name] == 0) { if (_fetch_name_to_type[name] == 0) {
VLOG(2) << "ferch var " << name << "type int"; VLOG(2) << "ferch var " << name << "type int64";
model._int64_value_map[name].resize(
output.insts(0).tensor_array(idx).int64_data_size());
int size = output.insts(0).tensor_array(idx).int64_data_size(); int size = output.insts(0).tensor_array(idx).int64_data_size();
for (int i = 0; i < size; ++i) { model._int64_value_map[name] = std::vector<int64_t>(
model._int64_value_map[name][i] = output.insts(0).tensor_array(idx).int64_data().begin(),
output.insts(0).tensor_array(idx).int64_data(i); output.insts(0).tensor_array(idx).int64_data().begin() + size);
} } else if (_fetch_name_to_type[name] == 1) {
} else {
VLOG(2) << "fetch var " << name << "type float"; VLOG(2) << "fetch var " << name << "type float";
model._float_value_map[name].resize(
output.insts(0).tensor_array(idx).float_data_size());
int size = output.insts(0).tensor_array(idx).float_data_size(); int size = output.insts(0).tensor_array(idx).float_data_size();
for (int i = 0; i < size; ++i) { model._float_value_map[name] = std::vector<float>(
model._float_value_map[name][i] = output.insts(0).tensor_array(idx).float_data().begin(),
output.insts(0).tensor_array(idx).float_data(i); output.insts(0).tensor_array(idx).float_data().begin() + size);
} } else if (_fetch_name_to_type[name] == 2) {
VLOG(2) << "fetch var " << name << "type int32";
int size = output.insts(0).tensor_array(idx).int_data_size();
model._int32_value_map[name] = std::vector<int32_t>(
output.insts(0).tensor_array(idx).int_data().begin(),
output.insts(0).tensor_array(idx).int_data().begin() + size);
} }
idx += 1; idx += 1;
} }
...@@ -613,7 +648,6 @@ int PredictorClient::numpy_predict( ...@@ -613,7 +648,6 @@ int PredictorClient::numpy_predict(
_api.thrd_clear(); _api.thrd_clear();
return 0; return 0;
} }
} // namespace general_model } // namespace general_model
} // namespace paddle_serving } // namespace paddle_serving
} // namespace baidu } // namespace baidu
...@@ -90,6 +90,9 @@ int GeneralDistKVInferOp::inference() { ...@@ -90,6 +90,9 @@ int GeneralDistKVInferOp::inference() {
keys.begin() + key_idx); keys.begin() + key_idx);
key_idx += dataptr_size_pairs[i].second; key_idx += dataptr_size_pairs[i].second;
} }
Timer timeline;
int64_t cube_start = timeline.TimeStampUS();
timeline.Start();
rec::mcube::CubeAPI *cube = rec::mcube::CubeAPI::instance(); rec::mcube::CubeAPI *cube = rec::mcube::CubeAPI::instance();
std::vector<std::string> table_names = cube->get_table_names(); std::vector<std::string> table_names = cube->get_table_names();
if (table_names.size() == 0) { if (table_names.size() == 0) {
...@@ -97,7 +100,7 @@ int GeneralDistKVInferOp::inference() { ...@@ -97,7 +100,7 @@ int GeneralDistKVInferOp::inference() {
return -1; return -1;
} }
int ret = cube->seek(table_names[0], keys, &values); int ret = cube->seek(table_names[0], keys, &values);
int64_t cube_end = timeline.TimeStampUS();
if (values.size() != keys.size() || values[0].buff.size() == 0) { if (values.size() != keys.size() || values[0].buff.size() == 0) {
LOG(ERROR) << "cube value return null"; LOG(ERROR) << "cube value return null";
} }
...@@ -153,9 +156,7 @@ int GeneralDistKVInferOp::inference() { ...@@ -153,9 +156,7 @@ int GeneralDistKVInferOp::inference() {
VLOG(2) << "infer batch size: " << batch_size; VLOG(2) << "infer batch size: " << batch_size;
Timer timeline;
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
timeline.Start();
if (InferManager::instance().infer( if (InferManager::instance().infer(
engine_name().c_str(), &infer_in, out, batch_size)) { engine_name().c_str(), &infer_in, out, batch_size)) {
...@@ -165,6 +166,8 @@ int GeneralDistKVInferOp::inference() { ...@@ -165,6 +166,8 @@ int GeneralDistKVInferOp::inference() {
int64_t end = timeline.TimeStampUS(); int64_t end = timeline.TimeStampUS();
CopyBlobInfo(input_blob, output_blob); CopyBlobInfo(input_blob, output_blob);
AddBlobInfo(output_blob, cube_start);
AddBlobInfo(output_blob, cube_end);
AddBlobInfo(output_blob, start); AddBlobInfo(output_blob, start);
AddBlobInfo(output_blob, end); AddBlobInfo(output_blob, end);
return 0; return 0;
......
...@@ -126,9 +126,12 @@ int GeneralReaderOp::inference() { ...@@ -126,9 +126,12 @@ int GeneralReaderOp::inference() {
if (elem_type[i] == 0) { // int64 if (elem_type[i] == 0) { // int64
elem_size[i] = sizeof(int64_t); elem_size[i] = sizeof(int64_t);
lod_tensor.dtype = paddle::PaddleDType::INT64; lod_tensor.dtype = paddle::PaddleDType::INT64;
} else { } else if (elem_type[i] == 1) {
elem_size[i] = sizeof(float); elem_size[i] = sizeof(float);
lod_tensor.dtype = paddle::PaddleDType::FLOAT32; lod_tensor.dtype = paddle::PaddleDType::FLOAT32;
} else if (elem_type[i] == 2) {
elem_size[i] = sizeof(int32_t);
lod_tensor.dtype = paddle::PaddleDType::INT32;
} }
if (model_config->_is_lod_feed[i]) { if (model_config->_is_lod_feed[i]) {
...@@ -159,8 +162,10 @@ int GeneralReaderOp::inference() { ...@@ -159,8 +162,10 @@ int GeneralReaderOp::inference() {
int data_len = 0; int data_len = 0;
if (tensor.int64_data_size() > 0) { if (tensor.int64_data_size() > 0) {
data_len = tensor.int64_data_size(); data_len = tensor.int64_data_size();
} else { } else if (tensor.float_data_size() > 0) {
data_len = tensor.float_data_size(); data_len = tensor.float_data_size();
} else if (tensor.int_data_size() > 0) {
data_len = tensor.int_data_size();
} }
VLOG(2) << "tensor size for var[" << i << "]: " << data_len; VLOG(2) << "tensor size for var[" << i << "]: " << data_len;
tensor_size += data_len; tensor_size += data_len;
...@@ -198,6 +203,8 @@ int GeneralReaderOp::inference() { ...@@ -198,6 +203,8 @@ int GeneralReaderOp::inference() {
for (int i = 0; i < var_num; ++i) { for (int i = 0; i < var_num; ++i) {
if (elem_type[i] == 0) { if (elem_type[i] == 0) {
int64_t *dst_ptr = static_cast<int64_t *>(out->at(i).data.data()); int64_t *dst_ptr = static_cast<int64_t *>(out->at(i).data.data());
VLOG(2) << "first element data in var[" << i << "] is "
<< req->insts(0).tensor_array(i).int64_data(0);
int offset = 0; int offset = 0;
for (int j = 0; j < batch_size; ++j) { for (int j = 0; j < batch_size; ++j) {
int elem_num = req->insts(j).tensor_array(i).int64_data_size(); int elem_num = req->insts(j).tensor_array(i).int64_data_size();
...@@ -210,8 +217,10 @@ int GeneralReaderOp::inference() { ...@@ -210,8 +217,10 @@ int GeneralReaderOp::inference() {
offset += capacity[i]; offset += capacity[i];
} }
} }
} else { } else if (elem_type[i] == 1) {
float *dst_ptr = static_cast<float *>(out->at(i).data.data()); float *dst_ptr = static_cast<float *>(out->at(i).data.data());
VLOG(2) << "first element data in var[" << i << "] is "
<< req->insts(0).tensor_array(i).float_data(0);
int offset = 0; int offset = 0;
for (int j = 0; j < batch_size; ++j) { for (int j = 0; j < batch_size; ++j) {
int elem_num = req->insts(j).tensor_array(i).float_data_size(); int elem_num = req->insts(j).tensor_array(i).float_data_size();
...@@ -224,6 +233,22 @@ int GeneralReaderOp::inference() { ...@@ -224,6 +233,22 @@ int GeneralReaderOp::inference() {
offset += capacity[i]; offset += capacity[i];
} }
} }
} else if (elem_type[i] == 2) {
int32_t *dst_ptr = static_cast<int32_t *>(out->at(i).data.data());
VLOG(2) << "first element data in var[" << i << "] is "
<< req->insts(0).tensor_array(i).int_data(0);
int offset = 0;
for (int j = 0; j < batch_size; ++j) {
int elem_num = req->insts(j).tensor_array(i).int_data_size();
for (int k = 0; k < elem_num; ++k) {
dst_ptr[offset + k] = req->insts(j).tensor_array(i).int_data(k);
}
if (out->at(i).lod.size() == 1) {
offset = out->at(i).lod[0][j + 1];
} else {
offset += capacity[i];
}
}
} }
} }
......
...@@ -91,7 +91,6 @@ int GeneralResponseOp::inference() { ...@@ -91,7 +91,6 @@ int GeneralResponseOp::inference() {
for (auto &idx : fetch_index) { for (auto &idx : fetch_index) {
Tensor *tensor = fetch_inst->add_tensor_array(); Tensor *tensor = fetch_inst->add_tensor_array();
tensor->set_elem_type(1);
if (model_config->_is_lod_fetch[idx]) { if (model_config->_is_lod_fetch[idx]) {
VLOG(2) << "out[" << idx << "] " << model_config->_fetch_name[idx] VLOG(2) << "out[" << idx << "] " << model_config->_fetch_name[idx]
<< " is lod_tensor"; << " is lod_tensor";
...@@ -115,49 +114,48 @@ int GeneralResponseOp::inference() { ...@@ -115,49 +114,48 @@ int GeneralResponseOp::inference() {
for (int j = 0; j < in->at(idx).shape.size(); ++j) { for (int j = 0; j < in->at(idx).shape.size(); ++j) {
cap *= in->at(idx).shape[j]; cap *= in->at(idx).shape[j];
} }
if (in->at(idx).dtype == paddle::PaddleDType::INT64) {
VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx] FetchInst *fetch_p = output->mutable_insts(0);
auto dtype = in->at(idx).dtype;
if (dtype == paddle::PaddleDType::INT64) {
VLOG(2) << "Prepare int64 var [" << model_config->_fetch_name[idx]
<< "]."; << "].";
int64_t *data_ptr = static_cast<int64_t *>(in->at(idx).data.data()); int64_t *data_ptr = static_cast<int64_t *>(in->at(idx).data.data());
if (model_config->_is_lod_fetch[idx]) { // from
FetchInst *fetch_p = output->mutable_insts(0); // https://stackoverflow.com/questions/15499641/copy-a-stdvector-to-a-repeated-field-from-protobuf-with-memcpy
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) { // `Swap` method is faster than `{}` method.
fetch_p->mutable_tensor_array(var_idx)->add_lod( google::protobuf::RepeatedField<int64_t> tmp_data(data_ptr,
in->at(idx).lod[0][j]); data_ptr + cap);
} fetch_p->mutable_tensor_array(var_idx)->mutable_int64_data()->Swap(
for (int j = 0; j < cap; ++j) { &tmp_data);
fetch_p->mutable_tensor_array(var_idx)->add_int64_data(data_ptr[j]); } else if (dtype == paddle::PaddleDType::FLOAT32) {
}
} else {
FetchInst *fetch_p = output->mutable_insts(0);
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_int64_data(data_ptr[j]);
}
}
VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready";
var_idx++;
} else if (in->at(idx).dtype == paddle::PaddleDType::FLOAT32) {
VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx] VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx]
<< "]."; << "].";
float *data_ptr = static_cast<float *>(in->at(idx).data.data()); float *data_ptr = static_cast<float *>(in->at(idx).data.data());
if (model_config->_is_lod_fetch[idx]) { google::protobuf::RepeatedField<float> tmp_data(data_ptr,
FetchInst *fetch_p = output->mutable_insts(0); data_ptr + cap);
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) { fetch_p->mutable_tensor_array(var_idx)->mutable_float_data()->Swap(
fetch_p->mutable_tensor_array(var_idx)->add_lod( &tmp_data);
in->at(idx).lod[0][j]); } else if (dtype == paddle::PaddleDType::INT32) {
} VLOG(2) << "Prepare int32 var [" << model_config->_fetch_name[idx]
for (int j = 0; j < cap; ++j) { << "].";
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]); int32_t *data_ptr = static_cast<int32_t *>(in->at(idx).data.data());
} google::protobuf::RepeatedField<int32_t> tmp_data(data_ptr,
} else { data_ptr + cap);
FetchInst *fetch_p = output->mutable_insts(0); fetch_p->mutable_tensor_array(var_idx)->mutable_int_data()->Swap(
for (int j = 0; j < cap; ++j) { &tmp_data);
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]); }
}
if (model_config->_is_lod_fetch[idx]) {
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_lod(
in->at(idx).lod[0][j]);
} }
VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready";
var_idx++;
} }
VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready";
var_idx++;
} }
} }
......
...@@ -603,13 +603,13 @@ class VersionedInferEngine : public InferEngine { ...@@ -603,13 +603,13 @@ class VersionedInferEngine : public InferEngine {
LOG(ERROR) << "Failed generate engine with type:" << engine_type; LOG(ERROR) << "Failed generate engine with type:" << engine_type;
return -1; return -1;
} }
VLOG(2) << "FLGS_logtostderr " << FLAGS_logtostderr; VLOG(2) << "FLAGS_logtostderr " << FLAGS_logtostderr;
int tmp = FLAGS_logtostderr; int tmp = FLAGS_logtostderr;
if (engine->proc_initialize(conf, version) != 0) { if (engine->proc_initialize(conf, version) != 0) {
LOG(ERROR) << "Failed initialize engine, type:" << engine_type; LOG(ERROR) << "Failed initialize engine, type:" << engine_type;
return -1; return -1;
} }
VLOG(2) << "FLGS_logtostderr " << FLAGS_logtostderr; VLOG(2) << "FLAGS_logtostderr " << FLAGS_logtostderr;
FLAGS_logtostderr = tmp; FLAGS_logtostderr = tmp;
auto r = _versions.insert(std::make_pair(engine->version(), engine)); auto r = _versions.insert(std::make_pair(engine->version(), engine));
if (!r.second) { if (!r.second) {
......
...@@ -12,13 +12,23 @@ ...@@ -12,13 +12,23 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
#include <sys/time.h>
#include <fstream> #include <fstream>
#include <iostream> #include <iostream>
#include <memory> #include <memory>
#include <thread>
#include "core/predictor/framework.pb.h" #include "core/predictor/framework.pb.h"
#include "quant.h" #include "quant.h"
#include "seq_file.h" #include "seq_file.h"
inline uint64_t time_diff(const struct timeval &start_time,
const struct timeval &end_time) {
return (end_time.tv_sec - start_time.tv_sec) * 1000000 +
(end_time.tv_usec - start_time.tv_usec);
}
using paddle::framework::proto::VarType; using paddle::framework::proto::VarType;
std::map<int, size_t> var_type_size; std::map<int, size_t> var_type_size;
void reg_var_types() { void reg_var_types() {
...@@ -100,8 +110,8 @@ int dump_parameter(const char *input_file, const char *output_file) { ...@@ -100,8 +110,8 @@ int dump_parameter(const char *input_file, const char *output_file) {
char *value_buf = new char[value_buf_len]; char *value_buf = new char[value_buf_len];
size_t offset = 0; size_t offset = 0;
for (int64_t i = 0; i < dims[0]; ++i) { for (int64_t i = 0; i < dims[0]; ++i) {
// std::cout << "key_len " << key_len << " value_len " << value_buf_len << // std::cout << "key_len " << key_len << " value_len " << value_buf_len
// std::endl; // << std::endl;
memcpy(value_buf, tensor_buf + offset, value_buf_len); memcpy(value_buf, tensor_buf + offset, value_buf_len);
seq_file_writer.write((char *)&i, sizeof(i), value_buf, value_buf_len); seq_file_writer.write((char *)&i, sizeof(i), value_buf, value_buf_len);
offset += value_buf_len; offset += value_buf_len;
...@@ -109,14 +119,14 @@ int dump_parameter(const char *input_file, const char *output_file) { ...@@ -109,14 +119,14 @@ int dump_parameter(const char *input_file, const char *output_file) {
return 0; return 0;
} }
int compress_parameter(const char *file1, const char *file2, int bits) { float *read_embedding_table(const char *file1, std::vector<int64_t> &dims) {
std::ifstream is(file1); std::ifstream is(file1);
// Step 1: is read version, os write version // Step 1: is read version, os write version
uint32_t version; uint32_t version;
is.read(reinterpret_cast<char *>(&version), sizeof(version)); is.read(reinterpret_cast<char *>(&version), sizeof(version));
if (version != 0) { if (version != 0) {
std::cout << "Version number " << version << " not supported" << std::endl; std::cout << "Version number " << version << " not supported" << std::endl;
return -1; return NULL;
} }
std::cout << "Version size: " << sizeof(version) << std::endl; std::cout << "Version size: " << sizeof(version) << std::endl;
// Step 2: is read LoD level, os write LoD level // Step 2: is read LoD level, os write LoD level
...@@ -138,7 +148,7 @@ int compress_parameter(const char *file1, const char *file2, int bits) { ...@@ -138,7 +148,7 @@ int compress_parameter(const char *file1, const char *file2, int bits) {
is.read(reinterpret_cast<char *>(&version), sizeof(version)); is.read(reinterpret_cast<char *>(&version), sizeof(version));
if (version != 0) { if (version != 0) {
std::cout << "Version number " << version << " not supported" << std::endl; std::cout << "Version number " << version << " not supported" << std::endl;
return -1; return NULL;
} }
// Step 4: is read Tensor Data, os write min/max/quant data // Step 4: is read Tensor Data, os write min/max/quant data
...@@ -149,10 +159,10 @@ int compress_parameter(const char *file1, const char *file2, int bits) { ...@@ -149,10 +159,10 @@ int compress_parameter(const char *file1, const char *file2, int bits) {
is.read(reinterpret_cast<char *>(buf.get()), size); is.read(reinterpret_cast<char *>(buf.get()), size);
if (!desc.ParseFromArray(buf.get(), size)) { if (!desc.ParseFromArray(buf.get(), size)) {
std::cout << "Cannot parse tensor desc" << std::endl; std::cout << "Cannot parse tensor desc" << std::endl;
return -1; return NULL;
} }
// read tensor // read tensor
std::vector<int64_t> dims; // std::vector<int64_t> dims;
dims.reserve(static_cast<size_t>(desc.dims().size())); dims.reserve(static_cast<size_t>(desc.dims().size()));
std::copy(desc.dims().begin(), desc.dims().end(), std::back_inserter(dims)); std::copy(desc.dims().begin(), desc.dims().end(), std::back_inserter(dims));
...@@ -164,7 +174,7 @@ int compress_parameter(const char *file1, const char *file2, int bits) { ...@@ -164,7 +174,7 @@ int compress_parameter(const char *file1, const char *file2, int bits) {
if (dims.size() != 2) { if (dims.size() != 2) {
std::cout << "Parameter dims not 2D" << std::endl; std::cout << "Parameter dims not 2D" << std::endl;
return -1; return NULL;
} }
size_t numel = 1; size_t numel = 1;
...@@ -176,47 +186,96 @@ int compress_parameter(const char *file1, const char *file2, int bits) { ...@@ -176,47 +186,96 @@ int compress_parameter(const char *file1, const char *file2, int bits) {
char *tensor_buf = new char[buf_size]; char *tensor_buf = new char[buf_size];
is.read(static_cast<char *>(tensor_buf), buf_size); is.read(static_cast<char *>(tensor_buf), buf_size);
float *tensor_float_buf = reinterpret_cast<float *>(tensor_buf); float *tensor_float_buf = reinterpret_cast<float *>(tensor_buf);
size_t per_line_size = dims[1] * 1 + 2 * sizeof(float); return tensor_float_buf;
char *tensor_out = new char[per_line_size * dims[0]]; }
float loss = 0; int compress_parameter_parallel(const char *file1,
float all_loss = 0; const char *file2,
int bits,
int n_threads) {
#define MIN_THREADS (1)
#define MAX_THREADS (80)
std::vector<int64_t> dims;
float *emb_table = read_embedding_table(file1, dims);
if (emb_table == NULL || dims.size() != 2) {
return -1;
}
// int64_t dict_size = dims[0]/100000000;
int64_t dict_size = dims[0];
int64_t emb_size = dims[1];
size_t per_line_size = emb_size * 1 + 2 * sizeof(float);
n_threads = std::min(std::max(MIN_THREADS, n_threads), MAX_THREADS);
int64_t step = dict_size / n_threads;
std::vector<char *> result;
result.reserve(dict_size + 1);
double pow2bits = pow(2, bits);
std::cout << "Start Quant" << std::endl; std::cout << "Start Quant" << std::endl;
std::vector<std::thread> threads;
for (int i = 0; i < n_threads + 1; ++i) {
threads.push_back(std::thread([=, &result]() {
int64_t start = i * step;
int64_t end = (i + 1) * step;
if (i == n_threads) {
if (start == dict_size) {
return;
}
end = dict_size;
}
printf("THREAD[%d], index [%ld, %ld), start Quant table...\n",
i,
start,
end);
struct timeval quant_start;
gettimeofday(&(quant_start), NULL);
for (int64_t k = start; k < end; ++k) {
float xmin = 0, xmax = 0, loss = 0;
char *tensor_temp = new char[per_line_size];
greedy_search(
emb_table + k * emb_size, xmin, xmax, loss, emb_size, bits);
// 得出 loss 最小的时候的 scale
float scale = (xmax - xmin) / (pow2bits - 1);
char *min_ptr = tensor_temp;
char *max_ptr = tensor_temp + sizeof(float);
memcpy(min_ptr, &xmin, sizeof(float));
memcpy(max_ptr, &xmax, sizeof(float));
for (size_t e = 0; e < emb_size; ++e) {
float x = *(emb_table + k * emb_size + e);
int val = round((x - xmin) / scale);
val = std::max(0, val);
val = std::min((int)pow2bits - 1, val);
*(tensor_temp + 2 * sizeof(float) + e) = val;
}
result[k] = tensor_temp;
if ((k - start) % 10000 == 0) {
printf("THREAD[%d], handle line: %ld\n", i, k - start);
}
}
struct timeval quant_end;
gettimeofday(&(quant_end), NULL);
printf("THREAD[%d], Quantization finished, cost: %lu us!!!\n",
i,
time_diff(quant_start, quant_end));
}));
}
for (auto &thread : threads) {
thread.join();
}
SeqFileWriter seq_file_writer(file2); SeqFileWriter seq_file_writer(file2);
for (int64_t i = 0; i < dict_size; i++) {
size_t offset = 0; seq_file_writer.write((char *)&i, sizeof(i), result[i], per_line_size);
for (int64_t i = 0; i < dims[0]; ++i) {
float xmin = 0, xmax = 0, loss = 0;
size_t scale = dims[1];
char *tensor_temp = new char[per_line_size];
greedy_search(
tensor_float_buf + i * dims[1], xmin, xmax, loss, scale, bits);
for (size_t e = 0; e < dims[1]; ++e) {
float x = *(tensor_float_buf + i * dims[1] + e);
int val = round((x - xmin) / (xmax - xmin) * (pow(2, bits) - 1));
val = std::max(0, val);
val = std::min((int)pow(2, bits) - 1, val);
char *min_ptr = tensor_temp;
char *max_ptr = tensor_temp + sizeof(float);
memcpy(min_ptr, &xmin, sizeof(float));
memcpy(max_ptr, &xmax, sizeof(float));
*(tensor_temp + 2 * sizeof(float) + e) = val;
float unit = (xmax - xmin) / pow(2, bits);
float trans_val = unit * val + xmin;
}
seq_file_writer.write((char *)&i, sizeof(i), tensor_temp, per_line_size);
} }
return 0; return 0;
} }
int main(int argc, char **argv) { int main(int argc, char **argv) {
if (argc < 3 || argc > 4) { if (argc < 3 || argc > 5) {
std::cout << "Usage: if no compress, please follow:" << std::endl; std::cout << "Usage:" << std::endl;
std::cout << "seq_generator PARAMETER_FILE OUTPUT_FILE\n" << std::endl; std::cout << "if no compress, please follow:" << std::endl;
std::cout << " seq_generator PARAMETER_FILE OUTPUT_FILE\n" << std::endl;
std::cout << "if compress, please follow: " << std::endl; std::cout << "if compress, please follow: " << std::endl;
std::cout << "seq_generator PARAMETER_FILE OUTPUT_FILE QUANT_BITS" std::cout << " seq_generator PARAMETER_FILE OUTPUT_FILE QUANT_BITS "
"[N_THREADS]"
<< std::endl; << std::endl;
std::cout << "Now it only support 8 bit." << std::endl; std::cout << " Now it only support 8 bit." << std::endl;
return -1; return -1;
} }
reg_var_types(); reg_var_types();
...@@ -227,7 +286,13 @@ int main(int argc, char **argv) { ...@@ -227,7 +286,13 @@ int main(int argc, char **argv) {
} }
if (argc == 4) { if (argc == 4) {
std::cout << "generate compressed sparse param sequence file" << std::endl; std::cout << "generate compressed sparse param sequence file" << std::endl;
compress_parameter(argv[1], argv[2], atoi(argv[3])); compress_parameter_parallel(argv[1], argv[2], atoi(argv[3]), 1);
return 0;
}
if (argc == 5) {
std::cout << "parallel generate compressed sparse param sequence file"
<< std::endl;
compress_parameter_parallel(argv[1], argv[2], atoi(argv[3]), atoi(argv[4]));
return 0; return 0;
} }
} }
......
...@@ -4,17 +4,28 @@ ...@@ -4,17 +4,28 @@
## Compilation environment requirements ## Compilation environment requirements
- OS: CentOS 7 | module | version |
- GCC: 4.8.2 and later | :--------------------------: | :----------------------------------------------------------: |
- Golang: 1.9.2 and later | OS | CentOS 7 |
- Git:2.17.1 and later | gcc | 4.8.5 and later |
- CMake:3.2.2 and later | gcc-c++ | 4.8.5 and later |
- Python:2.7.2 and later / 3.6 and later | git | 3.82 and later |
| cmake | 3.2.0 and later |
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you: | Python | 2.7.2 and later / 3.6 and later |
| Go | 1.9.2 and later |
- CPU: `hub.baidubce.com/paddlepaddle/serving:latest-devel`,dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel) | git | 2.17.1 and later |
- GPU: `hub.baidubce.com/paddlepaddle/serving:latest-gpu-devel`,dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel) | glibc-static | 2.17 |
| openssl-devel | 1.0.2k |
| bzip2-devel | 1.0.6 and later |
| python-devel / python3-devel | 2.7.5 and later / 3.6.8 and later |
| sqlite-devel | 3.7.17 and later |
| patchelf | 0.9 and later |
| libXext | 1.3.3 |
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
| python-whl | numpy>=1.12, <=1.16.4<br/>google>=2.0.3<br/>protobuf>=3.12.2<br/>grpcio-tools>=1.28.1<br/>grpcio>=1.28.1<br/>func-timeout>=4.3.5<br/>pyyaml>=1.3.0<br/>sentencepiece==0.1.92<br>flask>=1.1.2<br>ujson>=2.0.3 |
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](DOCKER_IMAGES.md).
This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake: This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake:
...@@ -29,6 +40,9 @@ git clone https://github.com/PaddlePaddle/Serving ...@@ -29,6 +40,9 @@ git clone https://github.com/PaddlePaddle/Serving
cd Serving && git submodule update --init --recursive cd Serving && git submodule update --init --recursive
``` ```
## PYTHONROOT Setting ## PYTHONROOT Setting
```shell ```shell
...@@ -38,12 +52,24 @@ export PYTHONROOT=/usr/ ...@@ -38,12 +52,24 @@ export PYTHONROOT=/usr/
In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`. In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
## Install Python dependencies
```shell
pip install -r python/requirements.txt
```
If Python3 is used, replace `pip` with `pip3`.
## Compile Server ## Compile Server
### Integrated CPU version paddle inference library ### Integrated CPU version paddle inference library
``` shell ``` shell
mkdir build && cd build mkdir server-build-cpu && cd server-build-cpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON ..
make -j10 make -j10
``` ```
...@@ -53,7 +79,7 @@ you can execute `make install` to put targets under directory `./output`, you ne ...@@ -53,7 +79,7 @@ you can execute `make install` to put targets under directory `./output`, you ne
### Integrated GPU version paddle inference library ### Integrated GPU version paddle inference library
``` shell ``` shell
mkdir build && cd build mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON -DWITH_GPU=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON -DWITH_GPU=ON ..
make -j10 make -j10
``` ```
...@@ -62,33 +88,42 @@ execute `make install` to put targets under directory `./output` ...@@ -62,33 +88,42 @@ execute `make install` to put targets under directory `./output`
**Attention:** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details. **Attention:** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details.
## Compile Client ## Compile Client
``` shell ``` shell
mkdir build && cd build mkdir client-build && cd client-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT=ON ..
make -j10 make -j10
``` ```
execute `make install` to put targets under directory `./output` execute `make install` to put targets under directory `./output`
## Compile the App ## Compile the App
```bash ```bash
mkdir build && cd build mkdir app-build && cd app-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DAPP=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DAPP=ON ..
make make
``` ```
## Install wheel package ## Install wheel package
Regardless of the client, server or App part, after compiling, install the whl package under `python/dist/`. Regardless of the client, server or App part, after compiling, install the whl package under `python/dist/`.
## Note ## Note
When running the python server, it will check the `SERVING_BIN` environment variable. If you want to use your own compiled binary file, set the environment variable to the path of the corresponding binary file, usually`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`. When running the python server, it will check the `SERVING_BIN` environment variable. If you want to use your own compiled binary file, set the environment variable to the path of the corresponding binary file, usually`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`.
## CMake Option Description ## CMake Option Description
| Compile Options | Description | Default | | Compile Options | Description | Default |
......
...@@ -4,17 +4,28 @@ ...@@ -4,17 +4,28 @@
## 编译环境设置 ## 编译环境设置
- OS: CentOS 7 | 组件 | 版本要求 |
- GCC: 4.8.2及以上 | :--------------------------: | :----------------------------------------------------------: |
- Golang: 1.9.2及以上 | OS | CentOS 7 |
- Git:2.17.1及以上 | gcc | 4.8.5 and later |
- CMake:3.2.2及以上 | gcc-c++ | 4.8.5 and later |
- Python:2.7.2及以上 / 3.6及以上 | git | 3.82 and later |
| cmake | 3.2.0 and later |
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境: | Python | 2.7.2 and later / 3.6 and later |
| Go | 1.9.2 and later |
- CPU: `hub.baidubce.com/paddlepaddle/serving:latest-devel`,dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel) | git | 2.17.1 and later |
- GPU: `hub.baidubce.com/paddlepaddle/serving:latest-gpu-devel`,dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel) | glibc-static | 2.17 |
| openssl-devel | 1.0.2k |
| bzip2-devel | 1.0.6 and later |
| python-devel / python3-devel | 2.7.5 and later / 3.6.8 and later |
| sqlite-devel | 3.7.17 and later |
| patchelf | 0.9 |
| libXext | 1.3.3 |
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
| python-whl | numpy>=1.12, <=1.16.4<br/>google>=2.0.3<br/>protobuf>=3.12.2<br/>grpcio-tools>=1.28.1<br/>grpcio>=1.28.1<br/>func-timeout>=4.3.5<br/>pyyaml>=1.3.0<br/>sentencepiece==0.1.92<br/>flask>=1.1.2<br/>ujson>=2.0.3 |
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境,详见[该文档](DOCKER_IMAGES_CN.md)
本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译,只需要调整cmake的Python相关选项即可: 本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译,只需要调整cmake的Python相关选项即可:
...@@ -29,6 +40,9 @@ git clone https://github.com/PaddlePaddle/Serving ...@@ -29,6 +40,9 @@ git clone https://github.com/PaddlePaddle/Serving
cd Serving && git submodule update --init --recursive cd Serving && git submodule update --init --recursive
``` ```
## PYTHONROOT设置 ## PYTHONROOT设置
```shell ```shell
...@@ -38,12 +52,24 @@ export PYTHONROOT=/usr/ ...@@ -38,12 +52,24 @@ export PYTHONROOT=/usr/
我们提供默认Centos7的Python路径为`/usr/bin/python`,如果您要使用我们的Centos6镜像,需要将其设置为`export PYTHONROOT=/usr/local/python2.7/` 我们提供默认Centos7的Python路径为`/usr/bin/python`,如果您要使用我们的Centos6镜像,需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`
## 安装Python依赖
```shell
pip install -r python/requirements.txt
```
如果使用 Python3,请以 `pip3` 替换 `pip`
## 编译Server部分 ## 编译Server部分
### 集成CPU版本Paddle Inference Library ### 集成CPU版本Paddle Inference Library
``` shell ``` shell
mkdir build && cd build mkdir server-build-cpu && cd server-build-cpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON ..
make -j10 make -j10
``` ```
...@@ -53,7 +79,7 @@ make -j10 ...@@ -53,7 +79,7 @@ make -j10
### 集成GPU版本Paddle Inference Library ### 集成GPU版本Paddle Inference Library
``` shell ``` shell
mkdir build && cd build mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON -DWITH_GPU=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON -DWITH_GPU=ON ..
make -j10 make -j10
``` ```
...@@ -62,32 +88,42 @@ make -j10 ...@@ -62,32 +88,42 @@ make -j10
**注意:** 编译成功后,需要设置`SERVING_BIN`路径,详见后面的[注意事项](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE_CN.md#注意事项) **注意:** 编译成功后,需要设置`SERVING_BIN`路径,详见后面的[注意事项](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE_CN.md#注意事项)
## 编译Client部分 ## 编译Client部分
``` shell ``` shell
mkdir build && cd build mkdir client-build && cd client-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCLIENT=ON ..
make -j10 make -j10
``` ```
执行`make install`可以把目标产出放在`./output`目录下。 执行`make install`可以把目标产出放在`./output`目录下。
## 编译App部分 ## 编译App部分
```bash ```bash
mkdir build && cd build mkdir app-build && cd app-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCMAKE_INSTALL_PREFIX=./output -DAPP=ON .. cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DCMAKE_INSTALL_PREFIX=./output -DAPP=ON ..
make make
``` ```
## 安装wheel包 ## 安装wheel包
无论是Client端,Server端还是App部分,编译完成后,安装`python/dist/`下的whl包即可。 无论是Client端,Server端还是App部分,编译完成后,安装`python/dist/`下的whl包即可。
## 注意事项 ## 注意事项
运行python端Server时,会检查`SERVING_BIN`环境变量,如果想使用自己编译的二进制文件,请将设置该环境变量为对应二进制文件的路径,通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving` 运行python端Server时,会检查`SERVING_BIN`环境变量,如果想使用自己编译的二进制文件,请将设置该环境变量为对应二进制文件的路径,通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`
## CMake选项说明 ## CMake选项说明
| 编译选项 | 说明 | 默认 | | 编译选项 | 说明 | 默认 |
......
...@@ -68,7 +68,7 @@ Paddle Serving uses this [Git branching model](http://nvie.com/posts/a-successfu ...@@ -68,7 +68,7 @@ Paddle Serving uses this [Git branching model](http://nvie.com/posts/a-successfu
1. Build and test 1. Build and test
Users can build Paddle Serving natively on Linux, see the [BUILD steps](doc/INSTALL.md). Users can build Paddle Serving natively on Linux, see the [BUILD steps](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md).
1. Keep pulling 1. Keep pulling
......
...@@ -6,7 +6,8 @@ ...@@ -6,7 +6,8 @@
There are two examples on CTR under python / examples, they are criteo_ctr, criteo_ctr_with_cube. The former is to save the entire model during training, including sparse parameters. The latter is to cut out the sparse parameters and save them into two parts, one is the sparse parameter and the other is the dense parameter. Because the scale of sparse parameters is very large in industrial cases, reaching the order of 10 ^ 9. Therefore, it is not practical to start large-scale sparse parameter prediction on one machine. Therefore, we introduced Baidu's industrial-grade product Cube to provide the sparse parameter service for many years to provide distributed sparse parameter services. There are two examples on CTR under python / examples, they are criteo_ctr, criteo_ctr_with_cube. The former is to save the entire model during training, including sparse parameters. The latter is to cut out the sparse parameters and save them into two parts, one is the sparse parameter and the other is the dense parameter. Because the scale of sparse parameters is very large in industrial cases, reaching the order of 10 ^ 9. Therefore, it is not practical to start large-scale sparse parameter prediction on one machine. Therefore, we introduced Baidu's industrial-grade product Cube to provide the sparse parameter service for many years to provide distributed sparse parameter services.
The local mode of Cube is different from distributed Cube, which is designed to be convenient for developers to use in experiments and demos. If there is a demand for distributed sparse parameter service, please continue reading [Distributed Cube User Guide](./Distributed_Cube) after reading this document (still developing). The local mode of Cube is different from distributed Cube, which is designed to be convenient for developers to use in experiments and demos.
<!--If there is a demand for distributed sparse parameter service, please continue reading [Distributed Cube User Guide](./Distributed_Cube) after reading this document (still developing).-->
This document uses the original model without any compression algorithm. If there is a need for a quantitative model to go online, please read the [Quantization Storage on Cube Sparse Parameter Indexing](./CUBE_QUANT.md) This document uses the original model without any compression algorithm. If there is a need for a quantitative model to go online, please read the [Quantization Storage on Cube Sparse Parameter Indexing](./CUBE_QUANT.md)
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
在python/examples下有两个关于CTR的示例,他们分别是criteo_ctr, criteo_ctr_with_cube。前者是在训练时保存整个模型,包括稀疏参数。后者是将稀疏参数裁剪出来,保存成两个部分,一个是稀疏参数,另一个是稠密参数。由于在工业级的场景中,稀疏参数的规模非常大,达到10^9数量级。因此在一台机器上启动大规模稀疏参数预测是不实际的,因此我们引入百度多年来在稀疏参数索引领域的工业级产品Cube,提供分布式的稀疏参数服务。 在python/examples下有两个关于CTR的示例,他们分别是criteo_ctr, criteo_ctr_with_cube。前者是在训练时保存整个模型,包括稀疏参数。后者是将稀疏参数裁剪出来,保存成两个部分,一个是稀疏参数,另一个是稠密参数。由于在工业级的场景中,稀疏参数的规模非常大,达到10^9数量级。因此在一台机器上启动大规模稀疏参数预测是不实际的,因此我们引入百度多年来在稀疏参数索引领域的工业级产品Cube,提供分布式的稀疏参数服务。
单机版Cube是分布式Cube的弱化版本,旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求,请在读完此文档之后,继续阅读 [稀疏参数索引服务Cube使用指南](分布式Cube)(正在建设中)。 <!--单机版Cube是分布式Cube的弱化版本,旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求,请在读完此文档之后,继续阅读 [稀疏参数索引服务Cube使用指南](分布式Cube)(正在建设中)。-->
本文档使用的都是未经过任何压缩算法处理的原始模型,如果有量化模型上线需求,请阅读[Cube稀疏参数索引量化存储使用指南](./CUBE_QUANT_CN.md) 本文档使用的都是未经过任何压缩算法处理的原始模型,如果有量化模型上线需求,请阅读[Cube稀疏参数索引量化存储使用指南](./CUBE_QUANT_CN.md)
......
...@@ -42,7 +42,7 @@ cd python/examples/criteo_ctr_with_cube ...@@ -42,7 +42,7 @@ cd python/examples/criteo_ctr_with_cube
python local_train.py python local_train.py
cp ../../../build_server/core/predictor/seq_generator seq_generator cp ../../../build_server/core/predictor/seq_generator seq_generator
cp ../../../build_server/output/bin/cube* ./cube/ cp ../../../build_server/output/bin/cube* ./cube/
sh cube_prepare_quant.sh & sh cube_quant_prepare.sh &
python test_server_quant.py ctr_serving_model_kv & python test_server_quant.py ctr_serving_model_kv &
python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
``` ```
......
...@@ -42,7 +42,7 @@ cd python/examples/criteo_ctr_with_cube ...@@ -42,7 +42,7 @@ cd python/examples/criteo_ctr_with_cube
python local_train.py python local_train.py
cp ../../../build_server/core/predictor/seq_generator seq_generator cp ../../../build_server/core/predictor/seq_generator seq_generator
cp ../../../build_server/output/bin/cube* ./cube/ cp ../../../build_server/output/bin/cube* ./cube/
sh cube_prepare_quant.sh & sh cube_quant_prepare.sh &
python test_server_quant.py ctr_serving_model_kv & python test_server_quant.py ctr_serving_model_kv &
python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
``` ```
......
...@@ -106,7 +106,7 @@ class FluidFamilyCore { ...@@ -106,7 +106,7 @@ class FluidFamilyCore {
![预测服务Service](predict-service.png) ![预测服务Service](predict-service.png)
关于OP之间的依赖关系,以及通过OP组建workflow,可以参考[从零开始写一个预测服务](CREATING.md)的相关章节 关于OP之间的依赖关系,以及通过OP组建workflow,可以参考[从零开始写一个预测服务](https://github.com/PaddlePaddle/Serving/blob/develop/doc/deprecated/CREATING.md)的相关章节
服务端实例透视图 服务端实例透视图
......
# Docker Images
([简体中文](DOCKER_IMAGES_CN.md)|English)
This document maintains a list of docker images provided by Paddle Serving.
## Get docker image
You can get images in two ways:
1. Pull image directly from `hub.baidubce.com ` or `docker.io` through TAG:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
docker pull paddlepaddle/serving:<TAG> # hub.docker.com
```
2. Building image based on dockerfile
Create a new folder and copy Dockerfile to this folder, and run the following command:
```shell
docker build -t <image-name>:<images-tag> .
```
## Image description
Runtime images cannot be used for compilation.
| Description | OS | TAG | Dockerfile |
| :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
| CPU runtime | CentOS7 | latest | [Dockerfile](../tools/Dockerfile) |
| CPU development | CentOS7 | latest-devel | [Dockerfile.devel](../tools/Dockerfile.devel) |
| GPU (cuda9.0-cudnn7) runtime | CentOS7 | latest-cuda9.0-cudnn7 | [Dockerfile.cuda9.0-cudnn7](../tools/Dockerfile.cuda9.0-cudnn7) |
| GPU (cuda9.0-cudnn7) development | CentOS7 | latest-cuda9.0-cudnn7-devel | [Dockerfile.cuda9.0-cudnn7.devel](../tools/Dockerfile.cuda9.0-cudnn7.devel) |
| GPU (cuda10.0-cudnn7) runtime | CentOS7 | latest-cuda10.0-cudnn7 | [Dockerfile.cuda10.0-cudnn7](../tools/Dockerfile.cuda10.0-cudnn7) |
| GPU (cuda10.0-cudnn7) development | CentOS7 | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
| CPU development (Used to compile packages on Ubuntu) | CentOS6 | <None> | [Dockerfile.centos6.devel](../tools/Dockerfile.centos6.devel) |
| GPU (cuda9.0-cudnn7) development (Used to compile packages on Ubuntu) | CentOS6 | <None> | [Dockerfile.centos6.cuda9.0-cudnn7.devel](../tools/Dockerfile.centos6.cuda9.0-cudnn7.devel) |
# Docker 镜像
(简体中文|[English](DOCKER_IMAGES.md))
该文档维护了 Paddle Serving 提供的镜像列表。
## 获取镜像
您可以通过两种方式获取镜像。
1. 通过 TAG 直接从 `hub.baidubce.com ``docker.io` 拉取镜像:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
docker pull paddlepaddle/serving:<TAG> # hub.docker.com
```
2. 基于 Dockerfile 构建镜像
建立新目录,复制对应 Dockerfile 内容到该目录下 Dockerfile 文件。执行
```shell
docker build -t <image-name>:<images-tag> .
```
## 镜像说明
运行时镜像不能用于开发编译。
| 镜像说明 | 操作系统 | TAG | Dockerfile |
| -------------------------------------------------- | -------- | ---------------------------- | ------------------------------------------------------------ |
| CPU 运行镜像 | CentOS7 | latest | [Dockerfile](../tools/Dockerfile) |
| CPU 开发镜像 | CentOS7 | latest-devel | [Dockerfile.devel](../tools/Dockerfile.devel) |
| GPU (cuda9.0-cudnn7) 运行镜像 | CentOS7 | latest-cuda9.0-cudnn7 | [Dockerfile.cuda9.0-cudnn7](../tools/Dockerfile.cuda9.0-cudnn7) |
| GPU (cuda9.0-cudnn7) 开发镜像 | CentOS7 | latest-cuda9.0-cudnn7-devel | [Dockerfile.cuda9.0-cudnn7.devel](../tools/Dockerfile.cuda9.0-cudnn7.devel) |
| GPU (cuda10.0-cudnn7) 运行镜像 | CentOS7 | latest-cuda10.0-cudnn7 | [Dockerfile.cuda10.0-cudnn7](../tools/Dockerfile.cuda10.0-cudnn7) |
| GPU (cuda10.0-cudnn7) 开发镜像 | CentOS7 | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
| CPU 开发镜像 (用于编译 Ubuntu 包) | CentOS6 | <无> | [Dockerfile.centos6.devel](../tools/Dockerfile.centos6.devel) |
| GPU (cuda9.0-cudnn7) 开发镜像 (用于编译 Ubuntu 包) | CentOS6 | <无> | [Dockerfile.centos6.cuda9.0-cudnn7.devel](../tools/Dockerfile.centos6.cuda9.0-cudnn7.devel) |
...@@ -12,4 +12,7 @@ ...@@ -12,4 +12,7 @@
client.load_client_config(sys.argv[1]) client.load_client_config(sys.argv[1])
client.set_rpc_timeout_ms(100000) client.set_rpc_timeout_ms(100000)
client.connect(["127.0.0.1:9393"]) client.connect(["127.0.0.1:9393"])
``` ```
- Q: 如何使用自己编译的Paddle Serving进行预测?
A:通过pip命令安装自己编译出的whl包,并设置SERVING_BIN环境变量为编译出的serving二进制文件路径。
# gRPC接口
gRPC 接口实现形式类似 Web Service:
![](grpc_impl.png)
## 与bRPC接口对比
1. gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数:
```python
def load_model_config(self, server_config_paths, client_config_path=None)
```
在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能是不同的(如 cube local 例子中,Client 端的数据先交给 cube,经过 cube 处理后再交给预测库),所以 gRPC Server 端需要获取 gRPC Client 端的配置;同时为了取消 gRPC Client 端手动加载配置文件的过程,所以设计 gRPC Server 端同时加载两个配置文件。`client_config_path` 默认为 `<server_config_path>/serving_server_conf.prototxt`
2. gRPC Client 端取消 `load_client_config` 步骤:
`connect` 步骤通过 RPC 获取相应的 prototxt(从任意一个 endpoint 获取即可)。
3. gRPC Client 需要通过 RPC 方式设置 timeout 时间(调用形式与 bRPC Client保持一致)
因为 bRPC Client 在 `connect` 后无法更改 timeout 时间,所以当 gRPC Server 收到变更 timeout 的调用请求时会重新创建 bRPC Client 实例以变更 bRPC Client timeout时间,同时 gRPC Client 会设置 gRPC 的 deadline 时间。
**注意,设置 timeout 接口和 Inference 接口不能同时调用(非线程安全),出于性能考虑暂时不加锁。**
4. gRPC Client 端 `predict` 函数添加 `asyn``is_python` 参数:
```python
def predict(self, feed, fetch, need_variant_tag=False, asyn=False, is_python=True)
```
其中,`asyn` 为异步调用选项。当 `asyn=True` 时为异步调用,返回 `MultiLangPredictFuture` 对象,通过 `MultiLangPredictFuture.result()` 阻塞获取预测值;当 `asyn=Fasle` 为同步调用。
`is_python` 为 proto 格式选项。当 `is_python=True` 时,基于 numpy bytes 格式进行数据传输,目前只适用于 Python;当 `is_python=False` 时,以普通数据格式传输,更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多(详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654))。
5. 异常处理:当 gRPC Server 端的 bRPC Client 预测失败(返回 `None`)时,gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获,并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常(参考 timeout 样例)。
6. 由于 gRPC 只支持 pick_first 和 round_robin 负载均衡策略,ABTEST 特性还未打齐。
7. 经测试,gRPC 版本可以在 Windows、macOS 平台使用。
8. 计划支持的客户端语言:
- [x] Python
- [ ] Java
- [ ] Go
- [ ] JavaScript
## Python 端的一些例子
详见 `python/examples/grpc_impl_example` 下的示例文件。
# Paddle Serving Client Java SDK
([简体中文](JAVA_SDK_CN.md)|English)
Paddle Serving provides Java SDK,which supports predict on the Client side with Java language. This document shows how to use the Java SDK.
## Getting started
### Prerequisites
```
- Java 8 or higher
- Apache Maven
```
The following table shows compatibilities between Paddle Serving Server and Java SDK.
| Paddle Serving Server version | Java SDK version |
| :---------------------------: | :--------------: |
| 0.3.2 | 0.0.1 |
### Install Java SDK
You can download jar and install it to the local Maven repository:
```shell
wget https://paddle-serving.bj.bcebos.com/jar/paddle-serving-sdk-java-0.0.1.jar
mvn install:install-file -Dfile=$PWD/paddle-serving-sdk-java-0.0.1.jar -DgroupId=io.paddle.serving.client -DartifactId=paddle-serving-sdk-java -Dversion=0.0.1 -Dpackaging=jar
```
Or compile from the source code and install it to the local Maven repository:
```shell
cd Serving/java
mvn compile
mvn install
```
### Maven configure
```text
<dependency>
<groupId>io.paddle.serving.client</groupId>
<artifactId>paddle-serving-sdk-java</artifactId>
<version>0.0.1</version>
</dependency>
```
## Example
Here we will show how to use Java SDK for Boston house price prediction. Please refer to [examples](../java/examples) folder for more examples.
### Get model
```shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
### Start Python Server
```shell
python -m paddle_serving_server.serve --model uci_housing_model --port 9393 --use_multilang
```
#### Client side code example
```java
import io.paddle.serving.client.*;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import java.util.*;
public class PaddleServingClientExample {
public static void main( String[] args ) {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return ;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
System.out.println("predict failed.");
return ;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return ;
}
}
```
# Paddle Serving Client Java SDK
(简体中文|[English](JAVA_SDK.md))
Paddle Serving 提供了 Java SDK,支持 Client 端用 Java 语言进行预测,本文档说明了如何使用 Java SDK。
## 快速开始
### 环境要求
```
- Java 8 or higher
- Apache Maven
```
下表显示了 Paddle Serving Server 和 Java SDK 之间的兼容性
| Paddle Serving Server version | Java SDK version |
| :---------------------------: | :--------------: |
| 0.3.2 | 0.0.1 |
### 安装
您可以直接下载 jar,安装到本地 Maven 库:
```shell
wget https://paddle-serving.bj.bcebos.com/jar/paddle-serving-sdk-java-0.0.1.jar
mvn install:install-file -Dfile=$PWD/paddle-serving-sdk-java-0.0.1.jar -DgroupId=io.paddle.serving.client -DartifactId=paddle-serving-sdk-java -Dversion=0.0.1 -Dpackaging=jar
```
或者从源码进行编译,安装到本地 Maven 库:
```shell
cd Serving/java
mvn compile
mvn install
```
### Maven 配置
```text
<dependency>
<groupId>io.paddle.serving.client</groupId>
<artifactId>paddle-serving-sdk-java</artifactId>
<version>0.0.1</version>
</dependency>
```
## 使用样例
这里将展示如何使用 Java SDK 进行房价预测,更多例子详见 [examples](../java/examples) 文件夹。
### 获取房价预测模型
```shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
### 启动 Python 端 Server
```shell
python -m paddle_serving_server.serve --model uci_housing_model --port 9393 --use_multilang
```
### Client 端代码示例
```java
import io.paddle.serving.client.*;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import java.util.*;
public class PaddleServingClientExample {
public static void main( String[] args ) {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return ;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
System.out.println("predict failed.");
return ;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return ;
}
}
```
...@@ -3,45 +3,51 @@ ...@@ -3,45 +3,51 @@
## CPU server ## CPU server
### Python 3 ### Python 3
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.0-py3-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.2-py3-none-any.whl
``` ```
### Python 2 ### Python 2
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.0-py2-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.2-py2-none-any.whl
``` ```
## GPU server ## GPU server
### Python 3 ### Python 3
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.0-py3-none-any.whl #cuda 9.0
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.2.post9-py3-none-any.whl
#cuda 10.0
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.2.post10-py3-none-any.whl
``` ```
### Python 2 ### Python 2
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.0-py2-none-any.whl #cuda 9.0
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.2.post9-py2-none-any.whl
#cuda 10.0
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.2.post10-py2-none-any.whl
``` ```
## Client ## Client
### Python 3.7 ### Python 3.7
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp37-none-manylinux1_x86_64.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.2-cp37-none-any.whl
``` ```
### Python 3.6 ### Python 3.6
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp36-none-manylinux1_x86_64.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.2-cp36-none-any.whl
``` ```
### Python 2.7 ### Python 2.7
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp27-none-manylinux1_x86_64.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.2-cp27-none-any.whl
``` ```
## App ## App
### Python 3 ### Python 3
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.0-py3-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.2-py3-none-any.whl
``` ```
### Python 2 ### Python 2
``` ```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.0-py2-none-any.whl https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.2-py2-none-any.whl
``` ```
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
([简体中文](NEW_WEB_SERVICE_CN.md)|English) ([简体中文](NEW_WEB_SERVICE_CN.md)|English)
This document will take the image classification service based on the Imagenet data set as an example to introduce how to develop a new web service. The complete code can be visited at [here](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imagenet/image_classification_service.py). This document will take the image classification service based on the Imagenet data set as an example to introduce how to develop a new web service. The complete code can be visited at [here](../python/examples/imagenet/resnet50_web_service.py).
## WebService base class ## WebService base class
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
(简体中文|[English](NEW_WEB_SERVICE.md)) (简体中文|[English](NEW_WEB_SERVICE.md))
本文档将以Imagenet图像分类服务为例,来介绍如何开发一个新的Web Service。您可以在[这里](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imagenet/image_classification_service.py)查阅完整的代码。 本文档将以Imagenet图像分类服务为例,来介绍如何开发一个新的Web Service。您可以在[这里](../python/examples/imagenet/resnet50_web_service.py)查阅完整的代码。
## WebService基类 ## WebService基类
......
...@@ -14,7 +14,35 @@ Under the same conditions, the communication time of the HTTP prediction service ...@@ -14,7 +14,35 @@ Under the same conditions, the communication time of the HTTP prediction service
Parameters for performance optimization: Parameters for performance optimization:
The memory/graphic memory optimization option is enabled by default in Paddle Serving, which can reduce the memory/video memory usage and usually does not affect performance. If you need to turn it off, you can use --mem_optim_off in the command line.
r_optim can optimize the calculation graph and increase the inference speed. It is turned off by default and turned on by --ir_optim in the command line.
| Parameters | Type | Default | Description | | Parameters | Type | Default | Description |
| ---------- | ---- | ------- | ------------------------------------------------------------ | | ---------- | ---- | ------- | ------------------------------------------------------------ |
| mem_optim | - | - | Enable memory / graphic memory optimization | | mem_optim_off | - | - | Disable memory / graphic memory optimization |
| ir_optim | - | - | Enable analysis and optimization of calculation graph,including OP fusion, etc | | ir_optim | - | - | Enable analysis and optimization of calculation graph,including OP fusion, etc |
For the mode of using Python code to start the prediction service, the API of the above two parameters is as follows:
RPC Service
```
from paddle_serving_server import Server
server = Server()
...
server.set_memory_optimize(mem_optim)
server.set_ir_optimize(ir_optim)
...
```
HTTP Service
```
from paddle_serving_server import WebService
class NewService(WebService):
...
new_service = NewService(name="new")
...
new_service.prepare_server(mem_optim=True, ir_optim=False)
...
```
...@@ -14,7 +14,33 @@ ...@@ -14,7 +14,33 @@
性能优化相关参数: 性能优化相关参数:
Paddle Serving中默认开启内存/显存优化选项,可以减少对内存/显存的占用,通常不会对性能造成影响,如果需要关闭可以在命令行启动模式中使用--mem_optim_off。
ir_optim可以优化计算图,提升推理速度,默认关闭,在命令行启动的模式中通过--ir_optim开启。
| 参数 | 类型 | 默认值 | 含义 | | 参数 | 类型 | 默认值 | 含义 |
| --------- | ---- | ------ | -------------------------------- | | --------- | ---- | ------ | -------------------------------- |
| mem_optim | - | - | 开启内存/显存优化 | | mem_optim_off | - | - | 关闭内存/显存优化 |
| ir_optim | - | - | 开启计算图分析优化,包括OP融合等 | | ir_optim | - | - | 开启计算图分析优化,包括OP融合等 |
对于使用Python代码启动预测服务的模式,以上两个参数的接口如下:
RPC服务
```
from paddle_serving_server import Server
server = Server()
...
server.set_memory_optimize(mem_optim)
server.set_ir_optimize(ir_optim)
...
```
HTTP服务
```
from paddle_serving_server import WebService
class NewService(WebService):
...
new_service = NewService(name="new")
...
new_service.prepare_server(mem_optim=True, ir_optim=False)
...
```
# Pipeline Serving
([简体中文](PIPELINE_SERVING_CN.md)|English)
Paddle Serving is usually used for the deployment of single model, but the end-to-end deep learning model can not solve all the problems at present. Usually, it is necessary to use multiple deep learning models to solve practical problems.
Paddle Serving provides a user-friendly programming framework for multi-model composite services, Pipeline Serving, which aims to reduce the threshold of programming, improve resource utilization (especially GPU), and improve the prediction efficiency.
## Architecture Design
The Server side is built based on gRPC and graph execution engine. The relationship between them is shown in the following figure.
<center>
<img src='pipeline_serving-image1.png' height = "250" align="middle"/>
</center>
### Graph Execution Engine
The graph execution engine consists of OPs and Channels, and the connected OPs share one Channel.
- Channel can be understood as a buffer queue. Each OP accepts only one Channel input and multiply Channel outputs (each output is the same); a Channel can contain outputs from multiple OPs, and data from the same Channel can be used as input for multiple OPs.
- Users only need to define relationships between OPs. Graph engine will analyze the dependencies of the entire graph and declaring Channels at the compile time.
- After Request data enters the graph execution engine service, the graph engine will generator an Request ID, and Reponse is returned through corresponding Request ID.
- For cases where large data needs to be transferred between OPs, consider RAM DB external memory for global storage and data transfer by passing index keys in Channel.
<center>
<img src='pipeline_serving-image2.png' height = "300" align="middle"/>
</center>
### OP Design
- The default function of a single OP is to access a single Paddle Serving Service based on the input Channel data and put the result into the output Channel.
- OP supports user customization, including preprocess, process, postprocess functions that can be inherited and implemented by the user.
- OP can set the number of concurrencies to increase the number of concurrencies processed.
- OP can be started by a thread or process.
### Channel Design
- Channel is the data structure for sharing data between OPs, responsible for sharing data or sharing data status information.
- Outputs from multiple OPs can be stored in the same Channel, and data from the same Channel can be used by multiple OPs.
- The following illustration shows the design of Channel in the graph execution engine, using input buffer and output buffer to align data between multiple OP inputs and multiple OP outputs, with a queue in the middle to buffer.
<center>
<img src='pipeline_serving-image3.png' height = "500" align="middle"/>
</center>
### Extreme Case Consideration
- Request timeout
The entire graph execution engine may time out at every step. The graph execution engine controls the time out by setting `timeout` value. Requests that time out at any step will return a timeout response.
- Channel stores too much data
Channels may store too much data, causing copy time to be too high. Graph execution engines can store OP calculation results in external memory, such as high-speed memory KV systems.
- Whether input buffers and output buffers in Channel will increase indefinitely
- It will not increase indefinitely. The input to the entire graph execution engine is placed inside a Channel's internal queue, directly acting as a traffic control buffer queue for the entire service.
- For input buffer, adjust the number of concurrencies of OP1 and OP2 according to the amount of computation, so that the number of input buffers from each input OP is relatively balanced.
- For output buffer, you can use a similar process as input buffer, which adjusts the concurrency of OP3 and OP4 to control the buffer length of output buffer.
- Note: The length of the input buffer depends on the speed at which each item in the internal queue is ready, and the length of the output buffer depends on the speed at which downstream OPs obtain data from the output buffer.
## Detailed Design
### User Interface Design
#### 1. General OP Definition
As the basic unit of graph execution engine, the general OP constructor is as follows:
```python
def __init__(name=None,
input_ops=[],
server_endpoints=[],
fetch_list=[],
client_config=None,
concurrency=1,
timeout=-1,
retry=1)
```
The meaning of each parameter is as follows:
| Parameter | Meaning |
| :--------------: | :----------------------------------------------------------: |
| name | (str) String used to identify the OP type, which must be globally unique. |
| input_ops | (list) A list of all previous OPs of the current Op. |
| server_endpoints | (list) List of endpoints for remote Paddle Serving Service. If this parameter is not set, the OP will not access the remote Paddle Serving Service, that is, the process operation will not be performed. |
| fetch_list | (list) List of fetch variable names for remote Paddle Serving Service. |
| client_config | (str) The path of the client configuration file corresponding to the Paddle Serving Service. |
| concurrency | (int) The number of concurrent OPs. |
| timeout | (int) The timeout time of the process operation, in seconds. If the value is less than zero, no timeout is considered. |
| retry | (int) Timeout number of retries. When the value is 1, no retries are made. |
#### 2. General OP Secondary Development Interface
| Interface or Variable | Explain |
| :--------------------------------------------: | :----------------------------------------------------------: |
| def preprocess(self, input_dicts) | Process the data obtained from the channel, and the processed data will be used as the input of the **process** function. |
| def process(self, feed_dict) | The RPC prediction process is based on the Paddle Serving Client, and the processed data will be used as the input of the **postprocess** function. |
| def postprocess(self, input_dicts, fetch_dict) | After processing the prediction results, the processed data will be put into the subsequent Channel to be obtained by the subsequent OP. |
| def init_op(self) | Used to load resources (such as word dictionary). |
| self.concurrency_idx | Concurrency index of current thread / process (different kinds of OP are calculated separately). |
In a running cycle, OP will execute three operations: preprocess, process, and postprocess (when the `server_endpoints` parameter is not set, the process operation is not executed). Users can rewrite these three functions. The default implementation is as follows:
```python
def preprocess(self, input_dicts):
# multiple previous Op
if len(input_dicts) != 1:
raise NotImplementedError(
'this Op has multiple previous inputs. Please override this func.'
(_, input_dict), = input_dicts.items()
return input_dict
def process(self, feed_dict):
err, err_info = ChannelData.check_npdata(feed_dict)
if err != 0:
raise NotImplementedError(
"{} Please override preprocess func.".format(err_info))
call_result = self.client.predict(
feed=feed_dict, fetch=self._fetch_names)
return call_result
def postprocess(self, input_dicts, fetch_dict):
return fetch_dict
```
The parameter of **preprocess** is the data `input_dicts` in the previous Channel. This variable is a dictionary with the name of the previous OP as key and the output of the corresponding OP as value.
The parameter of **process** is the input variable `fetch_dict` (the return value of the preprocess function) of the Paddle Serving Client prediction interface. This variable is a dictionary with feed_name as the key and the data in the ndarray format as the value.
The parameters of **postprocess** are `input_dicts` and `fetch_dict`. `input_dicts` is consistent with the parameter of preprocess, and `fetch_dict` is the return value of the process function (if process is not executed, this value is the return value of preprocess).
Users can also rewrite the **init_op** function to load some custom resources (such as word dictionary). The default implementation is as follows:
```python
def init_op(self):
pass
```
It should be noted that in the threaded version of OP, each OP will only call this function once, so the loaded resources must be thread safe.
#### 3. RequestOp Definition
RequestOp is used to process RPC data received by Pipeline Server, and the processed data will be added to the graph execution engine. Its constructor is as follows:
```python
def __init__(self)
```
#### 4. RequestOp Secondary Development Interface
| Interface or Variable | Explain |
| :---------------------------------------: | :----------------------------------------------------------: |
| def init_op(self) | It is used to load resources (such as dictionaries), and is consistent with general OP. |
| def unpack_request_package(self, request) | Process received RPC data. |
The default implementation of **unpack_request_package** is to make the key and value in RPC request into a dictionary:
```python
def unpack_request_package(self, request):
dictdata = {}
for idx, key in enumerate(request.key):
data = request.value[idx]
try:
data = eval(data)
except Exception as e:
pass
dictdata[key] = data
return dictdata
```
The return value is required to be a dictionary type.
#### 5. ResponseOp Definition
ResponseOp is used to process the prediction results of the graph execution engine. The processed data will be used as the RPC return value of Pipeline Server. Its constructor is as follows:
```python
def __init__(self, input_ops)
```
`input_ops` is the last OP of graph execution engine. Users can construct different DAGs by setting different `input_ops` without modifying the topology of OPs.
#### 6. ResponseOp Secondary Development Interface
| Interface or Variable | Explain |
| :------------------------------------------: | :----------------------------------------------------------: |
| def init_op(self) | It is used to load resources (such as dictionaries), and is consistent with general OP. |
| def pack_response_package(self, channeldata) | Process the prediction results of the graph execution engine as the return of RPC. |
The default implementation of **pack_response_package** is to convert the dictionary of prediction results into key and value in RPC response:
```python
def pack_response_package(self, channeldata):
resp = pipeline_service_pb2.Response()
resp.ecode = channeldata.ecode
if resp.ecode == ChannelDataEcode.OK.value:
if channeldata.datatype == ChannelDataType.CHANNEL_NPDATA.value:
feed = channeldata.parse()
np.set_printoptions(threshold=np.nan)
for name, var in feed.items():
resp.value.append(var.__repr__())
resp.key.append(name)
elif channeldata.datatype == ChannelDataType.DICT.value:
feed = channeldata.parse()
for name, var in feed.items():
if not isinstance(var, str):
resp.ecode = ChannelDataEcode.TYPE_ERROR.value
resp.error_info = self._log(
"fetch var type must be str({}).".format(type(var)))
break
resp.value.append(var)
resp.key.append(name)
else:
resp.ecode = ChannelDataEcode.TYPE_ERROR.value
resp.error_info = self._log(
"Error type({}) in datatype.".format(channeldata.datatype))
else:
resp.error_info = channeldata.error_info
return resp
```
#### 7. PipelineServer Definition
The definition of PipelineServer is relatively simple, as follows:
```python
server = PipelineServer()
server.set_response_op(response_op)
server.prepare_server(config_yml_path)
server.run_server()
```
Where `response_op` is the responseop mentioned above, PipelineServer will initialize Channels according to the topology relationship of each OP and build the calculation graph. `config_yml_path` is the configuration file of PipelineServer. The example file is as follows:
```yaml
port: 18080 # gRPC port
worker_num: 1 # gRPC thread pool size (the number of processes in the process version servicer). The default is 1
build_dag_each_worker: false # Whether to use process server or not. The default is false
dag:
is_thread_op: true # Whether to use the thread version of OP. The default is true
client_type: brpc # Use brpc or grpc client. The default is brpc
retry: 1 # The number of times DAG executor retries after failure. The default value is 1, that is, no retrying
use_profile: false # Whether to print the log on the server side. The default is false
```
## Example
Here, we build a simple imdb model enable example to show how to use Pipeline Serving. The relevant code can be found in the `python/examples/pipeline/imdb_model_ensemble` folder. The Server-side structure in the example is shown in the following figure:
<center>
<img src='pipeline_serving-image4.png' height = "200" align="middle"/>
</center>
### Get the model file and start the Paddle Serving Service
```shell
cd python/examples/pipeline/imdb_model_ensemble
sh get_data.sh
python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 &> cnn.log &
python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.log &
```
### Start PipelineServer
Run the following code
```python
from paddle_serving_server.pipeline import Op, RequestOp, ResponseOp
from paddle_serving_server.pipeline import PipelineServer
from paddle_serving_server.pipeline.proto import pipeline_service_pb2
from paddle_serving_server.pipeline.channel import ChannelDataEcode
import numpy as np
import logging
from paddle_serving_app.reader import IMDBDataset
logging.basicConfig(level=logging.DEBUG)
_LOGGER = logging.getLogger()
class ImdbRequestOp(RequestOp):
def init_op(self):
self.imdb_dataset = IMDBDataset()
self.imdb_dataset.load_resource('imdb.vocab')
def unpack_request_package(self, request):
dictdata = {}
for idx, key in enumerate(request.key):
if key != "words":
continue
words = request.value[idx]
word_ids, _ = self.imdb_dataset.get_words_and_label(words)
dictdata[key] = np.array(word_ids)
return dictdata
class CombineOp(Op):
def preprocess(self, input_data):
combined_prediction = 0
for op_name, data in input_data.items():
_LOGGER.info("{}: {}".format(op_name, data["prediction"]))
combined_prediction += data["prediction"]
data = {"prediction": combined_prediction / 2}
return data
read_op = ImdbRequestOp()
bow_op = Op(name="bow",
input_ops=[read_op],
server_endpoints=["127.0.0.1:9393"],
fetch_list=["prediction"],
client_config="imdb_bow_client_conf/serving_client_conf.prototxt",
concurrency=1,
timeout=-1,
retry=1)
cnn_op = Op(name="cnn",
input_ops=[read_op],
server_endpoints=["127.0.0.1:9292"],
fetch_list=["prediction"],
client_config="imdb_cnn_client_conf/serving_client_conf.prototxt",
concurrency=1,
timeout=-1,
retry=1)
combine_op = CombineOp(
name="combine",
input_ops=[bow_op, cnn_op],
concurrency=5,
timeout=-1,
retry=1)
# use default ResponseOp implementation
response_op = ResponseOp(input_ops=[combine_op])
server = PipelineServer()
server.set_response_op(response_op)
server.prepare_server('config.yml')
server.run_server()
```
### Perform prediction through PipelineClient
```python
from paddle_serving_client.pipeline import PipelineClient
import numpy as np
client = PipelineClient()
client.connect(['127.0.0.1:18080'])
words = 'i am very sad | 0'
futures = []
for i in range(3):
futures.append(
client.predict(
feed_dict={"words": words},
fetch=["prediction"],
asyn=True))
for f in futures:
res = f.result()
if res["ecode"] != 0:
print(res)
exit(1)
```
## How to optimize through the timeline tool
In order to better optimize the performance, PipelineServing provides a timeline tool to monitor the time of each stage of the whole service.
### Output profile information on server side
The server is controlled by the `use_profile` field in yaml:
```yaml
dag:
use_profile: true
```
After the function is enabled, the server will print the corresponding log information to the standard output in the process of prediction. In order to show the time consumption of each stage more intuitively, scripts are provided for further analysis and processing of log files.
The output of the server is first saved to a file. Taking profile as an example, the script converts the time monitoring information in the log into JSON format and saves it to the trace file. The trace file can be visualized through the tracing function of Chrome browser.
```shell
python timeline_trace.py profile trace
```
Specific operation: open Chrome browser, input in the address bar `chrome://tracing/` , jump to the tracing page, click the load button, open the saved trace file, and then visualize the time information of each stage of the prediction service.
### Output profile information on client side
The profile function can be enabled by setting `profile=True` in the `predict` interface on the client side.
After the function is enabled, the client will print the log information corresponding to the prediction to the standard output during the prediction process, and the subsequent analysis and processing are the same as that of the server.
# Pipeline Serving
(简体中文|[English](PIPELINE_SERVING.md))
Paddle Serving 通常用于单模型的一键部署,但端到端的深度学习模型当前还不能解决所有问题,多个深度学习模型配合起来使用还是解决现实问题的常规手段。
Paddle Serving 提供了用户友好的多模型组合服务编程框架,Pipeline Serving,旨在降低编程门槛,提高资源使用率(尤其是GPU设备),提升整体的预估效率。
## 整体架构设计
Server端基于 gRPC 和图执行引擎构建,两者的关系如下图所示。
<center>
<img src='pipeline_serving-image1.png' height = "250" align="middle"/>
</center>
### 图执行引擎
图执行引擎由 OP 和 Channel 构成,相连接的 OP 之间会共享一个 Channel。
- Channel 可以理解为一个缓冲队列。每个 OP 只接受一个 Channel 的输入和多个 Channel 的输出(每个输出相同);一个 Channel 可以包含来自多个 OP 的输出,同一个 Channel 的数据可以作为多个 OP 的输入Channel
- 用户只需要定义 OP 间的关系,在编译期图引擎负责分析整个图的依赖关系,并声明Channel
- Request 进入图执行引擎服务后会产生一个 Request Id,Reponse 会通过 Request Id 进行对应的返回
- 对于 OP 之间需要传输过大数据的情况,可以考虑 RAM DB 外存进行全局存储,通过在 Channel 中传递索引的 Key 来进行数据传输
<center>
<img src='pipeline_serving-image2.png' height = "300" align="middle"/>
</center>
### OP的设计
- 单个OP默认的功能是根据输入的 Channel 数据,访问一个 Paddle Serving 的单模型服务,并将结果存在输出的 Channel
- 单个 OP 可以支持用户自定义,包括 preprocess,process,postprocess 三个函数都可以由用户继承和实现
- 单个 OP 可以控制并发数,从而增加处理并发数
- OP 可以由线程或进程启动
### Channel的设计
- Channel 是 OP 之间共享数据的数据结构,负责共享数据或者共享数据状态信息
- Channel 可以支持多个OP的输出存储在同一个 Channel,同一个 Channel 中的数据可以被多个 OP 使用
- 下图为图执行引擎中 Channel 的设计,采用 input buffer 和 output buffer 进行多 OP 输入或多 OP 输出的数据对齐,中间采用一个 Queue 进行缓冲
<center>
<img src='pipeline_serving-image3.png' height = "500" align="middle"/>
</center>
### 极端情况的考虑
- 请求超时的处理
整个图执行引擎每一步都有可能发生超时,图执行引擎里面通过设置 timeout 值来控制,任何环节超时的请求都会返回超时响应。
- Channel 存储的数据过大
Channel 中可能会存储过大的数据,导致拷贝等耗时过高,图执行引擎里面可以通过将 OP 计算结果数据存储到外存,如高速的内存 KV 系统
- Channel 设计中的 input buffer 和 output buffer 是否会无限增加
- 不会。整个图执行引擎的输入会放到一个 Channel 的 internal queue 里面,直接作为整个服务的流量控制缓冲队列
- 对于 input buffer,根据计算量的情况调整 OP1 和 OP2 的并发数,使得 input buffer 来自各个输入 OP 的数量相对平衡
- 对于 output buffer,可以采用和 input buffer 类似的处理方法,即调整 OP3 和 OP4 的并发数,使得 output buffer 的缓冲长度得到控制
- 注:input buffer 的长度取决于 internal queue 中每个 item 完全 ready 的速度,output buffer 的长度取决于下游 OP 从 output buffer 获取数据的速度
## 详细设计
### 用户接口设计
#### 1. 普通 OP 定义
普通 OP 作为图执行引擎中的基本单元,其构造函数如下:
```python
def __init__(name=None,
input_ops=[],
server_endpoints=[],
fetch_list=[],
client_config=None,
concurrency=1,
timeout=-1,
retry=1)
```
各参数含义如下
| 参数名 | 含义 |
| :--------------: | :----------------------------------------------------------: |
| name | (str)用于标识 OP 类型的字符串,该字段必须全局唯一。 |
| input_ops | (list)当前 OP 的所有前继 OP 的列表。 |
| server_endpoints | (list)远程 Paddle Serving Service 的 endpoints 列表。如果不设置该参数,则不访问远程 Paddle Serving Service,即 不会执行 process 操作。 |
| fetch_list | (list)远程 Paddle Serving Service 的 fetch 列表。 |
| client_config | (str)Paddle Serving Service 对应的 Client 端配置文件路径。 |
| concurrency | (int)OP 的并发数。 |
| timeout | (int)process 操作的超时时间,单位为秒。若该值小于零,则视作不超时。 |
| retry | (int)超时重试次数。当该值为 1 时,不进行重试。 |
#### 2. 普通 OP二次开发接口
| 变量或接口 | 说明 |
| :--------------------------------------------: | :----------------------------------------------------------: |
| def preprocess(self, input_dicts) | 对从 Channel 中获取的数据进行处理,处理完的数据将作为 **process** 函数的输入。 |
| def process(self, feed_dict) | 基于 Paddle Serving Client 进行 RPC 预测,处理完的数据将作为 **postprocess** 函数的输入。 |
| def postprocess(self, input_dicts, fetch_dict) | 处理预测结果,处理完的数据将被放入后继 Channel 中,以被后继 OP 获取。 |
| def init_op(self) | 用于加载资源(如字典等)。 |
| self.concurrency_idx | 当前线程(进程)的并发数索引(不同种类的 OP 单独计算)。 |
OP 在一个运行周期中会依次执行 preprocess,process,postprocess 三个操作(当不设置 `server_endpoints` 参数时,不执行 process 操作),用户可以对这三个函数进行重写,默认实现如下:
```python
def preprocess(self, input_dicts):
# multiple previous Op
if len(input_dicts) != 1:
raise NotImplementedError(
'this Op has multiple previous inputs. Please override this func.'
(_, input_dict), = input_dicts.items()
return input_dict
def process(self, feed_dict):
err, err_info = ChannelData.check_npdata(feed_dict)
if err != 0:
raise NotImplementedError(
"{} Please override preprocess func.".format(err_info))
call_result = self.client.predict(
feed=feed_dict, fetch=self._fetch_names)
return call_result
def postprocess(self, input_dicts, fetch_dict):
return fetch_dict
```
**preprocess** 的参数是前继 Channel 中的数据 `input_dicts`,该变量是一个以前继 OP 的 name 为 Key,对应 OP 的输出为 Value 的字典。
**process** 的参数是 Paddle Serving Client 预测接口的输入变量 `fetch_dict`(preprocess 函数的返回值),该变量是一个以 feed_name 为 Key,对应 ndarray 格式的数据为 Value 的字典。
**postprocess** 的参数是 `input_dicts``fetch_dict``input_dicts` 与 preprocess 的参数一致,`fetch_dict` 是 process 函数的返回值(如果没有执行 process ,则该值为 preprocess 的返回值)。
用户还可以对 **init_op** 函数进行重写,已加载自定义的一些资源(比如字典等),默认实现如下:
```python
def init_op(self):
pass
```
需要注意的是,在线程版 OP 中,每个 OP 只会调用一次该函数,故加载的资源必须要求是线程安全的。
#### 3. RequestOp 定义
RequestOp 用于处理 Pipeline Server 接收到的 RPC 数据,处理后的数据将会被加入到图执行引擎中。其构造函数如下:
```python
def __init__(self)
```
#### 4. RequestOp 二次开发接口
| 变量或接口 | 说明 |
| :---------------------------------------: | :----------------------------------------: |
| def init_op(self) | 用于加载资源(如字典等),与普通 OP 一致。 |
| def unpack_request_package(self, request) | 处理接收到的 RPC 数据。 |
**unpack_request_package** 的默认实现是将 RPC request 中的 key 和 value 做成字典:
```python
def unpack_request_package(self, request):
dictdata = {}
for idx, key in enumerate(request.key):
data = request.value[idx]
try:
data = eval(data)
except Exception as e:
pass
dictdata[key] = data
return dictdata
```
要求返回值是一个字典类型。
#### 5. ResponseOp 定义
ResponseOp 用于处理图执行引擎的预测结果,处理后的数据将会作为 Pipeline Server 的RPC 返回值,其构造函数如下:
```python
def __init__(self, input_ops)
```
其中,`input_ops` 是图执行引擎的最后一个 OP,用户可以通过设置不同的 `input_ops` 以在不修改 OP 的拓扑关系下构造不同的 DAG。
#### 6. ResponseOp 二次开发接口
| 变量或接口 | 说明 |
| :------------------------------------------: | :-----------------------------------------: |
| def init_op(self) | 用于加载资源(如字典等),与普通 OP 一致。 |
| def pack_response_package(self, channeldata) | 处理图执行引擎的预测结果,作为 RPC 的返回。 |
**pack_response_package** 的默认实现是将预测结果的字典转化为 RPC response 中的 key 和 value:
```python
def pack_response_package(self, channeldata):
resp = pipeline_service_pb2.Response()
resp.ecode = channeldata.ecode
if resp.ecode == ChannelDataEcode.OK.value:
if channeldata.datatype == ChannelDataType.CHANNEL_NPDATA.value:
feed = channeldata.parse()
np.set_printoptions(threshold=np.nan)
for name, var in feed.items():
resp.value.append(var.__repr__())
resp.key.append(name)
elif channeldata.datatype == ChannelDataType.DICT.value:
feed = channeldata.parse()
for name, var in feed.items():
if not isinstance(var, str):
resp.ecode = ChannelDataEcode.TYPE_ERROR.value
resp.error_info = self._log(
"fetch var type must be str({}).".format(type(var)))
break
resp.value.append(var)
resp.key.append(name)
else:
resp.ecode = ChannelDataEcode.TYPE_ERROR.value
resp.error_info = self._log(
"Error type({}) in datatype.".format(channeldata.datatype))
else:
resp.error_info = channeldata.error_info
return resp
```
#### 7. PipelineServer定义
PipelineServer 的定义比较简单,如下所示:
```python
server = PipelineServer()
server.set_response_op(response_op)
server.prepare_server(config_yml_path)
server.run_server()
```
其中,`response_op` 为上面提到的 ResponseOp,PipelineServer 将会根据各个 OP 的拓扑关系初始化 Channel 并构建计算图。`config_yml_path` 为 PipelineServer 的配置文件,示例文件如下:
```yaml
port: 18080 # gRPC端口号
worker_num: 1 # gRPC线程池大小(进程版 Servicer 中为进程数),默认为 1
build_dag_each_worker: false # 是否使用进程版 Servicer,默认为 false
dag:
is_thread_op: true # 是否使用线程版Op,默认为 true
client_type: brpc # 使用 brpc 或 grpc client,默认为 brpc
retry: 1 # DAG Executor 在失败后重试次数,默认为 1,即不重试
use_profile: false # 是否在 Server 端打印日志,默认为 false
```
## 例子
这里通过搭建简单的 imdb model ensemble 例子来展示如何使用 Pipeline Serving,相关代码在 `python/examples/pipeline/imdb_model_ensemble` 文件夹下可以找到,例子中的 Server 端结构如下图所示:
<center>
<img src='pipeline_serving-image4.png' height = "200" align="middle"/>
</center>
### 获取模型文件并启动 Paddle Serving Service
```shell
cd python/examples/pipeline/imdb_model_ensemble
sh get_data.sh
python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 &> cnn.log &
python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 &> bow.log &
```
### 启动 PipelineServer
运行下面代码
```python
from paddle_serving_server.pipeline import Op, RequestOp, ResponseOp
from paddle_serving_server.pipeline import PipelineServer
from paddle_serving_server.pipeline.proto import pipeline_service_pb2
from paddle_serving_server.pipeline.channel import ChannelDataEcode
import numpy as np
import logging
from paddle_serving_app.reader import IMDBDataset
logging.basicConfig(level=logging.DEBUG)
_LOGGER = logging.getLogger()
class ImdbRequestOp(RequestOp):
def init_op(self):
self.imdb_dataset = IMDBDataset()
self.imdb_dataset.load_resource('imdb.vocab')
def unpack_request_package(self, request):
dictdata = {}
for idx, key in enumerate(request.key):
if key != "words":
continue
words = request.value[idx]
word_ids, _ = self.imdb_dataset.get_words_and_label(words)
dictdata[key] = np.array(word_ids)
return dictdata
class CombineOp(Op):
def preprocess(self, input_data):
combined_prediction = 0
for op_name, data in input_data.items():
_LOGGER.info("{}: {}".format(op_name, data["prediction"]))
combined_prediction += data["prediction"]
data = {"prediction": combined_prediction / 2}
return data
read_op = ImdbRequestOp()
bow_op = Op(name="bow",
input_ops=[read_op],
server_endpoints=["127.0.0.1:9393"],
fetch_list=["prediction"],
client_config="imdb_bow_client_conf/serving_client_conf.prototxt",
concurrency=1,
timeout=-1,
retry=1)
cnn_op = Op(name="cnn",
input_ops=[read_op],
server_endpoints=["127.0.0.1:9292"],
fetch_list=["prediction"],
client_config="imdb_cnn_client_conf/serving_client_conf.prototxt",
concurrency=1,
timeout=-1,
retry=1)
combine_op = CombineOp(
name="combine",
input_ops=[bow_op, cnn_op],
concurrency=5,
timeout=-1,
retry=1)
# use default ResponseOp implementation
response_op = ResponseOp(input_ops=[combine_op])
server = PipelineServer()
server.set_response_op(response_op)
server.prepare_server('config.yml')
server.run_server()
```
### 通过 PipelineClient 执行预测
```python
from paddle_serving_client.pipeline import PipelineClient
import numpy as np
client = PipelineClient()
client.connect(['127.0.0.1:18080'])
words = 'i am very sad | 0'
futures = []
for i in range(3):
futures.append(
client.predict(
feed_dict={"words": words},
fetch=["prediction"],
asyn=True))
for f in futures:
res = f.result()
if res["ecode"] != 0:
print(res)
exit(1)
```
## 如何通过 Timeline 工具进行优化
为了更好地对性能进行优化,PipelineServing 提供了 Timeline 工具,对整个服务的各个阶段时间进行打点。
### 在 Server 端输出 Profile 信息
Server 端用 yaml 中的 `use_profile` 字段进行控制:
```yaml
dag:
use_profile: true
```
开启该功能后,Server 端在预测的过程中会将对应的日志信息打印到标准输出,为了更直观地展现各阶段的耗时,提供脚本对日志文件做进一步的分析处理。
使用时先将 Server 的输出保存到文件,以 profile 为例,脚本将日志中的时间打点信息转换成 json 格式保存到trace 文件,trace 文件可以通过 chrome 浏览器的 tracing 功能进行可视化。
```shell
python timeline_trace.py profile trace
```
具体操作:打开 chrome 浏览器,在地址栏输入 chrome://tracing/ ,跳转至 tracing 页面,点击 load 按钮,打开保存的 trace 文件,即可将预测服务的各阶段时间信息可视化。
### 在 Client 端输出 Profile 信息
Client 端在 `predict` 接口设置 `profile=True`,即可开启 Profile 功能。
开启该功能后,Client 端在预测的过程中会将该次预测对应的日志信息打印到标准输出,后续分析处理同 Server。
# Paddle Serving
([简体中文](./README_CN.md)|English)
Paddle Serving is PaddlePaddle's online estimation service framework, which can help developers easily implement remote prediction services that call deep learning models from mobile and server ends. At present, Paddle Serving is mainly based on models that support PaddlePaddle training. It can be used in conjunction with the Paddle training framework to quickly deploy inference services. Paddle Serving is designed around common industrial-level deep learning model deployment scenarios. Some common functions include multi-model management, model hot loading, [Baidu-rpc](https://github.com/apache/incubator-brpc)-based high-concurrency low-latency response capabilities, and online model A/B tests. The API that cooperates with the Paddle training framework can enable users to seamlessly transition between training and remote deployment, improving the landing efficiency of deep learning models.
------------
## Quick Start
Paddle Serving's current develop version supports lightweight Python API for fast predictions, and training with Paddle can get through. We take the most classic Boston house price prediction as an example to fully explain the process of model training on a single machine and model deployment using Paddle Serving.
#### Install
It is highly recommended that you build Paddle Serving inside Docker, please read [How to run PaddleServing in Docker](RUN_IN_DOCKER.md)
```
pip install paddle-serving-client
pip install paddle-serving-server
```
#### Training Script
``` python
import sys
import paddle
import paddle.fluid as fluid
train_reader = paddle.batch(paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500), batch_size=16)
test_reader = paddle.batch(paddle.reader.shuffle(
paddle.dataset.uci_housing.test(), buf_size=500), batch_size=16)
x = fluid.data(name='x', shape=[None, 13], dtype='float32')
y = fluid.data(name='y', shape=[None, 1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
sgd_optimizer.minimize(avg_loss)
place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
import paddle_serving_client.io as serving_io
for pass_id in range(30):
for data_train in train_reader():
avg_loss_value, = exe.run(
fluid.default_main_program(),
feed=feeder.feed(data_train),
fetch_list=[avg_loss])
serving_io.save_model(
"serving_server_model", "serving_client_conf",
{"x": x}, {"y": y_predict}, fluid.default_main_program())
```
#### Server Side Code
``` python
import sys
from paddle_serving.serving_server import OpMaker
from paddle_serving.serving_server import OpSeqMaker
from paddle_serving.serving_server import Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
general_infer_op = op_maker.create('general_infer')
op_seq_maker = OpSeqMaker()
op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_infer_op)
server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence())
server.load_model_config(sys.argv[1])
server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
server.run_server()
```
#### Launch Server End
``` shell
python test_server.py serving_server_model
```
#### Client Prediction
``` python
from paddle_serving_client import Client
import paddle
import sys
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"])
test_reader = paddle.batch(paddle.reader.shuffle(
paddle.dataset.uci_housing.test(), buf_size=500), batch_size=1)
for data in test_reader():
fetch_map = client.predict(feed={"x": data[0][0]}, fetch=["y"])
print("{} {}".format(fetch_map["y"][0], data[0][1][0]))
```
### Document
[Design Doc](DESIGN.md)
[FAQ](./deprecated/FAQ.md)
### Senior Developer Guildlines
[Compile Tutorial](COMPILE.md)
## Contribution
If you want to make contributions to Paddle Serving Please refer to [CONRTIBUTE](CONTRIBUTE.md)
# Paddle Serving
(简体中文|[English](./README.md))
Paddle Serving是PaddlePaddle的在线预估服务框架,能够帮助开发者轻松实现从移动端、服务器端调用深度学习模型的远程预测服务。当前Paddle Serving以支持PaddlePaddle训练的模型为主,可以与Paddle训练框架联合使用,快速部署预估服务。Paddle Serving围绕常见的工业级深度学习模型部署场景进行设计,一些常见的功能包括多模型管理、模型热加载、基于[Baidu-rpc](https://github.com/apache/incubator-brpc)的高并发低延迟响应能力、在线模型A/B实验等。与Paddle训练框架互相配合的API可以使用户在训练与远程部署之间无缝过度,提升深度学习模型的落地效率。
------------
## 快速上手指南
Paddle Serving当前的develop版本支持轻量级Python API进行快速预测,并且与Paddle的训练可以打通。我们以最经典的波士顿房价预测为示例,完整说明在单机进行模型训练以及使用Paddle Serving进行模型部署的过程。
#### 安装
强烈建议您在Docker内构建Paddle Serving,请查看[如何在Docker中运行PaddleServing](RUN_IN_DOCKER_CN.md)
```
pip install paddle-serving-client
pip install paddle-serving-server
```
#### 训练脚本
``` python
import sys
import paddle
import paddle.fluid as fluid
train_reader = paddle.batch(paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500), batch_size=16)
test_reader = paddle.batch(paddle.reader.shuffle(
paddle.dataset.uci_housing.test(), buf_size=500), batch_size=16)
x = fluid.data(name='x', shape=[None, 13], dtype='float32')
y = fluid.data(name='y', shape=[None, 1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
sgd_optimizer.minimize(avg_loss)
place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
import paddle_serving_client.io as serving_io
for pass_id in range(30):
for data_train in train_reader():
avg_loss_value, = exe.run(
fluid.default_main_program(),
feed=feeder.feed(data_train),
fetch_list=[avg_loss])
serving_io.save_model(
"serving_server_model", "serving_client_conf",
{"x": x}, {"y": y_predict}, fluid.default_main_program())
```
#### 服务器端代码
``` python
import sys
from paddle_serving.serving_server import OpMaker
from paddle_serving.serving_server import OpSeqMaker
from paddle_serving.serving_server import Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
general_infer_op = op_maker.create('general_infer')
op_seq_maker = OpSeqMaker()
op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_infer_op)
server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence())
server.load_model_config(sys.argv[1])
server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
server.run_server()
```
#### 服务器端启动
``` shell
python test_server.py serving_server_model
```
#### 客户端预测
``` python
from paddle_serving_client import Client
import paddle
import sys
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"])
test_reader = paddle.batch(paddle.reader.shuffle(
paddle.dataset.uci_housing.test(), buf_size=500), batch_size=1)
for data in test_reader():
fetch_map = client.predict(feed={"x": data[0][0]}, fetch=["y"])
print("{} {}".format(fetch_map["y"][0], data[0][1][0]))
```
### 文档
[设计文档](DESIGN_CN.md)
[FAQ](./deprecated/FAQ.md)
### 资深开发者使用指南
[编译指南](COMPILE_CN.md)
## 贡献
如果你想要给Paddle Serving做贡献,请参考[贡献指南](CONTRIBUTE.md)
...@@ -12,21 +12,12 @@ This document takes Python2 as an example to show how to run Paddle Serving in d ...@@ -12,21 +12,12 @@ This document takes Python2 as an example to show how to run Paddle Serving in d
### Get docker image ### Get docker image
You can get images in two ways: Refer to [this document](DOCKER_IMAGES.md) for a docker image:
1. Pull image directly ```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest
```bash ```
docker pull hub.baidubce.com/paddlepaddle/serving:latest
```
2. Building image based on dockerfile
Create a new folder and copy [Dockerfile](../tools/Dockerfile) to this folder, and run the following command:
```bash
docker build -t hub.baidubce.com/paddlepaddle/serving:latest .
```
### Create container ### Create container
...@@ -104,26 +95,16 @@ The GPU version is basically the same as the CPU version, with only some differe ...@@ -104,26 +95,16 @@ The GPU version is basically the same as the CPU version, with only some differe
### Get docker image ### Get docker image
You can also get images in two ways: Refer to [this document](DOCKER_IMAGES.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:
1. Pull image directly
```bash ```shell
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
``` ```
2. Building image based on dockerfile
Create a new folder and copy [Dockerfile.gpu](../tools/Dockerfile.gpu) to this folder, and run the following command:
```bash
nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:latest-gpu .
```
### Create container ### Create container
```bash ```bash
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker exec -it test bash nvidia-docker exec -it test bash
``` ```
...@@ -200,4 +181,4 @@ tar -xzf uci_housing.tar.gz ...@@ -200,4 +181,4 @@ tar -xzf uci_housing.tar.gz
## Attention ## Attention
The images provided by this document are all runtime images, which do not support compilation. If you want to compile from source, refer to [COMPILE](COMPILE.md). Runtime images cannot be used for compilation. If you want to compile from source, refer to [COMPILE](COMPILE.md).
...@@ -12,21 +12,12 @@ Docker(GPU版本需要在GPU机器上安装nvidia-docker) ...@@ -12,21 +12,12 @@ Docker(GPU版本需要在GPU机器上安装nvidia-docker)
### 获取镜像 ### 获取镜像
可以通过两种方式获取镜像。 参考[该文档](DOCKER_IMAGES_CN.md)获取镜像:
1. 直接拉取镜像 ```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest
```bash ```
docker pull hub.baidubce.com/paddlepaddle/serving:latest
```
2. 基于Dockerfile构建镜像
建立新目录,复制[Dockerfile](../tools/Dockerfile)内容到该目录下Dockerfile文件。执行
```bash
docker build -t hub.baidubce.com/paddlepaddle/serving:latest .
```
### 创建容器并进入 ### 创建容器并进入
...@@ -102,26 +93,16 @@ GPU版本与CPU版本基本一致,只有部分接口命名的差别(GPU版 ...@@ -102,26 +93,16 @@ GPU版本与CPU版本基本一致,只有部分接口命名的差别(GPU版
### 获取镜像 ### 获取镜像
可以通过两种方式获取镜像。 参考[该文档](DOCKER_IMAGES_CN.md)获取镜像,这里以 `cuda9.0-cudnn7` 的镜像为例:
1. 直接拉取镜像
```bash ```shell
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
``` ```
2. 基于Dockerfile构建镜像
建立新目录,复制[Dockerfile.gpu](../tools/Dockerfile.gpu)内容到该目录下Dockerfile文件。执行
```bash
nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:latest-gpu .
```
### 创建容器并进入 ### 创建容器并进入
```bash ```bash
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker exec -it test bash nvidia-docker exec -it test bash
``` ```
...@@ -195,4 +176,4 @@ tar -xzf uci_housing.tar.gz ...@@ -195,4 +176,4 @@ tar -xzf uci_housing.tar.gz
## 注意事项 ## 注意事项
该文档提供的镜像均为运行镜像,不支持开发编译。如果想要从源码编译,请查看[如何编译PaddleServing](COMPILE.md) 运行时镜像不能用于开发编译。如果想要从源码编译,请查看[如何编译PaddleServing](COMPILE.md)
...@@ -77,7 +77,7 @@ service ImageClassifyService { ...@@ -77,7 +77,7 @@ service ImageClassifyService {
关于Serving端的配置的详细信息,可以参考[Serving端配置](SERVING_CONFIGURE.md) 关于Serving端的配置的详细信息,可以参考[Serving端配置](SERVING_CONFIGURE.md)
以下配置文件将ReaderOP, ClassifyOP和WriteJsonOP串联成一个workflow (关于OP/workflow等概念,可参考[设计文档](DESIGN.md)) 以下配置文件将ReaderOP, ClassifyOP和WriteJsonOP串联成一个workflow (关于OP/workflow等概念,可参考[设计文档](../DESIGN.md))
- 配置文件示例: - 配置文件示例:
......
...@@ -26,7 +26,7 @@ ...@@ -26,7 +26,7 @@
第1) - 第5)步裁剪完毕后的模型网络配置如下: 第1) - 第5)步裁剪完毕后的模型网络配置如下:
![Pruned CTR prediction network](pruned-ctr-network.png) ![Pruned CTR prediction network](../pruned-ctr-network.png)
整个裁剪过程具体说明如下: 整个裁剪过程具体说明如下:
......
# Docker compilation environment preparation
([简体中文](./DOCKER_CN.md)|English)
## Environmental requirements
+ Docker is installed on the development machine.
+ Compiling the GPU version requires nvidia-docker.
## Dockerfile
[CPU Version Dockerfile](../tools/Dockerfile)
[GPU Version Dockerfile](../tools/Dockerfile.gpu)
## Instructions
### Building Docker Image
Create a new directory and copy the Dockerfile to this directory.
Run
```bash
docker build -t serving_compile:cpu .
```
Or
```bash
docker build -t serving_compile:cuda9 .
```
## Enter Docker Container
CPU Version please run
```bash
docker run -it serving_compile:cpu bash
```
GPU Version please run
```bash
docker run -it --runtime=nvidia -it serving_compile:cuda9 bash
```
## List of supported environments compiled by Docker
The list of supported environments is as follows::
| System Environment Supported by CPU Docker Compiled Executables |
| -------------------------- |
| Centos6 |
| Centos7 |
| Ubuntu16.04 |
| Ubuntu18.04 |
| System Environment Supported by GPU Docker Compiled Executables |
| ---------------------------------- |
| Centos6_cuda9_cudnn7 |
| Centos7_cuda9_cudnn7 |
| Ubuntu16.04_cuda9_cudnn7 |
| Ubuntu16.04_cuda10_cudnn7 |
**Remarks:**
+ If you cannot find libcrypto.so.10 and libssl.so.10 when you execute the pre-compiled version, you can change /usr/lib64/libssl.so.10 and /usr/lib64/libcrypto.so in the Docker environment. 10 Copy to the directory where the executable is located.
+ CPU pre-compiled version can only be executed on CPU machines, GPU pre-compiled version can only be executed on GPU machines.
# Docker编译环境准备
(简体中文|[English](./DOCKER.md))
## 环境要求
+ 开发机上已安装Docker。
+ 编译GPU版本需要安装nvidia-docker。
## Dockerfile文件
[CPU版本Dockerfile](../tools/Dockerfile)
[GPU版本Dockerfile](../tools/Dockerfile.gpu)
## 使用方法
### 构建Docker镜像
建立新目录,复制Dockerfile内容到该目录下Dockerfile文件。
执行
```bash
docker build -t serving_compile:cpu .
```
或者
```bash
docker build -t serving_compile:cuda9 .
```
## 进入Docker
CPU版本请执行
```bash
docker run -it serving_compile:cpu bash
```
GPU版本请执行
```bash
docker run -it --runtime=nvidia -it serving_compile:cuda9 bash
```
## Docker编译出的可执行文件支持的环境列表
经过验证的环境列表如下:
| CPU Docker编译出的可执行文件支持的系统环境 |
| -------------------------- |
| Centos6 |
| Centos7 |
| Ubuntu16.04 |
| Ubuntu18.04 |
| GPU Docker编译出的可执行文件支持的系统环境 |
| ---------------------------------- |
| Centos6_cuda9_cudnn7 |
| Centos7_cuda9_cudnn7 |
| Ubuntu16.04_cuda9_cudnn7 |
| Ubuntu16.04_cuda10_cudnn7 |
**备注:**
+ 若执行预编译版本出现找不到libcrypto.so.10、libssl.so.10的情况,可以将Docker环境中的/usr/lib64/libssl.so.10与/usr/lib64/libcrypto.so.10复制到可执行文件所在目录。
+ CPU预编译版本仅可在CPU机器上执行,GPU预编译版本仅可在GPU机器上执行。
# Getting Started
请先按照[编译安装说明](INSTALL.md)完成编译
## 运行示例
说明:Imagenet图像分类模型,默认采用CPU模式(GPU模式当前版本暂未提供支持)
Step1:启动Server端:
```shell
cd /path/to/paddle-serving/output/demo/serving/ && ./bin/serving &
```
默认启动后日志写在./log/下,可tail日志查看serving端接收请求的日志:
```shell
tail -f log/serving.INFO
```
Step2:启动Client端:
```shell
cd path/to/paddle-serving/output/demo/client/image_classification && ./bin/ximage &
```
默认启动后日志写在./log/下,可tail日志查看分类结果:
```shell
tail -f log/ximage.INFO
```
...@@ -72,7 +72,7 @@ for i in range(0, len(samples) - BATCH_SIZE, BATCH_SIZE): ...@@ -72,7 +72,7 @@ for i in range(0, len(samples) - BATCH_SIZE, BATCH_SIZE):
print e.reason print e.reason
``` ```
完整示例请参考[text_classification.py](../demo-client/python/text_classification.py) 完整示例请参考[text_classification.py](https://github.com/PaddlePaddle/Serving/blob/develop/tools/cpp_examples/demo-client/python/text_classification.py)
## 3. PHP访问HTTP Serving ## 3. PHP访问HTTP Serving
...@@ -128,4 +128,4 @@ for ($i = 0; $i < count($samples) - BATCH_SIZE; $i += BATCH_SIZE) { ...@@ -128,4 +128,4 @@ for ($i = 0; $i < count($samples) - BATCH_SIZE; $i += BATCH_SIZE) {
curl_close($ch); curl_close($ch);
``` ```
完整代码请参考[text_classification.php](../demo-client/php/text_classification.php) 完整代码请参考[text_classification.php](https://github.com/PaddlePaddle/Serving/blob/develop/tools/cpp_examples/demo-client/php/text_classification.php)
[Design](DESIGN.md)
[Installation](INSTALL.md)
[Getting Started](GETTING_STARTED.md)
[Creating a Prediction Service](CREATING.md)
[Client Configure](CLIENT_CONFIGURE.md)
[Server Side Configuration](SERVING_CONFIGURE.md)
[How to Configure a Clustered Service](CLUSTERING.md)
[Multiple Serving Instances over Single GPU Card](MULTI_SERVING_OVER_SINGLE_GPU_CARD.md)
[Benchmarking](BENCHMARKING.md)
[GPU Benchmarking](GPU_BENCHMARKING.md)
[FAQ](FAQ.md)
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>io.paddle.serving.client</groupId>
<artifactId>paddle-serving-sdk-java-examples</artifactId>
<version>0.0.1</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
<version>3.8.1</version>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<mainClass>my.fully.qualified.class.Main</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-my-jar-with-dependencies</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<nd4j.backend>nd4j-native</nd4j.backend>
<nd4j.version>1.0.0-beta7</nd4j.version>
<datavec.version>1.0.0-beta7</datavec.version>
<paddle.serving.client.version>0.0.1</paddle.serving.client.version>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>io.paddle.serving.client</groupId>
<artifactId>paddle-serving-sdk-java</artifactId>
<version>${paddle.serving.client.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.30</version>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>${nd4j.backend}</artifactId>
<version>${nd4j.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.datavec</groupId>
<artifactId>datavec-data-image</artifactId>
<version>${datavec.version}</version>
</dependency>
</dependencies>
</project>
import io.paddle.serving.client.*;
import java.io.File;
import java.io.IOException;
import java.net.URL;
import org.nd4j.linalg.api.iter.NdIndexIterator;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.datavec.image.loader.NativeImageLoader;
import org.nd4j.linalg.api.ops.CustomOp;
import org.nd4j.linalg.api.ops.DynamicCustomOp;
import org.nd4j.linalg.factory.Nd4j;
import java.util.*;
public class PaddleServingClientExample {
boolean fit_a_line() {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return false;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
return false;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return true;
}
boolean yolov4(String filename) {
// https://deeplearning4j.konduit.ai/
int height = 608;
int width = 608;
int channels = 3;
NativeImageLoader loader = new NativeImageLoader(height, width, channels);
INDArray BGRimage = null;
try {
BGRimage = loader.asMatrix(new File(filename));
} catch (java.io.IOException e) {
System.out.println("load image fail.");
return false;
}
// shape: (channels, height, width)
BGRimage = BGRimage.reshape(channels, height, width);
INDArray RGBimage = Nd4j.create(BGRimage.shape());
// BGR2RGB
CustomOp op = DynamicCustomOp.builder("reverse")
.addInputs(BGRimage)
.addOutputs(RGBimage)
.addIntegerArguments(0)
.build();
Nd4j.getExecutioner().exec(op);
// Div(255.0)
INDArray image = RGBimage.divi(255.0);
INDArray im_size = Nd4j.createFromArray(new int[]{height, width});
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("image", image);
put("im_size", im_size);
}};
List<String> fetch = Arrays.asList("save_infer_model/scale_0.tmp_0");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return false;
}
succ = client.setRpcTimeoutMs(20000); // cpu
if (succ != true) {
System.out.println("set timeout failed.");
return false;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
return false;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return true;
}
boolean batch_predict() {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<HashMap<String, INDArray>> feed_batch
= new ArrayList<HashMap<String, INDArray>>() {{
add(feed_data);
add(feed_data);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return false;
}
Map<String, INDArray> fetch_map = client.predict(feed_batch, fetch);
if (fetch_map == null) {
return false;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return true;
}
boolean asyn_predict() {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return false;
}
PredictFuture future = client.asyn_predict(feed_data, fetch);
Map<String, INDArray> fetch_map = future.get();
if (fetch_map == null) {
System.out.println("Get future reslut failed");
return false;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return true;
}
boolean model_ensemble() {
long[] data = {8, 233, 52, 601};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("words", npdata);
}};
List<String> fetch = Arrays.asList("prediction");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return false;
}
Map<String, HashMap<String, INDArray>> fetch_map
= client.ensemble_predict(feed_data, fetch);
if (fetch_map == null) {
return false;
}
for (Map.Entry<String, HashMap<String, INDArray>> entry : fetch_map.entrySet()) {
System.out.println("Model = " + entry.getKey());
HashMap<String, INDArray> tt = entry.getValue();
for (Map.Entry<String, INDArray> e : tt.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
}
return true;
}
boolean bert() {
float[] input_mask = {1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f};
long[] position_ids = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
long[] input_ids = {101, 6843, 3241, 749, 8024, 7662, 2533, 1391, 2533, 2523, 7676, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
long[] segment_ids = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("input_mask", Nd4j.createFromArray(input_mask));
put("position_ids", Nd4j.createFromArray(position_ids));
put("input_ids", Nd4j.createFromArray(input_ids));
put("segment_ids", Nd4j.createFromArray(segment_ids));
}};
List<String> fetch = Arrays.asList("pooled_output");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return false;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
return false;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return true;
}
boolean cube_local() {
long[] embedding_14 = {250644};
long[] embedding_2 = {890346};
long[] embedding_10 = {3939};
long[] embedding_17 = {421122};
long[] embedding_23 = {664215};
long[] embedding_6 = {704846};
float[] dense_input = {0.0f, 0.006633499170812604f, 0.03f, 0.0f,
0.145078125f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f};
long[] embedding_24 = {269955};
long[] embedding_12 = {295309};
long[] embedding_7 = {437731};
long[] embedding_3 = {990128};
long[] embedding_1 = {7753};
long[] embedding_4 = {286835};
long[] embedding_8 = {27346};
long[] embedding_9 = {636474};
long[] embedding_18 = {880474};
long[] embedding_16 = {681378};
long[] embedding_22 = {410878};
long[] embedding_13 = {255651};
long[] embedding_5 = {25207};
long[] embedding_11 = {10891};
long[] embedding_20 = {238459};
long[] embedding_21 = {26235};
long[] embedding_15 = {691460};
long[] embedding_25 = {544187};
long[] embedding_19 = {537425};
long[] embedding_0 = {737395};
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("embedding_14.tmp_0", Nd4j.createFromArray(embedding_14));
put("embedding_2.tmp_0", Nd4j.createFromArray(embedding_2));
put("embedding_10.tmp_0", Nd4j.createFromArray(embedding_10));
put("embedding_17.tmp_0", Nd4j.createFromArray(embedding_17));
put("embedding_23.tmp_0", Nd4j.createFromArray(embedding_23));
put("embedding_6.tmp_0", Nd4j.createFromArray(embedding_6));
put("dense_input", Nd4j.createFromArray(dense_input));
put("embedding_24.tmp_0", Nd4j.createFromArray(embedding_24));
put("embedding_12.tmp_0", Nd4j.createFromArray(embedding_12));
put("embedding_7.tmp_0", Nd4j.createFromArray(embedding_7));
put("embedding_3.tmp_0", Nd4j.createFromArray(embedding_3));
put("embedding_1.tmp_0", Nd4j.createFromArray(embedding_1));
put("embedding_4.tmp_0", Nd4j.createFromArray(embedding_4));
put("embedding_8.tmp_0", Nd4j.createFromArray(embedding_8));
put("embedding_9.tmp_0", Nd4j.createFromArray(embedding_9));
put("embedding_18.tmp_0", Nd4j.createFromArray(embedding_18));
put("embedding_16.tmp_0", Nd4j.createFromArray(embedding_16));
put("embedding_22.tmp_0", Nd4j.createFromArray(embedding_22));
put("embedding_13.tmp_0", Nd4j.createFromArray(embedding_13));
put("embedding_5.tmp_0", Nd4j.createFromArray(embedding_5));
put("embedding_11.tmp_0", Nd4j.createFromArray(embedding_11));
put("embedding_20.tmp_0", Nd4j.createFromArray(embedding_20));
put("embedding_21.tmp_0", Nd4j.createFromArray(embedding_21));
put("embedding_15.tmp_0", Nd4j.createFromArray(embedding_15));
put("embedding_25.tmp_0", Nd4j.createFromArray(embedding_25));
put("embedding_19.tmp_0", Nd4j.createFromArray(embedding_19));
put("embedding_0.tmp_0", Nd4j.createFromArray(embedding_0));
}};
List<String> fetch = Arrays.asList("prob");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return false;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
return false;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return true;
}
public static void main( String[] args ) {
// DL4J(Deep Learning for Java)Document:
// https://www.bookstack.cn/read/deeplearning4j/bcb48e8eeb38b0c6.md
PaddleServingClientExample e = new PaddleServingClientExample();
boolean succ = false;
if (args.length < 1) {
System.out.println("Usage: java -cp <jar> PaddleServingClientExample <test-type>.");
System.out.println("<test-type>: fit_a_line bert model_ensemble asyn_predict batch_predict cube_local cube_quant yolov4");
return;
}
String testType = args[0];
System.out.format("[Example] %s\n", testType);
if ("fit_a_line".equals(testType)) {
succ = e.fit_a_line();
} else if ("bert".equals(testType)) {
succ = e.bert();
} else if ("model_ensemble".equals(testType)) {
succ = e.model_ensemble();
} else if ("asyn_predict".equals(testType)) {
succ = e.asyn_predict();
} else if ("batch_predict".equals(testType)) {
succ = e.batch_predict();
} else if ("cube_local".equals(testType)) {
succ = e.cube_local();
} else if ("cube_quant".equals(testType)) {
succ = e.cube_local();
} else if ("yolov4".equals(testType)) {
if (args.length < 2) {
System.out.println("Usage: java -cp <jar> PaddleServingClientExample yolov4 <image-filepath>.");
return;
}
succ = e.yolov4(args[1]);
} else {
System.out.format("test-type(%s) not match.\n", testType);
return;
}
if (succ == true) {
System.out.println("[Example] succ.");
} else {
System.out.println("[Example] fail.");
}
}
}
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>io.paddle.serving.client</groupId>
<artifactId>paddle-serving-sdk-java</artifactId>
<version>0.0.1</version>
<packaging>jar</packaging>
<name>paddle-serving-sdk-java</name>
<description>Java SDK for Paddle Sering Client.</description>
<url>https://github.com/PaddlePaddle/Serving</url>
<licenses>
<license>
<name>Apache License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>
<developers>
<developer>
<name>PaddlePaddle Author</name>
<email>guru4elephant@gmail.com</email>
<organization>PaddlePaddle</organization>
<organizationUrl>https://github.com/PaddlePaddle/Serving</organizationUrl>
</developer>
</developers>
<scm>
<connection>scm:git:https://github.com/PaddlePaddle/Serving.git</connection>
<developerConnection>scm:git:https://github.com/PaddlePaddle/Serving.git</developerConnection>
<url>https://github.com/PaddlePaddle/Serving</url>
</scm>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<grpc.version>1.27.2</grpc.version>
<protobuf.version>3.11.0</protobuf.version>
<protoc.version>3.11.0</protoc.version>
<nd4j.backend>nd4j-native</nd4j.backend>
<nd4j.version>1.0.0-beta7</nd4j.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-bom</artifactId>
<version>${grpc.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<version>1.6</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-stub</artifactId>
</dependency>
<dependency>
<groupId>javax.annotation</groupId>
<artifactId>javax.annotation-api</artifactId>
<version>1.2</version>
<scope>provided</scope> <!-- not needed at runtime -->
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-testing</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java-util</artifactId>
<version>${protobuf.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.errorprone</groupId>
<artifactId>error_prone_annotations</artifactId>
<version>2.3.4</version> <!-- prefer to use 2.3.3 or later -->
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.5.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.6</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-collections4</artifactId>
<version>4.4</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20190722</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.30</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.12.1</version>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>${nd4j.backend}</artifactId>
<version>${nd4j.version}</version>
</dependency>
</dependencies>
<profiles>
<profile>
<id>release</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.1.1</version>
<configuration>
<javadocExecutable>${java.home}/bin/javadoc</javadocExecutable>
</configuration>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<version>1.6</version>
<executions>
<execution>
<id>sign-artifacts</id>
<phase>verify</phase>
<goals>
<goal>sign</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
<build>
<extensions>
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.6.2</version>
</extension>
</extensions>
<plugins>
<plugin>
<groupId>org.sonatype.plugins</groupId>
<artifactId>nexus-staging-maven-plugin</artifactId>
<version>1.6.8</version>
<extensions>true</extensions>
<configuration>
<serverId>ossrh</serverId>
<nexusUrl>https://oss.sonatype.org/</nexusUrl>
<autoReleaseAfterClose>true</autoReleaseAfterClose>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-release-plugin</artifactId>
<version>2.5.3</version>
<configuration>
<autoVersionSubmodules>true</autoVersionSubmodules>
<useReleaseProfile>false</useReleaseProfile>
<releaseProfiles>release</releaseProfiles>
<goals>deploy</goals>
</configuration>
</plugin>
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocArtifact>com.google.protobuf:protoc:${protoc.version}:exe:${os.detected.classifier}
</protocArtifact>
<pluginId>grpc-java</pluginId>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:${grpc.version}:exe:${os.detected.classifier}
</pluginArtifact>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>compile-custom</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>3.0.0-M2</version>
<executions>
<execution>
<id>enforce</id>
<configuration>
<rules>
<requireUpperBoundDeps/>
</rules>
</configuration>
<goals>
<goal>enforce</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
package io.paddle.serving.client;
import java.util.*;
import java.util.function.Function;
import java.lang.management.ManagementFactory;
import java.lang.management.RuntimeMXBean;
import io.grpc.ManagedChannel;
import io.grpc.ManagedChannelBuilder;
import io.grpc.StatusRuntimeException;
import com.google.protobuf.ByteString;
import com.google.common.util.concurrent.ListenableFuture;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.api.iter.NdIndexIterator;
import org.nd4j.linalg.factory.Nd4j;
import io.paddle.serving.grpc.*;
import io.paddle.serving.configure.*;
import io.paddle.serving.client.PredictFuture;
class Profiler {
int pid_;
String print_head_ = null;
List<String> time_record_ = null;
boolean enable_ = false;
Profiler() {
RuntimeMXBean runtimeMXBean = ManagementFactory.getRuntimeMXBean();
pid_ = Integer.valueOf(runtimeMXBean.getName().split("@")[0]).intValue();
print_head_ = "\nPROFILE\tpid:" + pid_ + "\t";
time_record_ = new ArrayList<String>();
time_record_.add(print_head_);
}
void record(String name) {
if (enable_) {
long ctime = System.currentTimeMillis() * 1000;
time_record_.add(name + ":" + String.valueOf(ctime) + " ");
}
}
void printProfile() {
if (enable_) {
String profile_str = String.join("", time_record_);
time_record_ = new ArrayList<String>();
time_record_.add(print_head_);
}
}
void enable(boolean flag) {
enable_ = flag;
}
}
public class Client {
private ManagedChannel channel_;
private MultiLangGeneralModelServiceGrpc.MultiLangGeneralModelServiceBlockingStub blockingStub_;
private MultiLangGeneralModelServiceGrpc.MultiLangGeneralModelServiceFutureStub futureStub_;
private double rpcTimeoutS_;
private List<String> feedNames_;
private Map<String, Integer> feedTypes_;
private Map<String, List<Integer>> feedShapes_;
private List<String> fetchNames_;
private Map<String, Integer> fetchTypes_;
private Set<String> lodTensorSet_;
private Map<String, Integer> feedTensorLen_;
private Profiler profiler_;
public Client() {
channel_ = null;
blockingStub_ = null;
futureStub_ = null;
rpcTimeoutS_ = 2;
feedNames_ = null;
feedTypes_ = null;
feedShapes_ = null;
fetchNames_ = null;
fetchTypes_ = null;
lodTensorSet_ = null;
feedTensorLen_ = null;
profiler_ = new Profiler();
boolean is_profile = false;
String FLAGS_profile_client = System.getenv("FLAGS_profile_client");
if (FLAGS_profile_client != null && FLAGS_profile_client.equals("1")) {
is_profile = true;
}
profiler_.enable(is_profile);
}
public boolean setRpcTimeoutMs(int rpc_timeout) {
if (futureStub_ == null || blockingStub_ == null) {
System.out.println("set timeout must be set after connect.");
return false;
}
rpcTimeoutS_ = rpc_timeout / 1000.0;
SetTimeoutRequest timeout_req = SetTimeoutRequest.newBuilder()
.setTimeoutMs(rpc_timeout)
.build();
SimpleResponse resp;
try {
resp = blockingStub_.setTimeout(timeout_req);
} catch (StatusRuntimeException e) {
System.out.format("Set RPC timeout failed: %s\n", e.toString());
return false;
}
return resp.getErrCode() == 0;
}
public boolean connect(String target) {
// TODO: target must be NameResolver-compliant URI
// https://grpc.github.io/grpc-java/javadoc/io/grpc/ManagedChannelBuilder.html
try {
channel_ = ManagedChannelBuilder.forTarget(target)
.defaultLoadBalancingPolicy("round_robin")
.maxInboundMessageSize(Integer.MAX_VALUE)
.usePlaintext()
.build();
blockingStub_ = MultiLangGeneralModelServiceGrpc.newBlockingStub(channel_);
futureStub_ = MultiLangGeneralModelServiceGrpc.newFutureStub(channel_);
} catch (Exception e) {
System.out.format("Connect failed: %s\n", e.toString());
return false;
}
GetClientConfigRequest get_client_config_req = GetClientConfigRequest.newBuilder().build();
GetClientConfigResponse resp;
try {
resp = blockingStub_.getClientConfig(get_client_config_req);
} catch (Exception e) {
System.out.format("Get Client config failed: %s\n", e.toString());
return false;
}
String model_config_str = resp.getClientConfigStr();
_parseModelConfig(model_config_str);
return true;
}
private void _parseModelConfig(String model_config_str) {
GeneralModelConfig.Builder model_conf_builder = GeneralModelConfig.newBuilder();
try {
com.google.protobuf.TextFormat.getParser().merge(model_config_str, model_conf_builder);
} catch (com.google.protobuf.TextFormat.ParseException e) {
System.out.format("Parse client config failed: %s\n", e.toString());
}
GeneralModelConfig model_conf = model_conf_builder.build();
feedNames_ = new ArrayList<String>();
fetchNames_ = new ArrayList<String>();
feedTypes_ = new HashMap<String, Integer>();
feedShapes_ = new HashMap<String, List<Integer>>();
fetchTypes_ = new HashMap<String, Integer>();
lodTensorSet_ = new HashSet<String>();
feedTensorLen_ = new HashMap<String, Integer>();
List<FeedVar> feed_var_list = model_conf.getFeedVarList();
for (FeedVar feed_var : feed_var_list) {
feedNames_.add(feed_var.getAliasName());
}
List<FetchVar> fetch_var_list = model_conf.getFetchVarList();
for (FetchVar fetch_var : fetch_var_list) {
fetchNames_.add(fetch_var.getAliasName());
}
for (int i = 0; i < feed_var_list.size(); ++i) {
FeedVar feed_var = feed_var_list.get(i);
String var_name = feed_var.getAliasName();
feedTypes_.put(var_name, feed_var.getFeedType());
feedShapes_.put(var_name, feed_var.getShapeList());
if (feed_var.getIsLodTensor()) {
lodTensorSet_.add(var_name);
} else {
int counter = 1;
for (int dim : feedShapes_.get(var_name)) {
counter *= dim;
}
feedTensorLen_.put(var_name, counter);
}
}
for (int i = 0; i < fetch_var_list.size(); i++) {
FetchVar fetch_var = fetch_var_list.get(i);
String var_name = fetch_var.getAliasName();
fetchTypes_.put(var_name, fetch_var.getFetchType());
if (fetch_var.getIsLodTensor()) {
lodTensorSet_.add(var_name);
}
}
}
private InferenceRequest _packInferenceRequest(
List<HashMap<String, INDArray>> feed_batch,
Iterable<String> fetch) throws IllegalArgumentException {
List<String> feed_var_names = new ArrayList<String>();
feed_var_names.addAll(feed_batch.get(0).keySet());
InferenceRequest.Builder req_builder = InferenceRequest.newBuilder()
.addAllFeedVarNames(feed_var_names)
.addAllFetchVarNames(fetch)
.setIsPython(false);
for (HashMap<String, INDArray> feed_data: feed_batch) {
FeedInst.Builder inst_builder = FeedInst.newBuilder();
for (String name: feed_var_names) {
Tensor.Builder tensor_builder = Tensor.newBuilder();
INDArray variable = feed_data.get(name);
long[] flattened_shape = {-1};
INDArray flattened_list = variable.reshape(flattened_shape);
int v_type = feedTypes_.get(name);
NdIndexIterator iter = new NdIndexIterator(flattened_list.shape());
if (v_type == 0) { // int64
while (iter.hasNext()) {
long[] next_index = iter.next();
long x = flattened_list.getLong(next_index);
tensor_builder.addInt64Data(x);
}
} else if (v_type == 1) { // float32
while (iter.hasNext()) {
long[] next_index = iter.next();
float x = flattened_list.getFloat(next_index);
tensor_builder.addFloatData(x);
}
} else if (v_type == 2) { // int32
while (iter.hasNext()) {
long[] next_index = iter.next();
// the interface of INDArray is strange:
// https://deeplearning4j.org/api/latest/org/nd4j/linalg/api/ndarray/INDArray.html
int[] int_next_index = new int[next_index.length];
for(int i = 0; i < next_index.length; i++) {
int_next_index[i] = (int)next_index[i];
}
int x = flattened_list.getInt(int_next_index);
tensor_builder.addIntData(x);
}
} else {
throw new IllegalArgumentException("error tensor value type.");
}
tensor_builder.addAllShape(feedShapes_.get(name));
inst_builder.addTensorArray(tensor_builder.build());
}
req_builder.addInsts(inst_builder.build());
}
return req_builder.build();
}
private Map<String, HashMap<String, INDArray>>
_unpackInferenceResponse(
InferenceResponse resp,
Iterable<String> fetch,
Boolean need_variant_tag) throws IllegalArgumentException {
return Client._staticUnpackInferenceResponse(
resp, fetch, fetchTypes_, lodTensorSet_, need_variant_tag);
}
private static Map<String, HashMap<String, INDArray>>
_staticUnpackInferenceResponse(
InferenceResponse resp,
Iterable<String> fetch,
Map<String, Integer> fetchTypes,
Set<String> lodTensorSet,
Boolean need_variant_tag) throws IllegalArgumentException {
if (resp.getErrCode() != 0) {
return null;
}
String tag = resp.getTag();
HashMap<String, HashMap<String, INDArray>> multi_result_map
= new HashMap<String, HashMap<String, INDArray>>();
for (ModelOutput model_result: resp.getOutputsList()) {
String engine_name = model_result.getEngineName();
FetchInst inst = model_result.getInsts(0);
HashMap<String, INDArray> result_map
= new HashMap<String, INDArray>();
int index = 0;
for (String name: fetch) {
Tensor variable = inst.getTensorArray(index);
int v_type = fetchTypes.get(name);
INDArray data = null;
if (v_type == 0) { // int64
List<Long> list = variable.getInt64DataList();
long[] array = new long[list.size()];
for (int i = 0; i < list.size(); i++) {
array[i] = list.get(i);
}
data = Nd4j.createFromArray(array);
} else if (v_type == 1) { // float32
List<Float> list = variable.getFloatDataList();
float[] array = new float[list.size()];
for (int i = 0; i < list.size(); i++) {
array[i] = list.get(i);
}
data = Nd4j.createFromArray(array);
} else if (v_type == 2) { // int32
List<Integer> list = variable.getIntDataList();
int[] array = new int[list.size()];
for (int i = 0; i < list.size(); i++) {
array[i] = list.get(i);
}
data = Nd4j.createFromArray(array);
} else {
throw new IllegalArgumentException("error tensor value type.");
}
// shape
List<Integer> shape_lsit = variable.getShapeList();
int[] shape_array = new int[shape_lsit.size()];
for (int i = 0; i < shape_lsit.size(); ++i) {
shape_array[i] = shape_lsit.get(i);
}
data = data.reshape(shape_array);
// put data to result_map
result_map.put(name, data);
// lod
if (lodTensorSet.contains(name)) {
List<Integer> list = variable.getLodList();
int[] array = new int[list.size()];
for (int i = 0; i < list.size(); i++) {
array[i] = list.get(i);
}
result_map.put(name + ".lod", Nd4j.createFromArray(array));
}
index += 1;
}
multi_result_map.put(engine_name, result_map);
}
// TODO: tag(ABtest not support now)
return multi_result_map;
}
public Map<String, INDArray> predict(
HashMap<String, INDArray> feed,
Iterable<String> fetch) {
return predict(feed, fetch, false);
}
public Map<String, HashMap<String, INDArray>> ensemble_predict(
HashMap<String, INDArray> feed,
Iterable<String> fetch) {
return ensemble_predict(feed, fetch, false);
}
public PredictFuture asyn_predict(
HashMap<String, INDArray> feed,
Iterable<String> fetch) {
return asyn_predict(feed, fetch, false);
}
public Map<String, INDArray> predict(
HashMap<String, INDArray> feed,
Iterable<String> fetch,
Boolean need_variant_tag) {
List<HashMap<String, INDArray>> feed_batch
= new ArrayList<HashMap<String, INDArray>>();
feed_batch.add(feed);
return predict(feed_batch, fetch, need_variant_tag);
}
public Map<String, HashMap<String, INDArray>> ensemble_predict(
HashMap<String, INDArray> feed,
Iterable<String> fetch,
Boolean need_variant_tag) {
List<HashMap<String, INDArray>> feed_batch
= new ArrayList<HashMap<String, INDArray>>();
feed_batch.add(feed);
return ensemble_predict(feed_batch, fetch, need_variant_tag);
}
public PredictFuture asyn_predict(
HashMap<String, INDArray> feed,
Iterable<String> fetch,
Boolean need_variant_tag) {
List<HashMap<String, INDArray>> feed_batch
= new ArrayList<HashMap<String, INDArray>>();
feed_batch.add(feed);
return asyn_predict(feed_batch, fetch, need_variant_tag);
}
public Map<String, INDArray> predict(
List<HashMap<String, INDArray>> feed_batch,
Iterable<String> fetch) {
return predict(feed_batch, fetch, false);
}
public Map<String, HashMap<String, INDArray>> ensemble_predict(
List<HashMap<String, INDArray>> feed_batch,
Iterable<String> fetch) {
return ensemble_predict(feed_batch, fetch, false);
}
public PredictFuture asyn_predict(
List<HashMap<String, INDArray>> feed_batch,
Iterable<String> fetch) {
return asyn_predict(feed_batch, fetch, false);
}
public Map<String, INDArray> predict(
List<HashMap<String, INDArray>> feed_batch,
Iterable<String> fetch,
Boolean need_variant_tag) {
try {
profiler_.record("java_prepro_0");
InferenceRequest req = _packInferenceRequest(feed_batch, fetch);
profiler_.record("java_prepro_1");
profiler_.record("java_client_infer_0");
InferenceResponse resp = blockingStub_.inference(req);
profiler_.record("java_client_infer_1");
profiler_.record("java_postpro_0");
Map<String, HashMap<String, INDArray>> ensemble_result
= _unpackInferenceResponse(resp, fetch, need_variant_tag);
List<Map.Entry<String, HashMap<String, INDArray>>> list
= new ArrayList<Map.Entry<String, HashMap<String, INDArray>>>(
ensemble_result.entrySet());
if (list.size() != 1) {
System.out.format("predict failed: please use ensemble_predict impl.\n");
return null;
}
profiler_.record("java_postpro_1");
profiler_.printProfile();
return list.get(0).getValue();
} catch (StatusRuntimeException e) {
System.out.format("predict failed: %s\n", e.toString());
return null;
}
}
public Map<String, HashMap<String, INDArray>> ensemble_predict(
List<HashMap<String, INDArray>> feed_batch,
Iterable<String> fetch,
Boolean need_variant_tag) {
try {
profiler_.record("java_prepro_0");
InferenceRequest req = _packInferenceRequest(feed_batch, fetch);
profiler_.record("java_prepro_1");
profiler_.record("java_client_infer_0");
InferenceResponse resp = blockingStub_.inference(req);
profiler_.record("java_client_infer_1");
profiler_.record("java_postpro_0");
Map<String, HashMap<String, INDArray>> ensemble_result
= _unpackInferenceResponse(resp, fetch, need_variant_tag);
profiler_.record("java_postpro_1");
profiler_.printProfile();
return ensemble_result;
} catch (StatusRuntimeException e) {
System.out.format("predict failed: %s\n", e.toString());
return null;
}
}
public PredictFuture asyn_predict(
List<HashMap<String, INDArray>> feed_batch,
Iterable<String> fetch,
Boolean need_variant_tag) {
InferenceRequest req = _packInferenceRequest(feed_batch, fetch);
ListenableFuture<InferenceResponse> future = futureStub_.inference(req);
PredictFuture predict_future = new PredictFuture(future,
(InferenceResponse resp) -> {
return Client._staticUnpackInferenceResponse(
resp, fetch, fetchTypes_, lodTensorSet_, need_variant_tag);
}
);
return predict_future;
}
}
package io.paddle.serving.client;
import java.util.*;
import java.util.function.Function;
import io.grpc.StatusRuntimeException;
import com.google.common.util.concurrent.ListenableFuture;
import org.nd4j.linalg.api.ndarray.INDArray;
import io.paddle.serving.client.Client;
import io.paddle.serving.grpc.*;
public class PredictFuture {
private ListenableFuture<InferenceResponse> callFuture_;
private Function<InferenceResponse,
Map<String, HashMap<String, INDArray>>> callBackFunc_;
PredictFuture(ListenableFuture<InferenceResponse> call_future,
Function<InferenceResponse,
Map<String, HashMap<String, INDArray>>> call_back_func) {
callFuture_ = call_future;
callBackFunc_ = call_back_func;
}
public Map<String, INDArray> get() {
InferenceResponse resp = null;
try {
resp = callFuture_.get();
} catch (Exception e) {
System.out.format("predict failed: %s\n", e.toString());
return null;
}
Map<String, HashMap<String, INDArray>> ensemble_result
= callBackFunc_.apply(resp);
List<Map.Entry<String, HashMap<String, INDArray>>> list
= new ArrayList<Map.Entry<String, HashMap<String, INDArray>>>(
ensemble_result.entrySet());
if (list.size() != 1) {
System.out.format("predict failed: please use get_ensemble impl.\n");
return null;
}
return list.get(0).getValue();
}
public Map<String, HashMap<String, INDArray>> ensemble_get() {
InferenceResponse resp = null;
try {
resp = callFuture_.get();
} catch (Exception e) {
System.out.format("predict failed: %s\n", e.toString());
return null;
}
return callBackFunc_.apply(resp);
}
}
// Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
option java_multiple_files = true;
option java_package = "io.paddle.serving.configure";
option java_outer_classname = "ConfigureProto";
package paddle.serving.configure;
message FeedVar {
optional string name = 1;
optional string alias_name = 2;
optional bool is_lod_tensor = 3 [ default = false ];
optional int32 feed_type = 4 [ default = 0 ];
repeated int32 shape = 5;
}
message FetchVar {
optional string name = 1;
optional string alias_name = 2;
optional bool is_lod_tensor = 3 [ default = false ];
optional int32 fetch_type = 4 [ default = 0 ];
repeated int32 shape = 5;
}
message GeneralModelConfig {
repeated FeedVar feed_var = 1;
repeated FetchVar fetch_var = 2;
};
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
option java_multiple_files = true;
option java_package = "io.paddle.serving.grpc";
option java_outer_classname = "ServingProto";
message Tensor {
optional bytes data = 1;
repeated int32 int_data = 2;
repeated int64 int64_data = 3;
repeated float float_data = 4;
optional int32 elem_type = 5;
repeated int32 shape = 6;
repeated int32 lod = 7; // only for fetch tensor currently
};
message FeedInst { repeated Tensor tensor_array = 1; };
message FetchInst { repeated Tensor tensor_array = 1; };
message InferenceRequest {
repeated FeedInst insts = 1;
repeated string feed_var_names = 2;
repeated string fetch_var_names = 3;
required bool is_python = 4 [ default = false ];
};
message InferenceResponse {
repeated ModelOutput outputs = 1;
optional string tag = 2;
required int32 err_code = 3;
};
message ModelOutput {
repeated FetchInst insts = 1;
optional string engine_name = 2;
}
message SetTimeoutRequest { required int32 timeout_ms = 1; }
message SimpleResponse { required int32 err_code = 1; }
message GetClientConfigRequest {}
message GetClientConfigResponse { required string client_config_str = 1; }
service MultiLangGeneralModelService {
rpc Inference(InferenceRequest) returns (InferenceResponse) {}
rpc SetTimeout(SetTimeoutRequest) returns (SimpleResponse) {}
rpc GetClientConfig(GetClientConfigRequest)
returns (GetClientConfigResponse) {}
};
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="INFO">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%highlight{%d{yyyy-MM-dd HH:mm:ss} %C %M %n%p: %m%n}{STYLE=Logback}"/>
</Console>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
if (CLIENT) if (CLIENT)
file(INSTALL pipeline DESTINATION paddle_serving_client)
execute_process(COMMAND ${PYTHON_EXECUTABLE} run_codegen.py
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/paddle_serving_client/pipeline/proto)
file(GLOB_RECURSE SERVING_CLIENT_PY_FILES paddle_serving_client/*.py) file(GLOB_RECURSE SERVING_CLIENT_PY_FILES paddle_serving_client/*.py)
set(PY_FILES ${SERVING_CLIENT_PY_FILES}) set(PY_FILES ${SERVING_CLIENT_PY_FILES})
SET(PACKAGE_NAME "serving_client") SET(PACKAGE_NAME "serving_client")
...@@ -7,8 +10,14 @@ endif() ...@@ -7,8 +10,14 @@ endif()
if (SERVER) if (SERVER)
if (NOT WITH_GPU) if (NOT WITH_GPU)
file(INSTALL pipeline DESTINATION paddle_serving_server)
execute_process(COMMAND ${PYTHON_EXECUTABLE} run_codegen.py
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/paddle_serving_server/pipeline/proto)
file(GLOB_RECURSE SERVING_SERVER_PY_FILES paddle_serving_server/*.py) file(GLOB_RECURSE SERVING_SERVER_PY_FILES paddle_serving_server/*.py)
else() else()
file(INSTALL pipeline DESTINATION paddle_serving_server_gpu)
execute_process(COMMAND ${PYTHON_EXECUTABLE} run_codegen.py
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/paddle_serving_server_gpu/pipeline/proto)
file(GLOB_RECURSE SERVING_SERVER_PY_FILES paddle_serving_server_gpu/*.py) file(GLOB_RECURSE SERVING_SERVER_PY_FILES paddle_serving_server_gpu/*.py)
endif() endif()
set(PY_FILES ${SERVING_SERVER_PY_FILES}) set(PY_FILES ${SERVING_SERVER_PY_FILES})
...@@ -74,6 +83,7 @@ if (SERVER) ...@@ -74,6 +83,7 @@ if (SERVER)
OUTPUT ${PADDLE_SERVING_BINARY_DIR}/.timestamp OUTPUT ${PADDLE_SERVING_BINARY_DIR}/.timestamp
COMMAND cp -r COMMAND cp -r
${CMAKE_CURRENT_SOURCE_DIR}/paddle_serving_server_gpu/ ${PADDLE_SERVING_BINARY_DIR}/python/ ${CMAKE_CURRENT_SOURCE_DIR}/paddle_serving_server_gpu/ ${PADDLE_SERVING_BINARY_DIR}/python/
COMMAND env ${py_env} ${PYTHON_EXECUTABLE} paddle_serving_server_gpu/gen_cuda_version.py ${CUDA_VERSION_MAJOR}
COMMAND env ${py_env} ${PYTHON_EXECUTABLE} setup.py bdist_wheel COMMAND env ${py_env} ${PYTHON_EXECUTABLE} setup.py bdist_wheel
DEPENDS ${SERVING_SERVER_CORE} server_config_py_proto ${PY_FILES}) DEPENDS ${SERVING_SERVER_CORE} server_config_py_proto ${PY_FILES})
add_custom_target(paddle_python ALL DEPENDS ${PADDLE_SERVING_BINARY_DIR}/.timestamp) add_custom_target(paddle_python ALL DEPENDS ${PADDLE_SERVING_BINARY_DIR}/.timestamp)
......
...@@ -116,8 +116,10 @@ def single_func(idx, resource): ...@@ -116,8 +116,10 @@ def single_func(idx, resource):
if __name__ == '__main__': if __name__ == '__main__':
multi_thread_runner = MultiThreadRunner() multi_thread_runner = MultiThreadRunner()
endpoint_list = ["127.0.0.1:9292"] endpoint_list = [
turns = 10 "127.0.0.1:9292", "127.0.0.1:9293", "127.0.0.1:9294", "127.0.0.1:9295"
]
turns = 100
start = time.time() start = time.time()
result = multi_thread_runner.run( result = multi_thread_runner.run(
single_func, args.thread, {"endpoint": endpoint_list, single_func, args.thread, {"endpoint": endpoint_list,
...@@ -130,9 +132,9 @@ if __name__ == '__main__': ...@@ -130,9 +132,9 @@ if __name__ == '__main__':
avg_cost += result[0][i] avg_cost += result[0][i]
avg_cost = avg_cost / args.thread avg_cost = avg_cost / args.thread
print("total cost :{} s".format(total_cost)) print("total cost: {}s".format(total_cost))
print("each thread cost :{} s. ".format(avg_cost)) print("each thread cost: {}s. ".format(avg_cost))
print("qps :{} samples/s".format(args.batch_size * args.thread * turns / print("qps: {}samples/s".format(args.batch_size * args.thread * turns /
total_cost)) total_cost))
if os.getenv("FLAGS_serving_latency"): if os.getenv("FLAGS_serving_latency"):
show_latency(result[1]) show_latency(result[1])
rm profile_log rm profile_log*
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
export FLAGS_profile_server=1 export FLAGS_profile_server=1
export FLAGS_profile_client=1 export FLAGS_profile_client=1
export FLAGS_serving_latency=1 export FLAGS_serving_latency=1
python3 -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim False --ir_optim True 2> elog > stdlog &
gpu_id=0
#save cpu and gpu utilization log
if [ -d utilization ];then
rm -rf utilization
else
mkdir utilization
fi
#start server
$PYTHONROOT/bin/python3 -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim --ir_optim > elog 2>&1 &
sleep 5 sleep 5
#warm up #warm up
python3 benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1 $PYTHONROOT/bin/python3 benchmark.py --thread 4 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
echo -e "import psutil\ncpu_utilization=psutil.cpu_percent(1,False)\nprint('CPU_UTILIZATION:', cpu_utilization)\n" > cpu_utilization.py
for thread_num in 4 8 16 for thread_num in 1 4 8 16
do do
for batch_size in 1 4 16 64 256 for batch_size in 1 4 16 64
do do
python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1 job_bt=`date '+%Y%m%d%H%M%S'`
echo "model name :" $1 nvidia-smi --id=0 --query-compute-apps=used_memory --format=csv -lms 100 > gpu_use.log 2>&1 &
echo "thread num :" $thread_num nvidia-smi --id=0 --query-gpu=utilization.gpu --format=csv -lms 100 > gpu_utilization.log 2>&1 &
echo "batch size :" $batch_size gpu_memory_pid=$!
$PYTHONROOT/bin/python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
kill ${gpu_memory_pid}
kill `ps -ef|grep used_memory|awk '{print $2}'`
echo "model_name:" $1
echo "thread_num:" $thread_num
echo "batch_size:" $batch_size
echo "=================Done====================" echo "=================Done===================="
echo "model name :$1" >> profile_log_$1 echo "model_name:$1" >> profile_log_$1
echo "batch size :$batch_size" >> profile_log_$1 echo "batch_size:$batch_size" >> profile_log_$1
python3 ../util/show_profile.py profile $thread_num >> profile_log_$1 $PYTHONROOT/bin/python3 cpu_utilization.py >> profile_log_$1
job_et=`date '+%Y%m%d%H%M%S'`
awk 'BEGIN {max = 0} {if(NR>1){if ($1 > max) max=$1}} END {print "MAX_GPU_MEMORY:", max}' gpu_use.log >> profile_log_$1
awk 'BEGIN {max = 0} {if(NR>1){if ($1 > max) max=$1}} END {print "GPU_UTILIZATION:", max}' gpu_utilization.log >> profile_log_$1
rm -rf gpu_use.log gpu_utilization.log
$PYTHONROOT/bin/python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
tail -n 8 profile >> profile_log_$1 tail -n 8 profile >> profile_log_$1
echo "" >> profile_log_$1 echo "" >> profile_log_$1
done done
done done
#Divided log
awk 'BEGIN{RS="\n\n"}{i++}{print > "bert_log_"i}' profile_log_$1
mkdir bert_log && mv bert_log_* bert_log
ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9 ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
# Blazeface
## Get Model
```
python -m paddle_serving_app.package --get_model blazeface
tar -xzvf blazeface.tar.gz
```
## RPC Service
### Start Service
```
python -m paddle_serving_server.serve --model serving_server --port 9494
```
### Client Prediction
```
python test_client.py serving_client/serving_client_conf.prototxt test.jpg
```
the result is in `output` folder, including a json file and image file with bounding boxes.
...@@ -13,19 +13,26 @@ ...@@ -13,19 +13,26 @@
# limitations under the License. # limitations under the License.
from paddle_serving_client import Client from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader from paddle_serving_app.reader import *
import cv2 import sys
import numpy as np
preprocess = Sequential([
File2Image(),
Normalize([104, 117, 123], [127.502231, 127.502231, 127.502231], False)
])
postprocess = BlazeFacePostprocess("label_list.txt", "output")
client = Client() client = Client()
client.load_client_config("ocr_rec_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
image_file_list = ["./test_rec.jpg"] client.load_client_config(sys.argv[1])
img = cv2.imread(image_file_list[0]) client.connect(['127.0.0.1:9494'])
ocr_reader = OCRReader()
feed = {"image": ocr_reader.preprocess([img])} im_0 = preprocess(sys.argv[2])
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] tmp = Transpose((2, 0, 1))
fetch_map = client.predict(feed=feed, fetch=fetch) im = tmp(im_0)
rec_res = ocr_reader.postprocess(fetch_map) fetch_map = client.predict(
print(image_file_list[0]) feed={"image": im}, fetch=["detection_output_0.tmp_0"])
print(rec_res[0][0]) fetch_map["image"] = sys.argv[2]
fetch_map["im_shape"] = im_0.shape
postprocess(fetch_map)
...@@ -27,7 +27,7 @@ mv cube_app/cube* ./cube/ ...@@ -27,7 +27,7 @@ mv cube_app/cube* ./cube/
sh cube_prepare.sh & sh cube_prepare.sh &
``` ```
Here, the sparse parameter is loaded by cube sparse parameter indexing service Cube,for more details please read [Cube: Sparse Parameter Indexing Service (Local Mode)](../../../doc/CUBE_LOCAL.md) Here, the sparse parameter is loaded by cube sparse parameter indexing service Cube.
### Start RPC Predictor, the number of serving thread is 4(configurable in test_server.py) ### Start RPC Predictor, the number of serving thread is 4(configurable in test_server.py)
...@@ -45,7 +45,7 @@ python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data ...@@ -45,7 +45,7 @@ python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
CPU :Intel(R) Xeon(R) CPU 6148 @ 2.40GHz CPU :Intel(R) Xeon(R) CPU 6148 @ 2.40GHz
Model :[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/ctr_criteo_with_cube/network_conf.py) Model :[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/criteo_ctr_with_cube/network_conf.py)
server core/thread num : 4/8 server core/thread num : 4/8
......
...@@ -25,7 +25,7 @@ mv cube_app/cube* ./cube/ ...@@ -25,7 +25,7 @@ mv cube_app/cube* ./cube/
sh cube_prepare.sh & sh cube_prepare.sh &
``` ```
此处,模型当中的稀疏参数会被存放在稀疏参数索引服务Cube当中,关于稀疏参数索引服务Cube的介绍,请阅读[稀疏参数索引服务Cube单机版使用指南](../../../doc/CUBE_LOCAL_CN.md) 此处,模型当中的稀疏参数会被存放在稀疏参数索引服务Cube当中
### 启动RPC预测服务,服务端线程数为4(可在test_server.py配置) ### 启动RPC预测服务,服务端线程数为4(可在test_server.py配置)
...@@ -43,7 +43,7 @@ python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data ...@@ -43,7 +43,7 @@ python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
设备 :Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 设备 :Intel(R) Xeon(R) CPU 6148 @ 2.40GHz
模型 :[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/ctr_criteo_with_cube/network_conf.py) 模型 :[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/criteo_ctr_with_cube/network_conf.py)
server core/thread num : 4/8 server core/thread num : 4/8
......
...@@ -24,11 +24,13 @@ from paddle_serving_client.utils import MultiThreadRunner ...@@ -24,11 +24,13 @@ from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args from paddle_serving_client.utils import benchmark_args
from paddle_serving_client.metric import auc from paddle_serving_client.metric import auc
py_version = sys.version_info[0]
args = benchmark_args() args = benchmark_args()
def single_func(idx, resource): def single_func(idx, resource):
client = Client() client = Client()
print([resource["endpoint"][idx % len(resource["endpoint"])]])
client.load_client_config('ctr_client_conf/serving_client_conf.prototxt') client.load_client_config('ctr_client_conf/serving_client_conf.prototxt')
client.connect(['127.0.0.1:9292']) client.connect(['127.0.0.1:9292'])
batch = 1 batch = 1
...@@ -40,27 +42,32 @@ def single_func(idx, resource): ...@@ -40,27 +42,32 @@ def single_func(idx, resource):
] ]
reader = dataset.infer_reader(test_filelists[len(test_filelists) - 40:], reader = dataset.infer_reader(test_filelists[len(test_filelists) - 40:],
batch, buf_size) batch, buf_size)
args.batch_size = 1
if args.request == "rpc": if args.request == "rpc":
fetch = ["prob"] fetch = ["prob"]
print("Start Time")
start = time.time() start = time.time()
itr = 1000 itr = 1000
for ei in range(itr): for ei in range(itr):
if args.batch_size == 1: if args.batch_size > 0:
data = reader().next() feed_batch = []
feed_dict = {} for bi in range(args.batch_size):
feed_dict['dense_input'] = data[0][0] if py_version == 2:
for i in range(1, 27): data = reader().next()
feed_dict["embedding_{}.tmp_0".format(i - 1)] = data[0][i] else:
result = client.predict(feed=feed_dict, fetch=fetch) data = reader().__next__()
feed_dict = {}
feed_dict['dense_input'] = data[0][0]
for i in range(1, 27):
feed_dict["embedding_{}.tmp_0".format(i - 1)] = data[0][
i]
feed_batch.append(feed_dict)
result = client.predict(feed=feed_batch, fetch=fetch)
else: else:
print("unsupport batch size {}".format(args.batch_size)) print("unsupport batch size {}".format(args.batch_size))
elif args.request == "http": elif args.request == "http":
raise ("Not support http service.") raise ("Not support http service.")
end = time.time() end = time.time()
qps = itr / (end - start) qps = itr * args.batch_size / (end - start)
return [[end - start, qps]] return [[end - start, qps]]
...@@ -68,13 +75,17 @@ if __name__ == '__main__': ...@@ -68,13 +75,17 @@ if __name__ == '__main__':
multi_thread_runner = MultiThreadRunner() multi_thread_runner = MultiThreadRunner()
endpoint_list = ["127.0.0.1:9292"] endpoint_list = ["127.0.0.1:9292"]
#result = single_func(0, {"endpoint": endpoint_list}) #result = single_func(0, {"endpoint": endpoint_list})
start = time.time()
result = multi_thread_runner.run(single_func, args.thread, result = multi_thread_runner.run(single_func, args.thread,
{"endpoint": endpoint_list}) {"endpoint": endpoint_list})
end = time.time()
total_cost = end - start
avg_cost = 0 avg_cost = 0
qps = 0 qps = 0
for i in range(args.thread): for i in range(args.thread):
avg_cost += result[0][i * 2 + 0] avg_cost += result[0][i * 2 + 0]
qps += result[0][i * 2 + 1] qps += result[0][i * 2 + 1]
avg_cost = avg_cost / args.thread avg_cost = avg_cost / args.thread
print("total cost: {}".format(total_cost))
print("average total cost {} s.".format(avg_cost)) print("average total cost {} s.".format(avg_cost))
print("qps {} ins/s".format(qps)) print("qps {} ins/s".format(qps))
rm profile_log rm profile_log
batch_size=1 export FLAGS_profile_client=1
for thread_num in 1 2 4 8 16 export FLAGS_profile_server=1
wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz --no-check-certificate
tar xf ctr_cube_unittest.tar.gz
mv models/ctr_client_conf ./
mv models/ctr_serving_model_kv ./
mv models/data ./cube/
wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz --no-check-certificate
tar xf cube_app.tar.gz
mv cube_app/cube* ./cube/
sh cube_prepare.sh &
python test_server.py ctr_serving_model_kv > serving_log 2>&1 &
for thread_num in 1 4 16
do do
$PYTHONROOT/bin/python benchmark.py --thread $thread_num --model ctr_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1 for batch_size in 1 4 16 64
do
$PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
echo "batch size : $batch_size"
echo "thread num : $thread_num"
echo "========================================" echo "========================================"
echo "batch size : $batch_size" >> profile_log echo "batch size : $batch_size" >> profile_log
$PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
tail -n 2 profile >> profile_log tail -n 3 profile >> profile_log
done
done done
ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
# -*- coding: utf-8 -*-
#
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import Client
import sys
import os
import criteo as criteo
import time
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args
from paddle_serving_client.metric import auc
args = benchmark_args()
def single_func(idx, resource):
client = Client()
print([resource["endpoint"][idx % len(resource["endpoint"])]])
client.load_client_config('ctr_client_conf/serving_client_conf.prototxt')
client.connect(['127.0.0.1:9292'])
batch = 1
buf_size = 100
dataset = criteo.CriteoDataset()
dataset.setup(1000001)
test_filelists = [
"./raw_data/part-%d" % x for x in range(len(os.listdir("./raw_data")))
]
reader = dataset.infer_reader(test_filelists[len(test_filelists) - 40:],
batch, buf_size)
if args.request == "rpc":
fetch = ["prob"]
start = time.time()
itr = 1000
for ei in range(itr):
if args.batch_size > 1:
feed_batch = []
for bi in range(args.batch_size):
data = reader().next()
feed_dict = {}
feed_dict['dense_input'] = data[0][0]
for i in range(1, 27):
feed_dict["embedding_{}.tmp_0".format(i - 1)] = data[0][
i]
feed_batch.append(feed_dict)
result = client.predict(feed=feed_batch, fetch=fetch)
else:
print("unsupport batch size {}".format(args.batch_size))
elif args.request == "http":
raise ("Not support http service.")
end = time.time()
qps = itr * args.batch_size / (end - start)
return [[end - start, qps]]
if __name__ == '__main__':
multi_thread_runner = MultiThreadRunner()
endpoint_list = ["127.0.0.1:9292"]
#result = single_func(0, {"endpoint": endpoint_list})
result = multi_thread_runner.run(single_func, args.thread,
{"endpoint": endpoint_list})
print(result)
avg_cost = 0
qps = 0
for i in range(args.thread):
avg_cost += result[0][i * 2 + 0]
qps += result[0][i * 2 + 1]
avg_cost = avg_cost / args.thread
print("average total cost {} s.".format(avg_cost))
print("qps {} ins/s".format(qps))
rm profile_log
for thread_num in 1 2 4 8 16
do
for batch_size in 1 2 4 8 16 32 64 128 256 512
do
$PYTHONROOT/bin/python benchmark_batch.py --thread $thread_num --batch_size $batch_size --model serving_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
echo "========================================"
echo "batch size : $batch_size" >> profile_log
$PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
tail -n 2 profile >> profile_log
done
done
rm profile_log
#wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz --no-check-certificate
#tar xf ctr_cube_unittest.tar.gz
mv models/ctr_client_conf ./
mv models/ctr_serving_model_kv ./
mv models/data ./cube/
#wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz --no-check-certificate
#tar xf cube_app.tar.gz
mv cube_app/cube* ./cube/
sh cube_prepare.sh &
cp ../../../build_server/core/cube/cube-api/cube-cli .
python gen_key.py
for thread_num in 1 4 16 32
do
for batch_size in 1000
do
./cube-cli -config_file ./cube/conf/cube.conf -keys key -dict test_dict -thread_num $thread_num --batch $batch_size > profile 2>&1
echo "batch size : $batch_size"
echo "thread num : $thread_num"
echo "========================================"
echo "batch size : $batch_size" >> profile_log
echo "thread num : $thread_num" >> profile_log
tail -n 8 profile >> profile_log
done
done
ps -ef|grep 'cube'|grep -v grep|cut -c 9-15 | xargs kill -9
...@@ -16,7 +16,5 @@ ...@@ -16,7 +16,5 @@
mkdir -p cube_model mkdir -p cube_model
mkdir -p cube/data mkdir -p cube/data
./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature
./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1 -only_build=false ./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1 -only_build=false
mv ./cube/data/0_0/test_dict_part0/* ./cube/data/ cd cube && ./cube
cd cube && ./cube
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import random
with open("key", "w") as f:
for i in range(1000000):
f.write("{}\n".format(random.randint(0, 999999)))
...@@ -20,6 +20,8 @@ import criteo as criteo ...@@ -20,6 +20,8 @@ import criteo as criteo
import time import time
from paddle_serving_client.metric import auc from paddle_serving_client.metric import auc
py_version = sys.version_info[0]
client = Client() client = Client()
client.load_client_config(sys.argv[1]) client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"]) client.connect(["127.0.0.1:9292"])
...@@ -34,7 +36,10 @@ label_list = [] ...@@ -34,7 +36,10 @@ label_list = []
prob_list = [] prob_list = []
start = time.time() start = time.time()
for ei in range(10000): for ei in range(10000):
data = reader().next() if py_version == 2:
data = reader().next()
else:
data = reader().__next__()
feed_dict = {} feed_dict = {}
feed_dict['dense_input'] = data[0][0] feed_dict['dense_input'] = data[0][0]
for i in range(1, 27): for i in range(1, 27):
......
...@@ -33,5 +33,9 @@ server = Server() ...@@ -33,5 +33,9 @@ server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence()) server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(4) server.set_num_threads(4)
server.load_model_config(sys.argv[1]) server.load_model_config(sys.argv[1])
server.prepare_server(workdir="work_dir1", port=9292, device="cpu") server.prepare_server(
workdir="work_dir1",
port=9292,
device="cpu",
cube_conf="./cube/conf/cube.conf")
server.run_server() server.run_server()
...@@ -33,5 +33,9 @@ server = Server() ...@@ -33,5 +33,9 @@ server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence()) server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(4) server.set_num_threads(4)
server.load_model_config(sys.argv[1]) server.load_model_config(sys.argv[1])
server.prepare_server(workdir="work_dir1", port=9292, device="cpu") server.prepare_server(
workdir="work_dir1",
port=9292,
device="cpu",
cube_conf="./cube/conf/cube.conf")
server.run_server() server.run_server()
...@@ -33,5 +33,9 @@ server = Server() ...@@ -33,5 +33,9 @@ server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence()) server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(4) server.set_num_threads(4)
server.load_model_config(sys.argv[1]) server.load_model_config(sys.argv[1])
server.prepare_server(workdir="work_dir1", port=9292, device="cpu") server.prepare_server(
workdir="work_dir1",
port=9292,
device="cpu",
cube_conf="./cube/conf/cube.conf")
server.run_server() server.run_server()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import argparse
def parse_args():
parser = argparse.ArgumentParser(description="PaddlePaddle CTR example")
parser.add_argument(
'--train_data_path',
type=str,
default='./data/raw/train.txt',
help="The path of training dataset")
parser.add_argument(
'--sparse_only',
type=bool,
default=False,
help="Whether we use sparse features only")
parser.add_argument(
'--test_data_path',
type=str,
default='./data/raw/valid.txt',
help="The path of testing dataset")
parser.add_argument(
'--batch_size',
type=int,
default=1000,
help="The size of mini-batch (default:1000)")
parser.add_argument(
'--embedding_size',
type=int,
default=10,
help="The size for embedding layer (default:10)")
parser.add_argument(
'--num_passes',
type=int,
default=10,
help="The number of passes to train (default: 10)")
parser.add_argument(
'--model_output_dir',
type=str,
default='models',
help='The path for model to store (default: models)')
parser.add_argument(
'--sparse_feature_dim',
type=int,
default=1000001,
help='sparse feature hashing space for index processing')
parser.add_argument(
'--is_local',
type=int,
default=1,
help='Local train or distributed train (default: 1)')
parser.add_argument(
'--cloud_train',
type=int,
default=0,
help='Local train or distributed train on paddlecloud (default: 0)')
parser.add_argument(
'--async_mode',
action='store_true',
default=False,
help='Whether start pserver in async mode to support ASGD')
parser.add_argument(
'--no_split_var',
action='store_true',
default=False,
help='Whether split variables into blocks when update_method is pserver')
parser.add_argument(
'--role',
type=str,
default='pserver', # trainer or pserver
help='The path for model to store (default: models)')
parser.add_argument(
'--endpoints',
type=str,
default='127.0.0.1:6000',
help='The pserver endpoints, like: 127.0.0.1:6000,127.0.0.1:6001')
parser.add_argument(
'--current_endpoint',
type=str,
default='127.0.0.1:6000',
help='The path for model to store (default: 127.0.0.1:6000)')
parser.add_argument(
'--trainer_id',
type=int,
default=0,
help='The path for model to store (default: models)')
parser.add_argument(
'--trainers',
type=int,
default=1,
help='The num of trianers, (default: 1)')
return parser.parse_args()
ps -ef | grep cube | awk {'print $2'} | xargs kill -9
rm -rf cube/cube_data cube/data cube/log* cube/nohup* cube/output/ cube/donefile cube/input cube/monitor cube/cube-builder.INFO
ps -ef | grep test | awk {'print $2'} | xargs kill -9
ps -ef | grep serving | awk {'print $2'} | xargs kill -9
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
class CriteoDataset(object):
def setup(self, sparse_feature_dim):
self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
self.cont_max_ = [
20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
]
self.cont_diff_ = [
20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
]
self.hash_dim_ = sparse_feature_dim
# here, training data are lines with line_index < train_idx_
self.train_idx_ = 41256555
self.continuous_range_ = range(1, 14)
self.categorical_range_ = range(14, 40)
def _process_line(self, line):
features = line.rstrip('\n').split('\t')
dense_feature = []
sparse_feature = []
for idx in self.continuous_range_:
if features[idx] == '':
dense_feature.append(0.0)
else:
dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
self.cont_diff_[idx - 1])
for idx in self.categorical_range_:
sparse_feature.append(
[hash(str(idx) + features[idx]) % self.hash_dim_])
return dense_feature, sparse_feature, [int(features[0])]
def infer_reader(self, filelist, batch, buf_size):
def local_iter():
for fname in filelist:
with open(fname.strip(), "r") as fin:
for line in fin:
dense_feature, sparse_feature, label = self._process_line(
line)
#yield dense_feature, sparse_feature, label
yield [dense_feature] + sparse_feature + [label]
import paddle
batch_iter = paddle.batch(
paddle.reader.shuffle(
local_iter, buf_size=buf_size),
batch_size=batch)
return batch_iter
def generate_sample(self, line):
def data_iter():
dense_feature, sparse_feature, label = self._process_line(line)
feature_name = ["dense_input"]
for idx in self.categorical_range_:
feature_name.append("C" + str(idx - 13))
feature_name.append("label")
yield zip(feature_name, [dense_feature] + sparse_feature + [label])
return data_iter
if __name__ == "__main__":
criteo_dataset = CriteoDataset()
criteo_dataset.setup(int(sys.argv[1]))
criteo_dataset.run_from_stdin()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import sys
import paddle.fluid.incubate.data_generator as dg
class CriteoDataset(dg.MultiSlotDataGenerator):
def setup(self, sparse_feature_dim):
self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
self.cont_max_ = [
20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
]
self.cont_diff_ = [
20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
]
self.hash_dim_ = sparse_feature_dim
# here, training data are lines with line_index < train_idx_
self.train_idx_ = 41256555
self.continuous_range_ = range(1, 14)
self.categorical_range_ = range(14, 40)
def _process_line(self, line):
features = line.rstrip('\n').split('\t')
dense_feature = []
sparse_feature = []
for idx in self.continuous_range_:
if features[idx] == '':
dense_feature.append(0.0)
else:
dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
self.cont_diff_[idx - 1])
for idx in self.categorical_range_:
sparse_feature.append(
[hash(str(idx) + features[idx]) % self.hash_dim_])
return dense_feature, sparse_feature, [int(features[0])]
def infer_reader(self, filelist, batch, buf_size):
def local_iter():
for fname in filelist:
with open(fname.strip(), "r") as fin:
for line in fin:
dense_feature, sparse_feature, label = self._process_line(
line)
#yield dense_feature, sparse_feature, label
yield [dense_feature] + sparse_feature + [label]
import paddle
batch_iter = paddle.batch(
paddle.reader.shuffle(
local_iter, buf_size=buf_size),
batch_size=batch)
return batch_iter
def generate_sample(self, line):
def data_iter():
dense_feature, sparse_feature, label = self._process_line(line)
feature_name = ["dense_input"]
for idx in self.categorical_range_:
feature_name.append("C" + str(idx - 13))
feature_name.append("label")
yield zip(feature_name, [dense_feature] + sparse_feature + [label])
return data_iter
if __name__ == "__main__":
criteo_dataset = CriteoDataset()
criteo_dataset.setup(int(sys.argv[1]))
criteo_dataset.run_from_stdin()
[{
"dict_name": "test_dict",
"shard": 1,
"dup": 1,
"timeout": 200,
"retry": 3,
"backup_request": 100,
"type": "ipport_list",
"load_balancer": "rr",
"nodes": [{
"ipport_list": "list://127.0.0.1:8027"
}]
}]
--port=8027
--dict_split=1
--in_mem=true
--log_dir=./log/
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
#! /bin/bash
mkdir -p cube_model
mkdir -p cube/data
./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature
./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1 -only_build=false
mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
cd cube && ./cube
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
#! /bin/bash
mkdir -p cube_model
mkdir -p cube/data
./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8
./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=${PWD}/cube/data -shard_num=1 -only_build=false
mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
cd cube && ./cube
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/data/ctr_prediction/ctr_data.tar.gz
tar -zxvf ctr_data.tar.gz
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from __future__ import print_function
from args import parse_args
import os
import paddle.fluid as fluid
import sys
from network_conf import dnn_model
dense_feature_dim = 13
def train():
args = parse_args()
sparse_only = args.sparse_only
if not os.path.isdir(args.model_output_dir):
os.mkdir(args.model_output_dir)
dense_input = fluid.layers.data(
name="dense_input", shape=[dense_feature_dim], dtype='float32')
sparse_input_ids = [
fluid.layers.data(
name="C" + str(i), shape=[1], lod_level=1, dtype="int64")
for i in range(1, 27)
]
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
#nn_input = None if sparse_only else dense_input
nn_input = dense_input
predict_y, loss, auc_var, batch_auc_var, infer_vars = dnn_model(
nn_input, sparse_input_ids, label, args.embedding_size,
args.sparse_feature_dim)
optimizer = fluid.optimizer.SGD(learning_rate=1e-4)
optimizer.minimize(loss)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
dataset = fluid.DatasetFactory().create_dataset("InMemoryDataset")
dataset.set_use_var([dense_input] + sparse_input_ids + [label])
python_executable = "python"
pipe_command = "{} criteo_reader.py {}".format(python_executable,
args.sparse_feature_dim)
dataset.set_pipe_command(pipe_command)
dataset.set_batch_size(128)
thread_num = 10
dataset.set_thread(thread_num)
whole_filelist = [
"raw_data/part-%d" % x for x in range(len(os.listdir("raw_data")))
]
print(whole_filelist)
dataset.set_filelist(whole_filelist[:100])
dataset.load_into_memory()
fluid.layers.Print(auc_var)
epochs = 1
for i in range(epochs):
exe.train_from_dataset(
program=fluid.default_main_program(), dataset=dataset, debug=True)
print("epoch {} finished".format(i))
import paddle_serving_client.io as server_io
feed_var_dict = {}
feed_var_dict['dense_input'] = dense_input
for i, sparse in enumerate(sparse_input_ids):
feed_var_dict["embedding_{}.tmp_0".format(i)] = sparse
fetch_var_dict = {"prob": predict_y}
feed_kv_dict = {}
feed_kv_dict['dense_input'] = dense_input
for i, emb in enumerate(infer_vars):
feed_kv_dict["embedding_{}.tmp_0".format(i)] = emb
fetch_var_dict = {"prob": predict_y}
server_io.save_model("ctr_serving_model", "ctr_client_conf", feed_var_dict,
fetch_var_dict, fluid.default_main_program())
server_io.save_model("ctr_serving_model_kv", "ctr_client_conf_kv",
feed_kv_dict, fetch_var_dict,
fluid.default_main_program())
if __name__ == '__main__':
train()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import paddle.fluid as fluid
import math
def dnn_model(dense_input, sparse_inputs, label, embedding_size,
sparse_feature_dim):
def embedding_layer(input):
emb = fluid.layers.embedding(
input=input,
is_sparse=True,
is_distributed=False,
size=[sparse_feature_dim, embedding_size],
param_attr=fluid.ParamAttr(
name="SparseFeatFactors",
initializer=fluid.initializer.Uniform()))
x = fluid.layers.sequence_pool(input=emb, pool_type='sum')
return emb, x
def mlp_input_tensor(emb_sums, dense_tensor):
#if isinstance(dense_tensor, fluid.Variable):
# return fluid.layers.concat(emb_sums, axis=1)
#else:
return fluid.layers.concat(emb_sums + [dense_tensor], axis=1)
def mlp(mlp_input):
fc1 = fluid.layers.fc(input=mlp_input,
size=400,
act='relu',
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Normal(
scale=1 / math.sqrt(mlp_input.shape[1]))))
fc2 = fluid.layers.fc(input=fc1,
size=400,
act='relu',
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Normal(
scale=1 / math.sqrt(fc1.shape[1]))))
fc3 = fluid.layers.fc(input=fc2,
size=400,
act='relu',
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Normal(
scale=1 / math.sqrt(fc2.shape[1]))))
pre = fluid.layers.fc(input=fc3,
size=2,
act='softmax',
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Normal(
scale=1 / math.sqrt(fc3.shape[1]))))
return pre
emb_pair_sums = list(map(embedding_layer, sparse_inputs))
emb_sums = [x[1] for x in emb_pair_sums]
infer_vars = [x[0] for x in emb_pair_sums]
mlp_in = mlp_input_tensor(emb_sums, dense_input)
predict = mlp(mlp_in)
cost = fluid.layers.cross_entropy(input=predict, label=label)
avg_cost = fluid.layers.reduce_sum(cost)
accuracy = fluid.layers.accuracy(input=predict, label=label)
auc_var, batch_auc_var, auc_states = \
fluid.layers.auc(input=predict, label=label, num_thresholds=2 ** 12, slide_steps=20)
return predict, avg_cost, auc_var, batch_auc_var, infer_vars
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import MultiLangClient as Client
import sys
import os
import criteo as criteo
import time
from paddle_serving_client.metric import auc
import grpc
client = Client()
client.connect(["127.0.0.1:9292"])
batch = 1
buf_size = 100
dataset = criteo.CriteoDataset()
dataset.setup(1000001)
test_filelists = ["{}/part-0".format(sys.argv[1])]
reader = dataset.infer_reader(test_filelists, batch, buf_size)
label_list = []
prob_list = []
start = time.time()
for ei in range(10000):
data = reader().next()
feed_dict = {}
feed_dict['dense_input'] = data[0][0]
for i in range(1, 27):
feed_dict["embedding_{}.tmp_0".format(i - 1)] = data[0][i]
fetch_map = client.predict(feed=feed_dict, fetch=["prob"])
if fetch_map["serving_status_code"] == 0:
prob_list.append(fetch_map['prob'][0][1])
label_list.append(data[0][-1][0])
print(auc(label_list, prob_list))
end = time.time()
print(end - start)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import os
import sys
from paddle_serving_server import OpMaker
from paddle_serving_server import OpSeqMaker
from paddle_serving_server import MultiLangServer as Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
general_dist_kv_infer_op = op_maker.create('general_dist_kv_infer')
response_op = op_maker.create('general_response')
op_seq_maker = OpSeqMaker()
op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_dist_kv_infer_op)
op_seq_maker.add_op(response_op)
server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(4)
server.load_model_config(sys.argv[1], sys.argv[2])
server.prepare_server(
workdir="work_dir1",
port=9292,
device="cpu",
cube_conf="./cube/conf/cube.conf")
server.run_server()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import os
import sys
from paddle_serving_server_gpu import OpMaker
from paddle_serving_server_gpu import OpSeqMaker
from paddle_serving_server_gpu import MultiLangServer as Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
general_dist_kv_infer_op = op_maker.create('general_dist_kv_infer')
response_op = op_maker.create('general_response')
op_seq_maker = OpSeqMaker()
op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_dist_kv_infer_op)
op_seq_maker.add_op(response_op)
server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(4)
server.load_model_config(sys.argv[1], sys.argv[2])
server.prepare_server(
workdir="work_dir1",
port=9292,
device="cpu",
cube_conf="./cube/conf/cube.conf")
server.run_server()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import os
import sys
from paddle_serving_server import OpMaker
from paddle_serving_server import OpSeqMaker
from paddle_serving_server import MultiLangServer as Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
general_dist_kv_infer_op = op_maker.create('general_dist_kv_quant_infer')
response_op = op_maker.create('general_response')
op_seq_maker = OpSeqMaker()
op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_dist_kv_infer_op)
op_seq_maker.add_op(response_op)
server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(4)
server.load_model_config(sys.argv[1], sys.argv[2])
server.prepare_server(
workdir="work_dir1",
port=9292,
device="cpu",
cube_conf="./cube/conf/cube.conf")
server.run_server()
# 线性回归预测服务示例
## 获取数据
```shell
sh get_data.sh
```
## 开启 gRPC 服务端
``` shell
python test_server.py uci_housing_model/
```
也可以通过下面的一行代码开启默认 gRPC 服务:
```shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang
```
## 客户端预测
### 同步预测
``` shell
python test_sync_client.py
```
### 异步预测
``` shell
python test_asyn_client.py
```
### Batch 预测
``` shell
python test_batch_client.py
```
### 通用 pb 预测
``` shell
python test_general_pb_client.py
```
### 预测超时
``` shell
python test_timeout_client.py
```
### List 输入
``` shell
python test_list_input_client.py
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import MultiLangClient as Client
import functools
import time
import threading
import grpc
client = Client()
client.connect(["127.0.0.1:9393"])
complete_task_count = [0]
lock = threading.Lock()
def call_back(call_future):
try:
fetch_map = call_future.result()
print(fetch_map)
except grpc.RpcError as e:
print(e.code())
finally:
with lock:
complete_task_count[0] += 1
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283,
0.4919, 0.1856, 0.0795, -0.0332
]
task_count = 0
for i in range(3):
future = client.predict(feed={"x": x}, fetch=["price"], asyn=True)
task_count += 1
future.add_done_callback(functools.partial(call_back))
while complete_task_count[0] != task_count:
time.sleep(0.1)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import MultiLangClient as Client
client = Client()
client.connect(["127.0.0.1:9393"])
batch_size = 2
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283,
0.4919, 0.1856, 0.0795, -0.0332
]
for i in range(3):
batch_feed = [{"x": x} for j in range(batch_size)]
fetch_map = client.predict(feed=batch_feed, fetch=["price"])
if fetch_map["serving_status_code"] == 0:
print(fetch_map)
else:
print(fetch_map["serving_status_code"])
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import MultiLangClient as Client
client = Client()
client.connect(["127.0.0.1:9393"])
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283,
0.4919, 0.1856, 0.0795, -0.0332
]
for i in range(3):
fetch_map = client.predict(feed={"x": x}, fetch=["price"], is_python=False)
if fetch_map["serving_status_code"] == 0:
print(fetch_map)
else:
print(fetch_map["serving_status_code"])
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import MultiLangClient as Client
import numpy as np
client = Client()
client.connect(["127.0.0.1:9393"])
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283,
0.4919, 0.1856, 0.0795, -0.0332
]
for i in range(3):
fetch_map = client.predict(feed={"x": np.array(x)}, fetch=["price"])
if fetch_map["serving_status_code"] == 0:
print(fetch_map)
else:
print(fetch_map["serving_status_code"])
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import os
import sys
from paddle_serving_server import OpMaker
from paddle_serving_server import OpSeqMaker
from paddle_serving_server import MultiLangServer as Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
general_infer_op = op_maker.create('general_infer')
response_op = op_maker.create('general_response')
op_seq_maker = OpSeqMaker()
op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(response_op)
server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence())
server.load_model_config(sys.argv[1])
server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
server.run_server()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import os
import sys
from paddle_serving_server_gpu import OpMaker
from paddle_serving_server_gpu import OpSeqMaker
from paddle_serving_server_gpu import MultiLangServer as Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
general_infer_op = op_maker.create('general_infer')
response_op = op_maker.create('general_response')
op_seq_maker = OpSeqMaker()
op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(response_op)
server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence())
server.load_model_config(sys.argv[1])
server.set_gpuid(0)
server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
server.run_server()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import MultiLangClient as Client
client = Client()
client.connect(["127.0.0.1:9393"])
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283,
0.4919, 0.1856, 0.0795, -0.0332
]
for i in range(3):
fetch_map = client.predict(feed={"x": x}, fetch=["price"])
if fetch_map["serving_status_code"] == 0:
print(fetch_map)
else:
print(fetch_map["serving_status_code"])
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import MultiLangClient as Client
import grpc
client = Client()
client.connect(["127.0.0.1:9393"])
client.set_rpc_timeout_ms(1)
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283,
0.4919, 0.1856, 0.0795, -0.0332
]
for i in range(3):
fetch_map = client.predict(feed={"x": x}, fetch=["price"])
if fetch_map["serving_status_code"] == 0:
print(fetch_map)
elif fetch_map["serving_status_code"] == grpc.StatusCode.DEADLINE_EXCEEDED:
print('timeout')
else:
print(fetch_map["serving_status_code"])
wget --no-check-certificate https://fleet.bj.bcebos.com/text_classification_data.tar.gz
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz
tar -zxvf text_classification_data.tar.gz
tar -zxvf imdb_model.tar.gz
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import sys
import os
import paddle
import re
import paddle.fluid.incubate.data_generator as dg
py_version = sys.version_info[0]
class IMDBDataset(dg.MultiSlotDataGenerator):
def load_resource(self, dictfile):
self._vocab = {}
wid = 0
if py_version == 2:
with open(dictfile) as f:
for line in f:
self._vocab[line.strip()] = wid
wid += 1
else:
with open(dictfile, encoding="utf-8") as f:
for line in f:
self._vocab[line.strip()] = wid
wid += 1
self._unk_id = len(self._vocab)
self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
def get_words_only(self, line):
sent = line.lower().replace("<br />", " ").strip()
words = [x for x in self._pattern.split(sent) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas
def get_words_and_label(self, line):
send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
" ").strip()
label = [int(line.split('|')[-1])]
words = [x for x in self._pattern.split(send) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas, label
def infer_reader(self, infer_filelist, batch, buf_size):
def local_iter():
for fname in infer_filelist:
with open(fname, "r") as fin:
for line in fin:
feas, label = self.get_words_and_label(line)
yield feas, label
import paddle
batch_iter = paddle.batch(
paddle.reader.shuffle(
local_iter, buf_size=buf_size),
batch_size=batch)
return batch_iter
def generate_sample(self, line):
def memory_iter():
for i in range(1000):
yield self.return_value
def data_iter():
feas, label = self.get_words_and_label(line)
yield ("words", feas), ("label", label)
return data_iter
if __name__ == "__main__":
imdb = IMDBDataset()
imdb.load_resource("imdb.vocab")
imdb.run_from_stdin()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import MultiLangClient
from imdb_reader import IMDBDataset
client = MultiLangClient()
# If you have more than one model, make sure that the input
# and output of more than one model are the same.
client.connect(["127.0.0.1:9393"])
# you can define any english sentence or dataset here
# This example reuses imdb reader in training, you
# can define your own data preprocessing easily.
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')
for i in range(3):
line = 'i am very sad | 0'
word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids}
fetch = ["prediction"]
fetch_maps = client.predict(feed=feed, fetch=fetch)
for model, fetch_map in fetch_maps.items():
if model == "serving_status_code":
continue
print("step: {}, model: {}, res: {}".format(i, model, fetch_map))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_server import OpMaker
from paddle_serving_server import OpGraphMaker
from paddle_serving_server import MultiLangServer
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
cnn_infer_op = op_maker.create(
'general_infer', engine_name='cnn', inputs=[read_op])
bow_infer_op = op_maker.create(
'general_infer', engine_name='bow', inputs=[read_op])
response_op = op_maker.create(
'general_response', inputs=[cnn_infer_op, bow_infer_op])
op_graph_maker = OpGraphMaker()
op_graph_maker.add_op(read_op)
op_graph_maker.add_op(cnn_infer_op)
op_graph_maker.add_op(bow_infer_op)
op_graph_maker.add_op(response_op)
server = MultiLangServer()
server.set_op_graph(op_graph_maker.get_op_graph())
model_config = {cnn_infer_op: 'imdb_cnn_model', bow_infer_op: 'imdb_bow_model'}
server.load_model_config(model_config)
server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
server.run_server()
# Yolov4 Detection Service
([简体中文](README_CN.md)|English)
## Get Model
```
python -m paddle_serving_app.package --get_model yolov4
tar -xzvf yolov4.tar.gz
```
## Start RPC Service
```
python -m paddle_serving_server_gpu.serve --model yolov4_model --port 9393 --gpu_ids 0 --use_multilang
```
## Prediction
```
python test_client.py 000000570688.jpg
```
After the prediction is completed, a json file to save the prediction result and a picture with the detection result box will be generated in the `./outpu folder.
# Yolov4 检测服务
(简体中文|[English](README.md))
## 获取模型
```
python -m paddle_serving_app.package --get_model yolov4
tar -xzvf yolov4.tar.gz
```
## 启动RPC服务
```
python -m paddle_serving_server_gpu.serve --model yolov4_model --port 9393 --gpu_ids 0 --use_multilang
```
## 预测
```
python test_client.py 000000570688.jpg
```
预测完成会在`./output`文件夹下生成保存预测结果的json文件以及标出检测结果框的图片。
person
bicycle
car
motorcycle
airplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
couch
potted plant
bed
dining table
toilet
tv
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import numpy as np
from paddle_serving_client import MultiLangClient as Client
from paddle_serving_app.reader import *
import cv2
preprocess = Sequential([
File2Image(), BGR2RGB(), Resize(
(608, 608), interpolation=cv2.INTER_LINEAR), Div(255.0), Transpose(
(2, 0, 1))
])
postprocess = RCNNPostprocess("label_list.txt", "output", [608, 608])
client = Client()
client.connect(['127.0.0.1:9393'])
# client.set_rpc_timeout_ms(10000)
im = preprocess(sys.argv[1])
fetch_map = client.predict(
feed={
"image": im,
"im_size": np.array(list(im.shape[1:])),
},
fetch=["save_infer_model/scale_0.tmp_0"])
fetch_map.pop("serving_status_code")
fetch_map["image"] = sys.argv[1]
postprocess(fetch_map)
...@@ -24,38 +24,43 @@ import json ...@@ -24,38 +24,43 @@ import json
import base64 import base64
from paddle_serving_client import Client from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args from paddle_serving_client.utils import benchmark_args, show_latency
from paddle_serving_app.reader import Sequential, URL2Image, Resize from paddle_serving_app.reader import Sequential, File2Image, Resize
from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize
args = benchmark_args() args = benchmark_args()
seq_preprocess = Sequential([ seq_preprocess = Sequential([
URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)), File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True) Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
]) ])
def single_func(idx, resource): def single_func(idx, resource):
file_list = [] file_list = []
turns = resource["turns"]
latency_flags = False
if os.getenv("FLAGS_serving_latency"):
latency_flags = True
latency_list = []
for file_name in os.listdir("./image_data/n01440764"): for file_name in os.listdir("./image_data/n01440764"):
file_list.append(file_name) file_list.append(file_name)
img_list = [] img_list = []
for i in range(1000): for i in range(1000):
img_list.append(open("./image_data/n01440764/" + file_list[i]).read()) img_list.append("./image_data/n01440764/" + file_list[i])
profile_flags = False profile_flags = False
if "FLAGS_profile_client" in os.environ and os.environ[ if "FLAGS_profile_client" in os.environ and os.environ[
"FLAGS_profile_client"]: "FLAGS_profile_client"]:
profile_flags = True profile_flags = True
if args.request == "rpc": if args.request == "rpc":
reader = ImageReader()
fetch = ["score"] fetch = ["score"]
client = Client() client = Client()
client.load_client_config(args.model) client.load_client_config(args.model)
client.connect([resource["endpoint"][idx % len(resource["endpoint"])]]) client.connect([resource["endpoint"][idx % len(resource["endpoint"])]])
start = time.time() start = time.time()
for i in range(1000): for i in range(turns):
if args.batch_size >= 1: if args.batch_size >= 1:
l_start = time.time()
feed_batch = [] feed_batch = []
i_start = time.time() i_start = time.time()
for bi in range(args.batch_size): for bi in range(args.batch_size):
...@@ -69,6 +74,9 @@ def single_func(idx, resource): ...@@ -69,6 +74,9 @@ def single_func(idx, resource):
int(round(i_end * 1000000)))) int(round(i_end * 1000000))))
result = client.predict(feed=feed_batch, fetch=fetch) result = client.predict(feed=feed_batch, fetch=fetch)
l_end = time.time()
if latency_flags:
latency_list.append(l_end * 1000 - l_start * 1000)
else: else:
print("unsupport batch size {}".format(args.batch_size)) print("unsupport batch size {}".format(args.batch_size))
...@@ -77,7 +85,7 @@ def single_func(idx, resource): ...@@ -77,7 +85,7 @@ def single_func(idx, resource):
server = "http://" + resource["endpoint"][idx % len(resource[ server = "http://" + resource["endpoint"][idx % len(resource[
"endpoint"])] + "/image/prediction" "endpoint"])] + "/image/prediction"
start = time.time() start = time.time()
for i in range(1000): for i in range(turns):
if py_version == 2: if py_version == 2:
image = base64.b64encode( image = base64.b64encode(
open("./image_data/n01440764/" + file_list[i]).read()) open("./image_data/n01440764/" + file_list[i]).read())
...@@ -88,18 +96,31 @@ def single_func(idx, resource): ...@@ -88,18 +96,31 @@ def single_func(idx, resource):
r = requests.post( r = requests.post(
server, data=req, headers={"Content-Type": "application/json"}) server, data=req, headers={"Content-Type": "application/json"})
end = time.time() end = time.time()
if latency_flags:
return [[end - start], latency_list]
return [[end - start]] return [[end - start]]
if __name__ == '__main__': if __name__ == '__main__':
multi_thread_runner = MultiThreadRunner() multi_thread_runner = MultiThreadRunner()
endpoint_list = ["127.0.0.1:9393"] endpoint_list = [
#endpoint_list = endpoint_list + endpoint_list + endpoint_list "127.0.0.1:9292", "127.0.0.1:9293", "127.0.0.1:9294", "127.0.0.1:9295"
result = multi_thread_runner.run(single_func, args.thread, ]
{"endpoint": endpoint_list}) turns = 100
start = time.time()
result = multi_thread_runner.run(
single_func, args.thread, {"endpoint": endpoint_list,
"turns": turns})
#result = single_func(0, {"endpoint": endpoint_list}) #result = single_func(0, {"endpoint": endpoint_list})
end = time.time()
total_cost = end - start
avg_cost = 0 avg_cost = 0
for i in range(args.thread): for i in range(args.thread):
avg_cost += result[0][i] avg_cost += result[0][i]
avg_cost = avg_cost / args.thread avg_cost = avg_cost / args.thread
print("average total cost {} s.".format(avg_cost)) print("total cost: {}s".format(end - start))
print("each thread cost: {}s.".format(avg_cost))
print("qps: {}samples/s".format(args.batch_size * args.thread * turns /
total_cost))
if os.getenv("FLAGS_serving_latency"):
show_latency(result[1])
rm profile_log rm profile_log*
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
export FLAGS_profile_server=1 export FLAGS_profile_server=1
export FLAGS_profile_client=1 export FLAGS_profile_client=1
python -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog & python -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim --ir_optim 2> elog > stdlog &
sleep 5 sleep 5
gpu_id=0
#save cpu and gpu utilization log
if [ -d utilization ];then
rm -rf utilization
else
mkdir utilization
fi
#warm up #warm up
$PYTHONROOT/bin/python benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1 $PYTHONROOT/bin/python3 benchmark.py --thread 4 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
echo -e "import psutil\ncpu_utilization=psutil.cpu_percent(1,False)\nprint('CPU_UTILIZATION:', cpu_utilization)\n" > cpu_utilization.py
for thread_num in 4 8 16 for thread_num in 1 4 8 16
do do
for batch_size in 1 4 16 64 256 for batch_size in 1 4 16 64
do do
job_bt=`date '+%Y%m%d%H%M%S'`
nvidia-smi --id=0 --query-compute-apps=used_memory --format=csv -lms 100 > gpu_use.log 2>&1 &
nvidia-smi --id=0 --query-gpu=utilization.gpu --format=csv -lms 100 > gpu_utilization.log 2>&1 &
gpu_memory_pid=$!
$PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1 $PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
kill ${gpu_memory_pid}
kill `ps -ef|grep used_memory|awk '{print $2}'`
echo "model name :" $1 echo "model name :" $1
echo "thread num :" $thread_num echo "thread num :" $thread_num
echo "batch size :" $batch_size echo "batch size :" $batch_size
echo "=================Done====================" echo "=================Done===================="
echo "model name :$1" >> profile_log echo "model name :$1" >> profile_log
echo "batch size :$batch_size" >> profile_log echo "batch size :$batch_size" >> profile_log
job_et=`date '+%Y%m%d%H%M%S'`
awk 'BEGIN {max = 0} {if(NR>1){if ($1 > max) max=$1}} END {print "MAX_GPU_MEMORY:", max}' gpu_use.log >> profile_log_$1
awk 'BEGIN {max = 0} {if(NR>1){if ($1 > max) max=$1}} END {print "GPU_UTILIZATION:", max}' gpu_utilization.log >> profile_log_$1
rm -rf gpu_use.log gpu_utilization.log
$PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log $PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
tail -n 8 profile >> profile_log tail -n 8 profile >> profile_log
echo "" >> profile_log_$1
done done
done done
#Divided log
awk 'BEGIN{RS="\n\n"}{i++}{print > "ResNet_log_"i}' profile_log_$1
mkdir $1_log && mv ResNet_log_* $1_log
ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9 ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
...@@ -54,6 +54,7 @@ class ImageService(WebService): ...@@ -54,6 +54,7 @@ class ImageService(WebService):
score_list = fetch_map["score"] score_list = fetch_map["score"]
result = {"label": [], "prob": []} result = {"label": [], "prob": []}
for score in score_list: for score in score_list:
score = score.tolist()
max_score = max(score) max_score = max(score)
result["label"].append(self.label_dict[score.index(max_score)] result["label"].append(self.label_dict[score.index(max_score)]
.strip().replace(",", "")) .strip().replace(",", ""))
...@@ -65,7 +66,7 @@ image_service = ImageService(name="image") ...@@ -65,7 +66,7 @@ image_service = ImageService(name="image")
image_service.load_model_config(sys.argv[1]) image_service.load_model_config(sys.argv[1])
image_service.init_imagenet_setting() image_service.init_imagenet_setting()
if device == "gpu": if device == "gpu":
image_service.set_gpus("0,1") image_service.set_gpus("0")
image_service.prepare_server( image_service.prepare_server(
workdir="workdir", port=int(sys.argv[3]), device=device) workdir="workdir", port=int(sys.argv[3]), device=device)
image_service.run_rpc_service() image_service.run_rpc_service()
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_app.reader.image_reader import String2Image, Base64ToImage, Sequential
import base64
def test_String2Image():
with open("./daisy.jpg") as f:
img_str = f.read()
seq = Sequential([String2Image()])
img = seq(img_str)
assert (img.shape == (563, 500, 3))
def test_Base64ToImage():
with open("./daisy.jpg") as f:
img_str = f.read()
seq = Sequential([Base64ToImage()])
img = seq(base64.b64encode(img_str))
assert (img.shape == (563, 500, 3))
if __name__ == "__main__":
test_String2Image()
test_Base64ToImage()
...@@ -13,13 +13,14 @@ ...@@ -13,13 +13,14 @@
# limitations under the License. # limitations under the License.
# pylint: disable=doc-string-missing # pylint: disable=doc-string-missing
import os
import sys import sys
import time import time
import requests import requests
from paddle_serving_app.reader import IMDBDataset from paddle_serving_app.reader import IMDBDataset
from paddle_serving_client import Client from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args from paddle_serving_client.utils import MultiThreadRunner, benchmark_args, show_latency
args = benchmark_args() args = benchmark_args()
...@@ -31,6 +32,13 @@ def single_func(idx, resource): ...@@ -31,6 +32,13 @@ def single_func(idx, resource):
with open("./test_data/part-0") as fin: with open("./test_data/part-0") as fin:
for line in fin: for line in fin:
dataset.append(line.strip()) dataset.append(line.strip())
profile_flags = False
latency_flags = False
if os.getenv("FLAGS_profile_client"):
profile_flags = True
if os.getenv("FLAGS_serving_latency"):
latency_flags = True
latency_list = []
start = time.time() start = time.time()
if args.request == "rpc": if args.request == "rpc":
client = Client() client = Client()
...@@ -67,9 +75,26 @@ def single_func(idx, resource): ...@@ -67,9 +75,26 @@ def single_func(idx, resource):
return [[end - start]] return [[end - start]]
multi_thread_runner = MultiThreadRunner() if __name__ == '__main__':
result = multi_thread_runner.run(single_func, args.thread, {}) multi_thread_runner = MultiThreadRunner()
avg_cost = 0 endpoint_list = [
for cost in result[0]: "127.0.0.1:9292", "127.0.0.1:9293", "127.0.0.1:9294", "127.0.0.1:9295"
avg_cost += cost ]
print("total cost {} s of each thread".format(avg_cost / args.thread)) turns = 100
start = time.time()
result = multi_thread_runner.run(
single_func, args.thread, {"endpoint": endpoint_list,
"turns": turns})
end = time.time()
total_cost = end - start
avg_cost = 0
for i in range(args.thread):
avg_cost += result[0][i]
avg_cost = avg_cost / args.thread
print("total cost: {}".format(total_cost))
print("each thread cost: {}".format(avg_cost))
print("qps: {}samples/s".format(args.batch_size * args.thread * turns /
total_cost))
if os.getenv("FLAGS_serving_latency"):
show_latency(result[0])
rm profile_log rm profile_log*
for thread_num in 1 2 4 8 16 export FLAGS_profile_server=1
export FLAGS_profile_client=1
export FLAGS_serving_latency=1
$PYTHONROOT/bin/python3 -m paddle_serving_server.serve --model $1 --port 9292 --thread 4 --mem_optim --ir_optim 2> elog > stdlog &
hostname=`echo $(hostname)|awk -F '.baidu.com' '{print $1}'`
#save cpu and gpu utilization log
if [ -d utilization ];then
rm -rf utilization
else
mkdir utilization
fi
sleep 5
#warm up
$PYTHONROOT/bin/python3 benchmark.py --thread 4 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
echo -e "import psutil\ncpu_utilization=psutil.cpu_percent(1,False)\nprint('CPU_UTILIZATION:', cpu_utilization)\n" > cpu_utilization.py
for thread_num in 1 4 8 16
do do
for batch_size in 1 2 4 8 16 32 64 128 256 512 for batch_size in 1 4 16 64
do do
$PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model imdb_bow_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1 job_bt=`date '+%Y%m%d%H%M%S'`
echo "========================================" $PYTHONROOT/bin/python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
echo "batch size : $batch_size" >> profile_log echo "model_name:" $1
$PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log echo "thread_num:" $thread_num
tail -n 1 profile >> profile_log echo "batch_size:" $batch_size
echo "=================Done===================="
echo "model_name:$1" >> profile_log_$1
echo "batch_size:$batch_size" >> profile_log_$1
job_et=`date '+%Y%m%d%H%M%S'`
$PYTHONROOT/bin/python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
$PYTHONROOT/bin/python3 cpu_utilization.py >> profile_log_$1
tail -n 8 profile >> profile_log_$1
echo "" >> profile_log_$1
done done
done done
#Divided log
awk 'BEGIN{RS="\n\n"}{i++}{print > "imdb_log_"i}' profile_log_$1
mkdir $1_log && mv imdb_log_* $1_log
ps -ef|grep 'serving'|grep -v grep|cut -c 9-15 | xargs kill -9
...@@ -29,6 +29,6 @@ imdb_dataset.load_resource(sys.argv[2]) ...@@ -29,6 +29,6 @@ imdb_dataset.load_resource(sys.argv[2])
for line in sys.stdin: for line in sys.stdin:
word_ids, label = imdb_dataset.get_words_and_label(line) word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids} feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"] fetch = ["prediction"]
fetch_map = client.predict(feed=feed, fetch=fetch) fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][0], label[0])) print("{} {}".format(fetch_map["prediction"][0], label[0]))
...@@ -32,11 +32,7 @@ for i in range(3): ...@@ -32,11 +32,7 @@ for i in range(3):
line = 'i am very sad | 0' line = 'i am very sad | 0'
word_ids, label = imdb_dataset.get_words_and_label(line) word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids} feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"] fetch = ["prediction"]
fetch_maps = client.predict(feed=feed, fetch=fetch) fetch_maps = client.predict(feed=feed, fetch=fetch)
if len(fetch_maps) == 1: for model, fetch_map in fetch_maps.items():
print("step: {}, res: {}".format(i, fetch_maps['prediction'][0][1])) print("step: {}, model: {}, res: {}".format(i, model, fetch_map))
else:
for model, fetch_map in fetch_maps.items():
print("step: {}, model: {}, res: {}".format(i, model, fetch_map[
'prediction'][0][1]))
# OCR # OCR
(English|[简体中文](./README_CN.md))
## Get Model ## Get Model
``` ```
python -m paddle_serving_app.package --get_model ocr_rec python -m paddle_serving_app.package --get_model ocr_rec
tar -xzvf ocr_rec.tar.gz tar -xzvf ocr_rec.tar.gz
python -m paddle_serving_app.package --get_model ocr_det
tar -xzvf ocr_det.tar.gz
```
## Get Dataset (Optional)
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/ocr/test_imgs.tar
tar xf test_imgs.tar
``` ```
## RPC Service ## Web Service
### Start Service ### Start Service
``` ```
python -m paddle_serving_server.serve --model ocr_rec_model --port 9292 python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0
python ocr_web_server.py
``` ```
### Client Prediction ### Client Prediction
```
python ocr_web_client.py
```
If you want a faster web service, please try Web Debugger Service
## Web Debugger Service
```
python ocr_debugger_server.py
```
## Web Debugger Client Prediction
```
python ocr_web_client.py
```
## Benchmark
CPU: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz * 40
GPU: Nvidia Tesla V100 * 1
Dataset: RCTW 500 sample images
| engine | client read image(ms) | client-server tras time(ms) | server read image(ms) | det pre(ms) | det infer(ms) | det post(ms) | rec pre(ms) | rec infer(ms) | rec post(ms) | server-client trans time(ms) | server side time consumption(ms) | server side overhead(ms) | total time(ms) |
|------------------------------|----------------|----------------------------|------------------|--------------------|------------------|--------------------|--------------------|------------------|--------------------|--------------------------|--------------------|--------------|---------------|
| Serving web service | 8.69 | 13.41 | 109.97 | 2.82 | 87.76 | 4.29 | 3.98 | 78.51 | 3.66 | 4.12 | 181.02 | 136.49 | 317.51 |
| Serving Debugger web service | 8.73 | 16.42 | 115.27 | 2.93 | 20.63 | 3.97 | 4.48 | 13.84 | 3.60 | 6.91 | 49.45 | 147.33 | 196.78 |
## Appendix: Det or Rec only
if you are going to detect images not recognize it or directly recognize the words from images. We also provide Det and Rec server for you.
### Det Server
```
python det_web_server.py
#or
python det_debugger_server.py
```
### Det Client
```
# also use ocr_web_client.py
python ocr_web_client.py
```
### Rec Server
```
python rec_web_server.py
#or
python rec_debugger_server.py
```
### Rec Client
``` ```
python test_ocr_rec_client.py python rec_web_client.py
``` ```
# OCR 服务
([English](./README.md)|简体中文)
## 获取模型
```
python -m paddle_serving_app.package --get_model ocr_rec
tar -xzvf ocr_rec.tar.gz
python -m paddle_serving_app.package --get_model ocr_det
tar -xzvf ocr_det.tar.gz
```
## 获取数据集(可选)
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/ocr/test_imgs.tar
tar xf test_imgs.tar
```
### 客户端预测
```
python ocr_rpc_client.py
```
## Web Service服务
### 启动服务
```
python -m paddle_serving_server_gpu.serve --model ocr_det_model --port 9293 --gpu_id 0
python ocr_web_server.py
```
### 启动客户端
```
python ocr_web_client.py
```
如果用户需要更快的执行速度,请尝试Debugger版Web服务
## 启动Debugger版Web服务
```
python ocr_debugger_server.py
```
## 启动客户端
```
python ocr_web_client.py
```
## 性能指标
CPU: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz * 40
GPU: Nvidia Tesla V100单卡
数据集:RCTW 500张测试数据集
| engine | 客户端读图(ms) | 客户端发送请求到服务端(ms) | 服务端读图(ms) | 检测预处理耗时(ms) | 检测模型耗时(ms) | 检测后处理耗时(ms) | 识别预处理耗时(ms) | 识别模型耗时(ms) | 识别后处理耗时(ms) | 服务端回传客户端时间(ms) | 服务端整体耗时(ms) | 空跑耗时(ms) | 整体耗时(ms) |
|------------------------------|----------------|----------------------------|------------------|--------------------|------------------|--------------------|--------------------|------------------|--------------------|--------------------------|--------------------|--------------|---------------|
| Serving web service | 8.69 | 13.41 | 109.97 | 2.82 | 87.76 | 4.29 | 3.98 | 78.51 | 3.66 | 4.12 | 181.02 | 136.49 | 317.51 |
| Serving Debugger web service | 8.73 | 16.42 | 115.27 | 2.93 | 20.63 | 3.97 | 4.48 | 13.84 | 3.60 | 6.91 | 49.45 | 147.33 | 196.78 |
## 附录: 检测/识别单服务启动
如果您想单独启动检测或者识别服务,我们也提供了启动单服务的代码
### 启动检测服务
```
python det_web_server.py
#or
python det_debugger_server.py
```
### 检测服务客户端
```
# also use ocr_web_client.py
python ocr_web_client.py
```
### 启动识别服务
```
python rec_web_server.py
#or
python rec_debugger_server.py
```
### 识别服务客户端
```
python rec_web_client.py
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
import cv2
import sys
import numpy as np
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes
from paddle_serving_server_gpu.web_service import WebService
import time
import re
import base64
class OCRService(WebService):
def init_det(self):
self.det_preprocess = Sequential([
ResizeByFactor(32, 960), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
])
self.filter_func = FilterBoxes(10, 10)
self.post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
def preprocess(self, feed=[], fetch=[]):
data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
self.ori_h, self.ori_w, _ = im.shape
det_img = self.det_preprocess(im)
_, self.new_h, self.new_w = det_img.shape
return {"image": det_img[np.newaxis, :].copy()}, ["concat_1.tmp_0"]
def postprocess(self, feed={}, fetch=[], fetch_map=None):
det_out = fetch_map["concat_1.tmp_0"]
ratio_list = [
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
]
dt_boxes_list = self.post_func(det_out, [ratio_list])
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
return {"dt_boxes": dt_boxes.tolist()}
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config("ocr_det_model")
ocr_service.set_gpus("0")
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
ocr_service.init_det()
ocr_service.run_debugger_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
import cv2
import sys
import numpy as np
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes
from paddle_serving_server_gpu.web_service import WebService
import time
import re
import base64
class OCRService(WebService):
def init_det(self):
self.det_preprocess = Sequential([
ResizeByFactor(32, 960), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
])
self.filter_func = FilterBoxes(10, 10)
self.post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
def preprocess(self, feed=[], fetch=[]):
data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
self.ori_h, self.ori_w, _ = im.shape
det_img = self.det_preprocess(im)
_, self.new_h, self.new_w = det_img.shape
print(det_img)
return {"image": det_img}, ["concat_1.tmp_0"]
def postprocess(self, feed={}, fetch=[], fetch_map=None):
det_out = fetch_map["concat_1.tmp_0"]
ratio_list = [
float(self.new_h) / self.ori_h, float(self.new_w) / self.ori_w
]
dt_boxes_list = self.post_func(det_out, [ratio_list])
dt_boxes = self.filter_func(dt_boxes_list[0], [self.ori_h, self.ori_w])
return {"dt_boxes": dt_boxes.tolist()}
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config("ocr_det_model")
ocr_service.set_gpus("0")
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
ocr_service.init_det()
ocr_service.run_rpc_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader
import cv2
import sys
import numpy as np
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
from paddle_serving_server_gpu.web_service import WebService
from paddle_serving_app.local_predict import Debugger
import time
import re
import base64
class OCRService(WebService):
def init_det_debugger(self, det_model_config):
self.det_preprocess = Sequential([
ResizeByFactor(32, 960), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
])
self.det_client = Debugger()
self.det_client.load_model_config(
det_model_config, gpu=True, profile=False)
self.ocr_reader = OCRReader()
def preprocess(self, feed=[], fetch=[]):
data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
ori_h, ori_w, _ = im.shape
det_img = self.det_preprocess(im)
_, new_h, new_w = det_img.shape
det_img = det_img[np.newaxis, :]
det_img = det_img.copy()
det_out = self.det_client.predict(
feed={"image": det_img}, fetch=["concat_1.tmp_0"])
filter_func = FilterBoxes(10, 10)
post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
sorted_boxes = SortedBoxes()
ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list])
dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
dt_boxes = sorted_boxes(dt_boxes)
get_rotate_crop_image = GetRotateCropImage()
img_list = []
max_wh_ratio = 0
for i, dtbox in enumerate(dt_boxes):
boximg = get_rotate_crop_image(im, dt_boxes[i])
img_list.append(boximg)
h, w = boximg.shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
if len(img_list) == 0:
return [], []
_, w, h = self.ocr_reader.resize_norm_img(img_list[0],
max_wh_ratio).shape
imgs = np.zeros((len(img_list), 3, w, h)).astype('float32')
for id, img in enumerate(img_list):
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
imgs[id] = norm_img
feed = {"image": imgs.copy()}
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
res_lst = []
for res in rec_res:
res_lst.append(res[0])
res = {"res": res_lst}
return res
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config("ocr_rec_model")
ocr_service.prepare_server(workdir="workdir", port=9292)
ocr_service.init_det_debugger(det_model_config="ocr_det_model")
ocr_service.run_debugger_service(gpu=True)
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
import requests
import json
import cv2
import base64
import os, sys
import time
def cv2_to_base64(image):
#data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(image).decode(
'utf8') #data.tostring()).decode('utf8')
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:9292/ocr/prediction"
test_img_dir = "imgs/"
for img_file in os.listdir(test_img_dir):
with open(os.path.join(test_img_dir, img_file), 'rb') as file:
image_data1 = file.read()
image = cv2_to_base64(image_data1)
data = {"feed": [{"image": image}], "fetch": ["res"]}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader
import cv2
import sys
import numpy as np
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
from paddle_serving_server_gpu.web_service import WebService
import time
import re
import base64
class OCRService(WebService):
def init_det_client(self, det_port, det_client_config):
self.det_preprocess = Sequential([
ResizeByFactor(32, 960), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
])
self.det_client = Client()
self.det_client.load_client_config(det_client_config)
self.det_client.connect(["127.0.0.1:{}".format(det_port)])
self.ocr_reader = OCRReader()
def preprocess(self, feed=[], fetch=[]):
data = base64.b64decode(feed[0]["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
ori_h, ori_w, _ = im.shape
det_img = self.det_preprocess(im)
det_out = self.det_client.predict(
feed={"image": det_img}, fetch=["concat_1.tmp_0"])
_, new_h, new_w = det_img.shape
filter_func = FilterBoxes(10, 10)
post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
sorted_boxes = SortedBoxes()
ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
dt_boxes_list = post_func(det_out["concat_1.tmp_0"], [ratio_list])
dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
dt_boxes = sorted_boxes(dt_boxes)
get_rotate_crop_image = GetRotateCropImage()
feed_list = []
img_list = []
max_wh_ratio = 0
for i, dtbox in enumerate(dt_boxes):
boximg = get_rotate_crop_image(im, dt_boxes[i])
img_list.append(boximg)
h, w = boximg.shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for img in img_list:
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
feed = {"image": norm_img}
feed_list.append(feed)
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
return feed_list, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
res_lst = []
for res in rec_res:
res_lst.append(res[0])
res = {"res": res_lst}
return res
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config("ocr_rec_model")
ocr_service.set_gpus("0")
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
ocr_service.init_det_client(
det_port=9293,
det_client_config="ocr_det_client/serving_client_conf.prototxt")
ocr_service.run_rpc_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader
import cv2
import sys
import numpy as np
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
from paddle_serving_server_gpu.web_service import WebService
import time
import re
import base64
class OCRService(WebService):
def init_rec(self):
self.ocr_reader = OCRReader()
def preprocess(self, feed=[], fetch=[]):
img_list = []
for feed_data in feed:
data = base64.b64decode(feed_data["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
img_list.append(im)
max_wh_ratio = 0
for i, boximg in enumerate(img_list):
h, w = boximg.shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
_, w, h = self.ocr_reader.resize_norm_img(img_list[0],
max_wh_ratio).shape
imgs = np.zeros((len(img_list), 3, w, h)).astype('float32')
for i, img in enumerate(img_list):
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
imgs[i] = norm_img
feed = {"image": imgs.copy()}
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
res_lst = []
for res in rec_res:
res_lst.append(res[0])
res = {"res": res_lst}
return res
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config("ocr_rec_model")
ocr_service.set_gpus("0")
ocr_service.init_rec()
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
ocr_service.run_debugger_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# -*- coding: utf-8 -*-
import requests
import json
import cv2
import base64
import os, sys
import time
def cv2_to_base64(image):
#data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(image).decode(
'utf8') #data.tostring()).decode('utf8')
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:9292/ocr/prediction"
test_img_dir = "rec_img/"
for img_file in os.listdir(test_img_dir):
with open(os.path.join(test_img_dir, img_file), 'rb') as file:
image_data1 = file.read()
image = cv2_to_base64(image_data1)
#data = {"feed": [{"image": image}], "fetch": ["res"]}
data = {"feed": [{"image": image}] * 3, "fetch": ["res"]}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import OCRReader
import cv2
import sys
import numpy as np
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
from paddle_serving_server_gpu.web_service import WebService
import time
import re
import base64
class OCRService(WebService):
def init_rec(self):
self.ocr_reader = OCRReader()
def preprocess(self, feed=[], fetch=[]):
# TODO: to handle batch rec images
img_list = []
for feed_data in feed:
data = base64.b64decode(feed_data["image"].encode('utf8'))
data = np.fromstring(data, np.uint8)
im = cv2.imdecode(data, cv2.IMREAD_COLOR)
img_list.append(im)
feed_list = []
max_wh_ratio = 0
for i, boximg in enumerate(img_list):
h, w = boximg.shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for img in img_list:
norm_img = self.ocr_reader.resize_norm_img(img, max_wh_ratio)
feed = {"image": norm_img}
feed_list.append(feed)
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
return feed_list, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
rec_res = self.ocr_reader.postprocess(fetch_map, with_score=True)
res_lst = []
for res in rec_res:
res_lst.append(res[0])
res = {"res": res_lst}
return res
ocr_service = OCRService(name="ocr")
ocr_service.load_model_config("ocr_rec_model")
ocr_service.set_gpus("0")
ocr_service.init_rec()
ocr_service.prepare_server(workdir="workdir", port=9292, device="gpu", gpuid=0)
ocr_service.run_rpc_service()
ocr_service.run_web_service()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, ResizeByFactor
from paddle_serving_app.reader import Div, Normalize, Transpose
from paddle_serving_app.reader import DBPostProcess, FilterBoxes
client = Client()
client.load_client_config("ocr_det_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9494"])
read_image_file = File2Image()
preprocess = Sequential([
ResizeByFactor(32, 960), Div(255),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), Transpose(
(2, 0, 1))
])
post_func = DBPostProcess({
"thresh": 0.3,
"box_thresh": 0.5,
"max_candidates": 1000,
"unclip_ratio": 1.5,
"min_size": 3
})
filter_func = FilterBoxes(10, 10)
img = read_image_file(name)
ori_h, ori_w, _ = img.shape
img = preprocess(img)
new_h, new_w, _ = img.shape
ratio_list = [float(new_h) / ori_h, float(new_w) / ori_w]
outputs = client.predict(feed={"image": img}, fetch=["concat_1.tmp_0"])
dt_boxes_list = post_func(outputs["concat_1.tmp_0"], [ratio_list])
dt_boxes = filter_func(dt_boxes_list[0], [ori_h, ori_w])
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_server.pipeline import Analyst
import json
import logging
import sys
logging.basicConfig(level=logging.INFO)
if __name__ == "__main__":
if len(sys.argv) < 3:
print("Usage: python analyse.py <log_filename> <trace_filename>")
exit(1)
log_filename = sys.argv[1]
trace_filename = sys.argv[2]
analyst = Analyst(log_filename)
analyst.save_trace(trace_filename)
op_analyst = analyst.get_op_analyst()
op_concurrency = op_analyst.concurrency_analysis("analyse.yaml")
print(json.dumps(op_concurrency, indent=2, separators=(',', ':')))
port: 18080
worker_num: 1
build_dag_each_worker: false
dag:
is_thread_op: true
client_type: brpc
retry: 1
use_profile: false
wget --no-check-certificate https://fleet.bj.bcebos.com/text_classification_data.tar.gz
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz
tar -zxvf text_classification_data.tar.gz
tar -zxvf imdb_model.tar.gz
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client.pipeline import PipelineClient
import numpy as np
client = PipelineClient()
client.connect(['127.0.0.1:18080'])
words = 'i am very sad | 0'
futures = []
for i in range(100):
futures.append(
client.predict(
feed_dict={"words": words}, fetch=["prediction"], asyn=True))
for f in futures:
res = f.result()
if res["ecode"] != 0:
print("predict failed: {}".format(res))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_server.pipeline import Op, RequestOp, ResponseOp
from paddle_serving_server.pipeline import PipelineServer
from paddle_serving_server.pipeline.proto import pipeline_service_pb2
from paddle_serving_server.pipeline.channel import ChannelDataEcode
import numpy as np
import logging
from paddle_serving_app.reader import IMDBDataset
logging.basicConfig(level=logging.DEBUG)
_LOGGER = logging.getLogger()
class ImdbRequestOp(RequestOp):
def init_op(self):
self.imdb_dataset = IMDBDataset()
self.imdb_dataset.load_resource('imdb.vocab')
def unpack_request_package(self, request):
dictdata = {}
for idx, key in enumerate(request.key):
if key != "words":
continue
words = request.value[idx]
word_ids, _ = self.imdb_dataset.get_words_and_label(words)
dictdata[key] = np.array(word_ids)
return dictdata
class CombineOp(Op):
def preprocess(self, input_data):
combined_prediction = 0
for op_name, data in input_data.items():
_LOGGER.info("{}: {}".format(op_name, data["prediction"]))
combined_prediction += data["prediction"]
data = {"prediction": combined_prediction / 2}
return data
class ImdbResponseOp(ResponseOp):
# Here ImdbResponseOp is consistent with the default ResponseOp implementation
def pack_response_package(self, channeldata):
resp = pipeline_service_pb2.Response()
resp.ecode = channeldata.ecode
if resp.ecode == ChannelDataEcode.OK.value:
feed = channeldata.parse()
# ndarray to string
for name, var in feed.items():
resp.value.append(var.__repr__())
resp.key.append(name)
else:
resp.error_info = channeldata.error_info
return resp
read_op = ImdbRequestOp()
bow_op = Op(name="bow",
input_ops=[read_op],
server_endpoints=["127.0.0.1:9393"],
fetch_list=["prediction"],
client_config="imdb_bow_client_conf/serving_client_conf.prototxt",
concurrency=1,
timeout=-1,
retry=1)
cnn_op = Op(name="cnn",
input_ops=[read_op],
server_endpoints=["127.0.0.1:9292"],
fetch_list=["prediction"],
client_config="imdb_cnn_client_conf/serving_client_conf.prototxt",
concurrency=1,
timeout=-1,
retry=1)
combine_op = CombineOp(
name="combine",
input_ops=[bow_op, cnn_op],
concurrency=5,
timeout=-1,
retry=1)
# fetch output of bow_op
# response_op = ImdbResponseOp(input_ops=[bow_op])
# fetch output of combine_op
response_op = ImdbResponseOp(input_ops=[combine_op])
# use default ResponseOp implementation
# response_op = ResponseOp(input_ops=[combine_op])
server = PipelineServer()
server.set_response_op(response_op)
server.prepare_server('config.yml')
server.run_server()
...@@ -31,7 +31,7 @@ with open(profile_file) as f: ...@@ -31,7 +31,7 @@ with open(profile_file) as f:
if line[0] == "PROFILE": if line[0] == "PROFILE":
prase(line[2]) prase(line[2])
print("thread num :{}".format(thread_num)) print("thread_num: {}".format(thread_num))
for name in time_dict: for name in time_dict:
print("{} cost :{} s in each thread ".format(name, time_dict[name] / ( print("{} cost: {}s in each thread ".format(name, time_dict[name] / (
1000000.0 * float(thread_num)))) 1000000.0 * float(thread_num))))
...@@ -16,10 +16,16 @@ def prase(pid_str, time_str, counter): ...@@ -16,10 +16,16 @@ def prase(pid_str, time_str, counter):
if len(name_list) == 2: if len(name_list) == 2:
name = name_list[0] name = name_list[0]
else: else:
name = name_list[0] + "_" + name_list[1] name = "_".join(name_list[:-1])
name_list = name.split("#")
if len(name_list) > 1:
tid = name_list[-1]
name = "#".join(name_list[:-1])
else:
tid = 0
event_dict = {} event_dict = {}
event_dict["name"] = name event_dict["name"] = name
event_dict["tid"] = 0 event_dict["tid"] = tid
event_dict["pid"] = pid event_dict["pid"] = pid
event_dict["ts"] = ts event_dict["ts"] = ts
event_dict["ph"] = ph event_dict["ph"] = ph
...@@ -37,6 +43,8 @@ if __name__ == "__main__": ...@@ -37,6 +43,8 @@ if __name__ == "__main__":
for line in f.readlines(): for line in f.readlines():
line = line.strip().split("\t") line = line.strip().split("\t")
if line[0] == "PROFILE": if line[0] == "PROFILE":
if len(line) < 2:
continue
trace_list = prase(line[1], line[2], counter) trace_list = prase(line[1], line[2], counter)
counter += 1 counter += 1
for trace in trace_list: for trace in trace_list:
......
# Yolov4 Detection Service
([简体中文](README_CN.md)|English)
## Get Model
```
python -m paddle_serving_app.package --get_model yolov4
tar -xzvf yolov4.tar.gz
```
## Start RPC Service
```
python -m paddle_serving_server_gpu.serve --model yolov4_model --port 9393 --gpu_ids 0
```
## Prediction
```
python test_client.py 000000570688.jpg
```
After the prediction is completed, a json file to save the prediction result and a picture with the detection result box will be generated in the `./outpu folder.
# Yolov4 检测服务
(简体中文|[English](README.md))
## 获取模型
```
python -m paddle_serving_app.package --get_model yolov4
tar -xzvf yolov4.tar.gz
```
## 启动RPC服务
```
python -m paddle_serving_server_gpu.serve --model yolov4_model --port 9393 --gpu_ids 0
```
## 预测
```
python test_client.py 000000570688.jpg
```
预测完成会在`./output`文件夹下生成保存预测结果的json文件以及标出检测结果框的图片。
person
bicycle
car
motorcycle
airplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
couch
potted plant
bed
dining table
toilet
tv
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import numpy as np
from paddle_serving_client import Client
from paddle_serving_app.reader import *
import cv2
preprocess = Sequential([
File2Image(), BGR2RGB(), Resize(
(608, 608), interpolation=cv2.INTER_LINEAR), Div(255.0), Transpose(
(2, 0, 1))
])
postprocess = RCNNPostprocess("label_list.txt", "output", [608, 608])
client = Client()
client.load_client_config("yolov4_client/serving_client_conf.prototxt")
client.connect(['127.0.0.1:9393'])
im = preprocess(sys.argv[1])
print(im.shape)
fetch_map = client.predict(
feed={
"image": im,
"im_size": np.array(list(im.shape[1:])),
},
fetch=["save_infer_model/scale_0.tmp_0"])
fetch_map["image"] = sys.argv[1]
postprocess(fetch_map)
...@@ -70,9 +70,10 @@ class Debugger(object): ...@@ -70,9 +70,10 @@ class Debugger(object):
config.enable_use_gpu(100, 0) config.enable_use_gpu(100, 0)
if profile: if profile:
config.enable_profile() config.enable_profile()
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.set_cpu_math_library_num_threads(cpu_num) config.set_cpu_math_library_num_threads(cpu_num)
config.switch_ir_optim(False) config.switch_ir_optim(False)
config.switch_use_feed_fetch_ops(False)
self.predictor = create_paddle_predictor(config) self.predictor = create_paddle_predictor(config)
def predict(self, feed=None, fetch=None): def predict(self, feed=None, fetch=None):
...@@ -113,20 +114,30 @@ class Debugger(object): ...@@ -113,20 +114,30 @@ class Debugger(object):
"Fetch names should not be empty or out of saved fetch list.") "Fetch names should not be empty or out of saved fetch list.")
return {} return {}
inputs = [] input_names = self.predictor.get_input_names()
for name in self.feed_names_: for name in input_names:
if isinstance(feed[name], list): if isinstance(feed[name], list):
feed[name] = np.array(feed[name]).reshape(self.feed_shapes_[ feed[name] = np.array(feed[name]).reshape(self.feed_shapes_[
name]) name])
if self.feed_types_[name] == 0: if self.feed_types_[name] == 0:
feed[name] = feed[name].astype("int64") feed[name] = feed[name].astype("int64")
else: else:
feed[name] = feed[name].astype("float32") feed[name] = feed[name].astype("float32")
inputs.append(PaddleTensor(feed[name][np.newaxis, :])) input_tensor = self.predictor.get_input_tensor(name)
input_tensor.copy_from_cpu(feed[name])
outputs = self.predictor.run(inputs) output_tensors = []
output_names = self.predictor.get_output_names()
for output_name in output_names:
output_tensor = self.predictor.get_output_tensor(output_name)
output_tensors.append(output_tensor)
outputs = []
self.predictor.zero_copy_run()
for output_tensor in output_tensors:
output = output_tensor.copy_to_cpu()
outputs.append(output)
fetch_map = {} fetch_map = {}
for name in fetch: for i, name in enumerate(fetch):
fetch_map[name] = outputs[self.fetch_names_to_idx_[ fetch_map[name] = outputs[i]
name]].as_ndarray() if len(output_tensors[i].lod()) > 0:
fetch_map[name + ".lod"] = output_tensors[i].lod()[0]
return fetch_map return fetch_map
...@@ -24,13 +24,15 @@ class ServingModels(object): ...@@ -24,13 +24,15 @@ class ServingModels(object):
"SentimentAnalysis"] = ["senta_bilstm", "senta_bow", "senta_cnn"] "SentimentAnalysis"] = ["senta_bilstm", "senta_bow", "senta_cnn"]
self.model_dict["SemanticRepresentation"] = ["ernie"] self.model_dict["SemanticRepresentation"] = ["ernie"]
self.model_dict["ChineseWordSegmentation"] = ["lac"] self.model_dict["ChineseWordSegmentation"] = ["lac"]
self.model_dict["ObjectDetection"] = ["faster_rcnn"] self.model_dict[
"ObjectDetection"] = ["faster_rcnn", "yolov4", "blazeface"]
self.model_dict["ImageSegmentation"] = [ self.model_dict["ImageSegmentation"] = [
"unet", "deeplabv3", "deeplabv3+cityscapes" "unet", "deeplabv3", "deeplabv3+cityscapes"
] ]
self.model_dict["ImageClassification"] = [ self.model_dict["ImageClassification"] = [
"resnet_v2_50_imagenet", "mobilenet_v2_imagenet" "resnet_v2_50_imagenet", "mobilenet_v2_imagenet"
] ]
self.model_dict["TextDetection"] = ["ocr_det"]
self.model_dict["OCR"] = ["ocr_rec"] self.model_dict["OCR"] = ["ocr_rec"]
image_class_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageClassification/" image_class_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageClassification/"
...@@ -40,6 +42,7 @@ class ServingModels(object): ...@@ -40,6 +42,7 @@ class ServingModels(object):
senta_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/" senta_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/"
semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/" semantic_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/"
wordseg_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/" wordseg_url = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/"
ocr_det_url = "https://paddle-serving.bj.bcebos.com/ocr/"
self.url_dict = {} self.url_dict = {}
...@@ -55,6 +58,7 @@ class ServingModels(object): ...@@ -55,6 +58,7 @@ class ServingModels(object):
pack_url(self.model_dict, "ImageSegmentation", image_seg_url) pack_url(self.model_dict, "ImageSegmentation", image_seg_url)
pack_url(self.model_dict, "ImageClassification", image_class_url) pack_url(self.model_dict, "ImageClassification", image_class_url)
pack_url(self.model_dict, "OCR", ocr_url) pack_url(self.model_dict, "OCR", ocr_url)
pack_url(self.model_dict, "TextDetection", ocr_det_url)
def get_model_list(self): def get_model_list(self):
return self.model_dict return self.model_dict
......
...@@ -13,8 +13,9 @@ ...@@ -13,8 +13,9 @@
# limitations under the License. # limitations under the License.
from .chinese_bert_reader import ChineseBertReader from .chinese_bert_reader import ChineseBertReader
from .image_reader import ImageReader, File2Image, URL2Image, Sequential, Normalize from .image_reader import ImageReader, File2Image, URL2Image, Sequential, Normalize
from .image_reader import CenterCrop, Resize, Transpose, Div, RGB2BGR, BGR2RGB from .image_reader import CenterCrop, Resize, Transpose, Div, RGB2BGR, BGR2RGB, ResizeByFactor
from .image_reader import RCNNPostprocess, SegPostprocess, PadStride from .image_reader import RCNNPostprocess, SegPostprocess, PadStride
from .image_reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes
from .lac_reader import LACReader from .lac_reader import LACReader
from .senta_reader import SentaReader from .senta_reader import SentaReader
from .imdb_reader import IMDBDataset from .imdb_reader import IMDBDataset
......
...@@ -29,6 +29,7 @@ def normalize(img, mean, std, channel_first): ...@@ -29,6 +29,7 @@ def normalize(img, mean, std, channel_first):
else: else:
img_mean = np.array(mean).reshape((1, 1, 3)) img_mean = np.array(mean).reshape((1, 1, 3))
img_std = np.array(std).reshape((1, 1, 3)) img_std = np.array(std).reshape((1, 1, 3))
img = np.array(img).astype("float32")
img -= img_mean img -= img_mean
img /= img_std img /= img_std
return img return img
......
...@@ -11,6 +11,9 @@ ...@@ -11,6 +11,9 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2 import cv2
import os import os
import numpy as np import numpy as np
...@@ -18,6 +21,8 @@ import base64 ...@@ -18,6 +21,8 @@ import base64
import sys import sys
from . import functional as F from . import functional as F
from PIL import Image, ImageDraw from PIL import Image, ImageDraw
from shapely.geometry import Polygon
import pyclipper
import json import json
_cv2_interpolation_to_str = {cv2.INTER_LINEAR: "cv2.INTER_LINEAR", None: "None"} _cv2_interpolation_to_str = {cv2.INTER_LINEAR: "cv2.INTER_LINEAR", None: "None"}
...@@ -43,6 +48,196 @@ def generate_colormap(num_classes): ...@@ -43,6 +48,196 @@ def generate_colormap(num_classes):
return color_map return color_map
class DBPostProcess(object):
"""
The post process for Differentiable Binarization (DB).
"""
def __init__(self, params):
self.thresh = params['thresh']
self.box_thresh = params['box_thresh']
self.max_candidates = params['max_candidates']
self.unclip_ratio = params['unclip_ratio']
self.min_size = 3
def boxes_from_bitmap(self, pred, _bitmap, dest_width, dest_height):
'''
_bitmap: single map with shape (1, H, W),
whose values are binarized as {0, 1}
'''
bitmap = _bitmap
height, width = bitmap.shape
outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE)
if len(outs) == 3:
img, contours, _ = outs[0], outs[1], outs[2]
elif len(outs) == 2:
contours, _ = outs[0], outs[1]
num_contours = min(len(contours), self.max_candidates)
boxes = np.zeros((num_contours, 4, 2), dtype=np.int16)
scores = np.zeros((num_contours, ), dtype=np.float32)
for index in range(num_contours):
contour = contours[index]
points, sside = self.get_mini_boxes(contour)
if sside < self.min_size:
continue
points = np.array(points)
score = self.box_score_fast(pred, points.reshape(-1, 2))
if self.box_thresh > score:
continue
box = self.unclip(points).reshape(-1, 1, 2)
box, sside = self.get_mini_boxes(box)
if sside < self.min_size + 2:
continue
box = np.array(box)
if not isinstance(dest_width, int):
dest_width = dest_width.item()
dest_height = dest_height.item()
box[:, 0] = np.clip(
np.round(box[:, 0] / width * dest_width), 0, dest_width)
box[:, 1] = np.clip(
np.round(box[:, 1] / height * dest_height), 0, dest_height)
boxes[index, :, :] = box.astype(np.int16)
scores[index] = score
return boxes, scores
def unclip(self, box):
unclip_ratio = self.unclip_ratio
poly = Polygon(box)
distance = poly.area * unclip_ratio / poly.length
offset = pyclipper.PyclipperOffset()
offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
expanded = np.array(offset.Execute(distance))
return expanded
def get_mini_boxes(self, contour):
bounding_box = cv2.minAreaRect(contour)
points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
index_1, index_2, index_3, index_4 = 0, 1, 2, 3
if points[1][1] > points[0][1]:
index_1 = 0
index_4 = 1
else:
index_1 = 1
index_4 = 0
if points[3][1] > points[2][1]:
index_2 = 2
index_3 = 3
else:
index_2 = 3
index_3 = 2
box = [
points[index_1], points[index_2], points[index_3], points[index_4]
]
return box, min(bounding_box[1])
def box_score_fast(self, bitmap, _box):
h, w = bitmap.shape[:2]
box = _box.copy()
xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int), 0, w - 1)
xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int), 0, w - 1)
ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int), 0, h - 1)
ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int), 0, h - 1)
mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
box[:, 0] = box[:, 0] - xmin
box[:, 1] = box[:, 1] - ymin
cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0]
def __call__(self, pred, ratio_list):
pred = pred[:, 0, :, :]
segmentation = pred > self.thresh
boxes_batch = []
for batch_index in range(pred.shape[0]):
height, width = pred.shape[-2:]
tmp_boxes, tmp_scores = self.boxes_from_bitmap(
pred[batch_index], segmentation[batch_index], width, height)
boxes = []
for k in range(len(tmp_boxes)):
if tmp_scores[k] > self.box_thresh:
boxes.append(tmp_boxes[k])
if len(boxes) > 0:
boxes = np.array(boxes)
ratio_h, ratio_w = ratio_list[batch_index]
boxes[:, :, 0] = boxes[:, :, 0] / ratio_w
boxes[:, :, 1] = boxes[:, :, 1] / ratio_h
boxes_batch.append(boxes)
return boxes_batch
def __repr__(self):
return self.__class__.__name__ + \
" thresh: {1}, box_thresh: {2}, max_candidates: {3}, unclip_ratio: {4}, min_size: {5}".format(
self.thresh, self.box_thresh, self.max_candidates, self.unclip_ratio, self.min_size)
class FilterBoxes(object):
def __init__(self, width, height):
self.filter_width = width
self.filter_height = height
def order_points_clockwise(self, pts):
"""
reference from: https://github.com/jrosebr1/imutils/blob/master/imutils/perspective.py
# sort the points based on their x-coordinates
"""
xSorted = pts[np.argsort(pts[:, 0]), :]
# grab the left-most and right-most points from the sorted
# x-roodinate points
leftMost = xSorted[:2, :]
rightMost = xSorted[2:, :]
# now, sort the left-most coordinates according to their
# y-coordinates so we can grab the top-left and bottom-left
# points, respectively
leftMost = leftMost[np.argsort(leftMost[:, 1]), :]
(tl, bl) = leftMost
rightMost = rightMost[np.argsort(rightMost[:, 1]), :]
(tr, br) = rightMost
rect = np.array([tl, tr, br, bl], dtype="float32")
return rect
def clip_det_res(self, points, img_height, img_width):
for pno in range(4):
points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1))
points[pno, 1] = int(min(max(points[pno, 1], 0), img_height - 1))
return points
def __call__(self, dt_boxes, image_shape):
img_height, img_width = image_shape[0:2]
dt_boxes_new = []
for box in dt_boxes:
box = self.order_points_clockwise(box)
box = self.clip_det_res(box, img_height, img_width)
rect_width = int(np.linalg.norm(box[0] - box[1]))
rect_height = int(np.linalg.norm(box[0] - box[3]))
if rect_width <= self.filter_width or \
rect_height <= self.filter_height:
continue
dt_boxes_new.append(box)
dt_boxes = np.array(dt_boxes_new)
return dt_boxes
def __repr__(self):
return self.__class__.__name__ + " filter_width: {1}, filter_height: {2}".format(
self.filter_width, self.filter_height)
class SegPostprocess(object): class SegPostprocess(object):
def __init__(self, class_num): def __init__(self, class_num):
self.class_num = class_num self.class_num = class_num
...@@ -77,8 +272,7 @@ class SegPostprocess(object): ...@@ -77,8 +272,7 @@ class SegPostprocess(object):
result_png = score_png result_png = score_png
result_png = cv2.resize( result_png = cv2.resize(
result_png, result_png, (ori_shape[1], ori_shape[0]),
ori_shape[:2],
fx=0, fx=0,
fy=0, fy=0,
interpolation=cv2.INTER_CUBIC) interpolation=cv2.INTER_CUBIC)
...@@ -86,10 +280,11 @@ class SegPostprocess(object): ...@@ -86,10 +280,11 @@ class SegPostprocess(object):
class RCNNPostprocess(object): class RCNNPostprocess(object):
def __init__(self, label_file, output_dir): def __init__(self, label_file, output_dir, resize_shape=None):
self.output_dir = output_dir self.output_dir = output_dir
self.label_file = label_file self.label_file = label_file
self.label_list = [] self.label_list = []
self.resize_shape = resize_shape
with open(label_file) as fin: with open(label_file) as fin:
for line in fin: for line in fin:
self.label_list.append(line.strip()) self.label_list.append(line.strip())
...@@ -184,6 +379,13 @@ class RCNNPostprocess(object): ...@@ -184,6 +379,13 @@ class RCNNPostprocess(object):
xmax = xmin + w xmax = xmin + w
ymax = ymin + h ymax = ymin + h
img_w, img_h = image.size
if self.resize_shape is not None:
xmin = xmin * img_w / self.resize_shape[0]
xmax = xmax * img_w / self.resize_shape[0]
ymin = ymin * img_h / self.resize_shape[1]
ymax = ymax * img_h / self.resize_shape[1]
color = tuple(color_list[catid]) color = tuple(color_list[catid])
# draw bbox # draw bbox
...@@ -238,6 +440,30 @@ class RCNNPostprocess(object): ...@@ -238,6 +440,30 @@ class RCNNPostprocess(object):
self.label_file, self.output_dir) self.label_file, self.output_dir)
class BlazeFacePostprocess(RCNNPostprocess):
def clip_bbox(self, bbox, im_size=None):
h = 1. if im_size is None else im_size[0]
w = 1. if im_size is None else im_size[1]
xmin = max(min(bbox[0], w), 0.)
ymin = max(min(bbox[1], h), 0.)
xmax = max(min(bbox[2], w), 0.)
ymax = max(min(bbox[3], h), 0.)
return xmin, ymin, xmax, ymax
def _get_bbox_result(self, fetch_map, fetch_name, clsid2catid):
result = {}
is_bbox_normalized = True #for blaze face, set true here
output = fetch_map[fetch_name]
lod = [fetch_map[fetch_name + '.lod']]
lengths = self._offset_to_lengths(lod)
np_data = np.array(output)
result['bbox'] = (np_data, lengths)
result['im_id'] = np.array([[0]])
result["im_shape"] = np.array(fetch_map["im_shape"]).astype(np.int32)
bbox_results = self._bbox2out([result], clsid2catid, is_bbox_normalized)
return bbox_results
class Sequential(object): class Sequential(object):
""" """
Args: Args:
...@@ -291,6 +517,19 @@ class BGR2RGB(object): ...@@ -291,6 +517,19 @@ class BGR2RGB(object):
return self.__class__.__name__ + "()" return self.__class__.__name__ + "()"
class String2Image(object):
def __init__(self):
pass
def __call__(self, img_buffer):
data = np.fromstring(img_buffer, np.uint8)
img = cv2.imdecode(data, cv2.IMREAD_COLOR)
return img
def __repr__(self):
return self.__class__.__name__ + "()"
class File2Image(object): class File2Image(object):
def __init__(self): def __init__(self):
pass pass
...@@ -335,7 +574,9 @@ class Base64ToImage(object): ...@@ -335,7 +574,9 @@ class Base64ToImage(object):
pass pass
def __call__(self, img_base64): def __call__(self, img_base64):
img = base64.b64decode(img_base64) sample = base64.b64decode(img_base64)
data = np.fromstring(sample, np.uint8)
img = cv2.imdecode(data, cv2.IMREAD_COLOR)
return img return img
def __repr__(self): def __repr__(self):
...@@ -451,7 +692,7 @@ class Resize(object): ...@@ -451,7 +692,7 @@ class Resize(object):
Args: Args:
size (sequence or int): Desired output size. If size is a sequence like size (sequence or int): Desired output size. If size is a sequence like
(h, w), output size will be matched to this. If size is an int, (w, h), output size will be matched to this. If size is an int,
smaller edge of the image will be matched to this number. smaller edge of the image will be matched to this number.
i.e, if height > width, then image will be rescaled to i.e, if height > width, then image will be rescaled to
(size * height / width, size) (size * height / width, size)
...@@ -473,6 +714,57 @@ class Resize(object): ...@@ -473,6 +714,57 @@ class Resize(object):
_cv2_interpolation_to_str[self.interpolation]) _cv2_interpolation_to_str[self.interpolation])
class ResizeByFactor(object):
"""Resize the input numpy array Image to a size multiple of factor which is usually required by a network
Args:
factor (int): Resize factor. make width and height multiple factor of the value of factor. Default is 32
max_side_len (int): max size of width and height. if width or height is larger than max_side_len, just resize the width or the height. Default is 2400
"""
def __init__(self, factor=32, max_side_len=2400):
self.factor = factor
self.max_side_len = max_side_len
def __call__(self, img):
h, w, _ = img.shape
resize_w = w
resize_h = h
if max(resize_h, resize_w) > self.max_side_len:
if resize_h > resize_w:
ratio = float(self.max_side_len) / resize_h
else:
ratio = float(self.max_side_len) / resize_w
else:
ratio = 1.
resize_h = int(resize_h * ratio)
resize_w = int(resize_w * ratio)
if resize_h % self.factor == 0:
resize_h = resize_h
elif resize_h // self.factor <= 1:
resize_h = self.factor
else:
resize_h = (resize_h // 32 - 1) * 32
if resize_w % self.factor == 0:
resize_w = resize_w
elif resize_w // self.factor <= 1:
resize_w = self.factor
else:
resize_w = (resize_w // self.factor - 1) * self.factor
try:
if int(resize_w) <= 0 or int(resize_h) <= 0:
return None, (None, None)
im = cv2.resize(img, (int(resize_w), int(resize_h)))
except:
print(resize_w, resize_h)
sys.exit(0)
return im
def __repr__(self):
return self.__class__.__name__ + '(factor={0}, max_side_len={1})'.format(
self.factor, self.max_side_len)
class PadStride(object): class PadStride(object):
def __init__(self, stride): def __init__(self, stride):
self.coarsest_stride = stride self.coarsest_stride = stride
...@@ -505,6 +797,59 @@ class Transpose(object): ...@@ -505,6 +797,59 @@ class Transpose(object):
return format_string return format_string
class SortedBoxes(object):
"""
Sorted bounding boxes from Detection
"""
def __init__(self):
pass
def __call__(self, dt_boxes):
num_boxes = dt_boxes.shape[0]
sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i+1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
return _boxes
class GetRotateCropImage(object):
"""
Rotate and Crop image from OCR Det output
"""
def __init__(self):
pass
def __call__(self, img, points):
img_height, img_width = img.shape[0:2]
left = int(np.min(points[:, 0]))
right = int(np.max(points[:, 0]))
top = int(np.min(points[:, 1]))
bottom = int(np.max(points[:, 1]))
img_crop = img[top:bottom, left:right, :].copy()
points[:, 0] = points[:, 0] - left
points[:, 1] = points[:, 1] - top
img_crop_width = int(np.linalg.norm(points[0] - points[1]))
img_crop_height = int(np.linalg.norm(points[0] - points[3]))
pts_std = np.float32([[0, 0], [img_crop_width, 0], \
[img_crop_width, img_crop_height], [0, img_crop_height]])
M = cv2.getPerspectiveTransform(points, pts_std)
dst_img = cv2.warpPerspective(
img_crop,
M, (img_crop_width, img_crop_height),
borderMode=cv2.BORDER_REPLICATE)
dst_img_height, dst_img_width = dst_img.shape[0:2]
if dst_img_height * 1.0 / dst_img_width >= 1.5:
dst_img = np.rot90(dst_img)
return dst_img
class ImageReader(): class ImageReader():
def __init__(self, def __init__(self,
image_shape=[3, 224, 224], image_shape=[3, 224, 224],
......
...@@ -120,29 +120,21 @@ class CharacterOps(object): ...@@ -120,29 +120,21 @@ class CharacterOps(object):
class OCRReader(object): class OCRReader(object):
def __init__(self): def __init__(self,
args = self.parse_args() algorithm="CRNN",
image_shape = [int(v) for v in args.rec_image_shape.split(",")] image_shape=[3, 32, 320],
char_type="ch",
batch_num=1,
char_dict_path="./ppocr_keys_v1.txt"):
self.rec_image_shape = image_shape self.rec_image_shape = image_shape
self.character_type = args.rec_char_type self.character_type = char_type
self.rec_batch_num = args.rec_batch_num self.rec_batch_num = batch_num
char_ops_params = {} char_ops_params = {}
char_ops_params["character_type"] = args.rec_char_type char_ops_params["character_type"] = char_type
char_ops_params["character_dict_path"] = args.rec_char_dict_path char_ops_params["character_dict_path"] = char_dict_path
char_ops_params['loss_type'] = 'ctc' char_ops_params['loss_type'] = 'ctc'
self.char_ops = CharacterOps(char_ops_params) self.char_ops = CharacterOps(char_ops_params)
def parse_args(self):
parser = argparse.ArgumentParser()
parser.add_argument("--rec_algorithm", type=str, default='CRNN')
parser.add_argument("--rec_model_dir", type=str)
parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
parser.add_argument("--rec_char_type", type=str, default='ch')
parser.add_argument("--rec_batch_num", type=int, default=1)
parser.add_argument(
"--rec_char_dict_path", type=str, default="./ppocr_keys_v1.txt")
return parser.parse_args()
def resize_norm_img(self, img, max_wh_ratio): def resize_norm_img(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape imgC, imgH, imgW = self.rec_image_shape
if self.character_type == "ch": if self.character_type == "ch":
...@@ -154,15 +146,14 @@ class OCRReader(object): ...@@ -154,15 +146,14 @@ class OCRReader(object):
resized_w = imgW resized_w = imgW
else: else:
resized_w = int(math.ceil(imgH * ratio)) resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
seq = Sequential([ resized_image = resized_image.astype('float32')
Resize(imgH, resized_w), Transpose((2, 0, 1)), Div(255), resized_image = resized_image.transpose((2, 0, 1)) / 255
Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5], True) resized_image -= 0.5
]) resized_image /= 0.5
resized_image = seq(img)
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32) padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
padding_im[:, :, 0:resized_w] = resized_image
return padding_im return padding_im
def preprocess(self, img_list): def preprocess(self, img_list):
...@@ -182,22 +173,32 @@ class OCRReader(object): ...@@ -182,22 +173,32 @@ class OCRReader(object):
return norm_img_batch[0] return norm_img_batch[0]
def postprocess(self, outputs): def postprocess(self, outputs, with_score=False):
rec_res = [] rec_res = []
rec_idx_lod = outputs["ctc_greedy_decoder_0.tmp_0.lod"] rec_idx_lod = outputs["ctc_greedy_decoder_0.tmp_0.lod"]
predict_lod = outputs["softmax_0.tmp_0.lod"]
rec_idx_batch = outputs["ctc_greedy_decoder_0.tmp_0"] rec_idx_batch = outputs["ctc_greedy_decoder_0.tmp_0"]
if with_score:
predict_lod = outputs["softmax_0.tmp_0.lod"]
for rno in range(len(rec_idx_lod) - 1): for rno in range(len(rec_idx_lod) - 1):
beg = rec_idx_lod[rno] beg = rec_idx_lod[rno]
end = rec_idx_lod[rno + 1] end = rec_idx_lod[rno + 1]
rec_idx_tmp = rec_idx_batch[beg:end, 0] if isinstance(rec_idx_batch, list):
rec_idx_tmp = [x[0] for x in rec_idx_batch[beg:end]]
else: #nd array
rec_idx_tmp = rec_idx_batch[beg:end, 0]
preds_text = self.char_ops.decode(rec_idx_tmp) preds_text = self.char_ops.decode(rec_idx_tmp)
beg = predict_lod[rno] if with_score:
end = predict_lod[rno + 1] beg = predict_lod[rno]
probs = outputs["softmax_0.tmp_0"][beg:end, :] end = predict_lod[rno + 1]
ind = np.argmax(probs, axis=1) if isinstance(outputs["softmax_0.tmp_0"], list):
blank = probs.shape[1] outputs["softmax_0.tmp_0"] = np.array(outputs[
valid_ind = np.where(ind != (blank - 1))[0] "softmax_0.tmp_0"]).astype(np.float32)
score = np.mean(probs[valid_ind, ind[valid_ind]]) probs = outputs["softmax_0.tmp_0"][beg:end, :]
rec_res.append([preds_text, score]) ind = np.argmax(probs, axis=1)
blank = probs.shape[1]
valid_ind = np.where(ind != (blank - 1))[0]
score = np.mean(probs[valid_ind, ind[valid_ind]])
rec_res.append([preds_text, score])
else:
rec_res.append([preds_text])
return rec_res return rec_res
...@@ -12,4 +12,4 @@ ...@@ -12,4 +12,4 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" Paddle Serving App version string """ """ Paddle Serving App version string """
serving_app_version = "0.1.0" serving_app_version = "0.1.2"
...@@ -21,10 +21,18 @@ import google.protobuf.text_format ...@@ -21,10 +21,18 @@ import google.protobuf.text_format
import numpy as np import numpy as np
import time import time
import sys import sys
from .serving_client import PredictorRes
int_type = 0 import grpc
float_type = 1 from .proto import multi_lang_general_model_service_pb2
sys.path.append(
os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
from .proto import multi_lang_general_model_service_pb2_grpc
int64_type = 0
float32_type = 1
int32_type = 2
int_type = set([int64_type, int32_type])
float_type = set([float32_type])
class _NOPProfiler(object): class _NOPProfiler(object):
...@@ -125,6 +133,8 @@ class Client(object): ...@@ -125,6 +133,8 @@ class Client(object):
self.all_numpy_input = True self.all_numpy_input = True
self.has_numpy_input = False self.has_numpy_input = False
self.rpc_timeout_ms = 20000 self.rpc_timeout_ms = 20000
from .serving_client import PredictorRes
self.predictorres_constructor = PredictorRes
def load_client_config(self, path): def load_client_config(self, path):
from .serving_client import PredictorClient from .serving_client import PredictorClient
...@@ -272,7 +282,7 @@ class Client(object): ...@@ -272,7 +282,7 @@ class Client(object):
raise ValueError("Wrong feed name: {}.".format(key)) raise ValueError("Wrong feed name: {}.".format(key))
#if not isinstance(feed_i[key], np.ndarray): #if not isinstance(feed_i[key], np.ndarray):
self.shape_check(feed_i, key) self.shape_check(feed_i, key)
if self.feed_types_[key] == int_type: if self.feed_types_[key] in int_type:
if i == 0: if i == 0:
int_feed_names.append(key) int_feed_names.append(key)
if isinstance(feed_i[key], np.ndarray): if isinstance(feed_i[key], np.ndarray):
...@@ -285,7 +295,7 @@ class Client(object): ...@@ -285,7 +295,7 @@ class Client(object):
else: else:
int_slot.append(feed_i[key]) int_slot.append(feed_i[key])
self.all_numpy_input = False self.all_numpy_input = False
elif self.feed_types_[key] == float_type: elif self.feed_types_[key] in float_type:
if i == 0: if i == 0:
float_feed_names.append(key) float_feed_names.append(key)
if isinstance(feed_i[key], np.ndarray): if isinstance(feed_i[key], np.ndarray):
...@@ -304,7 +314,7 @@ class Client(object): ...@@ -304,7 +314,7 @@ class Client(object):
self.profile_.record('py_prepro_1') self.profile_.record('py_prepro_1')
self.profile_.record('py_client_infer_0') self.profile_.record('py_client_infer_0')
result_batch_handle = PredictorRes() result_batch_handle = self.predictorres_constructor()
if self.all_numpy_input: if self.all_numpy_input:
res = self.client_handle_.numpy_predict( res = self.client_handle_.numpy_predict(
float_slot_batch, float_feed_names, float_shape, int_slot_batch, float_slot_batch, float_feed_names, float_shape, int_slot_batch,
...@@ -332,7 +342,7 @@ class Client(object): ...@@ -332,7 +342,7 @@ class Client(object):
result_map = {} result_map = {}
# result map needs to be a numpy array # result map needs to be a numpy array
for i, name in enumerate(fetch_names): for i, name in enumerate(fetch_names):
if self.fetch_names_to_type_[name] == int_type: if self.fetch_names_to_type_[name] == int64_type:
# result_map[name] will be py::array(numpy array) # result_map[name] will be py::array(numpy array)
result_map[name] = result_batch_handle.get_int64_by_name( result_map[name] = result_batch_handle.get_int64_by_name(
mi, name) mi, name)
...@@ -341,7 +351,7 @@ class Client(object): ...@@ -341,7 +351,7 @@ class Client(object):
if name in self.lod_tensor_set: if name in self.lod_tensor_set:
result_map["{}.lod".format( result_map["{}.lod".format(
name)] = result_batch_handle.get_lod(mi, name) name)] = result_batch_handle.get_lod(mi, name)
elif self.fetch_names_to_type_[name] == float_type: elif self.fetch_names_to_type_[name] == float32_type:
result_map[name] = result_batch_handle.get_float_by_name( result_map[name] = result_batch_handle.get_float_by_name(
mi, name) mi, name)
shape = result_batch_handle.get_shape(mi, name) shape = result_batch_handle.get_shape(mi, name)
...@@ -349,6 +359,16 @@ class Client(object): ...@@ -349,6 +359,16 @@ class Client(object):
if name in self.lod_tensor_set: if name in self.lod_tensor_set:
result_map["{}.lod".format( result_map["{}.lod".format(
name)] = result_batch_handle.get_lod(mi, name) name)] = result_batch_handle.get_lod(mi, name)
elif self.fetch_names_to_type_[name] == int32_type:
# result_map[name] will be py::array(numpy array)
result_map[name] = result_batch_handle.get_int32_by_name(
mi, name)
shape = result_batch_handle.get_shape(mi, name)
result_map[name].shape = shape
if name in self.lod_tensor_set:
result_map["{}.lod".format(
name)] = result_batch_handle.get_lod(mi, name)
multi_result_map.append(result_map) multi_result_map.append(result_map)
ret = None ret = None
if len(model_engine_names) == 1: if len(model_engine_names) == 1:
...@@ -372,3 +392,266 @@ class Client(object): ...@@ -372,3 +392,266 @@ class Client(object):
def release(self): def release(self):
self.client_handle_.destroy_predictor() self.client_handle_.destroy_predictor()
self.client_handle_ = None self.client_handle_ = None
class MultiLangClient(object):
def __init__(self):
self.channel_ = None
self.stub_ = None
self.rpc_timeout_s_ = 2
self.profile_ = _Profiler()
def add_variant(self, tag, cluster, variant_weight):
# TODO
raise Exception("cannot support ABtest yet")
def set_rpc_timeout_ms(self, rpc_timeout):
if self.stub_ is None:
raise Exception("set timeout must be set after connect.")
if not isinstance(rpc_timeout, int):
# for bclient
raise ValueError("rpc_timeout must be int type.")
self.rpc_timeout_s_ = rpc_timeout / 1000.0
timeout_req = multi_lang_general_model_service_pb2.SetTimeoutRequest()
timeout_req.timeout_ms = rpc_timeout
resp = self.stub_.SetTimeout(timeout_req)
return resp.err_code == 0
def connect(self, endpoints):
# https://github.com/tensorflow/serving/issues/1382
options = [('grpc.max_receive_message_length', 512 * 1024 * 1024),
('grpc.max_send_message_length', 512 * 1024 * 1024),
('grpc.lb_policy_name', 'round_robin')]
# TODO: weight round robin
g_endpoint = 'ipv4:{}'.format(','.join(endpoints))
self.channel_ = grpc.insecure_channel(g_endpoint, options=options)
self.stub_ = multi_lang_general_model_service_pb2_grpc.MultiLangGeneralModelServiceStub(
self.channel_)
# get client model config
get_client_config_req = multi_lang_general_model_service_pb2.GetClientConfigRequest(
)
resp = self.stub_.GetClientConfig(get_client_config_req)
model_config_str = resp.client_config_str
self._parse_model_config(model_config_str)
def _flatten_list(self, nested_list):
for item in nested_list:
if isinstance(item, (list, tuple)):
for sub_item in self._flatten_list(item):
yield sub_item
else:
yield item
def _parse_model_config(self, model_config_str):
model_conf = m_config.GeneralModelConfig()
model_conf = google.protobuf.text_format.Merge(model_config_str,
model_conf)
self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
self.feed_types_ = {}
self.feed_shapes_ = {}
self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
self.fetch_types_ = {}
self.lod_tensor_set_ = set()
for i, var in enumerate(model_conf.feed_var):
self.feed_types_[var.alias_name] = var.feed_type
self.feed_shapes_[var.alias_name] = var.shape
if var.is_lod_tensor:
self.lod_tensor_set_.add(var.alias_name)
else:
counter = 1
for dim in self.feed_shapes_[var.alias_name]:
counter *= dim
for i, var in enumerate(model_conf.fetch_var):
self.fetch_types_[var.alias_name] = var.fetch_type
if var.is_lod_tensor:
self.lod_tensor_set_.add(var.alias_name)
def _pack_inference_request(self, feed, fetch, is_python):
req = multi_lang_general_model_service_pb2.InferenceRequest()
req.fetch_var_names.extend(fetch)
req.is_python = is_python
feed_batch = None
if isinstance(feed, dict):
feed_batch = [feed]
elif isinstance(feed, list):
feed_batch = feed
else:
raise Exception("{} not support".format(type(feed)))
req.feed_var_names.extend(feed_batch[0].keys())
init_feed_names = False
for feed_data in feed_batch:
inst = multi_lang_general_model_service_pb2.FeedInst()
for name in req.feed_var_names:
tensor = multi_lang_general_model_service_pb2.Tensor()
var = feed_data[name]
v_type = self.feed_types_[name]
if is_python:
data = None
if isinstance(var, list):
if v_type == 0: # int64
data = np.array(var, dtype="int64")
elif v_type == 1: # float32
data = np.array(var, dtype="float32")
elif v_type == 2: # int32
data = np.array(var, dtype="int32")
else:
raise Exception("error tensor value type.")
elif isinstance(var, np.ndarray):
data = var
if v_type == 0:
if data.dtype != 'int64':
data = data.astype("int64")
elif v_type == 1:
if data.dtype != 'float32':
data = data.astype("float32")
elif v_type == 2:
if data.dtype != 'int32':
data = data.astype("int32")
else:
raise Exception("error tensor value type.")
else:
raise Exception("var must be list or ndarray.")
tensor.data = data.tobytes()
else:
if isinstance(var, np.ndarray):
if v_type == 0: # int64
tensor.int64_data.extend(
var.reshape(-1).astype("int64").tolist())
elif v_type == 1:
tensor.float_data.extend(
var.reshape(-1).astype('float32').tolist())
elif v_type == 2:
tensor.int_data.extend(
var.reshape(-1).astype('int32').tolist())
else:
raise Exception("error tensor value type.")
elif isinstance(var, list):
if v_type == 0:
tensor.int64_data.extend(self._flatten_list(var))
elif v_type == 1:
tensor.float_data.extend(self._flatten_list(var))
elif v_type == 2:
tensor.int_data.extend(self._flatten_list(var))
else:
raise Exception("error tensor value type.")
else:
raise Exception("var must be list or ndarray.")
if isinstance(var, np.ndarray):
tensor.shape.extend(list(var.shape))
else:
tensor.shape.extend(self.feed_shapes_[name])
inst.tensor_array.append(tensor)
req.insts.append(inst)
return req
def _unpack_inference_response(self, resp, fetch, is_python,
need_variant_tag):
if resp.err_code != 0:
return None
tag = resp.tag
multi_result_map = {}
for model_result in resp.outputs:
inst = model_result.insts[0]
result_map = {}
for i, name in enumerate(fetch):
var = inst.tensor_array[i]
v_type = self.fetch_types_[name]
if is_python:
if v_type == 0: # int64
result_map[name] = np.frombuffer(
var.data, dtype="int64")
elif v_type == 1: # float32
result_map[name] = np.frombuffer(
var.data, dtype="float32")
else:
raise Exception("error type.")
else:
if v_type == 0: # int64
result_map[name] = np.array(
list(var.int64_data), dtype="int64")
elif v_type == 1: # float32
result_map[name] = np.array(
list(var.float_data), dtype="float32")
else:
raise Exception("error type.")
result_map[name].shape = list(var.shape)
if name in self.lod_tensor_set_:
result_map["{}.lod".format(name)] = np.array(list(var.lod))
multi_result_map[model_result.engine_name] = result_map
ret = None
if len(resp.outputs) == 1:
ret = list(multi_result_map.values())[0]
else:
ret = multi_result_map
ret["serving_status_code"] = 0
return ret if not need_variant_tag else [ret, tag]
def _done_callback_func(self, fetch, is_python, need_variant_tag):
def unpack_resp(resp):
return self._unpack_inference_response(resp, fetch, is_python,
need_variant_tag)
return unpack_resp
def get_feed_names(self):
return self.feed_names_
def predict(self,
feed,
fetch,
need_variant_tag=False,
asyn=False,
is_python=True):
if not asyn:
try:
self.profile_.record('py_prepro_0')
req = self._pack_inference_request(
feed, fetch, is_python=is_python)
self.profile_.record('py_prepro_1')
self.profile_.record('py_client_infer_0')
resp = self.stub_.Inference(req, timeout=self.rpc_timeout_s_)
self.profile_.record('py_client_infer_1')
self.profile_.record('py_postpro_0')
ret = self._unpack_inference_response(
resp,
fetch,
is_python=is_python,
need_variant_tag=need_variant_tag)
self.profile_.record('py_postpro_1')
self.profile_.print_profile()
return ret
except grpc.RpcError as e:
return {"serving_status_code": e.code()}
else:
req = self._pack_inference_request(feed, fetch, is_python=is_python)
call_future = self.stub_.Inference.future(
req, timeout=self.rpc_timeout_s_)
return MultiLangPredictFuture(
call_future,
self._done_callback_func(
fetch,
is_python=is_python,
need_variant_tag=need_variant_tag))
class MultiLangPredictFuture(object):
def __init__(self, call_future, callback_func):
self.call_future_ = call_future
self.callback_func_ = callback_func
def result(self):
try:
resp = self.call_future_.result()
except grpc.RpcError as e:
return {"serving_status_code": e.code()}
return self.callback_func_(resp)
def add_done_callback(self, fn):
def __fn__(call_future):
assert call_future == self.call_future_
fn(self)
self.call_future_.add_done_callback(__fn__)
...@@ -48,16 +48,18 @@ def save_model(server_model_folder, ...@@ -48,16 +48,18 @@ def save_model(server_model_folder,
config = model_conf.GeneralModelConfig() config = model_conf.GeneralModelConfig()
#int64 = 0; float32 = 1; int32 = 2;
for key in feed_var_dict: for key in feed_var_dict:
feed_var = model_conf.FeedVar() feed_var = model_conf.FeedVar()
feed_var.alias_name = key feed_var.alias_name = key
feed_var.name = feed_var_dict[key].name feed_var.name = feed_var_dict[key].name
feed_var.is_lod_tensor = feed_var_dict[key].lod_level >= 1 feed_var.is_lod_tensor = feed_var_dict[key].lod_level >= 1
if feed_var_dict[key].dtype == core.VarDesc.VarType.INT32 or \ if feed_var_dict[key].dtype == core.VarDesc.VarType.INT64:
feed_var_dict[key].dtype == core.VarDesc.VarType.INT64:
feed_var.feed_type = 0 feed_var.feed_type = 0
if feed_var_dict[key].dtype == core.VarDesc.VarType.FP32: if feed_var_dict[key].dtype == core.VarDesc.VarType.FP32:
feed_var.feed_type = 1 feed_var.feed_type = 1
if feed_var_dict[key].dtype == core.VarDesc.VarType.INT32:
feed_var.feed_type = 2
if feed_var.is_lod_tensor: if feed_var.is_lod_tensor:
feed_var.shape.extend([-1]) feed_var.shape.extend([-1])
else: else:
...@@ -73,13 +75,12 @@ def save_model(server_model_folder, ...@@ -73,13 +75,12 @@ def save_model(server_model_folder,
fetch_var.alias_name = key fetch_var.alias_name = key
fetch_var.name = fetch_var_dict[key].name fetch_var.name = fetch_var_dict[key].name
fetch_var.is_lod_tensor = fetch_var_dict[key].lod_level >= 1 fetch_var.is_lod_tensor = fetch_var_dict[key].lod_level >= 1
if fetch_var_dict[key].dtype == core.VarDesc.VarType.INT32 or \ if fetch_var_dict[key].dtype == core.VarDesc.VarType.INT64:
fetch_var_dict[key].dtype == core.VarDesc.VarType.INT64:
fetch_var.fetch_type = 0 fetch_var.fetch_type = 0
if fetch_var_dict[key].dtype == core.VarDesc.VarType.FP32: if fetch_var_dict[key].dtype == core.VarDesc.VarType.FP32:
fetch_var.fetch_type = 1 fetch_var.fetch_type = 1
if fetch_var_dict[key].dtype == core.VarDesc.VarType.INT32:
fetch_var.fetch_type = 2
if fetch_var.is_lod_tensor: if fetch_var.is_lod_tensor:
fetch_var.shape.extend([-1]) fetch_var.shape.extend([-1])
else: else:
......
...@@ -39,11 +39,11 @@ def benchmark_args(): ...@@ -39,11 +39,11 @@ def benchmark_args():
def show_latency(latency_list): def show_latency(latency_list):
latency_array = np.array(latency_list) latency_array = np.array(latency_list)
info = "latency:\n" info = "latency:\n"
info += "mean :{} ms\n".format(np.mean(latency_array)) info += "mean: {}ms\n".format(np.mean(latency_array))
info += "median :{} ms\n".format(np.median(latency_array)) info += "median: {}ms\n".format(np.median(latency_array))
info += "80 percent :{} ms\n".format(np.percentile(latency_array, 80)) info += "80 percent: {}ms\n".format(np.percentile(latency_array, 80))
info += "90 percent :{} ms\n".format(np.percentile(latency_array, 90)) info += "90 percent: {}ms\n".format(np.percentile(latency_array, 90))
info += "99 percent :{} ms\n".format(np.percentile(latency_array, 99)) info += "99 percent: {}ms\n".format(np.percentile(latency_array, 99))
sys.stderr.write(info) sys.stderr.write(info)
......
...@@ -12,6 +12,6 @@ ...@@ -12,6 +12,6 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" Paddle Serving Client version string """ """ Paddle Serving Client version string """
serving_client_version = "0.3.0" serving_client_version = "0.3.2"
serving_server_version = "0.3.0" serving_server_version = "0.3.2"
module_proto_version = "0.3.0" module_proto_version = "0.3.2"
...@@ -25,6 +25,17 @@ from contextlib import closing ...@@ -25,6 +25,17 @@ from contextlib import closing
import collections import collections
import fcntl import fcntl
import shutil
import numpy as np
import grpc
from .proto import multi_lang_general_model_service_pb2
import sys
sys.path.append(
os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
from .proto import multi_lang_general_model_service_pb2_grpc
from multiprocessing import Pool, Process
from concurrent import futures
class OpMaker(object): class OpMaker(object):
def __init__(self): def __init__(self):
...@@ -220,7 +231,8 @@ class Server(object): ...@@ -220,7 +231,8 @@ class Server(object):
infer_service.workflows.extend(["workflow1"]) infer_service.workflows.extend(["workflow1"])
self.infer_service_conf.services.extend([infer_service]) self.infer_service_conf.services.extend([infer_service])
def _prepare_resource(self, workdir): def _prepare_resource(self, workdir, cube_conf):
self.workdir = workdir
if self.resource_conf == None: if self.resource_conf == None:
with open("{}/{}".format(workdir, self.general_model_config_fn), with open("{}/{}".format(workdir, self.general_model_config_fn),
"w") as fout: "w") as fout:
...@@ -231,6 +243,11 @@ class Server(object): ...@@ -231,6 +243,11 @@ class Server(object):
if "dist_kv" in node.name: if "dist_kv" in node.name:
self.resource_conf.cube_config_path = workdir self.resource_conf.cube_config_path = workdir
self.resource_conf.cube_config_file = self.cube_config_fn self.resource_conf.cube_config_file = self.cube_config_fn
if cube_conf == None:
raise ValueError(
"Please set the path of cube.conf while use dist_kv op."
)
shutil.copy(cube_conf, workdir)
if "quant" in node.name: if "quant" in node.name:
self.resource_conf.cube_quant_bits = 8 self.resource_conf.cube_quant_bits = 8
self.resource_conf.model_toolkit_path = workdir self.resource_conf.model_toolkit_path = workdir
...@@ -318,10 +335,10 @@ class Server(object): ...@@ -318,10 +335,10 @@ class Server(object):
os.chdir(self.module_path) os.chdir(self.module_path)
need_download = False need_download = False
device_version = self.get_device_version() device_version = self.get_device_version()
floder_name = device_version + serving_server_version folder_name = device_version + serving_server_version
tar_name = floder_name + ".tar.gz" tar_name = folder_name + ".tar.gz"
bin_url = "https://paddle-serving.bj.bcebos.com/bin/" + tar_name bin_url = "https://paddle-serving.bj.bcebos.com/bin/" + tar_name
self.server_path = os.path.join(self.module_path, floder_name) self.server_path = os.path.join(self.module_path, folder_name)
#acquire lock #acquire lock
version_file = open("{}/version.py".format(self.module_path), "r") version_file = open("{}/version.py".format(self.module_path), "r")
...@@ -347,7 +364,7 @@ class Server(object): ...@@ -347,7 +364,7 @@ class Server(object):
os.remove(exe_path) os.remove(exe_path)
raise SystemExit( raise SystemExit(
'Decompressing failed, please check your permission of {} or disk space left.'. 'Decompressing failed, please check your permission of {} or disk space left.'.
foemat(self.module_path)) format(self.module_path))
finally: finally:
os.remove(tar_name) os.remove(tar_name)
#release lock #release lock
...@@ -355,7 +372,11 @@ class Server(object): ...@@ -355,7 +372,11 @@ class Server(object):
os.chdir(self.cur_path) os.chdir(self.cur_path)
self.bin_path = self.server_path + "/serving" self.bin_path = self.server_path + "/serving"
def prepare_server(self, workdir=None, port=9292, device="cpu"): def prepare_server(self,
workdir=None,
port=9292,
device="cpu",
cube_conf=None):
if workdir == None: if workdir == None:
workdir = "./tmp" workdir = "./tmp"
os.system("mkdir {}".format(workdir)) os.system("mkdir {}".format(workdir))
...@@ -364,11 +385,11 @@ class Server(object): ...@@ -364,11 +385,11 @@ class Server(object):
os.system("touch {}/fluid_time_file".format(workdir)) os.system("touch {}/fluid_time_file".format(workdir))
if not self.port_is_available(port): if not self.port_is_available(port):
raise SystemExit("Prot {} is already used".format(port)) raise SystemExit("Port {} is already used".format(port))
self._prepare_resource(workdir) self.set_port(port)
self._prepare_resource(workdir, cube_conf)
self._prepare_engine(self.model_config_paths, device) self._prepare_engine(self.model_config_paths, device)
self._prepare_infer_service(port) self._prepare_infer_service(port)
self.port = port
self.workdir = workdir self.workdir = workdir
infer_service_fn = "{}/{}".format(workdir, self.infer_service_fn) infer_service_fn = "{}/{}".format(workdir, self.infer_service_fn)
...@@ -428,3 +449,258 @@ class Server(object): ...@@ -428,3 +449,258 @@ class Server(object):
print("Going to Run Command") print("Going to Run Command")
print(command) print(command)
os.system(command) os.system(command)
class MultiLangServerServiceServicer(multi_lang_general_model_service_pb2_grpc.
MultiLangGeneralModelServiceServicer):
def __init__(self, model_config_path, is_multi_model, endpoints):
self.is_multi_model_ = is_multi_model
self.model_config_path_ = model_config_path
self.endpoints_ = endpoints
with open(self.model_config_path_) as f:
self.model_config_str_ = str(f.read())
self._parse_model_config(self.model_config_str_)
self._init_bclient(self.model_config_path_, self.endpoints_)
def _init_bclient(self, model_config_path, endpoints, timeout_ms=None):
from paddle_serving_client import Client
self.bclient_ = Client()
if timeout_ms is not None:
self.bclient_.set_rpc_timeout_ms(timeout_ms)
self.bclient_.load_client_config(model_config_path)
self.bclient_.connect(endpoints)
def _parse_model_config(self, model_config_str):
model_conf = m_config.GeneralModelConfig()
model_conf = google.protobuf.text_format.Merge(model_config_str,
model_conf)
self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
self.feed_types_ = {}
self.feed_shapes_ = {}
self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
self.fetch_types_ = {}
self.lod_tensor_set_ = set()
for i, var in enumerate(model_conf.feed_var):
self.feed_types_[var.alias_name] = var.feed_type
self.feed_shapes_[var.alias_name] = var.shape
if var.is_lod_tensor:
self.lod_tensor_set_.add(var.alias_name)
for i, var in enumerate(model_conf.fetch_var):
self.fetch_types_[var.alias_name] = var.fetch_type
if var.is_lod_tensor:
self.lod_tensor_set_.add(var.alias_name)
def _flatten_list(self, nested_list):
for item in nested_list:
if isinstance(item, (list, tuple)):
for sub_item in self._flatten_list(item):
yield sub_item
else:
yield item
def _unpack_inference_request(self, request):
feed_names = list(request.feed_var_names)
fetch_names = list(request.fetch_var_names)
is_python = request.is_python
feed_batch = []
for feed_inst in request.insts:
feed_dict = {}
for idx, name in enumerate(feed_names):
var = feed_inst.tensor_array[idx]
v_type = self.feed_types_[name]
data = None
if is_python:
if v_type == 0: # int64
data = np.frombuffer(var.data, dtype="int64")
elif v_type == 1: # float32
data = np.frombuffer(var.data, dtype="float32")
elif v_type == 2: # int32
data = np.frombuffer(var.data, dtype="int32")
else:
raise Exception("error type.")
else:
if v_type == 0: # int64
data = np.array(list(var.int64_data), dtype="int64")
elif v_type == 1: # float32
data = np.array(list(var.float_data), dtype="float32")
elif v_type == 2: # int32
data = np.array(list(var.int_data), dtype="int32")
else:
raise Exception("error type.")
data.shape = list(feed_inst.tensor_array[idx].shape)
feed_dict[name] = data
feed_batch.append(feed_dict)
return feed_batch, fetch_names, is_python
def _pack_inference_response(self, ret, fetch_names, is_python):
resp = multi_lang_general_model_service_pb2.InferenceResponse()
if ret is None:
resp.err_code = 1
return resp
results, tag = ret
resp.tag = tag
resp.err_code = 0
if not self.is_multi_model_:
results = {'general_infer_0': results}
for model_name, model_result in results.items():
model_output = multi_lang_general_model_service_pb2.ModelOutput()
inst = multi_lang_general_model_service_pb2.FetchInst()
for idx, name in enumerate(fetch_names):
tensor = multi_lang_general_model_service_pb2.Tensor()
v_type = self.fetch_types_[name]
if is_python:
tensor.data = model_result[name].tobytes()
else:
if v_type == 0: # int64
tensor.int64_data.extend(model_result[name].reshape(-1)
.tolist())
elif v_type == 1: # float32
tensor.float_data.extend(model_result[name].reshape(-1)
.tolist())
elif v_type == 2: # int32
tensor.int_data.extend(model_result[name].reshape(-1)
.tolist())
else:
raise Exception("error type.")
tensor.shape.extend(list(model_result[name].shape))
if name in self.lod_tensor_set_:
tensor.lod.extend(model_result["{}.lod".format(name)]
.tolist())
inst.tensor_array.append(tensor)
model_output.insts.append(inst)
model_output.engine_name = model_name
resp.outputs.append(model_output)
return resp
def SetTimeout(self, request, context):
# This porcess and Inference process cannot be operate at the same time.
# For performance reasons, do not add thread lock temporarily.
timeout_ms = request.timeout_ms
self._init_bclient(self.model_config_path_, self.endpoints_, timeout_ms)
resp = multi_lang_general_model_service_pb2.SimpleResponse()
resp.err_code = 0
return resp
def Inference(self, request, context):
feed_dict, fetch_names, is_python = self._unpack_inference_request(
request)
ret = self.bclient_.predict(
feed=feed_dict, fetch=fetch_names, need_variant_tag=True)
return self._pack_inference_response(ret, fetch_names, is_python)
def GetClientConfig(self, request, context):
resp = multi_lang_general_model_service_pb2.GetClientConfigResponse()
resp.client_config_str = self.model_config_str_
return resp
class MultiLangServer(object):
def __init__(self):
self.bserver_ = Server()
self.worker_num_ = 4
self.body_size_ = 64 * 1024 * 1024
self.concurrency_ = 100000
self.is_multi_model_ = False # for model ensemble
def set_max_concurrency(self, concurrency):
self.concurrency_ = concurrency
self.bserver_.set_max_concurrency(concurrency)
def set_num_threads(self, threads):
self.worker_num_ = threads
self.bserver_.set_num_threads(threads)
def set_max_body_size(self, body_size):
self.bserver_.set_max_body_size(body_size)
if body_size >= self.body_size_:
self.body_size_ = body_size
else:
print(
"max_body_size is less than default value, will use default value in service."
)
def set_port(self, port):
self.gport_ = port
def set_reload_interval(self, interval):
self.bserver_.set_reload_interval(interval)
def set_op_sequence(self, op_seq):
self.bserver_.set_op_sequence(op_seq)
def set_op_graph(self, op_graph):
self.bserver_.set_op_graph(op_graph)
def set_memory_optimize(self, flag=False):
self.bserver_.set_memory_optimize(flag)
def set_ir_optimize(self, flag=False):
self.bserver_.set_ir_optimize(flag)
def set_op_sequence(self, op_seq):
self.bserver_.set_op_sequence(op_seq)
def use_mkl(self, flag):
self.bserver_.use_mkl(flag)
def load_model_config(self, server_config_paths, client_config_path=None):
self.bserver_.load_model_config(server_config_paths)
if client_config_path is None:
if isinstance(server_config_paths, dict):
self.is_multi_model_ = True
client_config_path = '{}/serving_server_conf.prototxt'.format(
list(server_config_paths.items())[0][1])
else:
client_config_path = '{}/serving_server_conf.prototxt'.format(
server_config_paths)
self.bclient_config_path_ = client_config_path
def prepare_server(self,
workdir=None,
port=9292,
device="cpu",
cube_conf=None):
if not self._port_is_available(port):
raise SystemExit("Prot {} is already used".format(port))
default_port = 12000
self.port_list_ = []
for i in range(1000):
if default_port + i != port and self._port_is_available(default_port
+ i):
self.port_list_.append(default_port + i)
break
self.bserver_.prepare_server(
workdir=workdir,
port=self.port_list_[0],
device=device,
cube_conf=cube_conf)
self.set_port(port)
def _launch_brpc_service(self, bserver):
bserver.run_server()
def _port_is_available(self, port):
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
sock.settimeout(2)
result = sock.connect_ex(('0.0.0.0', port))
return result != 0
def run_server(self):
p_bserver = Process(
target=self._launch_brpc_service, args=(self.bserver_, ))
p_bserver.start()
options = [('grpc.max_send_message_length', self.body_size_),
('grpc.max_receive_message_length', self.body_size_)]
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=self.worker_num_),
options=options,
maximum_concurrent_rpcs=self.concurrency_)
multi_lang_general_model_service_pb2_grpc.add_MultiLangGeneralModelServiceServicer_to_server(
MultiLangServerServiceServicer(
self.bclient_config_path_, self.is_multi_model_,
["0.0.0.0:{}".format(self.port_list_[0])]), server)
server.add_insecure_port('[::]:{}'.format(self.gport_))
server.start()
p_bserver.join()
server.wait_for_termination()
...@@ -40,7 +40,7 @@ def parse_args(): # pylint: disable=doc-string-missing ...@@ -40,7 +40,7 @@ def parse_args(): # pylint: disable=doc-string-missing
parser.add_argument( parser.add_argument(
"--device", type=str, default="cpu", help="Type of device") "--device", type=str, default="cpu", help="Type of device")
parser.add_argument( parser.add_argument(
"--mem_optim", "--mem_optim_off",
default=False, default=False,
action="store_true", action="store_true",
help="Memory optimize") help="Memory optimize")
...@@ -53,6 +53,11 @@ def parse_args(): # pylint: disable=doc-string-missing ...@@ -53,6 +53,11 @@ def parse_args(): # pylint: disable=doc-string-missing
type=int, type=int,
default=512 * 1024 * 1024, default=512 * 1024 * 1024,
help="Limit sizes of messages") help="Limit sizes of messages")
parser.add_argument(
"--use_multilang",
default=False,
action="store_true",
help="Use Multi-language-service")
return parser.parse_args() return parser.parse_args()
...@@ -63,10 +68,11 @@ def start_standard_model(): # pylint: disable=doc-string-missing ...@@ -63,10 +68,11 @@ def start_standard_model(): # pylint: disable=doc-string-missing
port = args.port port = args.port
workdir = args.workdir workdir = args.workdir
device = args.device device = args.device
mem_optim = args.mem_optim mem_optim = args.mem_optim_off is False
ir_optim = args.ir_optim ir_optim = args.ir_optim
max_body_size = args.max_body_size max_body_size = args.max_body_size
use_mkl = args.use_mkl use_mkl = args.use_mkl
use_multilang = args.use_multilang
if model == "": if model == "":
print("You must specify your serving model") print("You must specify your serving model")
...@@ -83,7 +89,11 @@ def start_standard_model(): # pylint: disable=doc-string-missing ...@@ -83,7 +89,11 @@ def start_standard_model(): # pylint: disable=doc-string-missing
op_seq_maker.add_op(general_infer_op) op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(general_response_op) op_seq_maker.add_op(general_response_op)
server = serving.Server() server = None
if use_multilang:
server = serving.MultiLangServer()
else:
server = serving.Server()
server.set_op_sequence(op_seq_maker.get_op_sequence()) server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(thread_num) server.set_num_threads(thread_num)
server.set_memory_optimize(mem_optim) server.set_memory_optimize(mem_optim)
......
...@@ -12,6 +12,6 @@ ...@@ -12,6 +12,6 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" Paddle Serving Client version string """ """ Paddle Serving Client version string """
serving_client_version = "0.3.0" serving_client_version = "0.3.2"
serving_server_version = "0.3.0" serving_server_version = "0.3.2"
module_proto_version = "0.3.0" module_proto_version = "0.3.2"
...@@ -41,6 +41,8 @@ class WebService(object): ...@@ -41,6 +41,8 @@ class WebService(object):
server = Server() server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence()) server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(16) server.set_num_threads(16)
server.set_memory_optimize(self.mem_optim)
server.set_ir_optimize(self.ir_optim)
server.load_model_config(self.model_config) server.load_model_config(self.model_config)
server.prepare_server( server.prepare_server(
workdir=self.workdir, port=self.port_list[0], device=self.device) workdir=self.workdir, port=self.port_list[0], device=self.device)
...@@ -55,12 +57,19 @@ class WebService(object): ...@@ -55,12 +57,19 @@ class WebService(object):
else: else:
return False return False
def prepare_server(self, workdir="", port=9393, device="cpu"): def prepare_server(self,
workdir="",
port=9393,
device="cpu",
mem_optim=True,
ir_optim=False):
self.workdir = workdir self.workdir = workdir
self.port = port self.port = port
self.device = device self.device = device
default_port = 12000 default_port = 12000
self.port_list = [] self.port_list = []
self.mem_optim = mem_optim
self.ir_optim = ir_optim
for i in range(1000): for i in range(1000):
if self.port_is_available(default_port + i): if self.port_is_available(default_port + i):
self.port_list.append(default_port + i) self.port_list.append(default_port + i)
...@@ -83,13 +92,11 @@ class WebService(object): ...@@ -83,13 +92,11 @@ class WebService(object):
if isinstance(feed, dict) and "fetch" in feed: if isinstance(feed, dict) and "fetch" in feed:
del feed["fetch"] del feed["fetch"]
fetch_map = self.client.predict(feed=feed, fetch=fetch) fetch_map = self.client.predict(feed=feed, fetch=fetch)
for key in fetch_map: result = self.postprocess(
fetch_map[key] = fetch_map[key].tolist()
fetch_map = self.postprocess(
feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map) feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map)
result = {"result": fetch_map} result = {"result": result}
except ValueError: except ValueError as err:
result = {"result": "Request Value Error"} result = {"result": err}
return result return result
def run_rpc_service(self): def run_rpc_service(self):
...@@ -128,4 +135,6 @@ class WebService(object): ...@@ -128,4 +135,6 @@ class WebService(object):
return feed, fetch return feed, fetch
def postprocess(self, feed=[], fetch=[], fetch_map=None): def postprocess(self, feed=[], fetch=[], fetch_map=None):
for key in fetch_map:
fetch_map[key] = fetch_map[key].tolist()
return fetch_map return fetch_map
...@@ -26,12 +26,22 @@ from contextlib import closing ...@@ -26,12 +26,22 @@ from contextlib import closing
import argparse import argparse
import collections import collections
import fcntl import fcntl
import shutil
import numpy as np
import grpc
from .proto import multi_lang_general_model_service_pb2
import sys
sys.path.append(
os.path.join(os.path.abspath(os.path.dirname(__file__)), 'proto'))
from .proto import multi_lang_general_model_service_pb2_grpc
from multiprocessing import Pool, Process
from concurrent import futures
def serve_args(): def serve_args():
parser = argparse.ArgumentParser("serve") parser = argparse.ArgumentParser("serve")
parser.add_argument( parser.add_argument(
"--thread", type=int, default=10, help="Concurrency of server") "--thread", type=int, default=2, help="Concurrency of server")
parser.add_argument( parser.add_argument(
"--model", type=str, default="", help="Model for serving") "--model", type=str, default="", help="Model for serving")
parser.add_argument( parser.add_argument(
...@@ -47,7 +57,7 @@ def serve_args(): ...@@ -47,7 +57,7 @@ def serve_args():
parser.add_argument( parser.add_argument(
"--name", type=str, default="None", help="Default service name") "--name", type=str, default="None", help="Default service name")
parser.add_argument( parser.add_argument(
"--mem_optim", "--mem_optim_off",
default=False, default=False,
action="store_true", action="store_true",
help="Memory optimize") help="Memory optimize")
...@@ -58,6 +68,11 @@ def serve_args(): ...@@ -58,6 +68,11 @@ def serve_args():
type=int, type=int,
default=512 * 1024 * 1024, default=512 * 1024 * 1024,
help="Limit sizes of messages") help="Limit sizes of messages")
parser.add_argument(
"--use_multilang",
default=False,
action="store_true",
help="Use Multi-language-service")
return parser.parse_args() return parser.parse_args()
...@@ -172,7 +187,7 @@ class Server(object): ...@@ -172,7 +187,7 @@ class Server(object):
self.cube_config_fn = "cube.conf" self.cube_config_fn = "cube.conf"
self.workdir = "" self.workdir = ""
self.max_concurrency = 0 self.max_concurrency = 0
self.num_threads = 4 self.num_threads = 2
self.port = 8080 self.port = 8080
self.reload_interval_s = 10 self.reload_interval_s = 10
self.max_body_size = 64 * 1024 * 1024 self.max_body_size = 64 * 1024 * 1024
...@@ -220,15 +235,11 @@ class Server(object): ...@@ -220,15 +235,11 @@ class Server(object):
self.bin_path = os.environ["SERVING_BIN"] self.bin_path = os.environ["SERVING_BIN"]
def check_cuda(self): def check_cuda(self):
cuda_flag = False if os.system("ls /dev/ | grep nvidia > /dev/null") == 0:
r = os.popen("ldd {} | grep cudart".format(self.bin_path)) pass
r = r.read().split("=") else:
if len(r) >= 2 and "cudart" in r[1] and os.system(
"ls /dev/ | grep nvidia > /dev/null") == 0:
cuda_flag = True
if not cuda_flag:
raise SystemExit( raise SystemExit(
"CUDA not found, please check your environment or use cpu version by \"pip install paddle_serving_server\"" "GPU not found, please check your environment or use cpu version by \"pip install paddle_serving_server\""
) )
def set_gpuid(self, gpuid=0): def set_gpuid(self, gpuid=0):
...@@ -270,7 +281,7 @@ class Server(object): ...@@ -270,7 +281,7 @@ class Server(object):
infer_service.workflows.extend(["workflow1"]) infer_service.workflows.extend(["workflow1"])
self.infer_service_conf.services.extend([infer_service]) self.infer_service_conf.services.extend([infer_service])
def _prepare_resource(self, workdir): def _prepare_resource(self, workdir, cube_conf):
self.workdir = workdir self.workdir = workdir
if self.resource_conf == None: if self.resource_conf == None:
with open("{}/{}".format(workdir, self.general_model_config_fn), with open("{}/{}".format(workdir, self.general_model_config_fn),
...@@ -282,6 +293,11 @@ class Server(object): ...@@ -282,6 +293,11 @@ class Server(object):
if "dist_kv" in node.name: if "dist_kv" in node.name:
self.resource_conf.cube_config_path = workdir self.resource_conf.cube_config_path = workdir
self.resource_conf.cube_config_file = self.cube_config_fn self.resource_conf.cube_config_file = self.cube_config_fn
if cube_conf == None:
raise ValueError(
"Please set the path of cube.conf while use dist_kv op."
)
shutil.copy(cube_conf, workdir)
self.resource_conf.model_toolkit_path = workdir self.resource_conf.model_toolkit_path = workdir
self.resource_conf.model_toolkit_file = self.model_toolkit_fn self.resource_conf.model_toolkit_file = self.model_toolkit_fn
self.resource_conf.general_model_path = workdir self.resource_conf.general_model_path = workdir
...@@ -343,7 +359,15 @@ class Server(object): ...@@ -343,7 +359,15 @@ class Server(object):
def download_bin(self): def download_bin(self):
os.chdir(self.module_path) os.chdir(self.module_path)
need_download = False need_download = False
device_version = "serving-gpu-"
#acquire lock
version_file = open("{}/version.py".format(self.module_path), "r")
import re
for line in version_file.readlines():
if re.match("cuda_version", line):
cuda_version = line.split("\"")[1]
device_version = "serving-gpu-cuda" + cuda_version + "-"
folder_name = device_version + serving_server_version folder_name = device_version + serving_server_version
tar_name = folder_name + ".tar.gz" tar_name = folder_name + ".tar.gz"
bin_url = "https://paddle-serving.bj.bcebos.com/bin/" + tar_name bin_url = "https://paddle-serving.bj.bcebos.com/bin/" + tar_name
...@@ -352,8 +376,6 @@ class Server(object): ...@@ -352,8 +376,6 @@ class Server(object):
download_flag = "{}/{}.is_download".format(self.module_path, download_flag = "{}/{}.is_download".format(self.module_path,
folder_name) folder_name)
#acquire lock
version_file = open("{}/version.py".format(self.module_path), "r")
fcntl.flock(version_file, fcntl.LOCK_EX) fcntl.flock(version_file, fcntl.LOCK_EX)
if os.path.exists(download_flag): if os.path.exists(download_flag):
...@@ -365,6 +387,7 @@ class Server(object): ...@@ -365,6 +387,7 @@ class Server(object):
os.system("touch {}/{}.is_download".format(self.module_path, os.system("touch {}/{}.is_download".format(self.module_path,
folder_name)) folder_name))
print('Frist time run, downloading PaddleServing components ...') print('Frist time run, downloading PaddleServing components ...')
r = os.system('wget ' + bin_url + ' --no-check-certificate') r = os.system('wget ' + bin_url + ' --no-check-certificate')
if r != 0: if r != 0:
if os.path.exists(tar_name): if os.path.exists(tar_name):
...@@ -391,7 +414,11 @@ class Server(object): ...@@ -391,7 +414,11 @@ class Server(object):
os.chdir(self.cur_path) os.chdir(self.cur_path)
self.bin_path = self.server_path + "/serving" self.bin_path = self.server_path + "/serving"
def prepare_server(self, workdir=None, port=9292, device="cpu"): def prepare_server(self,
workdir=None,
port=9292,
device="cpu",
cube_conf=None):
if workdir == None: if workdir == None:
workdir = "./tmp" workdir = "./tmp"
os.system("mkdir {}".format(workdir)) os.system("mkdir {}".format(workdir))
...@@ -400,10 +427,10 @@ class Server(object): ...@@ -400,10 +427,10 @@ class Server(object):
os.system("touch {}/fluid_time_file".format(workdir)) os.system("touch {}/fluid_time_file".format(workdir))
if not self.port_is_available(port): if not self.port_is_available(port):
raise SystemExit("Prot {} is already used".format(port)) raise SystemExit("Port {} is already used".format(port))
self.set_port(port) self.set_port(port)
self._prepare_resource(workdir) self._prepare_resource(workdir, cube_conf)
self._prepare_engine(self.model_config_paths, device) self._prepare_engine(self.model_config_paths, device)
self._prepare_infer_service(port) self._prepare_infer_service(port)
self.workdir = workdir self.workdir = workdir
...@@ -472,3 +499,255 @@ class Server(object): ...@@ -472,3 +499,255 @@ class Server(object):
print(command) print(command)
os.system(command) os.system(command)
class MultiLangServerServiceServicer(multi_lang_general_model_service_pb2_grpc.
MultiLangGeneralModelServiceServicer):
def __init__(self, model_config_path, is_multi_model, endpoints):
self.is_multi_model_ = is_multi_model
self.model_config_path_ = model_config_path
self.endpoints_ = endpoints
with open(self.model_config_path_) as f:
self.model_config_str_ = str(f.read())
self._parse_model_config(self.model_config_str_)
self._init_bclient(self.model_config_path_, self.endpoints_)
def _init_bclient(self, model_config_path, endpoints, timeout_ms=None):
from paddle_serving_client import Client
self.bclient_ = Client()
if timeout_ms is not None:
self.bclient_.set_rpc_timeout_ms(timeout_ms)
self.bclient_.load_client_config(model_config_path)
self.bclient_.connect(endpoints)
def _parse_model_config(self, model_config_str):
model_conf = m_config.GeneralModelConfig()
model_conf = google.protobuf.text_format.Merge(model_config_str,
model_conf)
self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
self.feed_types_ = {}
self.feed_shapes_ = {}
self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
self.fetch_types_ = {}
self.lod_tensor_set_ = set()
for i, var in enumerate(model_conf.feed_var):
self.feed_types_[var.alias_name] = var.feed_type
self.feed_shapes_[var.alias_name] = var.shape
if var.is_lod_tensor:
self.lod_tensor_set_.add(var.alias_name)
for i, var in enumerate(model_conf.fetch_var):
self.fetch_types_[var.alias_name] = var.fetch_type
if var.is_lod_tensor:
self.lod_tensor_set_.add(var.alias_name)
def _flatten_list(self, nested_list):
for item in nested_list:
if isinstance(item, (list, tuple)):
for sub_item in self._flatten_list(item):
yield sub_item
else:
yield item
def _unpack_inference_request(self, request):
feed_names = list(request.feed_var_names)
fetch_names = list(request.fetch_var_names)
is_python = request.is_python
feed_batch = []
for feed_inst in request.insts:
feed_dict = {}
for idx, name in enumerate(feed_names):
var = feed_inst.tensor_array[idx]
v_type = self.feed_types_[name]
data = None
if is_python:
if v_type == 0:
data = np.frombuffer(var.data, dtype="int64")
elif v_type == 1:
data = np.frombuffer(var.data, dtype="float32")
elif v_type == 2:
data = np.frombuffer(var.data, dtype="int32")
else:
raise Exception("error type.")
else:
if v_type == 0: # int64
data = np.array(list(var.int64_data), dtype="int64")
elif v_type == 1: # float32
data = np.array(list(var.float_data), dtype="float32")
elif v_type == 2:
data = np.array(list(var.int_data), dtype="int32")
else:
raise Exception("error type.")
data.shape = list(feed_inst.tensor_array[idx].shape)
feed_dict[name] = data
feed_batch.append(feed_dict)
return feed_batch, fetch_names, is_python
def _pack_inference_response(self, ret, fetch_names, is_python):
resp = multi_lang_general_model_service_pb2.InferenceResponse()
if ret is None:
resp.err_code = 1
return resp
results, tag = ret
resp.tag = tag
resp.err_code = 0
if not self.is_multi_model_:
results = {'general_infer_0': results}
for model_name, model_result in results.items():
model_output = multi_lang_general_model_service_pb2.ModelOutput()
inst = multi_lang_general_model_service_pb2.FetchInst()
for idx, name in enumerate(fetch_names):
tensor = multi_lang_general_model_service_pb2.Tensor()
v_type = self.fetch_types_[name]
if is_python:
tensor.data = model_result[name].tobytes()
else:
if v_type == 0: # int64
tensor.int64_data.extend(model_result[name].reshape(-1)
.tolist())
elif v_type == 1: # float32
tensor.float_data.extend(model_result[name].reshape(-1)
.tolist())
elif v_type == 2: # int32
tensor.int_data.extend(model_result[name].reshape(-1)
.tolist())
else:
raise Exception("error type.")
tensor.shape.extend(list(model_result[name].shape))
if name in self.lod_tensor_set_:
tensor.lod.extend(model_result["{}.lod".format(name)]
.tolist())
inst.tensor_array.append(tensor)
model_output.insts.append(inst)
model_output.engine_name = model_name
resp.outputs.append(model_output)
return resp
def SetTimeout(self, request, context):
# This porcess and Inference process cannot be operate at the same time.
# For performance reasons, do not add thread lock temporarily.
timeout_ms = request.timeout_ms
self._init_bclient(self.model_config_path_, self.endpoints_, timeout_ms)
resp = multi_lang_general_model_service_pb2.SimpleResponse()
resp.err_code = 0
return resp
def Inference(self, request, context):
feed_dict, fetch_names, is_python = self._unpack_inference_request(
request)
ret = self.bclient_.predict(
feed=feed_dict, fetch=fetch_names, need_variant_tag=True)
return self._pack_inference_response(ret, fetch_names, is_python)
def GetClientConfig(self, request, context):
resp = multi_lang_general_model_service_pb2.GetClientConfigResponse()
resp.client_config_str = self.model_config_str_
return resp
class MultiLangServer(object):
def __init__(self):
self.bserver_ = Server()
self.worker_num_ = 4
self.body_size_ = 64 * 1024 * 1024
self.concurrency_ = 100000
self.is_multi_model_ = False # for model ensemble
def set_max_concurrency(self, concurrency):
self.concurrency_ = concurrency
self.bserver_.set_max_concurrency(concurrency)
def set_num_threads(self, threads):
self.worker_num_ = threads
self.bserver_.set_num_threads(threads)
def set_max_body_size(self, body_size):
self.bserver_.set_max_body_size(body_size)
if body_size >= self.body_size_:
self.body_size_ = body_size
else:
print(
"max_body_size is less than default value, will use default value in service."
)
def set_port(self, port):
self.gport_ = port
def set_reload_interval(self, interval):
self.bserver_.set_reload_interval(interval)
def set_op_sequence(self, op_seq):
self.bserver_.set_op_sequence(op_seq)
def set_op_graph(self, op_graph):
self.bserver_.set_op_graph(op_graph)
def set_memory_optimize(self, flag=False):
self.bserver_.set_memory_optimize(flag)
def set_ir_optimize(self, flag=False):
self.bserver_.set_ir_optimize(flag)
def set_gpuid(self, gpuid=0):
self.bserver_.set_gpuid(gpuid)
def load_model_config(self, server_config_paths, client_config_path=None):
self.bserver_.load_model_config(server_config_paths)
if client_config_path is None:
if isinstance(server_config_paths, dict):
self.is_multi_model_ = True
client_config_path = '{}/serving_server_conf.prototxt'.format(
list(server_config_paths.items())[0][1])
else:
client_config_path = '{}/serving_server_conf.prototxt'.format(
server_config_paths)
self.bclient_config_path_ = client_config_path
def prepare_server(self,
workdir=None,
port=9292,
device="cpu",
cube_conf=None):
if not self._port_is_available(port):
raise SystemExit("Prot {} is already used".format(port))
default_port = 12000
self.port_list_ = []
for i in range(1000):
if default_port + i != port and self._port_is_available(default_port
+ i):
self.port_list_.append(default_port + i)
break
self.bserver_.prepare_server(
workdir=workdir,
port=self.port_list_[0],
device=device,
cube_conf=cube_conf)
self.set_port(port)
def _launch_brpc_service(self, bserver):
bserver.run_server()
def _port_is_available(self, port):
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
sock.settimeout(2)
result = sock.connect_ex(('0.0.0.0', port))
return result != 0
def run_server(self):
p_bserver = Process(
target=self._launch_brpc_service, args=(self.bserver_, ))
p_bserver.start()
options = [('grpc.max_send_message_length', self.body_size_),
('grpc.max_receive_message_length', self.body_size_)]
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=self.worker_num_),
options=options,
maximum_concurrent_rpcs=self.concurrency_)
multi_lang_general_model_service_pb2_grpc.add_MultiLangGeneralModelServiceServicer_to_server(
MultiLangServerServiceServicer(
self.bclient_config_path_, self.is_multi_model_,
["0.0.0.0:{}".format(self.port_list_[0])]), server)
server.add_insecure_port('[::]:{}'.format(self.gport_))
server.start()
p_bserver.join()
server.wait_for_termination()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import re
import os
new_str = ""
with open("paddle_serving_server_gpu/version.py", "r") as f:
for line in f.readlines():
if re.match("cuda_version", line):
line = re.sub(r"\d+", sys.argv[1], line)
new_str = new_str + line
with open("paddle_serving_server_gpu/version.py", "w") as f:
f.write(new_str)
...@@ -34,9 +34,10 @@ def start_gpu_card_model(index, gpuid, args): # pylint: disable=doc-string-miss ...@@ -34,9 +34,10 @@ def start_gpu_card_model(index, gpuid, args): # pylint: disable=doc-string-miss
port = args.port + index port = args.port + index
thread_num = args.thread thread_num = args.thread
model = args.model model = args.model
mem_optim = args.mem_optim mem_optim = args.mem_optim_off is False
ir_optim = args.ir_optim ir_optim = args.ir_optim
max_body_size = args.max_body_size max_body_size = args.max_body_size
use_multilang = args.use_multilang
workdir = "{}_{}".format(args.workdir, gpuid) workdir = "{}_{}".format(args.workdir, gpuid)
if model == "": if model == "":
...@@ -54,7 +55,10 @@ def start_gpu_card_model(index, gpuid, args): # pylint: disable=doc-string-miss ...@@ -54,7 +55,10 @@ def start_gpu_card_model(index, gpuid, args): # pylint: disable=doc-string-miss
op_seq_maker.add_op(general_infer_op) op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(general_response_op) op_seq_maker.add_op(general_response_op)
server = serving.Server() if use_multilang:
server = serving.MultiLangServer()
else:
server = serving.Server()
server.set_op_sequence(op_seq_maker.get_op_sequence()) server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(thread_num) server.set_num_threads(thread_num)
server.set_memory_optimize(mem_optim) server.set_memory_optimize(mem_optim)
......
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" Paddle Serving Client version string """ """ Paddle Serving Client version string """
serving_client_version = "0.3.0" serving_client_version = "0.3.2"
serving_server_version = "0.3.0" serving_server_version = "0.3.2"
module_proto_version = "0.3.0" module_proto_version = "0.3.2"
cuda_version = "9"
...@@ -41,7 +41,9 @@ class WebService(object): ...@@ -41,7 +41,9 @@ class WebService(object):
workdir="conf", workdir="conf",
port=9292, port=9292,
gpuid=0, gpuid=0,
thread_num=10): thread_num=2,
mem_optim=True,
ir_optim=False):
device = "gpu" device = "gpu"
if gpuid == -1: if gpuid == -1:
device = "cpu" device = "cpu"
...@@ -50,14 +52,16 @@ class WebService(object): ...@@ -50,14 +52,16 @@ class WebService(object):
general_infer_op = op_maker.create('general_infer') general_infer_op = op_maker.create('general_infer')
general_response_op = op_maker.create('general_response') general_response_op = op_maker.create('general_response')
op_seq_maker = serving.OpSeqMaker() op_seq_maker = OpSeqMaker()
op_seq_maker.add_op(read_op) op_seq_maker.add_op(read_op)
op_seq_maker.add_op(general_infer_op) op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(general_response_op) op_seq_maker.add_op(general_response_op)
server = serving.Server() server = Server()
server.set_op_sequence(op_seq_maker.get_op_sequence()) server.set_op_sequence(op_seq_maker.get_op_sequence())
server.set_num_threads(thread_num) server.set_num_threads(thread_num)
server.set_memory_optimize(mem_optim)
server.set_ir_optimize(ir_optim)
server.load_model_config(self.model_config) server.load_model_config(self.model_config)
if gpuid >= 0: if gpuid >= 0:
...@@ -77,7 +81,13 @@ class WebService(object): ...@@ -77,7 +81,13 @@ class WebService(object):
else: else:
return False return False
def prepare_server(self, workdir="", port=9393, device="gpu", gpuid=0): def prepare_server(self,
workdir="",
port=9393,
device="gpu",
gpuid=0,
mem_optim=True,
ir_optim=False):
self.workdir = workdir self.workdir = workdir
self.port = port self.port = port
self.device = device self.device = device
...@@ -94,7 +104,12 @@ class WebService(object): ...@@ -94,7 +104,12 @@ class WebService(object):
# init cpu service # init cpu service
self.rpc_service_list.append( self.rpc_service_list.append(
self.default_rpc_service( self.default_rpc_service(
self.workdir, self.port_list[0], -1, thread_num=10)) self.workdir,
self.port_list[0],
-1,
thread_num=2,
mem_optim=mem_optim,
ir_optim=ir_optim))
else: else:
for i, gpuid in enumerate(self.gpus): for i, gpuid in enumerate(self.gpus):
self.rpc_service_list.append( self.rpc_service_list.append(
...@@ -102,7 +117,9 @@ class WebService(object): ...@@ -102,7 +117,9 @@ class WebService(object):
"{}_{}".format(self.workdir, i), "{}_{}".format(self.workdir, i),
self.port_list[i], self.port_list[i],
gpuid, gpuid,
thread_num=10)) thread_num=2,
mem_optim=mem_optim,
ir_optim=ir_optim))
def _launch_web_service(self): def _launch_web_service(self):
gpu_num = len(self.gpus) gpu_num = len(self.gpus)
...@@ -127,14 +144,14 @@ class WebService(object): ...@@ -127,14 +144,14 @@ class WebService(object):
request.json["fetch"]) request.json["fetch"])
if isinstance(feed, dict) and "fetch" in feed: if isinstance(feed, dict) and "fetch" in feed:
del feed["fetch"] del feed["fetch"]
if len(feed) == 0:
raise ValueError("empty input")
fetch_map = self.client.predict(feed=feed, fetch=fetch) fetch_map = self.client.predict(feed=feed, fetch=fetch)
for key in fetch_map:
fetch_map[key] = fetch_map[key].tolist()
result = self.postprocess( result = self.postprocess(
feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map) feed=request.json["feed"], fetch=fetch, fetch_map=fetch_map)
result = {"result": result} result = {"result": result}
except ValueError: except ValueError as err:
result = {"result": "Request Value Error"} result = {"result": err}
return result return result
def run_rpc_service(self): def run_rpc_service(self):
...@@ -164,6 +181,33 @@ class WebService(object): ...@@ -164,6 +181,33 @@ class WebService(object):
self.app_instance = app_instance self.app_instance = app_instance
# TODO: maybe change another API name: maybe run_local_predictor?
def run_debugger_service(self, gpu=False):
import socket
localIP = socket.gethostbyname(socket.gethostname())
print("web service address:")
print("http://{}:{}/{}/prediction".format(localIP, self.port,
self.name))
app_instance = Flask(__name__)
@app_instance.before_first_request
def init():
self._launch_local_predictor(gpu)
service_name = "/" + self.name + "/prediction"
@app_instance.route(service_name, methods=["POST"])
def run():
return self.get_prediction(request)
self.app_instance = app_instance
def _launch_local_predictor(self, gpu):
from paddle_serving_app.local_predict import Debugger
self.client = Debugger()
self.client.load_model_config(
"{}".format(self.model_config), gpu=gpu, profile=False)
def run_web_service(self): def run_web_service(self):
self.app_instance.run(host="0.0.0.0", self.app_instance.run(host="0.0.0.0",
port=self.port, port=self.port,
...@@ -171,10 +215,12 @@ class WebService(object): ...@@ -171,10 +215,12 @@ class WebService(object):
processes=1) processes=1)
def get_app_instance(self): def get_app_instance(self):
return app_instance return self.app_instance
def preprocess(self, feed=[], fetch=[]): def preprocess(self, feed=[], fetch=[]):
return feed, fetch return feed, fetch
def postprocess(self, feed=[], fetch=[], fetch_map=None): def postprocess(self, feed=[], fetch=[], fetch_map=None):
for key in fetch_map.iterkeys():
fetch_map[key] = fetch_map[key].tolist()
return fetch_map return fetch_map
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from operator import Op, RequestOp, ResponseOp
from pipeline_server import PipelineServer
from pipeline_client import PipelineClient
from analyse import Analyst
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import json
import copy
import re
import logging
_LOGGER = logging.getLogger()
class Analyst(object):
def __init__(self, profile_file):
self._profile_file = profile_file
self._trace = None
self.ave_call = None
self.ave_prepack = None
self.ave_postpack = None
self.op_analyst = None
self.start_time = None
self.end_time = None
def _prase_line(self, pid_str, time_str, counter):
pid = pid_str.split(":")[1]
event_list = time_str.split(" ")
trace_list = []
for event in event_list:
name, ts = event.split(":")
name_list = name.split("_")
ph = "B" if (name_list[-1] == "0") else "E"
if len(name_list) == 2:
name = name_list[0]
else:
name = "_".join(name_list[:-1])
name_list = name.split("#")
if len(name_list) > 1:
tid = name_list[-1]
name = "#".join(name_list[:-1])
else:
tid = 0
event_dict = {}
event_dict["name"] = name
event_dict["tid"] = tid
event_dict["pid"] = pid
event_dict["ts"] = ts
event_dict["ph"] = ph
trace_list.append(event_dict)
return trace_list
def get_trace(self):
if self._trace is not None:
return self._trace
all_list = []
counter = 0
with open(self._profile_file) as f:
for line in f.readlines():
line = line.strip().split("\t")
if line[0] == "PROFILE":
trace_list = self._prase_line(line[1], line[2], counter)
counter += 1
for trace in trace_list:
all_list.append(trace)
self._trace = all_list
return self._trace
def save_trace(self, trace_file):
self.get_trace()
trace = json.dumps(self._trace, indent=2, separators=(',', ':'))
with open(trace_file, "w") as f:
f.write(trace)
def print_profile(self):
self.get_profile()
print("graph engine call: {}".format(self.ave_call))
print("rpc prepack: {}".format(self.ave_prepack))
print("rpc postpack: {}".format(self.ave_postpack))
print("OP: {}".format(self.op_analyst))
def get_op_analyst(self):
self.get_profile()
return self.op_analyst
def get_profile(self):
if self.ave_call is not None and \
self.ave_prepack is not None and \
self.ave_postpack is not None and \
self.op_analyst is not None:
return (self.ave_call, self.ave_prepack, self.ave_postpack,
self.op_analyst)
trace = self.get_trace()
time_dict = {}
time_list_dict = {}
start, end = None, None
for event in trace:
name = "{}#{}".format(event["name"], event["tid"])
event_t = int(event["ts"])
if name in time_dict:
ts = event_t - time_dict.pop(name)
ts = ts / 1e3 # ms
if name not in time_list_dict:
time_list_dict[name] = []
time_list_dict[name].append(ts)
else:
time_dict[name] = event_t
if start is None:
start = event_t
elif start > event_t:
start = event_t
if end is None:
end = event_t
elif end < event_t:
end = event_t
self.start_time = start
self.end_time = end
op_analyst = OpAnalyst(start, end)
# reduce prepack_n, postpack_n, call_n
pat_prepack = re.compile(r"prepack_\d+#@G")
prepack_time_list = []
pat_postpack = re.compile(r"postpack_\d+#@G")
postpack_time_list = []
pat_call = re.compile(r"call_\d+#DAG")
call_time_list = []
for name in time_list_dict:
if pat_prepack.match(name):
prepack_time_list.extend(time_list_dict[name])
elif pat_postpack.match(name):
postpack_time_list.extend(time_list_dict[name])
elif pat_call.match(name):
call_time_list.extend(time_list_dict[name])
else:
op_analyst.add(name, time_list_dict[name])
self.ave_call = sum(call_time_list) * 1.0 / len(call_time_list)
self.ave_prepack = sum(prepack_time_list) * 1.0 / len(prepack_time_list)
self.ave_postpack = sum(postpack_time_list) * 1.0 / len(
postpack_time_list)
self.op_analyst = op_analyst
return (self.ave_call, self.ave_prepack, self.ave_postpack,
self.op_analyst)
class OpAnalyst(object):
def __init__(self, start_time, end_time):
self.op_time_list_dict = {}
self._qps = None
self._close = False
self.start_time = start_time
self.end_time = end_time
def add(self, name_str, ts_list):
if self._close:
_LOGGER.error("OpAnalyst is closed.")
return
op_name, curr_idx, step = self._parse(name_str)
if op_name not in self.op_time_list_dict:
self.op_time_list_dict[op_name] = {}
if curr_idx not in self.op_time_list_dict[op_name]:
self.op_time_list_dict[op_name][curr_idx] = {}
if step not in self.op_time_list_dict[op_name][curr_idx]:
self.op_time_list_dict[op_name][curr_idx][step] = []
self.op_time_list_dict[op_name][curr_idx][step].extend(ts_list)
def _parse(self, name):
step, name_str = name.split("#")
name_str = name_str[1:-1]
op_name, curr_idx = name_str.split("|")
return op_name, curr_idx, step
def _reduce_profile(self):
"""
Calculating the average time-consuming of multiple concurrent OPs.
"""
if self._close:
return
for op_name in self.op_time_list_dict:
total_time = None
for curr_idx in self.op_time_list_dict[op_name]:
ave_dict = {}
for step in self.op_time_list_dict[op_name][curr_idx]:
ave_dict[step] = sum(self.op_time_list_dict[op_name][
curr_idx][step]) * 1.0 / len(self.op_time_list_dict[
op_name][curr_idx][step])
if total_time is None:
total_time = ave_dict
else:
for step in ave_dict:
total_time[step] += ave_dict[step]
for step in total_time:
total_time[step] = total_time[step] * 1.0 / len(
self.op_time_list_dict[op_name])
self.op_time_list_dict[op_name] = total_time
self._close = True
def _get_qps(self):
"""
Calculating QPS for each step based on the time
consumed in each step of OP.
"""
if self._qps is not None:
return self._qps
self._reduce_profile()
self._qps = {}
for op_name, times in self.op_time_list_dict.items():
self._qps[op_name] = {
step: 1000.0 / ts
for step, ts in times.items()
}
return self._qps
def __str__(self):
self._reduce_profile()
return json.dumps(
self.op_time_list_dict, indent=2, separators=(', ', ':'))
def qps(self, op_name=None):
"""
Get the average QPS of each step of each OP (in q/s)
"""
self._get_qps()
if op_name is None:
return self._qps
else:
return self._qps[op_name]
def times(self, op_name=None):
"""
Get the average time of each step of each OP (in ms)
"""
self._reduce_profile()
if op_name is None:
return self.op_time_list_dict
else:
return self.op_time_list_dict[op_name]
def concurrency_analysis(self, op_config_yaml):
"""
Through OP time consuming and op_config_yaml to
calculate the theoretical QPS, as well as the
number of concurrency required by each OPs.
It should be noted that since multiple models
will affect each other on one card, only the
case that each model is on a different card can
be calculated.
The format of the yaml file is as follows:
```yaml
<op_name>:
<step(prep, midp or postp)>: <GPU id>
```
For example:
```yaml
cnn:
midp: 0
bow:
midp: 1
```
"""
import yaml
with open(op_config_yaml) as f:
op_config = yaml.load(f)
# check that each model is deployed on a different card
card_set = set()
# and finding the most time consuming part (GPU)
op_times = self.times()
most_time = 0
most_time_op_name = None
for op in op_config:
for step, cards in op_config[op].items():
if isinstance(cards, int):
cards = [cards]
elif isinstance(cards, str):
cards = [int(x) for x in cards.split(',')]
else:
raise Exception("Error cards type.")
for card in cards:
if card in card_set:
raise Exception(
"Analysis is failed because "
"different services interact when different"
" models are deployed on one card.")
else:
card_set.add(card)
times_each_card = op_times[op][step] / len(cards)
if most_time < times_each_card:
most_time = times_each_card
most_time_op_name = op
# calculate base qps
base_qps = 1.0 / most_time # q/ms
_LOGGER.info("Most Time Consuming (GPU): {} ms (op: {})"
.format(most_time, most_time_op_name))
_LOGGER.info("Theoretically Expected QPS: {} q/s".format(base_qps *
1000))
# reduce op times
op_times = {
op_name: sum(step_times.values())
for op_name, step_times in op_times.items()
}
# calculate op concurrency
op_concurrency = {
op_name: round(base_qps * times, 3)
for op_name, times in op_times.items()
}
return op_concurrency
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import threading
import multiprocessing
import multiprocessing.queues
import sys
if sys.version_info.major == 2:
import Queue
elif sys.version_info.major == 3:
import queue as Queue
else:
raise Exception("Error Python version")
import numpy as np
import logging
import enum
import copy
_LOGGER = logging.getLogger()
class ChannelDataEcode(enum.Enum):
OK = 0
TIMEOUT = 1
NOT_IMPLEMENTED = 2
TYPE_ERROR = 3
RPC_PACKAGE_ERROR = 4
CLIENT_ERROR = 5
CLOSED_ERROR = 6
UNKNOW = 7
class ChannelDataType(enum.Enum):
DICT = 0
CHANNEL_NPDATA = 1
ERROR = 2
class ChannelData(object):
def __init__(self,
datatype=None,
npdata=None,
dictdata=None,
data_id=None,
ecode=None,
error_info=None,
client_need_profile=False):
'''
There are several ways to use it:
1. ChannelData(ChannelDataType.CHANNEL_NPDATA.value, npdata, data_id)
2. ChannelData(ChannelDataType.DICT.value, dictdata, data_id)
3. ChannelData(ecode, error_info, data_id)
Protobufs are not pickle-able:
https://stackoverflow.com/questions/55344376/how-to-import-protobuf-module
'''
if ecode is not None:
if data_id is None or error_info is None:
raise ValueError("data_id and error_info cannot be None")
datatype = ChannelDataType.ERROR.value
else:
if datatype == ChannelDataType.CHANNEL_NPDATA.value:
ecode, error_info = ChannelData.check_npdata(npdata)
if ecode != ChannelDataEcode.OK.value:
datatype = ChannelDataType.ERROR.value
_LOGGER.error(error_info)
elif datatype == ChannelDataType.DICT.value:
ecode, error_info = ChannelData.check_dictdata(dictdata)
if ecode != ChannelDataEcode.OK.value:
datatype = ChannelDataType.ERROR.value
_LOGGER.error(error_info)
else:
raise ValueError("datatype not match")
self.datatype = datatype
self.npdata = npdata
self.dictdata = dictdata
self.id = data_id
self.ecode = ecode
self.error_info = error_info
self.client_need_profile = client_need_profile
self.profile_data_set = set()
def add_profile(self, profile_set):
if self.client_need_profile is False:
self.client_need_profile = True
self.profile_data_set |= profile_set
@staticmethod
def check_dictdata(dictdata):
ecode = ChannelDataEcode.OK.value
error_info = None
if isinstance(dictdata, list):
# batch data
for sample in dictdata:
if not isinstance(sample, dict):
ecode = ChannelDataEcode.TYPE_ERROR.value
error_info = "the value of data must " \
"be dict, but get {}.".format(type(sample))
break
elif not isinstance(dictdata, dict):
# batch size = 1
ecode = ChannelDataEcode.TYPE_ERROR.value
error_info = "the value of data must " \
"be dict, but get {}.".format(type(dictdata))
return ecode, error_info
@staticmethod
def check_npdata(npdata):
ecode = ChannelDataEcode.OK.value
error_info = None
if isinstance(npdata, list):
# batch data
for sample in npdata:
if not isinstance(sample, dict):
ecode = ChannelDataEcode.TYPE_ERROR.value
error_info = "the value of data must " \
"be dict, but get {}.".format(type(sample))
break
for _, value in sample.items():
if not isinstance(value, np.ndarray):
ecode = ChannelDataEcode.TYPE_ERROR.value
error_info = "the value of data must " \
"be np.ndarray, but get {}.".format(type(value))
return ecode, error_info
elif isinstance(npdata, dict):
# batch_size = 1
for _, value in npdata.items():
if not isinstance(value, np.ndarray):
ecode = ChannelDataEcode.TYPE_ERROR.value
error_info = "the value of data must " \
"be np.ndarray, but get {}.".format(type(value))
break
else:
ecode = ChannelDataEcode.TYPE_ERROR.value
error_info = "the value of data must " \
"be dict, but get {}.".format(type(npdata))
return ecode, error_info
def parse(self):
feed = None
if self.datatype == ChannelDataType.CHANNEL_NPDATA.value:
# return narray
feed = self.npdata
elif self.datatype == ChannelDataType.DICT.value:
# return dict
feed = self.dictdata
else:
raise TypeError("Error type({}) in datatype.".format(self.datatype))
return feed
def __str__(self):
return "type[{}], ecode[{}], id[{}]".format(
ChannelDataType(self.datatype).name, self.ecode, self.id)
class ProcessChannel(object):
"""
(Process version) The channel used for communication between Ops.
1. Support multiple different Op feed data (multiple producer)
Different types of data will be packaged through the data ID
2. Support multiple different Op fetch data (multiple consumer)
Only when all types of Ops get the data of the same ID,
the data will be poped; The Op of the same type will not
get the data of the same ID.
3. (TODO) Timeout and BatchSize are not fully supported.
Note:
1. The ID of the data in the channel must be different.
2. The function add_producer() and add_consumer() are not thread safe,
and can only be called during initialization.
There are two buffers and one queue in Channel:
op_A \ / op_D
op_B - a. input_buf -> b. queue -> c. output_buf - op_E
op_C / \ op_F
a. In input_buf, the input of multiple predecessor Ops is packed by data ID.
b. The packed data will be stored in queue.
c. In order to support multiple successor Ops to retrieve data, output_buf
maintains the data obtained from queue.
"""
def __init__(self, manager, name=None, maxsize=0, timeout=None):
# For queue multiprocess: after putting an object on
# an empty queue there may be an infinitessimal delay
# before the queue's :meth:`~Queue.empty`
# see more:
# - https://bugs.python.org/issue18277
# - https://hg.python.org/cpython/rev/860fc6a2bd21
self._que = manager.Queue(maxsize=maxsize)
self._maxsize = maxsize
self._timeout = timeout
self.name = name
self._stop = manager.Value('i', 0)
self._cv = multiprocessing.Condition()
self._producers = []
self._pushed_producer_count = manager.dict() # {data_id: count}
self._input_buf = manager.dict() # {data_id: {op_name: data}}
self._reset_max_cursor = 1000000000000000000
self._consumer_cursors = manager.dict() # {op_name: cursor}
self._cursor_count = manager.dict() # {cursor: count}
self._base_cursor = manager.Value('i', 0)
self._output_buf = manager.list()
def get_producers(self):
return self._producers
def get_consumers(self):
return self._consumer_cursors.keys()
def _log(self, info_str):
return "[{}] {}".format(self.name, info_str)
def debug(self):
return self._log("p: {}, c: {}".format(self.get_producers(),
self.get_consumers()))
def add_producer(self, op_name):
""" not thread safe, and can only be called during initialization. """
if op_name in self._producers:
raise ValueError(
self._log("producer({}) is already in channel".format(op_name)))
self._producers.append(op_name)
def add_consumer(self, op_name):
""" not thread safe, and can only be called during initialization. """
if op_name in self._consumer_cursors:
raise ValueError(
self._log("consumer({}) is already in channel".format(op_name)))
self._consumer_cursors[op_name] = 0
if self._cursor_count.get(0) is None:
self._cursor_count[0] = 0
self._cursor_count[0] += 1
def push(self, channeldata, op_name=None):
_LOGGER.debug(
self._log("{} try to push data: {}".format(op_name,
channeldata.__str__())))
if len(self._producers) == 0:
raise Exception(
self._log(
"expected number of producers to be greater than 0, but the it is 0."
))
elif len(self._producers) == 1:
with self._cv:
while self._stop.value == 0:
try:
self._que.put({op_name: channeldata}, timeout=0)
break
except Queue.Full:
self._cv.wait()
if self._stop.value == 1:
raise ChannelStopError()
_LOGGER.debug(
self._log("{} channel size: {}".format(op_name,
self._que.qsize())))
self._cv.notify_all()
_LOGGER.debug(self._log("{} notify all".format(op_name)))
_LOGGER.debug(self._log("{} push data succ!".format(op_name)))
return True
elif op_name is None:
raise Exception(
self._log(
"There are multiple producers, so op_name cannot be None."))
producer_num = len(self._producers)
data_id = channeldata.id
put_data = None
with self._cv:
_LOGGER.debug(self._log("{} get lock".format(op_name)))
if data_id not in self._input_buf:
self._input_buf[data_id] = {
name: None
for name in self._producers
}
self._pushed_producer_count[data_id] = 0
# see: https://docs.python.org/3.6/library/multiprocessing.html?highlight=multiprocess#proxy-objects
# self._input_buf[data_id][op_name] = channeldata
tmp_input_buf = self._input_buf[data_id]
tmp_input_buf[op_name] = channeldata
self._input_buf[data_id] = tmp_input_buf
if self._pushed_producer_count[data_id] + 1 == producer_num:
put_data = self._input_buf[data_id]
self._input_buf.pop(data_id)
self._pushed_producer_count.pop(data_id)
else:
self._pushed_producer_count[data_id] += 1
if put_data is None:
_LOGGER.debug(
self._log("{} push data succ, but not push to queue.".
format(op_name)))
else:
while self._stop.value == 0:
try:
_LOGGER.debug(
self._log("{} push data succ: {}".format(
op_name, put_data.__str__())))
self._que.put(put_data, timeout=0)
break
except Queue.Empty:
self._cv.wait()
if self._stop.value == 1:
raise ChannelStopError()
_LOGGER.debug(
self._log("multi | {} push data succ!".format(op_name)))
self._cv.notify_all()
return True
def front(self, op_name=None):
_LOGGER.debug(self._log("{} try to get data...".format(op_name)))
if len(self._consumer_cursors) == 0:
raise Exception(
self._log(
"expected number of consumers to be greater than 0, but the it is 0."
))
elif len(self._consumer_cursors) == 1:
resp = None
with self._cv:
while self._stop.value == 0 and resp is None:
try:
_LOGGER.debug(
self._log("{} try to get(with channel empty: {})".
format(op_name, self._que.empty())))
resp = self._que.get(timeout=0)
break
except Queue.Empty:
_LOGGER.debug(
self._log(
"{} wait for empty queue(with channel empty: {})".
format(op_name, self._que.empty())))
self._cv.wait()
if self._stop.value == 1:
raise ChannelStopError()
_LOGGER.debug(
self._log("{} get data succ: {}".format(op_name, resp.__str__(
))))
return resp
elif op_name is None:
raise Exception(
self._log(
"There are multiple consumers, so op_name cannot be None."))
# In output_buf, different Ops (according to op_name) have different
# cursors. In addition, there is a base_cursor. Their difference is
# the data_idx to be taken by the corresponding Op at the current
# time: data_idx = consumer_cursor - base_cursor
#
# base_cursor consumer_B_cursor (data_idx: 3)
# | |
# output_buf: | data0 | data1 | data2 | data3 |
# |
# consumer_A_cursor (data_idx: 0)
with self._cv:
# When the data required by the current Op is not in output_buf,
# it is necessary to obtain a data from queue and add it to output_buf.
while self._stop.value == 0 and self._consumer_cursors[
op_name] - self._base_cursor.value >= len(self._output_buf):
_LOGGER.debug(
self._log(
"({}) B self._consumer_cursors: {}, self._base_cursor: {}, len(self._output_buf): {}".
format(op_name, self._consumer_cursors,
self._base_cursor.value, len(self._output_buf))))
try:
_LOGGER.debug(
self._log("{} try to get(with channel size: {})".format(
op_name, self._que.qsize())))
channeldata = self._que.get(timeout=0)
self._output_buf.append(channeldata)
break
except Queue.Empty:
_LOGGER.debug(
self._log(
"{} wait for empty queue(with channel size: {})".
format(op_name, self._que.qsize())))
self._cv.wait()
if self._stop.value == 1:
raise ChannelStopError()
consumer_cursor = self._consumer_cursors[op_name]
base_cursor = self._base_cursor.value
data_idx = consumer_cursor - base_cursor
resp = self._output_buf[data_idx]
_LOGGER.debug(self._log("{} get data: {}".format(op_name, resp)))
self._cursor_count[consumer_cursor] -= 1
if consumer_cursor == base_cursor and self._cursor_count[
consumer_cursor] == 0:
# When all the different Ops get the data that data_idx points
# to, pop the data from output_buf.
self._cursor_count.pop(consumer_cursor)
self._output_buf.pop(0)
self._base_cursor.value += 1
# to avoid cursor overflow
if self._base_cursor.value >= self._reset_max_cursor:
self._base_cursor.value -= self._reset_max_cursor
for name in self._consumer_cursors.keys():
self._consumer_cursors[name] -= self._reset_max_cursor
cursor_count_tmp = {
cursor - self._reset_max_cursor: count
for cursor, count in self._cursor_count.copy().items()
}
self._cursor_count.clear()
for cursor, count in cursor_count_tmp.items():
self._cursor_count[cursor] = count
self._consumer_cursors[op_name] += 1
new_consumer_cursor = self._consumer_cursors[op_name]
if self._cursor_count.get(new_consumer_cursor) is None:
self._cursor_count[new_consumer_cursor] = 0
self._cursor_count[new_consumer_cursor] += 1
_LOGGER.debug(
self._log(
"({}) A self._consumer_cursors: {}, self._base_cursor: {}, len(self._output_buf): {}".
format(op_name, self._consumer_cursors,
self._base_cursor.value, len(self._output_buf))))
_LOGGER.debug(self._log("{} notify all".format(op_name)))
self._cv.notify_all()
_LOGGER.debug(self._log("multi | {} get data succ!".format(op_name)))
return resp # reference, read only
def stop(self):
_LOGGER.debug(self._log("stop."))
self._stop.value = 1
with self._cv:
self._cv.notify_all()
class ThreadChannel(Queue.Queue):
"""
(Thread version)The channel used for communication between Ops.
1. Support multiple different Op feed data (multiple producer)
Different types of data will be packaged through the data ID
2. Support multiple different Op fetch data (multiple consumer)
Only when all types of Ops get the data of the same ID,
the data will be poped; The Op of the same type will not
get the data of the same ID.
3. (TODO) Timeout and BatchSize are not fully supported.
Note:
1. The ID of the data in the channel must be different.
2. The function add_producer() and add_consumer() are not thread safe,
and can only be called during initialization.
There are two buffers and one queue in Channel:
op_A \ / op_D
op_B - a. input_buf -> b. queue -> c. output_buf - op_E
op_C / \ op_F
a. In input_buf, the input of multiple predecessor Ops is packed by data ID.
b. The packed data will be stored in queue.
c. In order to support multiple successor Ops to retrieve data, output_buf
maintains the data obtained from queue.
"""
def __init__(self, name=None, maxsize=-1, timeout=None):
Queue.Queue.__init__(self, maxsize=maxsize)
self._maxsize = maxsize
self._timeout = timeout
self.name = name
self._stop = False
self._cv = threading.Condition()
self._producers = []
self._pushed_producer_count = {} # {data_id: count}
self._input_buf = {} # {data_id: {op_name: data}}
self._reset_max_cursor = 1000000000000000000
self._consumer_cursors = {} # {op_name: idx}
self._cursor_count = {} # {cursor: count}
self._base_cursor = 0
self._output_buf = []
def get_producers(self):
return self._producers
def get_consumers(self):
return self._consumer_cursors.keys()
def _log(self, info_str):
return "[{}] {}".format(self.name, info_str)
def debug(self):
return self._log("p: {}, c: {}".format(self.get_producers(),
self.get_consumers()))
def add_producer(self, op_name):
""" not thread safe, and can only be called during initialization. """
if op_name in self._producers:
raise ValueError(
self._log("producer({}) is already in channel".format(op_name)))
self._producers.append(op_name)
def add_consumer(self, op_name):
""" not thread safe, and can only be called during initialization. """
if op_name in self._consumer_cursors:
raise ValueError(
self._log("consumer({}) is already in channel".format(op_name)))
self._consumer_cursors[op_name] = 0
if self._cursor_count.get(0) is None:
self._cursor_count[0] = 0
self._cursor_count[0] += 1
def push(self, channeldata, op_name=None):
_LOGGER.debug(
self._log("{} try to push data: {}".format(op_name,
channeldata.__str__())))
if len(self._producers) == 0:
raise Exception(
self._log(
"expected number of producers to be greater than 0, but the it is 0."
))
elif len(self._producers) == 1:
with self._cv:
while self._stop is False:
try:
self.put({op_name: channeldata}, timeout=0)
break
except Queue.Full:
self._cv.wait()
if self._stop:
raise ChannelStopError()
self._cv.notify_all()
_LOGGER.debug(self._log("{} push data succ!".format(op_name)))
return True
elif op_name is None:
raise Exception(
self._log(
"There are multiple producers, so op_name cannot be None."))
producer_num = len(self._producers)
data_id = channeldata.id
put_data = None
with self._cv:
_LOGGER.debug(self._log("{} get lock".format(op_name)))
if data_id not in self._input_buf:
self._input_buf[data_id] = {
name: None
for name in self._producers
}
self._pushed_producer_count[data_id] = 0
self._input_buf[data_id][op_name] = channeldata
if self._pushed_producer_count[data_id] + 1 == producer_num:
put_data = self._input_buf[data_id]
self._input_buf.pop(data_id)
self._pushed_producer_count.pop(data_id)
else:
self._pushed_producer_count[data_id] += 1
if put_data is None:
_LOGGER.debug(
self._log("{} push data succ, but not push to queue.".
format(op_name)))
else:
while self._stop is False:
try:
self.put(put_data, timeout=0)
break
except Queue.Empty:
self._cv.wait()
if self._stop:
raise ChannelStopError()
_LOGGER.debug(
self._log("multi | {} push data succ!".format(op_name)))
self._cv.notify_all()
return True
def front(self, op_name=None):
_LOGGER.debug(self._log("{} try to get data".format(op_name)))
if len(self._consumer_cursors) == 0:
raise Exception(
self._log(
"expected number of consumers to be greater than 0, but the it is 0."
))
elif len(self._consumer_cursors) == 1:
resp = None
with self._cv:
while self._stop is False and resp is None:
try:
resp = self.get(timeout=0)
break
except Queue.Empty:
self._cv.wait()
if self._stop:
raise ChannelStopError()
_LOGGER.debug(
self._log("{} get data succ: {}".format(op_name, resp.__str__(
))))
return resp
elif op_name is None:
raise Exception(
self._log(
"There are multiple consumers, so op_name cannot be None."))
# In output_buf, different Ops (according to op_name) have different
# cursors. In addition, there is a base_cursor. Their difference is
# the data_idx to be taken by the corresponding Op at the current
# time: data_idx = consumer_cursor - base_cursor
#
# base_cursor consumer_B_cursor (data_idx: 3)
# | |
# output_buf: | data0 | data1 | data2 | data3 |
# |
# consumer_A_cursor (data_idx: 0)
with self._cv:
# When the data required by the current Op is not in output_buf,
# it is necessary to obtain a data from queue and add it to output_buf.
while self._stop is False and self._consumer_cursors[
op_name] - self._base_cursor >= len(self._output_buf):
try:
channeldata = self.get(timeout=0)
self._output_buf.append(channeldata)
break
except Queue.Empty:
self._cv.wait()
if self._stop:
raise ChannelStopError()
consumer_cursor = self._consumer_cursors[op_name]
base_cursor = self._base_cursor
data_idx = consumer_cursor - base_cursor
resp = None
self._cursor_count[consumer_cursor] -= 1
if consumer_cursor == base_cursor and self._cursor_count[
consumer_cursor] == 0:
# When all the different Ops get the data that data_idx points
# to, pop the data from output_buf.
self._cursor_count.pop(consumer_cursor)
resp = self._output_buf.pop(0)
self._base_cursor += 1
# to avoid cursor overflow
if self._base_cursor >= self._reset_max_cursor:
self._base_cursor -= self._reset_max_cursor
for name in self._consumer_cursors:
self._consumer_cursors[name] -= self._reset_max_cursor
self._cursor_count = {
cursor - self._reset_max_cursor: count
for cursor, count in self._cursor_count.items()
}
else:
resp = copy.deepcopy(self._output_buf[data_idx])
_LOGGER.debug(self._log("{} get data: {}".format(op_name, resp)))
self._consumer_cursors[op_name] += 1
new_consumer_cursor = self._consumer_cursors[op_name]
if self._cursor_count.get(new_consumer_cursor) is None:
self._cursor_count[new_consumer_cursor] = 0
self._cursor_count[new_consumer_cursor] += 1
self._cv.notify_all()
_LOGGER.debug(self._log("multi | {} get data succ!".format(op_name)))
return resp
def stop(self):
_LOGGER.debug(self._log("stop."))
self._stop = True
with self._cv:
self._cv.notify_all()
class ChannelStopError(RuntimeError):
def __init__(self):
pass
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import threading
import multiprocessing
import sys
import copy
if sys.version_info.major == 2:
import Queue
elif sys.version_info.major == 3:
import queue as Queue
else:
raise Exception("Error Python version")
import os
import logging
from .operator import Op, RequestOp, ResponseOp, VirtualOp
from .channel import (ThreadChannel, ProcessChannel, ChannelData,
ChannelDataEcode, ChannelDataType, ChannelStopError)
from .profiler import TimeProfiler
from .util import NameGenerator
_LOGGER = logging.getLogger()
class DAGExecutor(object):
def __init__(self, response_op, dag_config, show_info):
self._retry = dag_config.get('retry', 1)
client_type = dag_config.get('client_type', 'brpc')
self._server_use_profile = dag_config.get('use_profile', False)
channel_size = dag_config.get('channel_size', 0)
self._is_thread_op = dag_config.get('is_thread_op', True)
if show_info and self._server_use_profile:
_LOGGER.info("================= PROFILER ================")
if self._is_thread_op:
_LOGGER.info("op: thread")
_LOGGER.info("profile mode: sync")
else:
_LOGGER.info("op: process")
_LOGGER.info("profile mode: asyn")
_LOGGER.info("-------------------------------------------")
self.name = "@G"
self._profiler = TimeProfiler()
self._profiler.enable(True)
self._dag = DAG(self.name, response_op, self._server_use_profile,
self._is_thread_op, client_type, channel_size,
show_info)
(in_channel, out_channel, pack_rpc_func,
unpack_rpc_func) = self._dag.build()
self._dag.start()
self._set_in_channel(in_channel)
self._set_out_channel(out_channel)
self._pack_rpc_func = pack_rpc_func
self._unpack_rpc_func = unpack_rpc_func
_LOGGER.debug(self._log(in_channel.debug()))
_LOGGER.debug(self._log(out_channel.debug()))
self._id_lock = threading.Lock()
self._id_counter = 0
self._reset_max_id = 1000000000000000000
self._cv_pool = {}
self._cv_for_cv_pool = threading.Condition()
self._fetch_buffer = None
self._recive_func = None
self._client_profile_key = "pipeline.profile"
self._client_profile_value = "1"
def start(self):
self._recive_func = threading.Thread(
target=DAGExecutor._recive_out_channel_func, args=(self, ))
self._recive_func.start()
def stop(self):
self._dag.stop()
self._dag.join()
def _get_next_data_id(self):
with self._id_lock:
if self._id_counter >= self._reset_max_id:
self._id_counter -= self._reset_max_id
self._id_counter += 1
return self._id_counter - 1
def _set_in_channel(self, in_channel):
if not isinstance(in_channel, (ThreadChannel, ProcessChannel)):
raise TypeError(
self._log('in_channel must be Channel type, but get {}'.format(
type(in_channel))))
in_channel.add_producer(self.name)
self._in_channel = in_channel
def _set_out_channel(self, out_channel):
if not isinstance(out_channel, (ThreadChannel, ProcessChannel)):
raise TypeError(
self._log('out_channel must be Channel type, but get {}'.format(
type(out_channel))))
out_channel.add_consumer(self.name)
self._out_channel = out_channel
def _recive_out_channel_func(self):
cv = None
while True:
try:
channeldata_dict = self._out_channel.front(self.name)
except ChannelStopError:
_LOGGER.debug(self._log("stop."))
with self._cv_for_cv_pool:
for data_id, cv in self._cv_pool.items():
closed_errror_data = ChannelData(
ecode=ChannelDataEcode.CLOSED_ERROR.value,
error_info="dag closed.",
data_id=data_id)
with cv:
self._fetch_buffer = closed_errror_data
cv.notify_all()
break
if len(channeldata_dict) != 1:
_LOGGER.error("out_channel cannot have multiple input ops")
os._exit(-1)
(_, channeldata), = channeldata_dict.items()
if not isinstance(channeldata, ChannelData):
raise TypeError(
self._log('data must be ChannelData type, but get {}'.
format(type(channeldata))))
data_id = channeldata.id
_LOGGER.debug("recive thread fetch data: {}".format(data_id))
with self._cv_for_cv_pool:
cv = self._cv_pool[data_id]
with cv:
self._fetch_buffer = channeldata
cv.notify_all()
def _get_channeldata_from_fetch_buffer(self, data_id):
resp = None
cv = threading.Condition()
with self._cv_for_cv_pool:
self._cv_pool[data_id] = cv
with cv:
cv.wait()
_LOGGER.debug("resp func get lock (data_id: {})".format(data_id))
resp = copy.deepcopy(self._fetch_buffer)
with self._cv_for_cv_pool:
self._cv_pool.pop(data_id)
return resp
def _pack_channeldata(self, rpc_request, data_id):
_LOGGER.debug(self._log('start inferce'))
dictdata = None
try:
dictdata = self._unpack_rpc_func(rpc_request)
except Exception as e:
return ChannelData(
ecode=ChannelDataEcode.RPC_PACKAGE_ERROR.value,
error_info="rpc package error: {}".format(e),
data_id=data_id)
else:
# because unpack_rpc_func is rewritten by user, we need
# to look for client_profile_key field in rpc_request
profile_value = None
for idx, key in enumerate(rpc_request.key):
if key == self._client_profile_key:
profile_value = rpc_request.value[idx]
break
return ChannelData(
datatype=ChannelDataType.DICT.value,
dictdata=dictdata,
data_id=data_id,
client_need_profile=(
profile_value == self._client_profile_value))
def call(self, rpc_request):
data_id = self._get_next_data_id()
if not self._is_thread_op:
self._profiler.record("call_{}#DAG-{}_0".format(data_id, data_id))
else:
self._profiler.record("call_{}#DAG_0".format(data_id))
self._profiler.record("prepack_{}#{}_0".format(data_id, self.name))
req_channeldata = self._pack_channeldata(rpc_request, data_id)
self._profiler.record("prepack_{}#{}_1".format(data_id, self.name))
resp_channeldata = None
for i in range(self._retry):
_LOGGER.debug(self._log('push data'))
#self._profiler.record("push_{}#{}_0".format(data_id, self.name))
try:
self._in_channel.push(req_channeldata, self.name)
except ChannelStopError:
_LOGGER.debug(self._log("stop."))
return self._pack_for_rpc_resp(
ChannelData(
ecode=ChannelDataEcode.CLOSED_ERROR.value,
error_info="dag closed.",
data_id=data_id))
#self._profiler.record("push_{}#{}_1".format(data_id, self.name))
_LOGGER.debug(self._log('wait for infer'))
#self._profiler.record("fetch_{}#{}_0".format(data_id, self.name))
resp_channeldata = self._get_channeldata_from_fetch_buffer(data_id)
#self._profiler.record("fetch_{}#{}_1".format(data_id, self.name))
if resp_channeldata.ecode == ChannelDataEcode.OK.value:
break
if i + 1 < self._retry:
_LOGGER.warn("retry({}): {}".format(
i + 1, resp_channeldata.error_info))
self._profiler.record("postpack_{}#{}_0".format(data_id, self.name))
rpc_resp = self._pack_for_rpc_resp(resp_channeldata)
self._profiler.record("postpack_{}#{}_1".format(data_id, self.name))
if not self._is_thread_op:
self._profiler.record("call_{}#DAG-{}_1".format(data_id, data_id))
else:
self._profiler.record("call_{}#DAG_1".format(data_id))
#self._profiler.print_profile()
profile_str = self._profiler.gen_profile_str()
if self._server_use_profile:
sys.stderr.write(profile_str)
# add profile info into rpc_resp
profile_value = ""
if resp_channeldata.client_need_profile:
profile_set = resp_channeldata.profile_data_set
profile_set.add(profile_str)
profile_value = "".join(list(profile_set))
rpc_resp.key.append(self._client_profile_key)
rpc_resp.value.append(profile_value)
return rpc_resp
def _pack_for_rpc_resp(self, channeldata):
_LOGGER.debug(self._log('get channeldata'))
return self._pack_rpc_func(channeldata)
def _log(self, info_str):
return "[{}] {}".format(self.name, info_str)
class DAG(object):
def __init__(self, request_name, response_op, use_profile, is_thread_op,
client_type, channel_size, show_info):
self._request_name = request_name
self._response_op = response_op
self._use_profile = use_profile
self._is_thread_op = is_thread_op
self._channel_size = channel_size
self._client_type = client_type
self._show_info = show_info
if not self._is_thread_op:
self._manager = multiprocessing.Manager()
def get_use_ops(self, response_op):
unique_names = set()
used_ops = set()
succ_ops_of_use_op = {} # {op_name: succ_ops}
que = Queue.Queue()
que.put(response_op)
while que.qsize() != 0:
op = que.get()
for pred_op in op.get_input_ops():
if pred_op.name not in succ_ops_of_use_op:
succ_ops_of_use_op[pred_op.name] = []
if op != response_op:
succ_ops_of_use_op[pred_op.name].append(op)
if pred_op not in used_ops:
que.put(pred_op)
used_ops.add(pred_op)
# check the name of op is globally unique
if pred_op.name in unique_names:
raise Exception("the name of Op must be unique: {}".
format(pred_op.name))
unique_names.add(pred_op.name)
return used_ops, succ_ops_of_use_op
def _gen_channel(self, name_gen):
channel = None
if self._is_thread_op:
channel = ThreadChannel(
name=name_gen.next(), maxsize=self._channel_size)
else:
channel = ProcessChannel(
self._manager, name=name_gen.next(), maxsize=self._channel_size)
return channel
def _gen_virtual_op(self, name_gen):
return VirtualOp(name=name_gen.next())
def _topo_sort(self, used_ops, response_op, out_degree_ops):
out_degree_num = {
name: len(ops)
for name, ops in out_degree_ops.items()
}
que_idx = 0 # scroll queue
ques = [Queue.Queue() for _ in range(2)]
zero_indegree_num = 0
for op in used_ops:
if len(op.get_input_ops()) == 0:
zero_indegree_num += 1
if zero_indegree_num != 1:
raise Exception("DAG contains multiple input Ops")
last_op = response_op.get_input_ops()[0]
ques[que_idx].put(last_op)
# topo sort to get dag_views
dag_views = []
sorted_op_num = 0
while True:
que = ques[que_idx]
next_que = ques[(que_idx + 1) % 2]
dag_view = []
while que.qsize() != 0:
op = que.get()
dag_view.append(op)
sorted_op_num += 1
for pred_op in op.get_input_ops():
out_degree_num[pred_op.name] -= 1
if out_degree_num[pred_op.name] == 0:
next_que.put(pred_op)
dag_views.append(dag_view)
if next_que.qsize() == 0:
break
que_idx = (que_idx + 1) % 2
if sorted_op_num < len(used_ops):
raise Exception("not legal DAG")
return dag_views, last_op
def _build_dag(self, response_op):
if response_op is None:
raise Exception("response_op has not been set.")
used_ops, out_degree_ops = self.get_use_ops(response_op)
if self._show_info:
_LOGGER.info("================= USED OP =================")
for op in used_ops:
if op.name != self._request_name:
_LOGGER.info(op.name)
_LOGGER.info("-------------------------------------------")
if len(used_ops) <= 1:
raise Exception(
"Besides RequestOp and ResponseOp, there should be at least one Op in DAG."
)
dag_views, last_op = self._topo_sort(used_ops, response_op,
out_degree_ops)
dag_views = list(reversed(dag_views))
if self._show_info:
_LOGGER.info("================== DAG ====================")
for idx, view in enumerate(dag_views):
_LOGGER.info("(VIEW {})".format(idx))
for op in view:
_LOGGER.info(" [{}]".format(op.name))
for out_op in out_degree_ops[op.name]:
_LOGGER.info(" - {}".format(out_op.name))
_LOGGER.info("-------------------------------------------")
# create channels and virtual ops
virtual_op_name_gen = NameGenerator("vir")
channel_name_gen = NameGenerator("chl")
virtual_ops = []
channels = []
input_channel = None
actual_view = None
for v_idx, view in enumerate(dag_views):
if v_idx + 1 >= len(dag_views):
break
next_view = dag_views[v_idx + 1]
if actual_view is None:
actual_view = view
actual_next_view = []
pred_op_of_next_view_op = {}
for op in actual_view:
# find actual succ op in next view and create virtual op
for succ_op in out_degree_ops[op.name]:
if succ_op in next_view:
if succ_op not in actual_next_view:
actual_next_view.append(succ_op)
if succ_op.name not in pred_op_of_next_view_op:
pred_op_of_next_view_op[succ_op.name] = []
pred_op_of_next_view_op[succ_op.name].append(op)
else:
# create virtual op
virtual_op = self._gen_virtual_op(virtual_op_name_gen)
virtual_ops.append(virtual_op)
out_degree_ops[virtual_op.name] = [succ_op]
actual_next_view.append(virtual_op)
pred_op_of_next_view_op[virtual_op.name] = [op]
virtual_op.add_virtual_pred_op(op)
actual_view = actual_next_view
# create channel
processed_op = set()
for o_idx, op in enumerate(actual_next_view):
if op.name in processed_op:
continue
channel = self._gen_channel(channel_name_gen)
channels.append(channel)
_LOGGER.debug("{} => {}".format(channel.name, op.name))
op.add_input_channel(channel)
pred_ops = pred_op_of_next_view_op[op.name]
if v_idx == 0:
input_channel = channel
else:
# if pred_op is virtual op, it will use ancestors as producers to channel
for pred_op in pred_ops:
_LOGGER.debug("{} => {}".format(pred_op.name,
channel.name))
pred_op.add_output_channel(channel)
processed_op.add(op.name)
# find same input op to combine channel
for other_op in actual_next_view[o_idx + 1:]:
if other_op.name in processed_op:
continue
other_pred_ops = pred_op_of_next_view_op[other_op.name]
if len(other_pred_ops) != len(pred_ops):
continue
same_flag = True
for pred_op in pred_ops:
if pred_op not in other_pred_ops:
same_flag = False
break
if same_flag:
_LOGGER.debug("{} => {}".format(channel.name,
other_op.name))
other_op.add_input_channel(channel)
processed_op.add(other_op.name)
output_channel = self._gen_channel(channel_name_gen)
channels.append(output_channel)
last_op.add_output_channel(output_channel)
pack_func, unpack_func = None, None
pack_func = response_op.pack_response_package
actual_ops = virtual_ops
for op in used_ops:
if len(op.get_input_ops()) == 0:
unpack_func = op.unpack_request_package
continue
actual_ops.append(op)
for c in channels:
_LOGGER.debug(c.debug())
return (actual_ops, channels, input_channel, output_channel, pack_func,
unpack_func)
def build(self):
(actual_ops, channels, input_channel, output_channel, pack_func,
unpack_func) = self._build_dag(self._response_op)
self._actual_ops = actual_ops
self._channels = channels
self._input_channel = input_channel
self._output_channel = output_channel
self._pack_func = pack_func
self._unpack_func = unpack_func
return self._input_channel, self._output_channel, self._pack_func, self._unpack_func
def start(self):
self._threads_or_proces = []
for op in self._actual_ops:
op.use_profiler(self._use_profile)
if self._is_thread_op:
self._threads_or_proces.extend(
op.start_with_thread(self._client_type))
else:
self._threads_or_proces.extend(
op.start_with_process(self._client_type))
# not join yet
return self._threads_or_proces
def join(self):
for x in self._threads_or_proces:
x.join()
def stop(self):
for chl in self._channels:
chl.stop()
for op in self._actual_ops:
op.clean_input_channel()
op.clean_output_channels()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import threading
import multiprocessing
from paddle_serving_client import MultiLangClient, Client
from concurrent import futures
import logging
import func_timeout
import os
import sys
import numpy as np
from numpy import *
from .proto import pipeline_service_pb2
from .channel import (ThreadChannel, ProcessChannel, ChannelDataEcode,
ChannelData, ChannelDataType, ChannelStopError)
from .util import NameGenerator
from .profiler import TimeProfiler
_LOGGER = logging.getLogger()
_op_name_gen = NameGenerator("Op")
class Op(object):
def __init__(self,
name=None,
input_ops=[],
server_endpoints=[],
fetch_list=[],
client_config=None,
concurrency=1,
timeout=-1,
retry=1):
if name is None:
name = _op_name_gen.next()
self.name = name # to identify the type of OP, it must be globally unique
self.concurrency = concurrency # amount of concurrency
self.set_input_ops(input_ops)
self._server_endpoints = server_endpoints
self.with_serving = False
if len(self._server_endpoints) != 0:
self.with_serving = True
self._client_config = client_config
self._fetch_names = fetch_list
self._timeout = timeout
self._retry = max(1, retry)
self._input = None
self._outputs = []
self._server_use_profile = False
# only for multithread
self._for_init_op_lock = threading.Lock()
self._for_close_op_lock = threading.Lock()
self._succ_init_op = False
self._succ_close_op = False
def use_profiler(self, use_profile):
self._server_use_profile = use_profile
def _profiler_record(self, string):
if self._profiler is None:
return
self._profiler.record(string)
def init_client(self, client_type, client_config, server_endpoints,
fetch_names):
if self.with_serving == False:
_LOGGER.debug("{} no client".format(self.name))
return None
_LOGGER.debug("{} client_config: {}".format(self.name, client_config))
_LOGGER.debug("{} fetch_names: {}".format(self.name, fetch_names))
if client_type == 'brpc':
client = Client()
client.load_client_config(client_config)
elif client_type == 'grpc':
client = MultiLangClient()
else:
raise ValueError("unknow client type: {}".format(client_type))
client.connect(server_endpoints)
self._fetch_names = fetch_names
return client
def get_input_ops(self):
return self._input_ops
def set_input_ops(self, ops):
if not isinstance(ops, list):
ops = [] if ops is None else [ops]
self._input_ops = []
for op in ops:
if not isinstance(op, Op):
raise TypeError(
self._log('input op must be Op type, not {}'.format(
type(op))))
self._input_ops.append(op)
def add_input_channel(self, channel):
if not isinstance(channel, (ThreadChannel, ProcessChannel)):
raise TypeError(
self._log('input channel must be Channel type, not {}'.format(
type(channel))))
channel.add_consumer(self.name)
self._input = channel
def clean_input_channel(self):
self._input = None
def _get_input_channel(self):
return self._input
def add_output_channel(self, channel):
if not isinstance(channel, (ThreadChannel, ProcessChannel)):
raise TypeError(
self._log('output channel must be Channel type, not {}'.format(
type(channel))))
channel.add_producer(self.name)
self._outputs.append(channel)
def clean_output_channels(self):
self._outputs = []
def _get_output_channels(self):
return self._outputs
def preprocess(self, input_dicts):
# multiple previous Op
if len(input_dicts) != 1:
raise NotImplementedError(
'this Op has multiple previous inputs. Please override this func.'
)
(_, input_dict), = input_dicts.items()
return input_dict
def process(self, feed_dict):
err, err_info = ChannelData.check_npdata(feed_dict)
if err != 0:
raise NotImplementedError(
"{} Please override preprocess func.".format(err_info))
call_result = self.client.predict(
feed=feed_dict, fetch=self._fetch_names)
_LOGGER.debug(self._log("get call_result"))
return call_result
def postprocess(self, input_dict, fetch_dict):
return fetch_dict
def _parse_channeldata(self, channeldata_dict):
data_id, error_channeldata = None, None
client_need_profile, profile_set = False, set()
parsed_data = {}
key = list(channeldata_dict.keys())[0]
data_id = channeldata_dict[key].id
client_need_profile = channeldata_dict[key].client_need_profile
for name, data in channeldata_dict.items():
if data.ecode != ChannelDataEcode.OK.value:
error_channeldata = data
break
parsed_data[name] = data.parse()
if client_need_profile:
profile_set |= data.profile_data_set
return (data_id, error_channeldata, parsed_data, client_need_profile,
profile_set)
def _push_to_output_channels(self,
data,
channels,
name=None,
client_need_profile=False,
profile_set=None):
if name is None:
name = self.name
self._add_profile_into_channeldata(data, client_need_profile,
profile_set)
for channel in channels:
channel.push(data, name)
def _add_profile_into_channeldata(self, data, client_need_profile,
profile_set):
profile_str = self._profiler.gen_profile_str()
if self._server_use_profile:
sys.stderr.write(profile_str)
if client_need_profile and profile_set is not None:
profile_set.add(profile_str)
data.add_profile(profile_set)
def start_with_process(self, client_type):
proces = []
for concurrency_idx in range(self.concurrency):
p = multiprocessing.Process(
target=self._run,
args=(concurrency_idx, self._get_input_channel(),
self._get_output_channels(), client_type, False))
p.start()
proces.append(p)
return proces
def start_with_thread(self, client_type):
threads = []
for concurrency_idx in range(self.concurrency):
t = threading.Thread(
target=self._run,
args=(concurrency_idx, self._get_input_channel(),
self._get_output_channels(), client_type, True))
t.start()
threads.append(t)
return threads
def init_op(self):
pass
def _run_preprocess(self, parsed_data, data_id, log_func):
preped_data, error_channeldata = None, None
try:
preped_data = self.preprocess(parsed_data)
except NotImplementedError as e:
# preprocess function not implemented
error_info = log_func(e)
_LOGGER.error(error_info)
error_channeldata = ChannelData(
ecode=ChannelDataEcode.NOT_IMPLEMENTED.value,
error_info=error_info,
data_id=data_id)
except TypeError as e:
# Error type in channeldata.datatype
error_info = log_func(e)
_LOGGER.error(error_info)
error_channeldata = ChannelData(
ecode=ChannelDataEcode.TYPE_ERROR.value,
error_info=error_info,
data_id=data_id)
except Exception as e:
error_info = log_func(e)
_LOGGER.error(error_info)
error_channeldata = ChannelData(
ecode=ChannelDataEcode.UNKNOW.value,
error_info=error_info,
data_id=data_id)
return preped_data, error_channeldata
def _run_process(self, preped_data, data_id, log_func):
midped_data, error_channeldata = None, None
if self.with_serving:
ecode = ChannelDataEcode.OK.value
if self._timeout <= 0:
try:
midped_data = self.process(preped_data)
except Exception as e:
ecode = ChannelDataEcode.UNKNOW.value
error_info = log_func(e)
_LOGGER.error(error_info)
else:
for i in range(self._retry):
try:
midped_data = func_timeout.func_timeout(
self._timeout, self.process, args=(preped_data, ))
except func_timeout.FunctionTimedOut as e:
if i + 1 >= self._retry:
ecode = ChannelDataEcode.TIMEOUT.value
error_info = log_func(e)
_LOGGER.error(error_info)
else:
_LOGGER.warn(
log_func("timeout, retry({})".format(i + 1)))
except Exception as e:
ecode = ChannelDataEcode.UNKNOW.value
error_info = log_func(e)
_LOGGER.error(error_info)
break
else:
break
if ecode != ChannelDataEcode.OK.value:
error_channeldata = ChannelData(
ecode=ecode, error_info=error_info, data_id=data_id)
elif midped_data is None:
# op client return None
error_channeldata = ChannelData(
ecode=ChannelDataEcode.CLIENT_ERROR.value,
error_info=log_func(
"predict failed. pls check the server side."),
data_id=data_id)
else:
midped_data = preped_data
return midped_data, error_channeldata
def _run_postprocess(self, input_dict, midped_data, data_id, log_func):
output_data, error_channeldata = None, None
try:
postped_data = self.postprocess(input_dict, midped_data)
except Exception as e:
error_info = log_func(e)
_LOGGER.error(error_info)
error_channeldata = ChannelData(
ecode=ChannelDataEcode.UNKNOW.value,
error_info=error_info,
data_id=data_id)
return output_data, error_channeldata
if not isinstance(postped_data, dict):
error_info = log_func("output of postprocess funticon must be " \
"dict type, but get {}".format(type(postped_data)))
_LOGGER.error(error_info)
error_channeldata = ChannelData(
ecode=ChannelDataEcode.UNKNOW.value,
error_info=error_info,
data_id=data_id)
return output_data, error_channeldata
err, _ = ChannelData.check_npdata(postped_data)
if err == 0:
output_data = ChannelData(
ChannelDataType.CHANNEL_NPDATA.value,
npdata=postped_data,
data_id=data_id)
else:
output_data = ChannelData(
ChannelDataType.DICT.value,
dictdata=postped_data,
data_id=data_id)
return output_data, error_channeldata
def _run(self, concurrency_idx, input_channel, output_channels, client_type,
is_thread_op):
def get_log_func(op_info_prefix):
def log_func(info_str):
return "{} {}".format(op_info_prefix, info_str)
return log_func
op_info_prefix = "[{}|{}]".format(self.name, concurrency_idx)
log = get_log_func(op_info_prefix)
tid = threading.current_thread().ident
# init op
self.concurrency_idx = concurrency_idx
try:
if is_thread_op:
with self._for_init_op_lock:
if not self._succ_init_op:
# init profiler
self._profiler = TimeProfiler()
self._profiler.enable(True)
# init client
self.client = self.init_client(
client_type, self._client_config,
self._server_endpoints, self._fetch_names)
# user defined
self.init_op()
self._succ_init_op = True
self._succ_close_op = False
else:
# init profiler
self._profiler = TimeProfiler()
self._profiler.enable(True)
# init client
self.client = self.init_client(client_type, self._client_config,
self._server_endpoints,
self._fetch_names)
# user defined
self.init_op()
except Exception as e:
_LOGGER.error(log(e))
os._exit(-1)
while True:
#self._profiler_record("get#{}_0".format(op_info_prefix))
try:
channeldata_dict = input_channel.front(self.name)
except ChannelStopError:
_LOGGER.debug(log("stop."))
if is_thread_op:
with self._for_close_op_lock:
if not self._succ_close_op:
self._profiler = None
self.client = None
self._succ_init_op = False
self._succ_close_op = True
break
#self._profiler_record("get#{}_1".format(op_info_prefix))
_LOGGER.debug(log("input_data: {}".format(channeldata_dict)))
(data_id, error_channeldata, parsed_data, client_need_profile,
profile_set) = self._parse_channeldata(channeldata_dict)
# error data in predecessor Op
if error_channeldata is not None:
try:
# error_channeldata with profile info
self._push_to_output_channels(error_channeldata,
output_channels)
except ChannelStopError:
_LOGGER.debug(log("stop."))
break
continue
# preprecess
self._profiler_record("prep#{}_0".format(op_info_prefix))
preped_data, error_channeldata = self._run_preprocess(parsed_data,
data_id, log)
self._profiler_record("prep#{}_1".format(op_info_prefix))
if error_channeldata is not None:
try:
self._push_to_output_channels(
error_channeldata,
output_channels,
client_need_profile=client_need_profile,
profile_set=profile_set)
except ChannelStopError:
_LOGGER.debug(log("stop."))
break
continue
# process
self._profiler_record("midp#{}_0".format(op_info_prefix))
midped_data, error_channeldata = self._run_process(preped_data,
data_id, log)
self._profiler_record("midp#{}_1".format(op_info_prefix))
if error_channeldata is not None:
try:
self._push_to_output_channels(
error_channeldata,
output_channels,
client_need_profile=client_need_profile,
profile_set=profile_set)
except ChannelStopError:
_LOGGER.debug(log("stop."))
break
continue
# postprocess
self._profiler_record("postp#{}_0".format(op_info_prefix))
output_data, error_channeldata = self._run_postprocess(
parsed_data, midped_data, data_id, log)
self._profiler_record("postp#{}_1".format(op_info_prefix))
if error_channeldata is not None:
try:
self._push_to_output_channels(
error_channeldata,
output_channels,
client_need_profile=client_need_profile,
profile_set=profile_set)
except ChannelStopError:
_LOGGER.debug(log("stop."))
break
continue
# push data to channel (if run succ)
#self._profiler_record("push#{}_0".format(op_info_prefix))
try:
self._push_to_output_channels(
output_data,
output_channels,
client_need_profile=client_need_profile,
profile_set=profile_set)
except ChannelStopError:
_LOGGER.debug(log("stop."))
break
#self._profiler_record("push#{}_1".format(op_info_prefix))
def _log(self, info):
return "{} {}".format(self.name, info)
class RequestOp(Op):
""" RequestOp do not run preprocess, process, postprocess. """
def __init__(self):
# PipelineService.name = "@G"
super(RequestOp, self).__init__(name="@G", input_ops=[])
# init op
try:
self.init_op()
except Exception as e:
_LOGGER.error(e)
os._exit(-1)
def unpack_request_package(self, request):
dictdata = {}
for idx, key in enumerate(request.key):
data = request.value[idx]
try:
data = eval(data)
except Exception as e:
pass
dictdata[key] = data
return dictdata
class ResponseOp(Op):
""" ResponseOp do not run preprocess, process, postprocess. """
def __init__(self, input_ops):
super(ResponseOp, self).__init__(name="@R", input_ops=input_ops)
# init op
try:
self.init_op()
except Exception as e:
_LOGGER.error(e)
os._exit(-1)
def pack_response_package(self, channeldata):
resp = pipeline_service_pb2.Response()
resp.ecode = channeldata.ecode
if resp.ecode == ChannelDataEcode.OK.value:
if channeldata.datatype == ChannelDataType.CHANNEL_NPDATA.value:
feed = channeldata.parse()
# ndarray to string:
# https://stackoverflow.com/questions/30167538/convert-a-numpy-ndarray-to-stringor-bytes-and-convert-it-back-to-numpy-ndarray
np.set_printoptions(threshold=np.nan)
for name, var in feed.items():
resp.value.append(var.__repr__())
resp.key.append(name)
elif channeldata.datatype == ChannelDataType.DICT.value:
feed = channeldata.parse()
for name, var in feed.items():
if not isinstance(var, str):
resp.ecode = ChannelDataEcode.TYPE_ERROR.value
resp.error_info = self._log(
"fetch var type must be str({}).".format(
type(var)))
break
resp.value.append(var)
resp.key.append(name)
else:
resp.ecode = ChannelDataEcode.TYPE_ERROR.value
resp.error_info = self._log(
"Error type({}) in datatype.".format(channeldata.datatype))
_LOGGER.error(resp.error_info)
else:
resp.error_info = channeldata.error_info
return resp
class VirtualOp(Op):
''' For connecting two channels. '''
def __init__(self, name, concurrency=1):
super(VirtualOp, self).__init__(
name=name, input_ops=None, concurrency=concurrency)
self._virtual_pred_ops = []
def add_virtual_pred_op(self, op):
self._virtual_pred_ops.append(op)
def _actual_pred_op_names(self, op):
if not isinstance(op, VirtualOp):
return [op.name]
names = []
for x in op._virtual_pred_ops:
names.extend(self._actual_pred_op_names(x))
return names
def add_output_channel(self, channel):
if not isinstance(channel, (ThreadChannel, ProcessChannel)):
raise TypeError(
self._log('output channel must be Channel type, not {}'.format(
type(channel))))
for op in self._virtual_pred_ops:
for op_name in self._actual_pred_op_names(op):
channel.add_producer(op_name)
self._outputs.append(channel)
def _run(self, concurrency_idx, input_channel, output_channels, client_type,
is_thread_op):
def get_log_func(op_info_prefix):
def log_func(info_str):
return "{} {}".format(op_info_prefix, info_str)
return log_func
op_info_prefix = "[{}|{}]".format(self.name, concurrency_idx)
log = get_log_func(op_info_prefix)
tid = threading.current_thread().ident
while True:
try:
channeldata_dict = input_channel.front(self.name)
except ChannelStopError:
_LOGGER.debug(log("stop."))
break
try:
for name, data in channeldata_dict.items():
self._push_to_output_channels(
data, channels=output_channels, name=name)
except ChannelStopError:
_LOGGER.debug(log("stop."))
break
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import grpc
import sys
import numpy as np
from numpy import *
import logging
import functools
from .proto import pipeline_service_pb2
from .proto import pipeline_service_pb2_grpc
_LOGGER = logging.getLogger()
class PipelineClient(object):
def __init__(self):
self._channel = None
self._profile_key = "pipeline.profile"
self._profile_value = "1"
def connect(self, endpoints):
options = [('grpc.max_receive_message_length', 512 * 1024 * 1024),
('grpc.max_send_message_length', 512 * 1024 * 1024),
('grpc.lb_policy_name', 'round_robin')]
g_endpoint = 'ipv4:{}'.format(','.join(endpoints))
self._channel = grpc.insecure_channel(g_endpoint, options=options)
self._stub = pipeline_service_pb2_grpc.PipelineServiceStub(
self._channel)
def _pack_request_package(self, feed_dict, profile):
req = pipeline_service_pb2.Request()
for key, value in feed_dict.items():
req.key.append(key)
if isinstance(value, np.ndarray):
req.value.append(value.__repr__())
elif isinstance(value, str):
req.value.append(value)
elif isinstance(value, list):
req.value.append(np.array(value).__repr__())
else:
raise TypeError("only str and np.ndarray type is supported: {}".
format(type(value)))
if profile:
req.key.append(self._profile_key)
req.value.append(self._profile_value)
return req
def _unpack_response_package(self, resp, fetch):
if resp.ecode != 0:
return {"ecode": resp.ecode, "error_info": resp.error_info}
fetch_map = {"ecode": resp.ecode}
for idx, key in enumerate(resp.key):
if key == self._profile_key:
if resp.value[idx] != "":
sys.stderr.write(resp.value[idx])
continue
if fetch is not None and key not in fetch:
continue
data = resp.value[idx]
try:
data = eval(data)
except Exception as e:
pass
fetch_map[key] = data
return fetch_map
def predict(self, feed_dict, fetch=None, asyn=False, profile=False):
if not isinstance(feed_dict, dict):
raise TypeError(
"feed must be dict type with format: {name: value}.")
if fetch is not None and not isinstance(fetch, list):
raise TypeError("fetch must be list type with format: [name].")
req = self._pack_request_package(feed_dict, profile)
if not asyn:
resp = self._stub.inference(req)
return self._unpack_response_package(resp, fetch)
else:
call_future = self._stub.inference.future(req)
return PipelinePredictFuture(
call_future,
functools.partial(
self._unpack_response_package, fetch=fetch))
class PipelinePredictFuture(object):
def __init__(self, call_future, callback_func):
self.call_future_ = call_future
self.callback_func_ = callback_func
def result(self):
resp = self.call_future_.result()
return self.callback_func_(resp)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from concurrent import futures
import grpc
import logging
import socket
import contextlib
from contextlib import closing
import multiprocessing
import yaml
from .proto import pipeline_service_pb2_grpc
from .operator import ResponseOp
from .dag import DAGExecutor
_LOGGER = logging.getLogger()
class PipelineService(pipeline_service_pb2_grpc.PipelineServiceServicer):
def __init__(self, response_op, dag_config, show_info):
super(PipelineService, self).__init__()
# init dag executor
self._dag_executor = DAGExecutor(
response_op, dag_config, show_info=show_info)
self._dag_executor.start()
def inference(self, request, context):
resp = self._dag_executor.call(request)
return resp
def __del__(self):
self._dag_executor.stop()
@contextlib.contextmanager
def _reserve_port(port):
"""Find and reserve a port for all subprocesses to use."""
sock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
if sock.getsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT) == 0:
raise RuntimeError("Failed to set SO_REUSEPORT.")
sock.bind(('', port))
try:
yield sock.getsockname()[1]
finally:
sock.close()
class PipelineServer(object):
def __init__(self):
self._port = None
self._worker_num = None
self._response_op = None
def set_response_op(self, response_op):
if not isinstance(response_op, ResponseOp):
raise Exception("response_op must be ResponseOp type.")
if len(response_op.get_input_ops()) != 1:
raise Exception("response_op can only have one previous op.")
self._response_op = response_op
def _port_is_available(self, port):
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
sock.settimeout(2)
result = sock.connect_ex(('0.0.0.0', port))
return result != 0
def prepare_server(self, yml_file):
with open(yml_file) as f:
yml_config = yaml.load(f.read())
self._port = yml_config.get('port')
if self._port is None:
raise SystemExit("Please set *port* in [{}] yaml file.".format(
yml_file))
if not self._port_is_available(self._port):
raise SystemExit("Prot {} is already used".format(self._port))
self._worker_num = yml_config.get('worker_num', 1)
self._build_dag_each_worker = yml_config.get('build_dag_each_worker',
False)
_LOGGER.info("============= PIPELINE SERVER =============")
_LOGGER.info("port: {}".format(self._port))
_LOGGER.info("worker_num: {}".format(self._worker_num))
servicer_info = "build_dag_each_worker: {}".format(
self._build_dag_each_worker)
if self._build_dag_each_worker is True:
servicer_info += " (Make sure that install grpcio whl with --no-binary flag)"
_LOGGER.info(servicer_info)
_LOGGER.info("-------------------------------------------")
self._dag_config = yml_config.get("dag", {})
def run_server(self):
if self._build_dag_each_worker:
with _reserve_port(self._port) as port:
bind_address = 'localhost:{}'.format(port)
workers = []
for i in range(self._worker_num):
show_info = (i == 0)
worker = multiprocessing.Process(
target=self._run_server_func,
args=(bind_address, self._response_op,
self._dag_config))
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
else:
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=self._worker_num))
pipeline_service_pb2_grpc.add_PipelineServiceServicer_to_server(
PipelineService(self._response_op, self._dag_config, True),
server)
server.add_insecure_port('[::]:{}'.format(self._port))
server.start()
server.wait_for_termination()
def _run_server_func(self, bind_address, response_op, dag_config):
options = (('grpc.so_reuseport', 1), )
server = grpc.server(
futures.ThreadPoolExecutor(
max_workers=1, ), options=options)
pipeline_service_pb2_grpc.add_PipelineServiceServicer_to_server(
PipelineService(response_op, dag_config, False), server)
server.add_insecure_port(bind_address)
server.start()
server.wait_for_termination()
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import os
import sys
import logging
if sys.version_info.major == 2:
import Queue
elif sys.version_info.major == 3:
import queue as Queue
else:
raise Exception("Error Python version")
import time
import threading
_LOGGER = logging.getLogger()
class TimeProfiler(object):
def __init__(self):
self._pid = os.getpid()
self._print_head = 'PROFILE\tpid:{}\t'.format(self._pid)
self._time_record = Queue.Queue()
self._enable = False
self._lock = threading.Lock()
def enable(self, enable):
self._enable = enable
def record(self, name_with_tag):
if self._enable is False:
return
timestamp = int(round(time.time() * 1000000))
name_with_tag = name_with_tag.split("_")
tag = name_with_tag[-1]
name = '_'.join(name_with_tag[:-1])
with self._lock:
self._time_record.put((name, tag, timestamp))
def print_profile(self):
if self._enable is False:
return
sys.stderr.write(self.gen_profile_str())
def gen_profile_str(self):
if self._enable is False:
return
print_str = self._print_head
tmp = {}
with self._lock:
while not self._time_record.empty():
name, tag, timestamp = self._time_record.get()
if name in tmp:
ptag, ptimestamp = tmp.pop(name)
print_str += "{}_{}:{} ".format(name, ptag, ptimestamp)
print_str += "{}_{}:{} ".format(name, tag, timestamp)
else:
tmp[name] = (tag, timestamp)
print_str = "\n{}\n".format(print_str)
for name, item in tmp.items():
tag, timestamp = item
self._time_record.put((name, tag, timestamp))
return print_str
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
package baidu.paddle_serving.pipeline_serving;
message Request {
repeated string key = 1;
repeated string value = 2;
};
message Response {
repeated string key = 1;
repeated string value = 2;
required int32 ecode = 3;
optional string error_info = 4;
};
service PipelineService {
rpc inference(Request) returns (Response) {}
};
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2015 gRPC authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Runs protoc with the gRPC plugin to generate messages and gRPC stubs."""
from grpc_tools import protoc
protoc.main((
'',
'-I.',
'--python_out=.',
'--grpc_python_out=.',
'pipeline_service.proto', ))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
class NameGenerator(object):
def __init__(self, prefix):
self._idx = -1
self._prefix = prefix
def next(self):
self._idx += 1
return "{}{}".format(self._prefix, self._idx)
numpy>=1.12, <=1.16.4 ; python_version<"3.5" numpy>=1.12, <=1.16.4 ; python_version<"3.5"
google>=2.0.3
protobuf>=3.12.2
grpcio-tools>=1.28.1
grpcio>=1.28.1
func-timeout>=4.3.5
pyyaml>=1.3.0
sentencepiece==0.1.92
flask>=1.1.2
ujson>=2.0.3
...@@ -42,7 +42,8 @@ if '${PACK}' == 'ON': ...@@ -42,7 +42,8 @@ if '${PACK}' == 'ON':
REQUIRED_PACKAGES = [ REQUIRED_PACKAGES = [
'six >= 1.10.0', 'sentencepiece', 'opencv-python', 'pillow' 'six >= 1.10.0', 'sentencepiece', 'opencv-python', 'pillow',
'shapely', 'pyclipper'
] ]
packages=['paddle_serving_app', packages=['paddle_serving_app',
......
...@@ -58,17 +58,21 @@ if '${PACK}' == 'ON': ...@@ -58,17 +58,21 @@ if '${PACK}' == 'ON':
REQUIRED_PACKAGES = [ REQUIRED_PACKAGES = [
'six >= 1.10.0', 'protobuf >= 3.1.0', 'numpy >= 1.12' 'six >= 1.10.0', 'protobuf >= 3.11.0', 'numpy >= 1.12', 'grpcio >= 1.28.1',
'grpcio-tools >= 1.28.1'
] ]
if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"): if not find_package("paddlepaddle") and not find_package("paddlepaddle-gpu"):
REQUIRED_PACKAGES.append("paddlepaddle") REQUIRED_PACKAGES.append("paddlepaddle")
packages=['paddle_serving_client', packages=['paddle_serving_client',
'paddle_serving_client.proto', 'paddle_serving_client.proto',
'paddle_serving_client.io', 'paddle_serving_client.io',
'paddle_serving_client.metric', 'paddle_serving_client.metric',
'paddle_serving_client.utils',] 'paddle_serving_client.utils',
'paddle_serving_client.pipeline',
'paddle_serving_client.pipeline.proto']
package_data={'paddle_serving_client': ['serving_client.so','lib/*'],} package_data={'paddle_serving_client': ['serving_client.so','lib/*'],}
package_dir={'paddle_serving_client': package_dir={'paddle_serving_client':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client', '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client',
...@@ -76,10 +80,14 @@ package_dir={'paddle_serving_client': ...@@ -76,10 +80,14 @@ package_dir={'paddle_serving_client':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto', '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/proto',
'paddle_serving_client.io': 'paddle_serving_client.io':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/io', '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/io',
'paddle_serving_client.metric': 'paddle_serving_client.metric':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/metric', '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/metric',
'paddle_serving_client.utils': 'paddle_serving_client.utils':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/utils',} '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/utils',
'paddle_serving_client.pipeline':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/pipeline',
'paddle_serving_client.pipeline.proto':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/pipeline/proto'}
setup( setup(
name='paddle-serving-client', name='paddle-serving-client',
......
...@@ -37,17 +37,23 @@ def python_version(): ...@@ -37,17 +37,23 @@ def python_version():
max_version, mid_version, min_version = python_version() max_version, mid_version, min_version = python_version()
REQUIRED_PACKAGES = [ REQUIRED_PACKAGES = [
'six >= 1.10.0', 'protobuf >= 3.1.0', 'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio >= 1.28.1', 'grpcio-tools >= 1.28.1',
'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app' 'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app'
] ]
packages=['paddle_serving_server', packages=['paddle_serving_server',
'paddle_serving_server.proto'] 'paddle_serving_server.proto',
'paddle_serving_server.pipeline',
'paddle_serving_server.pipeline.proto']
package_dir={'paddle_serving_server': package_dir={'paddle_serving_server':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server', '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server',
'paddle_serving_server.proto': 'paddle_serving_server.proto':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto'} '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/proto',
'paddle_serving_server.pipeline':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/pipeline',
'paddle_serving_server.pipeline.proto':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server/pipeline/proto'}
setup( setup(
name='paddle-serving-server', name='paddle-serving-server',
......
...@@ -37,22 +37,27 @@ def python_version(): ...@@ -37,22 +37,27 @@ def python_version():
max_version, mid_version, min_version = python_version() max_version, mid_version, min_version = python_version()
REQUIRED_PACKAGES = [ REQUIRED_PACKAGES = [
'six >= 1.10.0', 'protobuf >= 3.1.0', 'six >= 1.10.0', 'protobuf >= 3.11.0', 'grpcio >= 1.28.1', 'grpcio-tools >= 1.28.1',
'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app' 'paddle_serving_client', 'flask >= 1.1.1', 'paddle_serving_app'
] ]
packages=['paddle_serving_server_gpu', packages=['paddle_serving_server_gpu',
'paddle_serving_server_gpu.proto'] 'paddle_serving_server_gpu.proto',
'paddle_serving_server_gpu.pipeline',
'paddle_serving_server_gpu.pipeline.proto']
package_dir={'paddle_serving_server_gpu': package_dir={'paddle_serving_server_gpu':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu', '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu',
'paddle_serving_server_gpu.proto': 'paddle_serving_server_gpu.proto':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto'} '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/proto',
'paddle_serving_server_gpu.pipeline':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/pipeline',
'paddle_serving_server_gpu.pipeline.proto':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_server_gpu/pipeline/proto'}
setup( setup(
name='paddle-serving-server-gpu', name='paddle-serving-server-gpu',
version=serving_server_version.replace('-', ''), version=serving_server_version.replace('-', '') + '.post@CUDA_VERSION_MAJOR@',
description= description=
('Paddle Serving Package for saved model with PaddlePaddle'), ('Paddle Serving Package for saved model with PaddlePaddle'),
url='https://github.com/PaddlePaddle/Serving', url='https://github.com/PaddlePaddle/Serving',
......
...@@ -2,13 +2,14 @@ FROM centos:7.3.1611 ...@@ -2,13 +2,14 @@ FROM centos:7.3.1611
RUN yum -y install wget && \ RUN yum -y install wget && \
yum -y install epel-release && yum -y install patchelf && \ yum -y install epel-release && yum -y install patchelf && \
yum -y install gcc make python-devel && \ yum -y install gcc gcc-c++ make python-devel && \
yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false && \ yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false && \ yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false && \ yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install python3 python3-devel && \ yum -y install python3 python3-devel && \
yum clean all && \ yum clean all
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python get-pip.py && rm get-pip.py && \ python get-pip.py && rm get-pip.py && \
localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \ localedef -c -i en_US -f UTF-8 en_US.UTF-8 && \
echo "export LANG=en_US.utf8" >> /root/.bashrc echo "export LANG=en_US.utf8" >> /root/.bashrc
FROM centos:7.3.1611 FROM centos:7.3.1611
RUN yum -y install wget >/dev/null \ RUN yum -y install wget >/dev/null \
&& yum -y install gcc gcc-c++ make glibc-static which >/dev/null \ && yum -y install gcc gcc-c++ make glibc-static which >/dev/null \
&& yum -y install git openssl-devel curl-devel bzip2-devel python-devel >/dev/null \ && yum -y install git openssl-devel curl-devel bzip2-devel python-devel >/dev/null \
&& yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false \ && yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false \
&& yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false \ && yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false \
&& yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false \ && yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false
&& wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz >/dev/null \
RUN wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz >/dev/null \
&& tar xzf cmake-3.2.0-Linux-x86_64.tar.gz \ && tar xzf cmake-3.2.0-Linux-x86_64.tar.gz \
&& mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 \ && mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 \
&& echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc \ && echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc \
&& rm cmake-3.2.0-Linux-x86_64.tar.gz \ && rm cmake-3.2.0-Linux-x86_64.tar.gz
&& wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
RUN wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
&& tar xzf go1.14.linux-amd64.tar.gz \ && tar xzf go1.14.linux-amd64.tar.gz \
&& mv go /usr/local/go \ && mv go /usr/local/go \
&& echo 'export GOROOT=/usr/local/go' >> /root/.bashrc \ && echo 'export GOROOT=/usr/local/go' >> /root/.bashrc \
&& echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc \ && echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc \
&& rm go1.14.linux-amd64.tar.gz \ && rm go1.14.linux-amd64.tar.gz
&& yum -y install python-devel sqlite-devel >/dev/null \
RUN yum -y install python-devel sqlite-devel >/dev/null \
&& curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \ && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
&& python get-pip.py >/dev/null \ && python get-pip.py >/dev/null \
&& pip install google protobuf setuptools wheel flask >/dev/null \ && pip install google protobuf setuptools wheel flask >/dev/null \
&& rm get-pip.py \ && rm get-pip.py
&& wget http://nixos.org/releases/patchelf/patchelf-0.10/patchelf-0.10.tar.bz2 \
RUN wget http://nixos.org/releases/patchelf/patchelf-0.10/patchelf-0.10.tar.bz2 \
&& yum -y install bzip2 >/dev/null \ && yum -y install bzip2 >/dev/null \
&& tar -jxf patchelf-0.10.tar.bz2 \ && tar -jxf patchelf-0.10.tar.bz2 \
&& cd patchelf-0.10 \ && cd patchelf-0.10 \
&& ./configure --prefix=/usr \ && ./configure --prefix=/usr \
&& make >/dev/null && make install >/dev/null \ && make >/dev/null && make install >/dev/null \
&& cd .. \ && cd .. \
&& rm -rf patchelf-0.10* \ && rm -rf patchelf-0.10*
&& yum install -y python3 python3-devel \
&& pip3 install google protobuf setuptools wheel flask \ RUN yum install -y python3 python3-devel \
&& yum -y update >/dev/null \ && pip3 install google protobuf setuptools wheel flask
RUN yum -y update >/dev/null \
&& yum -y install dnf >/dev/null \ && yum -y install dnf >/dev/null \
&& yum -y install dnf-plugins-core >/dev/null \ && yum -y install dnf-plugins-core >/dev/null \
&& dnf copr enable alonid/llvm-3.8.0 -y \ && dnf copr enable alonid/llvm-3.8.0 -y \
&& dnf install llvm-3.8.0 clang-3.8.0 compiler-rt-3.8.0 -y \ && dnf install llvm-3.8.0 clang-3.8.0 compiler-rt-3.8.0 -y \
&& echo 'export PATH=/opt/llvm-3.8.0/bin:$PATH' >> /root/.bashrc && echo 'export PATH=/opt/llvm-3.8.0/bin:$PATH' >> /root/.bashrc
RUN yum install -y java \
&& wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo \
&& yum install -y apache-maven
RUN yum install -y lsof
FROM nvidia/cuda:10.0-cudnn7-devel-centos7 as builder
FROM nvidia/cuda:10.0-cudnn7-runtime-centos7
RUN yum -y install wget && \
yum -y install epel-release && yum -y install patchelf && \
yum -y install gcc gcc-c++ make python-devel && \
yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install python3 python3-devel && \
yum clean all
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python get-pip.py && rm get-pip.py
RUN ln -s /usr/local/cuda-10.0/lib64/libcublas.so.10.0 /usr/local/cuda-10.0/lib64/libcublas.so && \
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> /root/.bashrc && \
ln -s /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn.so && \
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
echo "export LANG=en_US.utf8" >> /root/.bashrc && \
mkdir -p /usr/local/cuda/extras
COPY --from=builder /usr/local/cuda/extras/CUPTI /usr/local/cuda/extras/CUPTI
FROM nvidia/cuda:10.0-cudnn7-devel-centos7
RUN yum -y install wget >/dev/null \
&& yum -y install gcc gcc-c++ make glibc-static which \
&& yum -y install git openssl-devel curl-devel bzip2-devel python-devel \
&& yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false \
&& yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false \
&& yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false
RUN wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz >/dev/null \
&& tar xzf cmake-3.2.0-Linux-x86_64.tar.gz \
&& mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 \
&& echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc \
&& rm cmake-3.2.0-Linux-x86_64.tar.gz
RUN wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
&& tar xzf go1.14.linux-amd64.tar.gz \
&& mv go /usr/local/go \
&& echo 'export GOROOT=/usr/local/go' >> /root/.bashrc \
&& echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc \
&& rm go1.14.linux-amd64.tar.gz
RUN yum -y install python-devel sqlite-devel \
&& curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
&& python get-pip.py >/dev/null \
&& pip install google protobuf setuptools wheel flask >/dev/null \
&& rm get-pip.py
RUN yum install -y python3 python3-devel \
&& pip3 install google protobuf setuptools wheel flask \
&& yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
&& yum clean all
RUN localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
&& echo "export LANG=en_US.utf8" >> /root/.bashrc
...@@ -3,15 +3,17 @@ FROM nvidia/cuda:9.0-cudnn7-devel-centos7 as builder ...@@ -3,15 +3,17 @@ FROM nvidia/cuda:9.0-cudnn7-devel-centos7 as builder
FROM nvidia/cuda:9.0-cudnn7-runtime-centos7 FROM nvidia/cuda:9.0-cudnn7-runtime-centos7
RUN yum -y install wget && \ RUN yum -y install wget && \
yum -y install epel-release && yum -y install patchelf && \ yum -y install epel-release && yum -y install patchelf && \
yum -y install gcc make python-devel && \ yum -y install gcc gcc-c++ make python-devel && \
yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false && \ yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false && \ yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false && \ yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false && \
yum -y install python3 python3-devel && \ yum -y install python3 python3-devel && \
yum clean all && \ yum clean all
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python get-pip.py && rm get-pip.py && \ RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
ln -s /usr/local/cuda-9.0/lib64/libcublas.so.9.0 /usr/local/cuda-9.0/lib64/libcublas.so && \ python get-pip.py && rm get-pip.py
RUN ln -s /usr/local/cuda-9.0/lib64/libcublas.so.9.0 /usr/local/cuda-9.0/lib64/libcublas.so && \
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> /root/.bashrc && \ echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> /root/.bashrc && \
ln -s /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so && \ ln -s /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so && \
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \ echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/targets/x86_64-linux/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
......
FROM nvidia/cuda:9.0-cudnn7-devel-centos7 FROM nvidia/cuda:9.0-cudnn7-devel-centos7
RUN yum -y install wget >/dev/null \ RUN yum -y install wget >/dev/null \
&& yum -y install gcc gcc-c++ make glibc-static which >/dev/null \ && yum -y install gcc gcc-c++ make glibc-static which \
&& yum -y install git openssl-devel curl-devel bzip2-devel python-devel >/dev/null \ && yum -y install git openssl-devel curl-devel bzip2-devel python-devel \
&& wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz >/dev/null \ && yum -y install libSM-1.2.2-2.el7.x86_64 --setopt=protected_multilib=false \
&& yum -y install libXrender-0.9.10-1.el7.x86_64 --setopt=protected_multilib=false \
&& yum -y install libXext-1.3.3-3.el7.x86_64 --setopt=protected_multilib=false
RUN wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz >/dev/null \
&& tar xzf cmake-3.2.0-Linux-x86_64.tar.gz \ && tar xzf cmake-3.2.0-Linux-x86_64.tar.gz \
&& mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 \ && mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 \
&& echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc \ && echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc \
&& rm cmake-3.2.0-Linux-x86_64.tar.gz \ && rm cmake-3.2.0-Linux-x86_64.tar.gz
&& wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
RUN wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
&& tar xzf go1.14.linux-amd64.tar.gz \ && tar xzf go1.14.linux-amd64.tar.gz \
&& mv go /usr/local/go \ && mv go /usr/local/go \
&& echo 'export GOROOT=/usr/local/go' >> /root/.bashrc \ && echo 'export GOROOT=/usr/local/go' >> /root/.bashrc \
&& echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc \ && echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc \
&& rm go1.14.linux-amd64.tar.gz \ && rm go1.14.linux-amd64.tar.gz
&& yum -y install python-devel sqlite-devel >/dev/null \
RUN yum -y install python-devel sqlite-devel \
&& curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \ && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
&& python get-pip.py >/dev/null \ && python get-pip.py >/dev/null \
&& pip install google protobuf setuptools wheel flask >/dev/null \ && pip install google protobuf setuptools wheel flask >/dev/null \
&& rm get-pip.py \ && rm get-pip.py
&& yum install -y python3 python3-devel \
RUN yum install -y python3 python3-devel \
&& pip3 install google protobuf setuptools wheel flask \ && pip3 install google protobuf setuptools wheel flask \
&& yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\ && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
&& yum clean all \ && yum clean all
RUN localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
&& echo "export LANG=en_US.utf8" >> /root/.bashrc && echo "export LANG=en_US.utf8" >> /root/.bashrc
FROM centos:7.3.1611 FROM centos:7.3.1611
RUN yum -y install wget >/dev/null \ RUN yum -y install wget \
&& yum -y install gcc gcc-c++ make glibc-static which >/dev/null \ && yum -y install gcc gcc-c++ make glibc-static which \
&& yum -y install git openssl-devel curl-devel bzip2-devel python-devel >/dev/null \ && yum -y install git openssl-devel curl-devel bzip2-devel python-devel
&& wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz >/dev/null \
RUN wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz >/dev/null \
&& tar xzf cmake-3.2.0-Linux-x86_64.tar.gz \ && tar xzf cmake-3.2.0-Linux-x86_64.tar.gz \
&& mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 \ && mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 \
&& echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc \ && echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc \
&& rm cmake-3.2.0-Linux-x86_64.tar.gz \ && rm cmake-3.2.0-Linux-x86_64.tar.gz
&& wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
RUN wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz >/dev/null \
&& tar xzf go1.14.linux-amd64.tar.gz \ && tar xzf go1.14.linux-amd64.tar.gz \
&& mv go /usr/local/go \ && mv go /usr/local/go \
&& echo 'export GOROOT=/usr/local/go' >> /root/.bashrc \ && echo 'export GOROOT=/usr/local/go' >> /root/.bashrc \
&& echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc \ && echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc \
&& rm go1.14.linux-amd64.tar.gz \ && rm go1.14.linux-amd64.tar.gz
&& yum -y install python-devel sqlite-devel >/dev/null \
RUN yum -y install python-devel sqlite-devel \
&& curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \ && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py >/dev/null \
&& python get-pip.py >/dev/null \ && python get-pip.py >/dev/null \
&& pip install google protobuf setuptools wheel flask >/dev/null \ && pip install google protobuf setuptools wheel flask >/dev/null \
&& rm get-pip.py \ && rm get-pip.py
&& yum install -y python3 python3-devel \
RUN yum install -y python3 python3-devel \
&& pip3 install google protobuf setuptools wheel flask \ && pip3 install google protobuf setuptools wheel flask \
&& yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\ && yum -y install epel-release && yum -y install patchelf libXext libSM libXrender\
&& yum clean all \ && yum clean all
&& localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
RUN localedef -c -i en_US -f UTF-8 en_US.UTF-8 \
&& echo "export LANG=en_US.utf8" >> /root/.bashrc && echo "export LANG=en_US.utf8" >> /root/.bashrc
...@@ -15,6 +15,6 @@ ...@@ -15,6 +15,6 @@
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
import re import re
with open("setup.cfg", "w") as f: with open("setup.cfg", "w") as f:
line = "[bdist_wheel]\npython-tag={0}{1}\nplat-name=manylinux1_x86_64".format( line = "[bdist_wheel]\npython-tag={0}{1}".format(get_abbr_impl(),
get_abbr_impl(), get_impl_ver()) get_impl_ver())
f.write(line) f.write(line)
...@@ -54,14 +54,13 @@ function build_app() { ...@@ -54,14 +54,13 @@ function build_app() {
local DIRNAME=build-app-$TYPE local DIRNAME=build-app-$TYPE
mkdir $DIRNAME # pwd: /Serving mkdir $DIRNAME # pwd: /Serving
cd $DIRNAME # pwd: /Serving/build-app-$TYPE cd $DIRNAME # pwd: /Serving/build-app-$TYPE
pip install numpy sentencepiece
case $TYPE in case $TYPE in
CPU|GPU) CPU|GPU)
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \ cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \ -DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \ -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DAPP=ON .. -DAPP=ON ..
rerun "make -j2 >/dev/null" 3 # due to some network reasons, compilation may fail rerun "make -j10 >/dev/null" 3 # due to some network reasons, compilation may fail
pip install -U python/dist/paddle_serving_app* >/dev/null pip install -U python/dist/paddle_serving_app* >/dev/null
;; ;;
*) *)
...@@ -84,7 +83,7 @@ function build_client() { ...@@ -84,7 +83,7 @@ function build_client() {
-DPYTHON_LIBRARIES=$PYTHONROOT/lib64/libpython2.7.so \ -DPYTHON_LIBRARIES=$PYTHONROOT/lib64/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \ -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCLIENT=ON .. -DCLIENT=ON ..
rerun "make -j2 >/dev/null" 3 # due to some network reasons, compilation may fail rerun "make -j10 >/dev/null" 3 # due to some network reasons, compilation may fail
pip install -U python/dist/paddle_serving_client* >/dev/null pip install -U python/dist/paddle_serving_client* >/dev/null
;; ;;
*) *)
...@@ -108,7 +107,7 @@ function build_server() { ...@@ -108,7 +107,7 @@ function build_server() {
-DPYTHON_LIBRARIES=$PYTHONROOT/lib64/libpython2.7.so \ -DPYTHON_LIBRARIES=$PYTHONROOT/lib64/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \ -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DSERVER=ON .. -DSERVER=ON ..
rerun "make -j2 >/dev/null" 3 # due to some network reasons, compilation may fail rerun "make -j10 >/dev/null" 3 # due to some network reasons, compilation may fail
check_cmd "make install -j2 >/dev/null" check_cmd "make install -j2 >/dev/null"
pip install -U python/dist/paddle_serving_server* >/dev/null pip install -U python/dist/paddle_serving_server* >/dev/null
;; ;;
...@@ -118,7 +117,7 @@ function build_server() { ...@@ -118,7 +117,7 @@ function build_server() {
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \ -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DSERVER=ON \ -DSERVER=ON \
-DWITH_GPU=ON .. -DWITH_GPU=ON ..
rerun "make -j2 >/dev/null" 3 # due to some network reasons, compilation may fail rerun "make -j10 >/dev/null" 3 # due to some network reasons, compilation may fail
check_cmd "make install -j2 >/dev/null" check_cmd "make install -j2 >/dev/null"
pip install -U python/dist/paddle_serving_server* >/dev/null pip install -U python/dist/paddle_serving_server* >/dev/null
;; ;;
...@@ -134,6 +133,16 @@ function build_server() { ...@@ -134,6 +133,16 @@ function build_server() {
function kill_server_process() { function kill_server_process() {
ps -ef | grep "serving" | grep -v serving_build | grep -v grep | awk '{print $2}' | xargs kill ps -ef | grep "serving" | grep -v serving_build | grep -v grep | awk '{print $2}' | xargs kill
sleep 1
}
function kill_process_by_port() {
if [ $# != 1 ]; then
echo "usage: kill_process_by_port <PID>"
exit 1
fi
local PID=$1
lsof -i:$PID | awk 'NR == 1 {next} {print $2}' | xargs kill
} }
function python_test_fit_a_line() { function python_test_fit_a_line() {
...@@ -181,26 +190,26 @@ function python_test_fit_a_line() { ...@@ -181,26 +190,26 @@ function python_test_fit_a_line() {
kill_server_process kill_server_process
# test web # test web
unsetproxy # maybe the proxy is used on iPipe, which makes web-test failed. #unsetproxy # maybe the proxy is used on iPipe, which makes web-test failed.
check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9393 --thread 2 --gpu_ids 0 --name uci > /dev/null &" #check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9393 --thread 2 --gpu_ids 0 --name uci > /dev/null &"
sleep 5 # wait for the server to start #sleep 5 # wait for the server to start
check_cmd "curl -H \"Content-Type:application/json\" -X POST -d '{\"feed\":[{\"x\": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], \"fetch\":[\"price\"]}' http://127.0.0.1:9393/uci/prediction" #check_cmd "curl -H \"Content-Type:application/json\" -X POST -d '{\"feed\":[{\"x\": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], \"fetch\":[\"price\"]}' http://127.0.0.1:9393/uci/prediction"
# check http code # check http code
http_code=`curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' -s -w "%{http_code}" -o /dev/null http://127.0.0.1:9393/uci/prediction` #http_code=`curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' -s -w "%{http_code}" -o /dev/null http://127.0.0.1:9393/uci/prediction`
if [ ${http_code} -ne 200 ]; then #if [ ${http_code} -ne 200 ]; then
echo "HTTP status code -ne 200" # echo "HTTP status code -ne 200"
exit 1 # exit 1
fi #fi
# test web batch # test web batch
check_cmd "curl -H \"Content-Type:application/json\" -X POST -d '{\"feed\":[{\"x\": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, {\"x\": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], \"fetch\":[\"price\"]}' http://127.0.0.1:9393/uci/prediction" #check_cmd "curl -H \"Content-Type:application/json\" -X POST -d '{\"feed\":[{\"x\": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, {\"x\": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], \"fetch\":[\"price\"]}' http://127.0.0.1:9393/uci/prediction"
# check http code # check http code
http_code=`curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, {"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' -s -w "%{http_code}" -o /dev/null http://127.0.0.1:9393/uci/prediction` #http_code=`curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, {"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' -s -w "%{http_code}" -o /dev/null http://127.0.0.1:9393/uci/prediction`
if [ ${http_code} -ne 200 ]; then #if [ ${http_code} -ne 200 ]; then
echo "HTTP status code -ne 200" # echo "HTTP status code -ne 200"
exit 1 # exit 1
fi #fi
setproxy # recover proxy state #setproxy # recover proxy state
kill_server_process #kill_server_process
;; ;;
*) *)
echo "error type" echo "error type"
...@@ -228,10 +237,7 @@ function python_run_criteo_ctr_with_cube() { ...@@ -228,10 +237,7 @@ function python_run_criteo_ctr_with_cube() {
check_cmd "mv models/data ./cube/" check_cmd "mv models/data ./cube/"
check_cmd "mv models/ut_data ./" check_cmd "mv models/ut_data ./"
cp ../../../build-server-$TYPE/output/bin/cube* ./cube/ cp ../../../build-server-$TYPE/output/bin/cube* ./cube/
mkdir -p $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server/serving-cpu-avx-openblas-0.1.3/
yes | cp ../../../build-server-$TYPE/output/demo/serving/bin/serving $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server/serving-cpu-avx-openblas-0.1.3/
sh cube_prepare.sh & sh cube_prepare.sh &
check_cmd "mkdir work_dir1 && cp cube/conf/cube.conf ./work_dir1/"
python test_server.py ctr_serving_model_kv & python test_server.py ctr_serving_model_kv &
sleep 5 sleep 5
check_cmd "python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data >score" check_cmd "python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data >score"
...@@ -246,6 +252,7 @@ function python_run_criteo_ctr_with_cube() { ...@@ -246,6 +252,7 @@ function python_run_criteo_ctr_with_cube() {
echo "criteo_ctr_with_cube inference auc test success" echo "criteo_ctr_with_cube inference auc test success"
kill_server_process kill_server_process
ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill
sleep 1
;; ;;
GPU) GPU)
check_cmd "wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz" check_cmd "wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz"
...@@ -255,12 +262,11 @@ function python_run_criteo_ctr_with_cube() { ...@@ -255,12 +262,11 @@ function python_run_criteo_ctr_with_cube() {
check_cmd "mv models/data ./cube/" check_cmd "mv models/data ./cube/"
check_cmd "mv models/ut_data ./" check_cmd "mv models/ut_data ./"
cp ../../../build-server-$TYPE/output/bin/cube* ./cube/ cp ../../../build-server-$TYPE/output/bin/cube* ./cube/
mkdir -p $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server_gpu/serving-gpu-0.1.3/
yes | cp ../../../build-server-$TYPE/output/demo/serving/bin/serving $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server_gpu/serving-gpu-0.1.3/
sh cube_prepare.sh & sh cube_prepare.sh &
check_cmd "mkdir work_dir1 && cp cube/conf/cube.conf ./work_dir1/"
python test_server_gpu.py ctr_serving_model_kv & python test_server_gpu.py ctr_serving_model_kv &
sleep 5 sleep 5
# for warm up
python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data > /dev/null || true
check_cmd "python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data >score" check_cmd "python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data >score"
tail -n 2 score | awk 'NR==1' tail -n 2 score | awk 'NR==1'
AUC=$(tail -n 2 score | awk 'NR==1') AUC=$(tail -n 2 score | awk 'NR==1')
...@@ -273,6 +279,7 @@ function python_run_criteo_ctr_with_cube() { ...@@ -273,6 +279,7 @@ function python_run_criteo_ctr_with_cube() {
echo "criteo_ctr_with_cube inference auc test success" echo "criteo_ctr_with_cube inference auc test success"
kill_server_process kill_server_process
ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill
sleep 1
;; ;;
*) *)
echo "error type" echo "error type"
...@@ -287,8 +294,6 @@ function python_run_criteo_ctr_with_cube() { ...@@ -287,8 +294,6 @@ function python_run_criteo_ctr_with_cube() {
function python_test_bert() { function python_test_bert() {
# pwd: /Serving/python/examples # pwd: /Serving/python/examples
local TYPE=$1 local TYPE=$1
yum install -y libXext libSM libXrender >/dev/null
pip install ujson
export SERVING_BIN=${SERVING_WORKDIR}/build-server-${TYPE}/core/general-server/serving export SERVING_BIN=${SERVING_WORKDIR}/build-server-${TYPE}/core/general-server/serving
cd bert # pwd: /Serving/python/examples/bert cd bert # pwd: /Serving/python/examples/bert
case $TYPE in case $TYPE in
...@@ -484,6 +489,7 @@ function python_test_lac() { ...@@ -484,6 +489,7 @@ function python_test_lac() {
setproxy # recover proxy state setproxy # recover proxy state
kill_server_process kill_server_process
ps -ef | grep "lac_web_service" | grep -v grep | awk '{print $2}' | xargs kill ps -ef | grep "lac_web_service" | grep -v grep | awk '{print $2}' | xargs kill
sleep 1
echo "lac CPU HTTP inference pass" echo "lac CPU HTTP inference pass"
;; ;;
GPU) GPU)
...@@ -499,6 +505,403 @@ function python_test_lac() { ...@@ -499,6 +505,403 @@ function python_test_lac() {
cd .. cd ..
} }
function java_run_test() {
# pwd: /Serving
local TYPE=$1
export SERVING_BIN=${SERVING_WORKDIR}/build-server-${TYPE}/core/general-server/serving
unsetproxy
case $TYPE in
CPU)
# compile java sdk
cd java # pwd: /Serving/java
mvn compile > /dev/null
mvn install > /dev/null
# compile java sdk example
cd examples # pwd: /Serving/java/examples
mvn compile > /dev/null
mvn install > /dev/null
# fit_a_line (general, asyn_predict, batch_predict)
cd ../../python/examples/grpc_impl_example/fit_a_line # pwd: /Serving/python/examples/grpc_impl_example/fit_a_line
sh get_data.sh
check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9393 --thread 4 --use_multilang > /dev/null &"
sleep 5 # wait for the server to start
cd ../../../java/examples # /Serving/java/examples
java -cp target/paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample fit_a_line
java -cp target/paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample asyn_predict
java -cp target/paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample batch_predict
kill_server_process
# imdb (model_ensemble)
cd ../../python/examples/grpc_impl_example/imdb # pwd: /Serving/python/examples/grpc_impl_example/imdb
sh get_data.sh > /dev/null
check_cmd "python test_multilang_ensemble_server.py > /dev/null &"
sleep 5 # wait for the server to start
cd ../../../java/examples # /Serving/java/examples
java -cp target/paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample model_ensemble
kill_server_process
# yolov4 (int32)
cd ../../python/examples/grpc_impl_example/yolov4 # pwd: /Serving/python/examples/grpc_impl_example/yolov4
python -m paddle_serving_app.package --get_model yolov4 > /dev/null
tar -xzf yolov4.tar.gz > /dev/null
check_cmd "python -m paddle_serving_server.serve --model yolov4_model --port 9393 --use_multilang --mem_optim > /dev/null &"
cd ../../../java/examples # /Serving/java/examples
java -cp target/paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PaddleServingClientExample yolov4 src/main/resources/000000570688.jpg
kill_server_process
cd ../../ # pwd: /Serving
;;
GPU)
;;
*)
echo "error type"
exit 1
;;
esac
echo "java-sdk $TYPE part finished as expected."
setproxy
unset SERVING_BIN
}
function python_test_grpc_impl() {
# pwd: /Serving/python/examples
cd grpc_impl_example # pwd: /Serving/python/examples/grpc_impl_example
local TYPE=$1
export SERVING_BIN=${SERVING_WORKDIR}/build-server-${TYPE}/core/general-server/serving
unsetproxy
case $TYPE in
CPU)
# test general case
cd fit_a_line # pwd: /Serving/python/examples/grpc_impl_example/fit_a_line
sh get_data.sh
# one line command start
check_cmd "python -m paddle_serving_server.serve --model uci_housing_model --port 9393 --thread 4 --use_multilang > /dev/null &"
sleep 5 # wait for the server to start
check_cmd "python test_sync_client.py > /dev/null"
check_cmd "python test_asyn_client.py > /dev/null"
check_cmd "python test_general_pb_client.py > /dev/null"
check_cmd "python test_numpy_input_client.py > /dev/null"
check_cmd "python test_batch_client.py > /dev/null"
check_cmd "python test_timeout_client.py > /dev/null"
kill_server_process
kill_process_by_port 9393
check_cmd "python test_server.py uci_housing_model > /dev/null &"
sleep 5 # wait for the server to start
check_cmd "python test_sync_client.py > /dev/null"
check_cmd "python test_asyn_client.py > /dev/null"
check_cmd "python test_general_pb_client.py > /dev/null"
check_cmd "python test_numpy_input_client.py > /dev/null"
check_cmd "python test_batch_client.py > /dev/null"
check_cmd "python test_timeout_client.py > /dev/null"
kill_server_process
kill_process_by_port 9393
cd .. # pwd: /Serving/python/examples/grpc_impl_example
# test load server config and client config in Server side
cd criteo_ctr_with_cube # pwd: /Serving/python/examples/grpc_impl_example/criteo_ctr_with_cube
check_cmd "wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz > /dev/null"
check_cmd "tar xf ctr_cube_unittest.tar.gz"
check_cmd "mv models/ctr_client_conf ./"
check_cmd "mv models/ctr_serving_model_kv ./"
check_cmd "mv models/data ./cube/"
check_cmd "mv models/ut_data ./"
cp ../../../../build-server-$TYPE/output/bin/cube* ./cube/
sh cube_prepare.sh &
check_cmd "mkdir work_dir1 && cp cube/conf/cube.conf ./work_dir1/"
python test_server.py ctr_serving_model_kv ctr_client_conf/serving_client_conf.prototxt &
sleep 5
check_cmd "python test_client.py ./ut_data >score"
tail -n 2 score | awk 'NR==1'
AUC=$(tail -n 2 score | awk 'NR==1')
VAR2="0.67" #TODO: temporarily relax the threshold to 0.67
RES=$( echo "$AUC>$VAR2" | bc )
if [[ $RES -eq 0 ]]; then
echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.67"
exit 1
fi
echo "grpc impl test success"
kill_server_process
ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill
cd .. # pwd: /Serving/python/examples/grpc_impl_example
;;
GPU)
export CUDA_VISIBLE_DEVICES=0
# test general case
cd fit_a_line # pwd: /Serving/python/examples/grpc_impl_example/fit_a_line
sh get_data.sh
# one line command start
check_cmd "python -m paddle_serving_server_gpu.serve --model uci_housing_model --port 9393 --thread 4 --gpu_ids 0 --use_multilang > /dev/null &"
sleep 5 # wait for the server to start
check_cmd "python test_sync_client.py > /dev/null"
check_cmd "python test_asyn_client.py > /dev/null"
check_cmd "python test_general_pb_client.py > /dev/null"
check_cmd "python test_numpy_input_client.py > /dev/null"
check_cmd "python test_batch_client.py > /dev/null"
check_cmd "python test_timeout_client.py > /dev/null"
kill_server_process
kill_process_by_port 9393
check_cmd "python test_server_gpu.py uci_housing_model > /dev/null &"
sleep 5 # wait for the server to start
check_cmd "python test_sync_client.py > /dev/null"
check_cmd "python test_asyn_client.py > /dev/null"
check_cmd "python test_general_pb_client.py > /dev/null"
check_cmd "python test_numpy_input_client.py > /dev/null"
check_cmd "python test_batch_client.py > /dev/null"
check_cmd "python test_timeout_client.py > /dev/null"
kill_server_process
kill_process_by_port 9393
#ps -ef | grep "test_server_gpu" | grep -v serving_build | grep -v grep | awk '{print $2}' | xargs kill
cd .. # pwd: /Serving/python/examples/grpc_impl_example
# test load server config and client config in Server side
cd criteo_ctr_with_cube # pwd: /Serving/python/examples/grpc_impl_example/criteo_ctr_with_cube
check_cmd "wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz"
check_cmd "tar xf ctr_cube_unittest.tar.gz"
check_cmd "mv models/ctr_client_conf ./"
check_cmd "mv models/ctr_serving_model_kv ./"
check_cmd "mv models/data ./cube/"
check_cmd "mv models/ut_data ./"
cp ../../../../build-server-$TYPE/output/bin/cube* ./cube/
sh cube_prepare.sh &
check_cmd "mkdir work_dir1 && cp cube/conf/cube.conf ./work_dir1/"
python test_server_gpu.py ctr_serving_model_kv ctr_client_conf/serving_client_conf.prototxt &
sleep 5
# for warm up
python test_client.py ./ut_data &> /dev/null || true
check_cmd "python test_client.py ./ut_data >score"
tail -n 2 score | awk 'NR==1'
AUC=$(tail -n 2 score | awk 'NR==1')
VAR2="0.67" #TODO: temporarily relax the threshold to 0.67
RES=$( echo "$AUC>$VAR2" | bc )
if [[ $RES -eq 0 ]]; then
echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.67"
exit 1
fi
echo "grpc impl test success"
kill_server_process
ps -ef | grep "test_server_gpu" | grep -v serving_build | grep -v grep | awk '{print $2}' | xargs kill
ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill
cd .. # pwd: /Serving/python/examples/grpc_impl_example
;;
*)
echo "error type"
exit 1
;;
esac
echo "test grpc impl $TYPE part finished as expected."
setproxy
unset SERVING_BIN
cd .. # pwd: /Serving/python/examples
}
function python_test_yolov4(){
#pwd:/ Serving/python/examples
local TYPE=$1
export SERVING_BIN=${SERVING_WORKDIR}/build-server-${TYPE}/core/general-server/serving
cd yolov4
case $TYPE in
CPU)
echo "no implement for cpu type"
;;
GPU)
python -m paddle_serving_app.package --get_model yolov4
tar -xzvf yolov4.tar.gz
check_cmd "python -m paddle_serving_server_gpu.serve --model yolov4_model/ --port 9393 --gpu_ids 0 &"
sleep 5
check_cmd "python test_client.py 000000570688.jpg"
echo "yolov4 GPU RPC inference pass"
kill_server_process
;;
*)
echo "error type"
exit 1
;;
esac
echo "test yolov4 $TYPE finished as expected."
unset SERVING_BIN
cd ..
}
function python_test_resnet50(){
#pwd:/ Serving/python/examples
local TYPE=$1
export SERVING_BIN=${SERVING_WORKDIR}/build-server-${TYPE}/core/general-server/serving
cd imagenet
case $TYPE in
CPU)
echo "no implement for cpu type"
;;
GPU)
sh get_model.sh
check_cmd"python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9696 --gpu_ids 0"
sleep 5
check_cmd"python resnet50_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt"
echo "resnet50 GPU RPC inference pass"
kill_server_process
;;
*)
echo "error type"
exit 1
;;
esac
echo "test resnet $TYPE finished as expected"
unset SERVING_BIN
cd ..
}
function python_test_pipeline(){
# pwd:/ Serving/python/examples
local TYPE=$1
export SERVING_BIN=${SERVING_WORKDIR}/build-server-${TYPE}/core/general-server/serving
unsetproxy
cd pipeline/imdb_model_ensemble
case $TYPE in
CPU)
# start paddle serving service (brpc)
sh get_data.sh
python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 --workdir test9292 &> cnn.log &
python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 --workdir test9393 &> bow.log &
sleep 5
# test: thread servicer & thread op
cat << EOF > config.yml
port: 18080
worker_num: 2
build_dag_each_worker: false
dag:
is_thread_op: true
client_type: brpc
retry: 1
use_profile: false
EOF
python test_pipeline_server.py > /dev/null &
sleep 5
check_cmd "python test_pipeline_client.py"
ps -ef | grep "pipeline_server" | grep -v grep | awk '{print $2}' | xargs kill
kill_process_by_port 18080
# test: thread servicer & process op
cat << EOF > config.yml
port: 18080
worker_num: 2
build_dag_each_worker: false
dag:
is_thread_op: false
client_type: brpc
retry: 1
use_profile: false
EOF
python test_pipeline_server.py > /dev/null &
sleep 5
check_cmd "python test_pipeline_client.py"
ps -ef | grep "pipeline_server" | grep -v grep | awk '{print $2}' | xargs kill
kill_process_by_port 18080
# test: process servicer & thread op
cat << EOF > config.yml
port: 18080
worker_num: 2
build_dag_each_worker: true
dag:
is_thread_op: flase
client_type: brpc
retry: 1
use_profile: false
EOF
python test_pipeline_server.py > /dev/null &
sleep 5
check_cmd "python test_pipeline_client.py"
ps -ef | grep "pipeline_server" | grep -v grep | awk '{print $2}' | xargs kill
kill_process_by_port 18080
# test: process servicer & process op
cat << EOF > config.yml
port: 18080
worker_num: 2
build_dag_each_worker: false
dag:
is_thread_op: false
client_type: brpc
retry: 1
use_profile: false
EOF
python test_pipeline_server.py > /dev/null &
sleep 5
check_cmd "python test_pipeline_client.py"
ps -ef | grep "pipeline_server" | grep -v grep | awk '{print $2}' | xargs kill
kill_process_by_port 18080
kill_server_process
kill_process_by_port 9292
kill_process_by_port 9393
# start paddle serving service (grpc)
python -m paddle_serving_server.serve --model imdb_cnn_model --port 9292 --use_multilang --workdir test9292 &> cnn.log &
python -m paddle_serving_server.serve --model imdb_bow_model --port 9393 --use_multilang --workdir test9393 &> bow.log &
sleep 5
cat << EOF > config.yml
port: 18080
worker_num: 2
build_dag_each_worker: false
dag:
is_thread_op: false
client_type: grpc
retry: 1
use_profile: false
EOF
python test_pipeline_server.py > /dev/null &
sleep 5
check_cmd "python test_pipeline_client.py"
ps -ef | grep "pipeline_server" | grep -v grep | awk '{print $2}' | xargs kill
kill_process_by_port 18080
kill_server_process
kill_process_by_port 9292
kill_process_by_port 9393
;;
GPU)
echo "pipeline ignore GPU test"
;;
*)
echo "error type"
exit 1
;;
esac
cd ../../
setproxy
unset SERVING_BIN
}
function python_app_api_test(){
#pwd:/ Serving/python/examples
#test image reader
local TYPE=$1
cd imagenet
case $TYPE in
CPU)
check_cmd "python test_image_reader.py"
;;
GPU)
echo "no implement for cpu type"
;;
*)
echo "error type"
exit 1
;;
esac
echo "test app api finised as expected"
cd ..
}
function python_run_test() { function python_run_test() {
# Using the compiled binary # Using the compiled binary
local TYPE=$1 # pwd: /Serving local TYPE=$1 # pwd: /Serving
...@@ -510,6 +913,10 @@ function python_run_test() { ...@@ -510,6 +913,10 @@ function python_run_test() {
python_test_lac $TYPE # pwd: /Serving/python/examples python_test_lac $TYPE # pwd: /Serving/python/examples
python_test_multi_process $TYPE # pwd: /Serving/python/examples python_test_multi_process $TYPE # pwd: /Serving/python/examples
python_test_multi_fetch $TYPE # pwd: /Serving/python/examples python_test_multi_fetch $TYPE # pwd: /Serving/python/examples
python_test_yolov4 $TYPE # pwd: /Serving/python/examples
python_test_grpc_impl $TYPE # pwd: /Serving/python/examples
python_test_resnet50 $TYPE # pwd: /Serving/python/examples
python_test_pipeline $TYPE # pwd: /Serving/python/examples
echo "test python $TYPE part finished as expected." echo "test python $TYPE part finished as expected."
cd ../.. # pwd: /Serving cd ../.. # pwd: /Serving
} }
...@@ -762,9 +1169,11 @@ function main() { ...@@ -762,9 +1169,11 @@ function main() {
build_client $TYPE # pwd: /Serving build_client $TYPE # pwd: /Serving
build_server $TYPE # pwd: /Serving build_server $TYPE # pwd: /Serving
build_app $TYPE # pwd: /Serving build_app $TYPE # pwd: /Serving
java_run_test $TYPE # pwd: /Serving
python_run_test $TYPE # pwd: /Serving python_run_test $TYPE # pwd: /Serving
monitor_test $TYPE # pwd: /Serving monitor_test $TYPE # pwd: /Serving
echo "serving $TYPE part finished as expected." echo "serving $TYPE part finished as expected."
} }
main $@ main $@
exit 0
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册