提交 85bf8a57 编写于 作者: B barrierye

merge

<p align="center"> <p align="center">
<br> <br>
<img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "130"> <img src='doc/serving_logo.png' width = "600" height = "130">
<br> <br>
<p> <p>
...@@ -34,7 +34,7 @@ We consider deploying deep learning inference service online to be a user-facing ...@@ -34,7 +34,7 @@ We consider deploying deep learning inference service online to be a user-facing
<h2 align="center">Installation</h2> <h2 align="center">Installation</h2>
We highly recommend you to run Paddle Serving in Docker, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md) We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md)
``` ```
# Run CPU Docker # Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0 docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
...@@ -55,7 +55,7 @@ pip install paddle-serving-server-gpu # GPU ...@@ -55,7 +55,7 @@ pip install paddle-serving-server-gpu # GPU
``` ```
You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download. You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client. Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
<h2 align="center">Quick Start Example</h2> <h2 align="center">Quick Start Example</h2>
...@@ -88,7 +88,7 @@ Here, we use `curl` to send a HTTP POST request to the service we just started. ...@@ -88,7 +88,7 @@ Here, we use `curl` to send a HTTP POST request to the service we just started.
</center> </center>
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
``` ```
### RPC service ### RPC service
...@@ -133,7 +133,7 @@ python lac_web_service.py jieba_server_model/ lac_workdir 9292 ...@@ -133,7 +133,7 @@ python lac_web_service.py jieba_server_model/ lac_workdir 9292
``` ```
- **Request sample**: - **Request sample**:
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天安门", "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
``` ```
- **Request result**: - **Request result**:
``` shell ``` shell
...@@ -166,7 +166,7 @@ python image_classification_service_demo.py resnet50_serving_model ...@@ -166,7 +166,7 @@ python image_classification_service_demo.py resnet50_serving_model
<p> <p>
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg", "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
``` ```
- **Request result**: - **Request result**:
``` shell ``` shell
...@@ -256,6 +256,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv ...@@ -256,6 +256,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
### Developers ### Developers
- [How to config Serving native operators on server side?](doc/SERVER_DAG.md) - [How to config Serving native operators on server side?](doc/SERVER_DAG.md)
- [How to develop a new Serving operator?](doc/NEW_OPERATOR.md) - [How to develop a new Serving operator?](doc/NEW_OPERATOR.md)
- [How to develop a new Web Service?](doc/NEW_WEB_SERVICE.md)
- [Golang client](doc/IMDB_GO_CLIENT.md) - [Golang client](doc/IMDB_GO_CLIENT.md)
- [Compile from source code](doc/COMPILE.md) - [Compile from source code](doc/COMPILE.md)
......
...@@ -35,7 +35,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务 ...@@ -35,7 +35,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
<h2 align="center">安装</h2> <h2 align="center">安装</h2>
强烈建议您在Docker内构建Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md) **强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)
``` ```
# 启动 CPU Docker # 启动 CPU Docker
...@@ -92,7 +92,7 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -92,7 +92,7 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
</center> </center>
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
``` ```
<h3 align="center">RPC服务</h3> <h3 align="center">RPC服务</h3>
...@@ -138,7 +138,7 @@ python lac_web_service.py jieba_server_model/ lac_workdir 9292 ...@@ -138,7 +138,7 @@ python lac_web_service.py jieba_server_model/ lac_workdir 9292
``` ```
- **客户端请求示例**: - **客户端请求示例**:
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天安门", "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
``` ```
- **返回结果示例**: - **返回结果示例**:
``` shell ``` shell
...@@ -171,7 +171,7 @@ python image_classification_service_demo.py resnet50_serving_model ...@@ -171,7 +171,7 @@ python image_classification_service_demo.py resnet50_serving_model
<p> <p>
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg", "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
``` ```
- **返回结果示例**: - **返回结果示例**:
``` shell ``` shell
...@@ -262,6 +262,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv ...@@ -262,6 +262,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
### 开发者教程 ### 开发者教程
- [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md) - [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md)
- [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md) - [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md)
- [如何开发一个新的Web Service?](doc/NEW_WEB_SERVICE_CN.md)
- [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md) - [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md)
- [如何编译PaddleServing?](doc/COMPILE_CN.md) - [如何编译PaddleServing?](doc/COMPILE_CN.md)
......
...@@ -217,6 +217,7 @@ int run_m(int argc, char** argv) { ...@@ -217,6 +217,7 @@ int run_m(int argc, char** argv) {
LOG(INFO) << " total_request = " << std::to_string(request_num) << " speed = " LOG(INFO) << " total_request = " << std::to_string(request_num) << " speed = "
<< std::to_string(1000000 * thread_num / mean_time) // mean_time us << std::to_string(1000000 * thread_num / mean_time) // mean_time us
<< " query per second"; << " query per second";
return 0;
} }
} // namespace mcube } // namespace mcube
......
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
#include <fstream> #include <fstream>
#include <map> #include <map>
#include <string> #include <string>
#include <utility> // move
#include <vector> #include <vector>
#include "core/sdk-cpp/builtin_format.pb.h" #include "core/sdk-cpp/builtin_format.pb.h"
...@@ -39,12 +40,32 @@ namespace baidu { ...@@ -39,12 +40,32 @@ namespace baidu {
namespace paddle_serving { namespace paddle_serving {
namespace general_model { namespace general_model {
class PredictorRes { class ModelRes {
public:
PredictorRes() {}
~PredictorRes() {}
public: public:
ModelRes() {}
ModelRes(const ModelRes& res) {
_engine_name = res._engine_name;
_int64_value_map.insert(res._int64_value_map.begin(),
res._int64_value_map.end());
_float_value_map.insert(res._float_value_map.begin(),
res._float_value_map.end());
_shape_map.insert(res._shape_map.begin(), res._shape_map.end());
_lod_map.insert(res._lod_map.begin(), res._lod_map.end());
}
ModelRes(ModelRes&& res) {
_engine_name = std::move(res._engine_name);
_int64_value_map.insert(
std::make_move_iterator(std::begin(res._int64_value_map)),
std::make_move_iterator(std::end(res._int64_value_map)));
_float_value_map.insert(
std::make_move_iterator(std::begin(res._float_value_map)),
std::make_move_iterator(std::end(res._float_value_map)));
_shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)),
std::make_move_iterator(std::end(res._shape_map)));
_lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)),
std::make_move_iterator(std::end(res._lod_map)));
}
~ModelRes() {}
const std::vector<int64_t>& get_int64_by_name(const std::string& name) { const std::vector<int64_t>& get_int64_by_name(const std::string& name) {
return _int64_value_map[name]; return _int64_value_map[name];
} }
...@@ -57,19 +78,75 @@ class PredictorRes { ...@@ -57,19 +78,75 @@ class PredictorRes {
const std::vector<int>& get_lod(const std::string& name) { const std::vector<int>& get_lod(const std::string& name) {
return _lod_map[name]; return _lod_map[name];
} }
void set_variant_tag(const std::string& variant_tag) { void set_engine_name(const std::string& engine_name) {
_variant_tag = variant_tag; _engine_name = engine_name;
}
const std::string& engine_name() { return _engine_name; }
ModelRes& operator=(ModelRes&& res) {
if (this != &res) {
_engine_name = std::move(res._engine_name);
_int64_value_map.insert(
std::make_move_iterator(std::begin(res._int64_value_map)),
std::make_move_iterator(std::end(res._int64_value_map)));
_float_value_map.insert(
std::make_move_iterator(std::begin(res._float_value_map)),
std::make_move_iterator(std::end(res._float_value_map)));
_shape_map.insert(std::make_move_iterator(std::begin(res._shape_map)),
std::make_move_iterator(std::end(res._shape_map)));
_lod_map.insert(std::make_move_iterator(std::begin(res._lod_map)),
std::make_move_iterator(std::end(res._lod_map)));
}
return *this;
} }
const std::string& variant_tag() { return _variant_tag; }
public: public:
std::string _engine_name;
std::map<std::string, std::vector<int64_t>> _int64_value_map; std::map<std::string, std::vector<int64_t>> _int64_value_map;
std::map<std::string, std::vector<float>> _float_value_map; std::map<std::string, std::vector<float>> _float_value_map;
std::map<std::string, std::vector<int>> _shape_map; std::map<std::string, std::vector<int>> _shape_map;
std::map<std::string, std::vector<int>> _lod_map; std::map<std::string, std::vector<int>> _lod_map;
};
class PredictorRes {
public:
PredictorRes() {}
~PredictorRes() {}
public:
void clear() {
_models.clear();
_engine_names.clear();
}
const std::vector<int64_t>& get_int64_by_name(const int model_idx,
const std::string& name) {
return _models[model_idx].get_int64_by_name(name);
}
const std::vector<float>& get_float_by_name(const int model_idx,
const std::string& name) {
return _models[model_idx].get_float_by_name(name);
}
const std::vector<int>& get_shape(const int model_idx,
const std::string& name) {
return _models[model_idx].get_shape(name);
}
const std::vector<int>& get_lod(const int model_idx,
const std::string& name) {
return _models[model_idx].get_lod(name);
}
void add_model_res(ModelRes&& res) {
_engine_names.push_back(res.engine_name());
_models.emplace_back(std::move(res));
}
void set_variant_tag(const std::string& variant_tag) {
_variant_tag = variant_tag;
}
const std::string& variant_tag() { return _variant_tag; }
const std::vector<std::string>& get_engine_names() { return _engine_names; }
private: private:
std::vector<ModelRes> _models;
std::string _variant_tag; std::string _variant_tag;
std::vector<std::string> _engine_names;
}; };
class PredictorClient { class PredictorClient {
......
...@@ -111,6 +111,7 @@ void PredictorClient::set_predictor_conf(const std::string &conf_path, ...@@ -111,6 +111,7 @@ void PredictorClient::set_predictor_conf(const std::string &conf_path,
int PredictorClient::destroy_predictor() { int PredictorClient::destroy_predictor() {
_api.thrd_finalize(); _api.thrd_finalize();
_api.destroy(); _api.destroy();
return 0;
} }
int PredictorClient::create_predictor_by_desc(const std::string &sdk_desc) { int PredictorClient::create_predictor_by_desc(const std::string &sdk_desc) {
...@@ -118,7 +119,8 @@ int PredictorClient::create_predictor_by_desc(const std::string &sdk_desc) { ...@@ -118,7 +119,8 @@ int PredictorClient::create_predictor_by_desc(const std::string &sdk_desc) {
LOG(ERROR) << "Predictor Creation Failed"; LOG(ERROR) << "Predictor Creation Failed";
return -1; return -1;
} }
_api.thrd_initialize(); // _api.thrd_initialize();
return 0;
} }
int PredictorClient::create_predictor() { int PredictorClient::create_predictor() {
...@@ -128,7 +130,8 @@ int PredictorClient::create_predictor() { ...@@ -128,7 +130,8 @@ int PredictorClient::create_predictor() {
LOG(ERROR) << "Predictor Creation Failed"; LOG(ERROR) << "Predictor Creation Failed";
return -1; return -1;
} }
_api.thrd_initialize(); // _api.thrd_initialize();
return 0;
} }
int PredictorClient::batch_predict( int PredictorClient::batch_predict(
...@@ -143,16 +146,13 @@ int PredictorClient::batch_predict( ...@@ -143,16 +146,13 @@ int PredictorClient::batch_predict(
const int &pid) { const int &pid) {
int batch_size = std::max(float_feed_batch.size(), int_feed_batch.size()); int batch_size = std::max(float_feed_batch.size(), int_feed_batch.size());
predict_res_batch._int64_value_map.clear(); predict_res_batch.clear();
predict_res_batch._float_value_map.clear();
predict_res_batch._shape_map.clear();
predict_res_batch._lod_map.clear();
Timer timeline; Timer timeline;
int64_t preprocess_start = timeline.TimeStampUS(); int64_t preprocess_start = timeline.TimeStampUS();
int fetch_name_num = fetch_name.size(); int fetch_name_num = fetch_name.size();
_api.thrd_clear(); _api.thrd_initialize();
std::string variant_tag; std::string variant_tag;
_predictor = _api.fetch_predictor("general_model", &variant_tag); _predictor = _api.fetch_predictor("general_model", &variant_tag);
predict_res_batch.set_variant_tag(variant_tag); predict_res_batch.set_variant_tag(variant_tag);
...@@ -189,11 +189,11 @@ int PredictorClient::batch_predict( ...@@ -189,11 +189,11 @@ int PredictorClient::batch_predict(
Tensor *tensor = tensor_vec[idx]; Tensor *tensor = tensor_vec[idx];
VLOG(2) << "prepare float feed " << name << " shape size " VLOG(2) << "prepare float feed " << name << " shape size "
<< float_shape[vec_idx].size(); << float_shape[vec_idx].size();
for (int j = 0; j < float_shape[vec_idx].size(); ++j) { for (uint32_t j = 0; j < float_shape[vec_idx].size(); ++j) {
tensor->add_shape(float_shape[vec_idx][j]); tensor->add_shape(float_shape[vec_idx][j]);
} }
tensor->set_elem_type(1); tensor->set_elem_type(1);
for (int j = 0; j < float_feed[vec_idx].size(); ++j) { for (uint32_t j = 0; j < float_feed[vec_idx].size(); ++j) {
tensor->add_float_data(float_feed[vec_idx][j]); tensor->add_float_data(float_feed[vec_idx][j]);
} }
vec_idx++; vec_idx++;
...@@ -208,13 +208,13 @@ int PredictorClient::batch_predict( ...@@ -208,13 +208,13 @@ int PredictorClient::batch_predict(
Tensor *tensor = tensor_vec[idx]; Tensor *tensor = tensor_vec[idx];
VLOG(2) << "prepare int feed " << name << " shape size " VLOG(2) << "prepare int feed " << name << " shape size "
<< int_shape[vec_idx].size(); << int_shape[vec_idx].size();
for (int j = 0; j < int_shape[vec_idx].size(); ++j) { for (uint32_t j = 0; j < int_shape[vec_idx].size(); ++j) {
tensor->add_shape(int_shape[vec_idx][j]); tensor->add_shape(int_shape[vec_idx][j]);
} }
tensor->set_elem_type(0); tensor->set_elem_type(0);
VLOG(3) << "feed var name " << name << " index " << vec_idx VLOG(3) << "feed var name " << name << " index " << vec_idx
<< "first data " << int_feed[vec_idx][0]; << "first data " << int_feed[vec_idx][0];
for (int j = 0; j < int_feed[vec_idx].size(); ++j) { for (uint32_t j = 0; j < int_feed[vec_idx].size(); ++j) {
tensor->add_int64_data(int_feed[vec_idx][j]); tensor->add_int64_data(int_feed[vec_idx][j]);
} }
vec_idx++; vec_idx++;
...@@ -247,52 +247,61 @@ int PredictorClient::batch_predict( ...@@ -247,52 +247,61 @@ int PredictorClient::batch_predict(
} else { } else {
client_infer_end = timeline.TimeStampUS(); client_infer_end = timeline.TimeStampUS();
postprocess_start = client_infer_end; postprocess_start = client_infer_end;
VLOG(2) << "get model output num";
for (auto &name : fetch_name) { uint32_t model_num = res.outputs_size();
// int idx = _fetch_name_to_idx[name]; VLOG(2) << "model num: " << model_num;
int idx = 0; for (uint32_t m_idx = 0; m_idx < model_num; ++m_idx) {
int shape_size = res.insts(0).tensor_array(idx).shape_size(); VLOG(2) << "process model output index: " << m_idx;
VLOG(2) << "fetch var " << name << " index " << idx << " shape size " auto output = res.outputs(m_idx);
<< shape_size; ModelRes model;
predict_res_batch._shape_map[name].resize(shape_size); model.set_engine_name(output.engine_name());
for (int i = 0; i < shape_size; ++i) {
predict_res_batch._shape_map[name][i] = for (auto &name : fetch_name) {
res.insts(0).tensor_array(idx).shape(i); // int idx = _fetch_name_to_idx[name];
} int idx = 0;
int lod_size = res.insts(0).tensor_array(idx).lod_size(); int shape_size = output.insts(0).tensor_array(idx).shape_size();
if (lod_size > 0) { VLOG(2) << "fetch var " << name << " index " << idx << " shape size "
predict_res_batch._lod_map[name].resize(lod_size); << shape_size;
for (int i = 0; i < lod_size; ++i) { model._shape_map[name].resize(shape_size);
predict_res_batch._lod_map[name][i] = for (int i = 0; i < shape_size; ++i) {
res.insts(0).tensor_array(idx).lod(i); model._shape_map[name][i] =
output.insts(0).tensor_array(idx).shape(i);
}
int lod_size = output.insts(0).tensor_array(idx).lod_size();
if (lod_size > 0) {
model._lod_map[name].resize(lod_size);
for (int i = 0; i < lod_size; ++i) {
model._lod_map[name][i] = output.insts(0).tensor_array(idx).lod(i);
}
} }
idx += 1;
} }
idx += 1;
}
for (auto &name : fetch_name) { for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name]; // int idx = _fetch_name_to_idx[name];
int idx = 0; int idx = 0;
if (_fetch_name_to_type[name] == 0) { if (_fetch_name_to_type[name] == 0) {
VLOG(2) << "ferch var " << name << "type int"; VLOG(2) << "ferch var " << name << "type int";
predict_res_batch._int64_value_map[name].resize( model._int64_value_map[name].resize(
res.insts(0).tensor_array(idx).int64_data_size()); output.insts(0).tensor_array(idx).int64_data_size());
int size = res.insts(0).tensor_array(idx).int64_data_size(); int size = output.insts(0).tensor_array(idx).int64_data_size();
for (int i = 0; i < size; ++i) { for (int i = 0; i < size; ++i) {
predict_res_batch._int64_value_map[name][i] = model._int64_value_map[name][i] =
res.insts(0).tensor_array(idx).int64_data(i); output.insts(0).tensor_array(idx).int64_data(i);
} }
} else { } else {
VLOG(2) << "fetch var " << name << "type float"; VLOG(2) << "fetch var " << name << "type float";
predict_res_batch._float_value_map[name].resize( model._float_value_map[name].resize(
res.insts(0).tensor_array(idx).float_data_size()); output.insts(0).tensor_array(idx).float_data_size());
int size = res.insts(0).tensor_array(idx).float_data_size(); int size = output.insts(0).tensor_array(idx).float_data_size();
for (int i = 0; i < size; ++i) { for (int i = 0; i < size; ++i) {
predict_res_batch._float_value_map[name][i] = model._float_value_map[name][i] =
res.insts(0).tensor_array(idx).float_data(i); output.insts(0).tensor_array(idx).float_data(i);
}
} }
idx += 1;
} }
idx += 1; predict_res_batch.add_model_res(std::move(model));
} }
postprocess_end = timeline.TimeStampUS(); postprocess_end = timeline.TimeStampUS();
} }
...@@ -305,7 +314,6 @@ int PredictorClient::batch_predict( ...@@ -305,7 +314,6 @@ int PredictorClient::batch_predict(
<< "prepro_1:" << preprocess_end << " " << "prepro_1:" << preprocess_end << " "
<< "client_infer_0:" << client_infer_start << " " << "client_infer_0:" << client_infer_start << " "
<< "client_infer_1:" << client_infer_end << " "; << "client_infer_1:" << client_infer_end << " ";
if (FLAGS_profile_server) { if (FLAGS_profile_server) {
int op_num = res.profile_time_size() / 2; int op_num = res.profile_time_size() / 2;
for (int i = 0; i < op_num; ++i) { for (int i = 0; i < op_num; ++i) {
...@@ -319,6 +327,8 @@ int PredictorClient::batch_predict( ...@@ -319,6 +327,8 @@ int PredictorClient::batch_predict(
fprintf(stderr, "%s\n", oss.str().c_str()); fprintf(stderr, "%s\n", oss.str().c_str());
} }
_api.thrd_clear();
return 0; return 0;
} }
......
...@@ -31,27 +31,28 @@ PYBIND11_MODULE(serving_client, m) { ...@@ -31,27 +31,28 @@ PYBIND11_MODULE(serving_client, m) {
py::class_<PredictorRes>(m, "PredictorRes", py::buffer_protocol()) py::class_<PredictorRes>(m, "PredictorRes", py::buffer_protocol())
.def(py::init()) .def(py::init())
.def("get_int64_by_name", .def("get_int64_by_name",
[](PredictorRes &self, std::string &name) { [](PredictorRes &self, int model_idx, std::string &name) {
return self.get_int64_by_name(name); return self.get_int64_by_name(model_idx, name);
}, },
py::return_value_policy::reference) py::return_value_policy::reference)
.def("get_float_by_name", .def("get_float_by_name",
[](PredictorRes &self, std::string &name) { [](PredictorRes &self, int model_idx, std::string &name) {
return self.get_float_by_name(name); return self.get_float_by_name(model_idx, name);
}, },
py::return_value_policy::reference) py::return_value_policy::reference)
.def("get_shape", .def("get_shape",
[](PredictorRes &self, std::string &name) { [](PredictorRes &self, int model_idx, std::string &name) {
return self.get_shape(name); return self.get_shape(model_idx, name);
}, },
py::return_value_policy::reference) py::return_value_policy::reference)
.def("get_lod", .def("get_lod",
[](PredictorRes &self, std::string &name) { [](PredictorRes &self, int model_idx, std::string &name) {
return self.get_lod(name); return self.get_lod(model_idx, name);
}, },
py::return_value_policy::reference) py::return_value_policy::reference)
.def("variant_tag", .def("variant_tag", [](PredictorRes &self) { return self.variant_tag(); })
[](PredictorRes &self) { return self.variant_tag(); }); .def("get_engine_names",
[](PredictorRes &self) { return self.get_engine_names(); });
py::class_<PredictorClient>(m, "PredictorClient", py::buffer_protocol()) py::class_<PredictorClient>(m, "PredictorClient", py::buffer_protocol())
.def(py::init()) .def(py::init())
...@@ -77,7 +78,6 @@ PYBIND11_MODULE(serving_client, m) { ...@@ -77,7 +78,6 @@ PYBIND11_MODULE(serving_client, m) {
[](PredictorClient &self) { self.create_predictor(); }) [](PredictorClient &self) { self.create_predictor(); })
.def("destroy_predictor", .def("destroy_predictor",
[](PredictorClient &self) { self.destroy_predictor(); }) [](PredictorClient &self) { self.destroy_predictor(); })
.def("batch_predict", .def("batch_predict",
[](PredictorClient &self, [](PredictorClient &self,
const std::vector<std::vector<std::vector<float>>> const std::vector<std::vector<std::vector<float>>>
......
...@@ -35,8 +35,17 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; ...@@ -35,8 +35,17 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralCopyOp::inference() { int GeneralCopyOp::inference() {
// reade request from client // reade request from client
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name()); const std::vector<std::string> pre_node_names = pre_names();
VLOG(2) << "precedent name: " << pre_name(); if (pre_node_names.size() != 1) {
LOG(ERROR) << "This op(" << op_name()
<< ") can only have one predecessor op, but received "
<< pre_node_names.size();
return -1;
}
const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
VLOG(2) << "precedent name: " << pre_name;
const TensorVector *in = &input_blob->tensor_vector; const TensorVector *in = &input_blob->tensor_vector;
VLOG(2) << "input size: " << in->size(); VLOG(2) << "input size: " << in->size();
int batch_size = input_blob->GetBatchSize(); int batch_size = input_blob->GetBatchSize();
......
...@@ -40,12 +40,21 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; ...@@ -40,12 +40,21 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralDistKVInferOp::inference() { int GeneralDistKVInferOp::inference() {
VLOG(2) << "Going to run inference"; VLOG(2) << "Going to run inference";
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name()); const std::vector<std::string> pre_node_names = pre_names();
VLOG(2) << "Get precedent op name: " << pre_name(); if (pre_node_names.size() != 1) {
LOG(ERROR) << "This op(" << op_name()
<< ") can only have one predecessor op, but received "
<< pre_node_names.size();
return -1;
}
const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
VLOG(2) << "Get precedent op name: " << pre_name;
GeneralBlob *output_blob = mutable_data<GeneralBlob>(); GeneralBlob *output_blob = mutable_data<GeneralBlob>();
if (!input_blob) { if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op:" << pre_name(); LOG(ERROR) << "Failed mutable depended argument, op:" << pre_name;
return -1; return -1;
} }
...@@ -149,8 +158,8 @@ int GeneralDistKVInferOp::inference() { ...@@ -149,8 +158,8 @@ int GeneralDistKVInferOp::inference() {
timeline.Start(); timeline.Start();
if (InferManager::instance().infer( if (InferManager::instance().infer(
GENERAL_MODEL_NAME, &infer_in, out, batch_size)) { engine_name().c_str(), &infer_in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << GENERAL_MODEL_NAME; LOG(ERROR) << "Failed do infer in fluid model: " << engine_name();
return -1; return -1;
} }
......
...@@ -41,12 +41,21 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; ...@@ -41,12 +41,21 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralDistKVQuantInferOp::inference() { int GeneralDistKVQuantInferOp::inference() {
VLOG(2) << "Going to run inference"; VLOG(2) << "Going to run inference";
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name()); const std::vector<std::string> pre_node_names = pre_names();
VLOG(2) << "Get precedent op name: " << pre_name(); if (pre_node_names.size() != 1) {
LOG(ERROR) << "This op(" << op_name()
<< ") can only have one predecessor op, but received "
<< pre_node_names.size();
return -1;
}
const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
VLOG(2) << "Get precedent op name: " << pre_name;
GeneralBlob *output_blob = mutable_data<GeneralBlob>(); GeneralBlob *output_blob = mutable_data<GeneralBlob>();
if (!input_blob) { if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op:" << pre_name(); LOG(ERROR) << "Failed mutable depended argument, op:" << pre_name;
return -1; return -1;
} }
...@@ -180,8 +189,8 @@ int GeneralDistKVQuantInferOp::inference() { ...@@ -180,8 +189,8 @@ int GeneralDistKVQuantInferOp::inference() {
timeline.Start(); timeline.Start();
if (InferManager::instance().infer( if (InferManager::instance().infer(
GENERAL_MODEL_NAME, &infer_in, out, batch_size)) { engine_name().c_str(), &infer_in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << GENERAL_MODEL_NAME; LOG(ERROR) << "Failed do infer in fluid model: " << engine_name();
return -1; return -1;
} }
......
...@@ -31,8 +31,6 @@ namespace baidu { ...@@ -31,8 +31,6 @@ namespace baidu {
namespace paddle_serving { namespace paddle_serving {
namespace serving { namespace serving {
static const char* GENERAL_MODEL_NAME = "general_model";
struct GeneralBlob { struct GeneralBlob {
std::vector<paddle::PaddleTensor> tensor_vector; std::vector<paddle::PaddleTensor> tensor_vector;
int64_t time_stamp[20]; int64_t time_stamp[20];
...@@ -63,6 +61,7 @@ static void CopyBlobInfo(const GeneralBlob* src, GeneralBlob* tgt) { ...@@ -63,6 +61,7 @@ static void CopyBlobInfo(const GeneralBlob* src, GeneralBlob* tgt) {
memcpy(&(tgt->time_stamp[0]), memcpy(&(tgt->time_stamp[0]),
&(src->time_stamp[0]), &(src->time_stamp[0]),
src->p_size * sizeof(int64_t)); src->p_size * sizeof(int64_t));
tgt->p_size = src->p_size;
} }
static void CopyLod(const paddle::PaddleTensor* src, static void CopyLod(const paddle::PaddleTensor* src,
......
...@@ -37,12 +37,21 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; ...@@ -37,12 +37,21 @@ using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralInferOp::inference() { int GeneralInferOp::inference() {
VLOG(2) << "Going to run inference"; VLOG(2) << "Going to run inference";
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name()); const std::vector<std::string> pre_node_names = pre_names();
VLOG(2) << "Get precedent op name: " << pre_name(); if (pre_node_names.size() != 1) {
LOG(ERROR) << "This op(" << op_name()
<< ") can only have one predecessor op, but received "
<< pre_node_names.size();
return -1;
}
const std::string pre_name = pre_node_names[0];
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name);
VLOG(2) << "Get precedent op name: " << pre_name;
GeneralBlob *output_blob = mutable_data<GeneralBlob>(); GeneralBlob *output_blob = mutable_data<GeneralBlob>();
if (!input_blob) { if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op:" << pre_name(); LOG(ERROR) << "Failed mutable depended argument, op:" << pre_name;
return -1; return -1;
} }
...@@ -59,8 +68,9 @@ int GeneralInferOp::inference() { ...@@ -59,8 +68,9 @@ int GeneralInferOp::inference() {
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
timeline.Start(); timeline.Start();
if (InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)) { if (InferManager::instance().infer(
LOG(ERROR) << "Failed do infer in fluid model: " << GENERAL_MODEL_NAME; engine_name().c_str(), in, out, batch_size)) {
LOG(ERROR) << "Failed do infer in fluid model: " << engine_name().c_str();
return -1; return -1;
} }
......
...@@ -33,23 +33,17 @@ using baidu::paddle_serving::predictor::general_model::Tensor; ...@@ -33,23 +33,17 @@ using baidu::paddle_serving::predictor::general_model::Tensor;
using baidu::paddle_serving::predictor::general_model::Response; using baidu::paddle_serving::predictor::general_model::Response;
using baidu::paddle_serving::predictor::general_model::Request; using baidu::paddle_serving::predictor::general_model::Request;
using baidu::paddle_serving::predictor::general_model::FetchInst; using baidu::paddle_serving::predictor::general_model::FetchInst;
using baidu::paddle_serving::predictor::general_model::ModelOutput;
using baidu::paddle_serving::predictor::InferManager; using baidu::paddle_serving::predictor::InferManager;
using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralResponseOp::inference() { int GeneralResponseOp::inference() {
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name()); const std::vector<std::string> pre_node_names = pre_names();
VLOG(2) << "pre node names size: " << pre_node_names.size();
if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op: " << pre_name();
return -1;
}
const TensorVector *in = &input_blob->tensor_vector;
int batch_size = input_blob->GetBatchSize();
VLOG(2) << "input batch size: " << batch_size;
const Request *req = dynamic_cast<const Request *>(get_request_message()); const Request *req = dynamic_cast<const Request *>(get_request_message());
// response inst with only fetch_var_names
Response *res = mutable_data<Response>();
Timer timeline; Timer timeline;
// double response_time = 0.0; // double response_time = 0.0;
...@@ -73,77 +67,107 @@ int GeneralResponseOp::inference() { ...@@ -73,77 +67,107 @@ int GeneralResponseOp::inference() {
model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)]; model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)];
} }
// response inst with only fetch_var_names const GeneralBlob *input_blob;
Response *res = mutable_data<Response>(); for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) {
FetchInst *fetch_inst = res->add_insts(); const std::string &pre_name = pre_node_names[pi];
for (auto &idx : fetch_index) { VLOG(2) << "pre names[" << pi << "]: " << pre_name << " ("
Tensor *tensor = fetch_inst->add_tensor_array(); << pre_node_names.size() << ")";
tensor->set_elem_type(1); input_blob = get_depend_argument<GeneralBlob>(pre_name);
if (model_config->_is_lod_fetch[idx]) { // fprintf(stderr, "input(%s) blob address %x\n", pre_names.c_str(),
VLOG(2) << "out[" << idx << "] is lod_tensor"; // input_blob);
for (int k = 0; k < in->at(idx).shape.size(); ++k) { if (!input_blob) {
VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k]; LOG(ERROR) << "Failed mutable depended argument, op: " << pre_name;
tensor->add_shape(in->at(idx).shape[k]); return -1;
}
} else {
VLOG(2) << "out[" << idx << "] is tensor";
for (int k = 0; k < in->at(idx).shape.size(); ++k) {
VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k];
tensor->add_shape(in->at(idx).shape[k]);
}
} }
}
int var_idx = 0; const TensorVector *in = &input_blob->tensor_vector;
for (auto &idx : fetch_index) {
int cap = 1; ModelOutput *output = res->add_outputs();
for (int j = 0; j < in->at(idx).shape.size(); ++j) { // To get the order of model return values
cap *= in->at(idx).shape[j]; output->set_engine_name(pre_name);
} FetchInst *fetch_inst = output->add_insts();
if (in->at(idx).dtype == paddle::PaddleDType::INT64) { for (auto &idx : fetch_index) {
int64_t *data_ptr = static_cast<int64_t *>(in->at(idx).data.data()); Tensor *tensor = fetch_inst->add_tensor_array();
tensor->set_elem_type(1);
if (model_config->_is_lod_fetch[idx]) { if (model_config->_is_lod_fetch[idx]) {
FetchInst *fetch_p = res->mutable_insts(0); VLOG(2) << "out[" << idx << "] is lod_tensor";
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) { for (int k = 0; k < in->at(idx).shape.size(); ++k) {
fetch_p->mutable_tensor_array(var_idx)->add_lod( VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k];
in->at(idx).lod[0][j]); tensor->add_shape(in->at(idx).shape[k]);
}
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_int64_data(data_ptr[j]);
} }
} else { } else {
FetchInst *fetch_p = res->mutable_insts(0); VLOG(2) << "out[" << idx << "] is tensor";
for (int j = 0; j < cap; ++j) { for (int k = 0; k < in->at(idx).shape.size(); ++k) {
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]); VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k];
tensor->add_shape(in->at(idx).shape[k]);
} }
} }
var_idx++; }
} else if (in->at(idx).dtype == paddle::PaddleDType::FLOAT32) {
float *data_ptr = static_cast<float *>(in->at(idx).data.data()); int var_idx = 0;
if (model_config->_is_lod_fetch[idx]) { for (auto &idx : fetch_index) {
FetchInst *fetch_p = res->mutable_insts(0); int cap = 1;
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) { for (int j = 0; j < in->at(idx).shape.size(); ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_lod( cap *= in->at(idx).shape[j];
in->at(idx).lod[0][j]); }
} if (in->at(idx).dtype == paddle::PaddleDType::INT64) {
for (int j = 0; j < cap; ++j) { int64_t *data_ptr = static_cast<int64_t *>(in->at(idx).data.data());
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]); if (model_config->_is_lod_fetch[idx]) {
FetchInst *fetch_p = output->mutable_insts(0);
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_lod(
in->at(idx).lod[0][j]);
}
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_int64_data(data_ptr[j]);
}
} else {
FetchInst *fetch_p = output->mutable_insts(0);
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]);
}
} }
} else { var_idx++;
FetchInst *fetch_p = res->mutable_insts(0); } else if (in->at(idx).dtype == paddle::PaddleDType::FLOAT32) {
for (int j = 0; j < cap; ++j) { float *data_ptr = static_cast<float *>(in->at(idx).data.data());
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]); if (model_config->_is_lod_fetch[idx]) {
FetchInst *fetch_p = output->mutable_insts(0);
for (int j = 0; j < in->at(idx).lod[0].size(); ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_lod(
in->at(idx).lod[0][j]);
}
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]);
}
} else {
FetchInst *fetch_p = output->mutable_insts(0);
for (int j = 0; j < cap; ++j) {
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]);
}
} }
var_idx++;
} }
var_idx++;
} }
} }
if (req->profile_server()) { if (req->profile_server()) {
int64_t end = timeline.TimeStampUS(); int64_t end = timeline.TimeStampUS();
VLOG(2) << "p size for input blob: " << input_blob->p_size; // TODO(barriery): multi-model profile_time.
for (int i = 0; i < input_blob->p_size; ++i) { // At present, only the response_op is multi-input, so here we get
res->add_profile_time(input_blob->time_stamp[i]); // the profile_time by hard coding. It needs to be replaced with
// a more elegant way.
for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) {
input_blob = get_depend_argument<GeneralBlob>(pre_node_names[pi]);
VLOG(2) << "p size for input blob: " << input_blob->p_size;
int profile_time_idx = -1;
if (pi == 0) {
profile_time_idx = 0;
} else {
profile_time_idx = input_blob->p_size - 2;
}
for (; profile_time_idx < input_blob->p_size; ++profile_time_idx) {
res->add_profile_time(input_blob->time_stamp[profile_time_idx]);
}
} }
// TODO(guru4elephant): find more elegant way to do this // TODO(guru4elephant): find more elegant way to do this
res->add_profile_time(start); res->add_profile_time(start);
......
...@@ -32,22 +32,18 @@ using baidu::paddle_serving::predictor::general_model::Tensor; ...@@ -32,22 +32,18 @@ using baidu::paddle_serving::predictor::general_model::Tensor;
using baidu::paddle_serving::predictor::general_model::Response; using baidu::paddle_serving::predictor::general_model::Response;
using baidu::paddle_serving::predictor::general_model::Request; using baidu::paddle_serving::predictor::general_model::Request;
using baidu::paddle_serving::predictor::general_model::FetchInst; using baidu::paddle_serving::predictor::general_model::FetchInst;
using baidu::paddle_serving::predictor::general_model::ModelOutput;
using baidu::paddle_serving::predictor::InferManager; using baidu::paddle_serving::predictor::InferManager;
using baidu::paddle_serving::predictor::PaddleGeneralModelConfig; using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;
int GeneralTextResponseOp::inference() { int GeneralTextResponseOp::inference() {
const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name()); VLOG(2) << "Going to run inference";
const std::vector<std::string> pre_node_names = pre_names();
VLOG(2) << "pre node names size: " << pre_node_names.size();
if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op: " << pre_name();
return -1;
}
const TensorVector *in = &input_blob->tensor_vector;
int batch_size = input_blob->GetBatchSize();
VLOG(2) << "infer batch size: " << batch_size;
const Request *req = dynamic_cast<const Request *>(get_request_message()); const Request *req = dynamic_cast<const Request *>(get_request_message());
// response inst with only fetch_var_names
Response *res = mutable_data<Response>();
Timer timeline; Timer timeline;
int64_t start = timeline.TimeStampUS(); int64_t start = timeline.TimeStampUS();
...@@ -67,59 +63,90 @@ int GeneralTextResponseOp::inference() { ...@@ -67,59 +63,90 @@ int GeneralTextResponseOp::inference() {
model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)]; model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)];
} }
// response inst with only fetch_var_names const GeneralBlob *input_blob;
Response *res = mutable_data<Response>(); for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) {
const std::string &pre_name = pre_node_names[pi];
VLOG(2) << "pre names[" << pi << "]: " << pre_name << " ("
<< pre_node_names.size() << ")";
input_blob = get_depend_argument<GeneralBlob>(pre_name);
if (!input_blob) {
LOG(ERROR) << "Failed mutable depended argument, op: " << pre_name;
return -1;
}
for (int i = 0; i < batch_size; ++i) { const TensorVector *in = &input_blob->tensor_vector;
FetchInst *fetch_inst = res->add_insts(); int batch_size = input_blob->GetBatchSize();
for (auto &idx : fetch_index) { VLOG(2) << "input batch size: " << batch_size;
Tensor *tensor = fetch_inst->add_tensor_array();
// currently only response float tensor or lod_tensor ModelOutput *output = res->add_outputs();
tensor->set_elem_type(1); output->set_engine_name(
if (model_config->_is_lod_fetch[idx]) { pre_name); // To get the order of model return values
VLOG(2) << "out[" << idx << " is lod_tensor"; for (int i = 0; i < batch_size; ++i) {
tensor->add_shape(-1); FetchInst *fetch_inst = output->add_insts();
} else { for (auto &idx : fetch_index) {
VLOG(2) << "out[" << idx << "] is tensor"; Tensor *tensor = fetch_inst->add_tensor_array();
for (int k = 1; k < in->at(idx).shape.size(); ++k) { // currently only response float tensor or lod_tensor
VLOG(2) << "shape[" << k - 1 << "]: " << in->at(idx).shape[k]; tensor->set_elem_type(1);
tensor->add_shape(in->at(idx).shape[k]); if (model_config->_is_lod_fetch[idx]) {
VLOG(2) << "out[" << idx << " is lod_tensor";
tensor->add_shape(-1);
} else {
VLOG(2) << "out[" << idx << "] is tensor";
for (int k = 1; k < in->at(idx).shape.size(); ++k) {
VLOG(2) << "shape[" << k - 1 << "]: " << in->at(idx).shape[k];
tensor->add_shape(in->at(idx).shape[k]);
}
} }
} }
} }
}
int var_idx = 0; int var_idx = 0;
for (auto &idx : fetch_index) { for (auto &idx : fetch_index) {
float *data_ptr = static_cast<float *>(in->at(idx).data.data()); float *data_ptr = static_cast<float *>(in->at(idx).data.data());
int cap = 1; int cap = 1;
for (int j = 1; j < in->at(idx).shape.size(); ++j) { for (int j = 1; j < in->at(idx).shape.size(); ++j) {
cap *= in->at(idx).shape[j]; cap *= in->at(idx).shape[j];
}
if (model_config->_is_lod_fetch[idx]) {
for (int j = 0; j < batch_size; ++j) {
for (int k = in->at(idx).lod[0][j]; k < in->at(idx).lod[0][j + 1];
k++) {
res->mutable_insts(j)->mutable_tensor_array(var_idx)->add_float_data(
data_ptr[k]);
}
} }
} else { if (model_config->_is_lod_fetch[idx]) {
for (int j = 0; j < batch_size; ++j) { for (int j = 0; j < batch_size; ++j) {
for (int k = j * cap; k < (j + 1) * cap; ++k) { for (int k = in->at(idx).lod[0][j]; k < in->at(idx).lod[0][j + 1];
res->mutable_insts(j)->mutable_tensor_array(var_idx)->add_float_data( k++) {
data_ptr[k]); output->mutable_insts(j)
->mutable_tensor_array(var_idx)
->add_float_data(data_ptr[k]);
}
}
} else {
for (int j = 0; j < batch_size; ++j) {
for (int k = j * cap; k < (j + 1) * cap; ++k) {
output->mutable_insts(j)
->mutable_tensor_array(var_idx)
->add_float_data(data_ptr[k]);
}
} }
} }
var_idx++;
} }
var_idx++;
} }
if (req->profile_server()) { if (req->profile_server()) {
int64_t end = timeline.TimeStampUS(); int64_t end = timeline.TimeStampUS();
// TODO(barriery): multi-model profile_time.
for (int i = 0; i < input_blob->p_size; ++i) { // At present, only the response_op is multi-input, so here we get
res->add_profile_time(input_blob->time_stamp[i]); // the profile_time by hard coding. It needs to be replaced with
// a more elegant way.
for (uint32_t pi = 0; pi < pre_node_names.size(); ++pi) {
input_blob = get_depend_argument<GeneralBlob>(pre_node_names[pi]);
VLOG(2) << "p size for input blob: " << input_blob->p_size;
int profile_time_idx = -1;
if (pi == 0) {
profile_time_idx = 0;
} else {
profile_time_idx = input_blob->p_size - 2;
}
for (; profile_time_idx < input_blob->p_size; ++profile_time_idx) {
res->add_profile_time(input_blob->time_stamp[profile_time_idx]);
}
} }
// TODO(guru4elephant): find more elegant way to do this // TODO(guru4elephant): find more elegant way to do this
res->add_profile_time(start); res->add_profile_time(start);
......
...@@ -40,10 +40,15 @@ message Request { ...@@ -40,10 +40,15 @@ message Request {
}; };
message Response { message Response {
repeated FetchInst insts = 1; repeated ModelOutput outputs = 1;
repeated int64 profile_time = 2; repeated int64 profile_time = 2;
}; };
message ModelOutput {
repeated FetchInst insts = 1;
optional string engine_name = 2;
}
service GeneralModelService { service GeneralModelService {
rpc inference(Request) returns (Response); rpc inference(Request) returns (Response);
rpc debug(Request) returns (Response); rpc debug(Request) returns (Response);
......
...@@ -27,6 +27,10 @@ namespace predictor { ...@@ -27,6 +27,10 @@ namespace predictor {
} }
#endif #endif
// #ifdef WITH_GPU
// #define USE_PTHREAD
// #endif
#ifdef USE_PTHREAD #ifdef USE_PTHREAD
#define THREAD_T pthread_t #define THREAD_T pthread_t
......
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
#include "core/predictor/framework/dag.h" #include "core/predictor/framework/dag.h"
#include <string> #include <string>
#include <utility> // make_pair
#include <vector> #include <vector>
#include "core/predictor/common/inner_common.h" #include "core/predictor/common/inner_common.h"
#include "core/predictor/framework/predictor_metric.h" // PredictorMetric #include "core/predictor/framework/predictor_metric.h" // PredictorMetric
...@@ -199,25 +200,81 @@ const DagStage* Dag::stage_by_index(uint32_t index) { return _stages[index]; } ...@@ -199,25 +200,81 @@ const DagStage* Dag::stage_by_index(uint32_t index) { return _stages[index]; }
int Dag::topo_sort() { int Dag::topo_sort() {
std::stringstream ss; std::stringstream ss;
for (uint32_t nid = 0; nid < _index_nodes.size(); nid++) { uint32_t nodes_size = _index_nodes.size();
DagStage* stage = new (std::nothrow) DagStage(); std::vector<uint32_t> in_degree(nodes_size, 0);
if (stage == NULL) { std::vector<std::vector<uint32_t>> in_egde(nodes_size);
LOG(ERROR) << "Invalid stage!"; for (uint32_t nid = 0; nid < nodes_size; nid++) {
return ERR_MEM_ALLOC_FAILURE; in_degree[nid] += _index_nodes[nid]->depends.size();
for (auto it = _index_nodes[nid]->depends.begin();
it != _index_nodes[nid]->depends.end();
++it) {
uint32_t pnid = Dag::node_by_name(it->first)->id -
1; // 0 is reserved for begginer-op
in_egde[pnid].push_back(nid);
}
}
for (int i = 0; i < in_degree.size(); ++i) {
LOG(INFO) << "(" << _index_nodes[i]->name << ") in_degree[" << i
<< "]: " << in_degree[i];
}
int sorted_num = 0;
DagStage* stage = new (std::nothrow) DagStage();
if (stage == NULL) {
LOG(ERROR) << "Invalid stage!";
return ERR_MEM_ALLOC_FAILURE;
}
ss.str("");
ss << _stages.size();
stage->name = ss.str();
stage->full_name = full_name() + NAME_DELIMITER + stage->name;
for (uint32_t nid = 0; nid < nodes_size; ++nid) {
if (in_degree[nid] == 0) {
++sorted_num;
stage->nodes.push_back(_index_nodes[nid]);
// assign stage number after stage created
_index_nodes[nid]->stage = _stages.size();
// assign dag node full name after stage created
_index_nodes[nid]->full_name =
stage->full_name + NAME_DELIMITER + _index_nodes[nid]->name;
} }
stage->nodes.push_back(_index_nodes[nid]); }
if (stage->nodes.size() == 0) {
LOG(ERROR) << "Invalid Dag!";
return ERR_INTERNAL_FAILURE;
}
_stages.push_back(stage);
while (sorted_num < nodes_size) {
auto pre_nodes = _stages.back()->nodes;
DagStage* stage = new (std::nothrow) DagStage();
ss.str(""); ss.str("");
ss << _stages.size(); ss << _stages.size();
stage->name = ss.str(); stage->name = ss.str();
stage->full_name = full_name() + NAME_DELIMITER + stage->name; stage->full_name = full_name() + NAME_DELIMITER + stage->name;
for (uint32_t pi = 0; pi < pre_nodes.size(); ++pi) {
uint32_t pnid = pre_nodes[pi]->id - 1;
for (uint32_t ei = 0; ei < in_egde[pnid].size(); ++ei) {
uint32_t nid = in_egde[pnid][ei];
--in_degree[nid];
if (in_degree[nid] == 0) {
++sorted_num;
stage->nodes.push_back(_index_nodes[nid]);
// assign stage number after stage created
_index_nodes[nid]->stage = _stages.size();
// assign dag node full name after stage created
_index_nodes[nid]->full_name =
stage->full_name + NAME_DELIMITER + _index_nodes[nid]->name;
}
}
}
if (stage->nodes.size() == 0) {
LOG(ERROR) << "Invalid Dag!";
return ERR_INTERNAL_FAILURE;
}
_stages.push_back(stage); _stages.push_back(stage);
// assign stage number after stage created
_index_nodes[nid]->stage = nid;
// assign dag node full name after stage created
_index_nodes[nid]->full_name =
stage->full_name + NAME_DELIMITER + _index_nodes[nid]->name;
} }
return ERR_OK; return ERR_OK;
} }
......
...@@ -76,19 +76,34 @@ int DagView::init(Dag* dag, const std::string& service_name) { ...@@ -76,19 +76,34 @@ int DagView::init(Dag* dag, const std::string& service_name) {
} }
op->set_full_name(service_name + NAME_DELIMITER + node->full_name); op->set_full_name(service_name + NAME_DELIMITER + node->full_name);
// Set the name of the Op as the key of the matching engine.
VLOG(2) << "op->set_engine_name(" << node->name.c_str() << ")";
op->set_engine_name(node->name);
vnode->conf = node; vnode->conf = node;
vnode->op = op; vnode->op = op;
// Add depends
for (auto it = vnode->conf->depends.begin();
it != vnode->conf->depends.end();
++it) {
std::string pre_node_name = it->first;
VLOG(2) << "add op pre name: \n"
<< "current op name: " << vnode->op->op_name()
<< ", previous op name: " << pre_node_name;
vnode->op->add_pre_node_name(pre_node_name);
}
vstage->nodes.push_back(vnode); vstage->nodes.push_back(vnode);
} }
// TODO(guru4elephant): this seems buggy, please review later // TODO(guru4elephant): this seems buggy, please review later
if (si > 0) { /*if (si > 0) {*/
VLOG(2) << "set op pre name: \n" // VLOG(2) << "set op pre name: \n"
<< "current op name: " << vstage->nodes.back()->op->op_name() //<< "current op name: " << vstage->nodes.back()->op->op_name()
<< " previous op name: " //<< " previous op name: "
<< _view[si - 1]->nodes.back()->op->op_name(); //<< _view[si - 1]->nodes.back()->op->op_name();
vstage->nodes.back()->op->set_pre_node_name( // vstage->nodes.back()->op->set_pre_node_name(
_view[si - 1]->nodes.back()->op->op_name()); //_view[si - 1]->nodes.back()->op->op_name());
} /*}*/
_view.push_back(vstage); _view.push_back(vstage);
} }
...@@ -139,6 +154,7 @@ int DagView::execute_one_stage(ViewStage* vstage, ...@@ -139,6 +154,7 @@ int DagView::execute_one_stage(ViewStage* vstage,
butil::IOBufBuilder* debug_os) { butil::IOBufBuilder* debug_os) {
butil::Timer stage_time(butil::Timer::STARTED); butil::Timer stage_time(butil::Timer::STARTED);
uint32_t node_size = vstage->nodes.size(); uint32_t node_size = vstage->nodes.size();
VLOG(2) << "vstage->nodes.size(): " << node_size;
for (uint32_t ni = 0; ni < node_size; ni++) { for (uint32_t ni = 0; ni < node_size; ni++) {
ViewNode* vnode = vstage->nodes[ni]; ViewNode* vnode = vstage->nodes[ni];
DagNode* conf = vnode->conf; DagNode* conf = vnode->conf;
......
...@@ -765,6 +765,8 @@ class InferManager { ...@@ -765,6 +765,8 @@ class InferManager {
} }
size_t engine_num = model_toolkit_conf.engines_size(); size_t engine_num = model_toolkit_conf.engines_size();
for (size_t ei = 0; ei < engine_num; ++ei) { for (size_t ei = 0; ei < engine_num; ++ei) {
LOG(INFO) << "model_toolkit_conf.engines(" << ei
<< ").name: " << model_toolkit_conf.engines(ei).name();
std::string engine_name = model_toolkit_conf.engines(ei).name(); std::string engine_name = model_toolkit_conf.engines(ei).name();
VersionedInferEngine* engine = new (std::nothrow) VersionedInferEngine(); VersionedInferEngine* engine = new (std::nothrow) VersionedInferEngine();
if (!engine) { if (!engine) {
......
...@@ -56,11 +56,11 @@ int MempoolWrapper::thread_initialize() { ...@@ -56,11 +56,11 @@ int MempoolWrapper::thread_initialize() {
im::fugue::memory::Region* region = new im::fugue::memory::Region(); im::fugue::memory::Region* region = new im::fugue::memory::Region();
region->init(); region->init();
im::Mempool* mempool = new (std::nothrow) im::Mempool(region); im::Mempool* mempool = new (std::nothrow) im::Mempool(region);
MempoolRegion* mempool_region = new MempoolRegion(region, mempool);
if (mempool == NULL) { if (mempool == NULL) {
LOG(ERROR) << "Failed create thread mempool"; LOG(ERROR) << "Failed create thread mempool";
return -1; return -1;
} }
MempoolRegion* mempool_region = new MempoolRegion(region, mempool);
if (THREAD_SETSPECIFIC(_bspec_key, mempool_region) != 0) { if (THREAD_SETSPECIFIC(_bspec_key, mempool_region) != 0) {
LOG(ERROR) << "unable to set the thrd_data"; LOG(ERROR) << "unable to set the thrd_data";
......
...@@ -60,6 +60,7 @@ int Op::init(Bus* bus, ...@@ -60,6 +60,7 @@ int Op::init(Bus* bus,
return -1; return -1;
} }
_pre_node_names.clear();
return custom_init(); return custom_init();
} }
......
...@@ -14,7 +14,9 @@ ...@@ -14,7 +14,9 @@
#pragma once #pragma once
#include <bvar/bvar.h> // bvar::LatencyRecorder #include <bvar/bvar.h> // bvar::LatencyRecorder
#include <cstdlib>
#include <string> #include <string>
#include <vector>
#include "core/predictor/common/inner_common.h" #include "core/predictor/common/inner_common.h"
#include "core/predictor/framework/channel.h" #include "core/predictor/framework/channel.h"
#include "core/predictor/framework/op_repository.h" #include "core/predictor/framework/op_repository.h"
...@@ -132,18 +134,28 @@ class Op { ...@@ -132,18 +134,28 @@ class Op {
const std::string& full_name() const { return _full_name; } const std::string& full_name() const { return _full_name; }
const std::string& pre_name() const { return _pre_node_name; } const std::vector<std::string>& pre_names() const { return _pre_node_names; }
void set_full_name(const std::string full_name) { _full_name = full_name; } void set_full_name(const std::string full_name) { _full_name = full_name; }
void set_pre_node_name(const std::string pre_name) { void add_pre_node_name(const std::string pre_name) {
_pre_node_name = pre_name; _pre_node_names.push_back(pre_name);
} }
const std::string& type() const; const std::string& type() const;
uint32_t id() const; uint32_t id() const;
// Set the name of the Op as the key of the matching engine.
// Notes that this key is only used by infer_op (only the
// infer_op needs to find the corresponding engine).
// At present, there is only general_infer_op.
void set_engine_name(const std::string engine_name) {
_engine_name = engine_name;
}
const std::string& engine_name() const { return _engine_name; }
// --------------- Default implements ---------------- // --------------- Default implements ----------------
virtual int custom_init() { return 0; } virtual int custom_init() { return 0; }
...@@ -189,13 +201,14 @@ class Op { ...@@ -189,13 +201,14 @@ class Op {
Bus* _bus; Bus* _bus;
Dag* _dag; Dag* _dag;
uint32_t _id; uint32_t _id;
std::string _pre_node_name; // only for sequential execution std::vector<std::string> _pre_node_names; // for DAG execution
std::string _name; std::string _name;
std::string _full_name; // service_workflow_stageindex_opname std::string _full_name; // service_workflow_stageindex_opname
std::string _type; std::string _type;
bool _has_calc; bool _has_calc;
bool _has_init; bool _has_init;
TimerFlow* _timer; TimerFlow* _timer;
std::string _engine_name; // only for infer_op
}; };
template <typename T> template <typename T>
...@@ -215,7 +228,10 @@ class OpWithChannel : public Op { ...@@ -215,7 +228,10 @@ class OpWithChannel : public Op {
return _channel; return _channel;
} }
_channel = butil::get_object<ChannelType>(); // TODO(barriery): There are some problems in using butil::get_object
// _channel = butil::get_object<ChannelType>();
_channel = new ChannelType();
if (!_channel) { if (!_channel) {
LOG(ERROR) << "Failed mutable channel of type:" << typeid(T).name(); LOG(ERROR) << "Failed mutable channel of type:" << typeid(T).name();
return NULL; return NULL;
...@@ -229,8 +245,14 @@ class OpWithChannel : public Op { ...@@ -229,8 +245,14 @@ class OpWithChannel : public Op {
int release_channel() { int release_channel() {
if (_channel) { if (_channel) {
_channel->deinit(); _channel->deinit();
butil::return_object<ChannelType>(_channel); delete _channel;
} }
// TODO(barriery): There are some problems in using butil::get_object
/*
if (_channel) {
_channel->deinit();
butil::return_object<ChannelType>(_channel);
} */
_channel = NULL; _channel = NULL;
return 0; return 0;
......
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include "core/sdk-cpp/include/common.h"
namespace baidu {
namespace paddle_serving {
namespace sdk_cpp {
#ifndef CATCH_ANY_AND_RET
#define CATCH_ANY_AND_RET(errno) \
catch (...) { \
LOG(ERROR) << "exception catched"; \
return errno; \
}
#endif
#define USE_PTHREAD
#ifdef USE_PTHREAD
#define THREAD_T pthread_t
#define THREAD_KEY_T pthread_key_t
#define THREAD_MUTEX_T pthread_mutex_t
#define THREAD_KEY_CREATE pthread_key_create
#define THREAD_SETSPECIFIC pthread_setspecific
#define THREAD_GETSPECIFIC pthread_getspecific
#define THREAD_CREATE pthread_create
#define THREAD_CANCEL pthread_cancel
#define THREAD_JOIN pthread_join
#define THREAD_KEY_DELETE pthread_key_delete
#define THREAD_MUTEX_INIT pthread_mutex_init
#define THREAD_MUTEX_LOCK pthread_mutex_lock
#define THREAD_MUTEX_UNLOCK pthread_mutex_unlock
#define THREAD_MUTEX_DESTROY pthread_mutex_destroy
#define THREAD_COND_T pthread_cond_t
#define THREAD_COND_INIT pthread_cond_init
#define THREAD_COND_SIGNAL pthread_cond_signal
#define THREAD_COND_WAIT pthread_cond_wait
#define THREAD_COND_DESTROY pthread_cond_destroy
#else
#define THREAD_T bthread_t
#define THREAD_KEY_T bthread_key_t
#define THREAD_MUTEX_T bthread_mutex_t
#define THREAD_KEY_CREATE bthread_key_create
#define THREAD_SETSPECIFIC bthread_setspecific
#define THREAD_GETSPECIFIC bthread_getspecific
#define THREAD_CREATE bthread_start_background
#define THREAD_CANCEL bthread_stop
#define THREAD_JOIN bthread_join
#define THREAD_KEY_DELETE bthread_key_delete
#define THREAD_MUTEX_INIT bthread_mutex_init
#define THREAD_MUTEX_LOCK bthread_mutex_lock
#define THREAD_MUTEX_UNLOCK bthread_mutex_unlock
#define THREAD_MUTEX_DESTROY bthread_mutex_destroy
#define THREAD_COND_T bthread_cond_t
#define THREAD_COND_INIT bthread_cond_init
#define THREAD_COND_SIGNAL bthread_cond_signal
#define THREAD_COND_WAIT bthread_cond_wait
#define THREAD_COND_DESTROY bthread_cond_destroy
#endif
} // namespace sdk_cpp
} // namespace paddle_serving
} // namespace baidu
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
#include <vector> #include <vector>
#include "core/sdk-cpp/include/common.h" #include "core/sdk-cpp/include/common.h"
#include "core/sdk-cpp/include/endpoint_config.h" #include "core/sdk-cpp/include/endpoint_config.h"
#include "core/sdk-cpp/include/macros.h"
#include "core/sdk-cpp/include/predictor.h" #include "core/sdk-cpp/include/predictor.h"
#include "core/sdk-cpp/include/stub.h" #include "core/sdk-cpp/include/stub.h"
...@@ -245,7 +246,7 @@ class StubImpl : public Stub { ...@@ -245,7 +246,7 @@ class StubImpl : public Stub {
const brpc::ChannelOptions& options); const brpc::ChannelOptions& options);
StubTLS* get_tls() { StubTLS* get_tls() {
return static_cast<StubTLS*>(bthread_getspecific(_bthread_key)); return static_cast<StubTLS*>(THREAD_GETSPECIFIC(_bthread_key));
} }
private: private:
...@@ -262,7 +263,8 @@ class StubImpl : public Stub { ...@@ -262,7 +263,8 @@ class StubImpl : public Stub {
uint32_t _package_size; uint32_t _package_size;
// tls handlers // tls handlers
bthread_key_t _bthread_key; // bthread_key_t _bthread_key;
THREAD_KEY_T _bthread_key;
// bvar variables // bvar variables
std::map<std::string, BvarWrapper*> _ltc_bvars; std::map<std::string, BvarWrapper*> _ltc_bvars;
......
...@@ -70,7 +70,7 @@ int StubImpl<T, C, R, I, O>::initialize(const VariantInfo& var, ...@@ -70,7 +70,7 @@ int StubImpl<T, C, R, I, O>::initialize(const VariantInfo& var,
_endpoint = ep; _endpoint = ep;
if (bthread_key_create(&_bthread_key, NULL) != 0) { if (THREAD_KEY_CREATE(&_bthread_key, NULL) != 0) {
LOG(FATAL) << "Failed create key for stub tls"; LOG(FATAL) << "Failed create key for stub tls";
return -1; return -1;
} }
...@@ -132,13 +132,13 @@ int StubImpl<T, C, R, I, O>::initialize(const VariantInfo& var, ...@@ -132,13 +132,13 @@ int StubImpl<T, C, R, I, O>::initialize(const VariantInfo& var,
template <typename T, typename C, typename R, typename I, typename O> template <typename T, typename C, typename R, typename I, typename O>
int StubImpl<T, C, R, I, O>::thrd_initialize() { int StubImpl<T, C, R, I, O>::thrd_initialize() {
if (bthread_getspecific(_bthread_key) != NULL) { if (THREAD_GETSPECIFIC(_bthread_key) != NULL) {
LOG(WARNING) << "Already thread initialized for stub"; LOG(WARNING) << "Already thread initialized for stub";
return 0; return 0;
} }
StubTLS* tls = new (std::nothrow) StubTLS(); StubTLS* tls = new (std::nothrow) StubTLS();
if (!tls || bthread_setspecific(_bthread_key, tls) != 0) { if (!tls || THREAD_SETSPECIFIC(_bthread_key, tls) != 0) {
LOG(FATAL) << "Failed binding tls data to bthread_key"; LOG(FATAL) << "Failed binding tls data to bthread_key";
return -1; return -1;
} }
......
...@@ -40,10 +40,15 @@ message Request { ...@@ -40,10 +40,15 @@ message Request {
}; };
message Response { message Response {
repeated FetchInst insts = 1; repeated ModelOutput outputs = 1;
repeated int64 profile_time = 2; repeated int64 profile_time = 2;
}; };
message ModelOutput {
repeated FetchInst insts = 1;
optional string engine_name = 2;
}
service GeneralModelService { service GeneralModelService {
rpc inference(Request) returns (Response); rpc inference(Request) returns (Response);
rpc debug(Request) returns (Response); rpc debug(Request) returns (Response);
......
...@@ -12,23 +12,20 @@ Paddle Serving支持基于Paddle进行训练的各种模型,并通过指定模 ...@@ -12,23 +12,20 @@ Paddle Serving支持基于Paddle进行训练的各种模型,并通过指定模
import paddlehub as hub import paddlehub as hub
model_name = "bert_chinese_L-12_H-768_A-12" model_name = "bert_chinese_L-12_H-768_A-12"
module = hub.Module(model_name) module = hub.Module(model_name)
inputs, outputs, program = module.context( inputs, outputs, program = module.context(trainable=True, max_seq_len=20)
trainable=True, max_seq_len=20) feed_keys = ["input_ids", "position_ids", "segment_ids", "input_mask", "pooled_output", "sequence_output"]
feed_keys = ["input_ids", "position_ids", "segment_ids",
"input_mask", "pooled_output", "sequence_output"]
fetch_keys = ["pooled_output", "sequence_output"] fetch_keys = ["pooled_output", "sequence_output"]
feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys])) feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys]))
fetch_dict = dict(zip(fetch_keys, [outputs[x]] for x in fetch_keys)) fetch_dict = dict(zip(fetch_keys, [outputs[x]] for x in fetch_keys))
import paddle_serving_client.io as serving_io import paddle_serving_client.io as serving_io
serving_io.save_model("bert_seq20_model", "bert_seq20_client", serving_io.save_model("bert_seq20_model", "bert_seq20_client", feed_dict, fetch_dict, program)
feed_dict, fetch_dict, program)
``` ```
#### Step2:启动服务 #### Step2:启动服务
``` shell ``` shell
python -m paddle_serving_server_gpu.serve --model bert_seq20_model --thread 10 --port 9292 --gpu_ids 0 python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 --gpu_ids 0
``` ```
| 参数 | 含义 | | 参数 | 含义 |
...@@ -53,7 +50,6 @@ pip install paddle_serving_app ...@@ -53,7 +50,6 @@ pip install paddle_serving_app
客户端脚本 bert_client.py内容如下 客户端脚本 bert_client.py内容如下
``` python ``` python
import os
import sys import sys
from paddle_serving_client import Client from paddle_serving_client import Client
from paddle_serving_app import ChineseBertReader from paddle_serving_app import ChineseBertReader
......
...@@ -2,6 +2,8 @@ ...@@ -2,6 +2,8 @@
(简体中文|[English](./DESIGN.md)) (简体中文|[English](./DESIGN.md))
注意本页内容有已经过期,请查看:[设计文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/DESIGN_DOC_CN.md)
## 1. 项目背景 ## 1. 项目背景
PaddlePaddle是百度开源的机器学习框架,广泛支持各种深度学习模型的定制化开发; Paddle Serving是Paddle的在线预测部分,与Paddle模型训练环节无缝衔接,提供机器学习预测云服务。本文将从模型、服务、接入等层面,自底向上描述Paddle Serving设计方案。 PaddlePaddle是百度开源的机器学习框架,广泛支持各种深度学习模型的定制化开发; Paddle Serving是Paddle的在线预测部分,与Paddle模型训练环节无缝衔接,提供机器学习预测云服务。本文将从模型、服务、接入等层面,自底向上描述Paddle Serving设计方案。
......
...@@ -164,12 +164,26 @@ Distributed Sparse Parameter Indexing is commonly seen in advertising and recomm ...@@ -164,12 +164,26 @@ Distributed Sparse Parameter Indexing is commonly seen in advertising and recomm
<img src='cube_eng.png' width = "450" height = "230"> <img src='cube_eng.png' width = "450" height = "230">
<br> <br>
<p> <p>
Why do we need to support distributed sparse parameter indexing in Paddle Serving? 1) In some recommendation scenarios, the number of features can be up to hundreds of billions that a single node can not hold the parameters within random access memory. 2) Paddle Serving supports distributed sparse parameter indexing that can couple with paddle inference. Users do not need to do extra work to have a low latency inference engine with hundreds of billions of parameters. Why do we need to support distributed sparse parameter indexing in Paddle Serving? 1) In some recommendation scenarios, the number of features can be up to hundreds of billions that a single node can not hold the parameters within random access memory. 2) Paddle Serving supports distributed sparse parameter indexing that can couple with paddle inference. Users do not need to do extra work to have a low latency inference engine with hundreds of billions of parameters.
### 3.2 Model Management, online A/B test, Model Online Reloading
Paddle Serving's C++ engine supports model management, online A/B test and model online reloading. Currently, python API is not released yet, please wait for the next release. ### 3.2 Online A/B test
After sufficient offline evaluation of the model, online A/B test is usually needed to decide whether to enable the service on a large scale. The following figure shows the basic structure of A/B test with Paddle Serving. After the client is configured with the corresponding configuration, the traffic will be automatically distributed to different servers to achieve A/B test. Please refer to [ABTEST in Paddle Serving](ABTEST_IN_PADDLE_SERVING.md) for specific examples.
<p align="center">
<br>
<img src='abtest.png' width = "345" height = "230">
<br>
<p>
### 3.3 Model Online Reloading
In order to ensure the availability of services, the model needs to be hot loaded without service interruption. Paddle Serving supports this feature and provides a tool for monitoring output models to update local models. Please refer to [Hot loading in Paddle Serving](HOT_LOADING_IN_SERVING.md) for specific examples.
### 3.4 Model Management
Paddle Serving's C++ engine supports model management. Currently, python API is not released yet, please wait for the next release.
## 4. User Types ## 4. User Types
Paddle Serving provides RPC and HTTP protocol for users. For HTTP service, we recommend users with median or small traffic services to use, and the latency is not a strict requirement. For RPC protocol, we recommend high traffic services and low latency required services to use. For users who use distributed sparse parameter indexing built-in service, it is not necessary to care about the underlying details of communication. The following figure gives out several scenarios that user may want to use Paddle Serving. Paddle Serving provides RPC and HTTP protocol for users. For HTTP service, we recommend users with median or small traffic services to use, and the latency is not a strict requirement. For RPC protocol, we recommend high traffic services and low latency required services to use. For users who use distributed sparse parameter indexing built-in service, it is not necessary to care about the underlying details of communication. The following figure gives out several scenarios that user may want to use Paddle Serving.
......
...@@ -159,14 +159,30 @@ Paddle Serving的核心执行引擎是一个有向无环图,图中的每个节 ...@@ -159,14 +159,30 @@ Paddle Serving的核心执行引擎是一个有向无环图,图中的每个节
<img src='cube_eng.png' width = "450" height = "230"> <img src='cube_eng.png' width = "450" height = "230">
<br> <br>
<p> <p>
为什么要使用Paddle Serving提供的分布式稀疏参数索引服务?1)在一些推荐场景中,模型的输入特征规模通常可以达到上千亿,单台机器无法支撑T级别模型在内存的保存,因此需要进行分布式存储。2)Paddle Serving提供的分布式稀疏参数索引服务,具有并发请求多个节点的能力,从而以较低的延时完成预估服务。 为什么要使用Paddle Serving提供的分布式稀疏参数索引服务?1)在一些推荐场景中,模型的输入特征规模通常可以达到上千亿,单台机器无法支撑T级别模型在内存的保存,因此需要进行分布式存储。2)Paddle Serving提供的分布式稀疏参数索引服务,具有并发请求多个节点的能力,从而以较低的延时完成预估服务。
### 3.2 模型管理、在线A/B流量测试、模型热加载 ### 3.2 在线A/B流量测试
在对模型进行充分的离线评估后,通常需要进行在线A/B测试,来决定是否大规模上线服务。下图为使用Paddle Serving做A/B测试的基本结构,Client端做好相应的配置后,自动将流量分发给不同的Server,从而完成A/B测试。具体例子请参考[如何使用Paddle Serving做ABTEST](ABTEST_IN_PADDLE_SERVING_CN.md)
<p align="center">
<br>
<img src='abtest.png' width = "345" height = "230">
<br>
<p>
### 3.3 模型热加载
Paddle Serving的C++引擎支持模型管理、在线A/B流量测试、模型热加载等功能,当前在Python API还有没完全开放这部分功能的配置,敬请期待。 为了保证服务的可用性,需要在服务不中断的情况下对模型进行热加载。Paddle Serving对该特性进行了支持,并提供了一个监控产出模型更新本地模型的工具,具体例子请参考[Paddle Serving中的模型热加载](HOT_LOADING_IN_SERVING_CN.md)
### 3.4 模型管理
Paddle Serving的C++引擎支持模型管理功能,当前在Python API还有没完全开放这部分功能的配置,敬请期待。
## 4. 用户类型 ## 4. 用户类型
Paddle Serving面向的用户提供RPC和HTTP两种访问协议。对于HTTP协议,我们更倾向于流量中小型的服务使用,并且对延时没有严格要求的AI服务开发者。对于RPC协议,我们面向流量较大,对延时要求更高的用户,此外RPC的客户端可能也处在一个大系统的服务中,这种情况下非常适合使用Paddle Serving提供的RPC服务。对于使用分布式稀疏参数索引服务而言,Paddle Serving的用户不需要关心底层的细节,其调用本质也是通过RPC服务再调用RPC服务。下图给出了当前设计的Paddle Serving可能会使用Serving服务的几种场景。 Paddle Serving面向的用户提供RPC和HTTP两种访问协议。对于HTTP协议,我们更倾向于流量中小型的服务使用,并且对延时没有严格要求的AI服务开发者。对于RPC协议,我们面向流量较大,对延时要求更高的用户,此外RPC的客户端可能也处在一个大系统的服务中,这种情况下非常适合使用Paddle Serving提供的RPC服务。对于使用分布式稀疏参数索引服务而言,Paddle Serving的用户不需要关心底层的细节,其调用本质也是通过RPC服务再调用RPC服务。下图给出了当前设计的Paddle Serving可能会使用Serving服务的几种场景。
<p align="center"> <p align="center">
......
# How to Convert Paddle Inference Model To Paddle Serving Format
([简体中文](./INFERENCE_TO_SERVING_CN.md)|English)
## Example
``` python
from paddle_serving_client.io import inference_model_to_serving
inference_model_dir = "your_inference_model"
serving_client_dir = "serving_client_dir"
serving_server_dir = "serving_server_dir"
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir)
```
# 如何从Paddle保存的预测模型转为Paddle Serving格式可部署的模型
([English](./INFERENCE_TO_SERVING.md)|简体中文)
## 示例
``` python
from paddle_serving_client.io import inference_model_to_serving
inference_model_dir = "your_inference_model"
serving_client_dir = "serving_client_dir"
serving_server_dir = "serving_server_dir"
feed_var_names, fetch_var_names = inference_model_to_serving(
inference_model_dir, serving_client_dir, serving_server_dir)
```
# Model Ensemble in Paddle Serving
([简体中文](MODEL_ENSEMBLE_IN_PADDLE_SERVING_CN.md)|English)
In some scenarios, multiple models with the same input may be used to predict in parallel and integrate predicted results for better prediction effect. Paddle Serving also supports this feature.
Next, we will take the text classification task as an example to show model ensemble in Paddle Serving (This feature is still serial prediction for the time being. We will support parallel prediction as soon as possible).
## Simple example
In this example (see the figure below), the server side predict the bow and CNN models with the same input in a service in parallel, The client side fetchs the prediction results of the two models, and processes the prediction results to get the final predict results.
![simple example](model_ensemble_example.png)
It should be noted that at present, only multiple models with the same format input and output in the same service are supported. In this example, the input and output formats of CNN and BOW model are the same.
The code used in the example is saved in the `python/examples/imdb` path:
```shell
.
├── get_data.sh
├── imdb_reader.py
├── test_ensemble_client.py
└── test_ensemble_server.py
```
### Prepare data
Get the pre-trained CNN and BOW models by the following command (you can also run the `get_data.sh` script):
```shell
wget --no-check-certificate https://fleet.bj.bcebos.com/text_classification_data.tar.gz
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz
tar -zxvf text_classification_data.tar.gz
tar -zxvf imdb_model.tar.gz
```
### Start server
Start server by the following Python code (you can also run the `test_ensemble_server.py` script):
```python
from paddle_serving_server import OpMaker
from paddle_serving_server import OpGraphMaker
from paddle_serving_server import Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
cnn_infer_op = op_maker.create(
'general_infer', engine_name='cnn', inputs=[read_op])
bow_infer_op = op_maker.create(
'general_infer', engine_name='bow', inputs=[read_op])
response_op = op_maker.create(
'general_response', inputs=[cnn_infer_op, bow_infer_op])
op_graph_maker = OpGraphMaker()
op_graph_maker.add_op(read_op)
op_graph_maker.add_op(cnn_infer_op)
op_graph_maker.add_op(bow_infer_op)
op_graph_maker.add_op(response_op)
server = Server()
server.set_op_graph(op_graph_maker.get_op_graph())
model_config = {cnn_infer_op: 'imdb_cnn_model', bow_infer_op: 'imdb_bow_model'}
server.load_model_config(model_config)
server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
server.run_server()
```
Different from the normal prediction service, here we need to use DAG to describe the logic of the server side.
When creating an Op, you need to specify the predecessor of the current Op (in this example, the predecessor of `cnn_infer_op` and `bow_infer_op` is `read_op`, and the predecessor of `response_op` is `cnn_infer_op` and `bow_infer_op`. For the infer Op `infer_op`, you need to define the prediction engine name `engine_name` (You can also use the default value. It is recommended to set the value to facilitate the client side to obtain the order of prediction results).
At the same time, when configuring the model path, you need to create a model configuration dictionary with the infer Op as the key and the corresponding model path as value to inform Serving which model each infer OP uses.
### Start client
Start client by the following Python code (you can also run the `test_ensemble_client.py` script):
```python
from paddle_serving_client import Client
from imdb_reader import IMDBDataset
client = Client()
# If you have more than one model, make sure that the input
# and output of more than one model are the same.
client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
client.connect(["127.0.0.1:9393"])
# you can define any english sentence or dataset here
# This example reuses imdb reader in training, you
# can define your own data preprocessing easily.
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')
for i in range(3):
line = 'i am very sad | 0'
word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
fetch_maps = client.predict(feed=feed, fetch=fetch)
if len(fetch_maps) == 1:
print("step: {}, res: {}".format(i, fetch_maps['prediction'][0][1]))
else:
for model, fetch_map in fetch_maps.items():
print("step: {}, model: {}, res: {}".format(i, model, fetch_map[
'prediction'][0][1]))
```
Compared with the normal prediction service, the client side has not changed much. When multiple model predictions are used, the prediction service will return a dictionary with engine name `engine_name`(the value is defined on the server side) as the key, and the corresponding model prediction results as the value.
### Expected result
```shell
step: 0, model: cnn, res: 0.560272455215
step: 0, model: bow, res: 0.633530199528
step: 1, model: cnn, res: 0.560272455215
step: 1, model: bow, res: 0.633530199528
step: 2, model: cnn, res: 0.560272455215
step: 2, model: bow, res: 0.633530199528
```
# Paddle Serving中的集成预测
(简体中文|[English](MODEL_ENSEMBLE_IN_PADDLE_SERVING.md))
在一些场景中,可能使用多个相同输入的模型并行集成预测以获得更好的预测效果,Paddle Serving提供了这项功能。
下面将以文本分类任务为例,来展示Paddle Serving的集成预测功能(暂时还是串行预测,我们会尽快支持并行化)。
## 集成预测样例
该样例中(见下图),Server端在一项服务中并行预测相同输入的BOW和CNN模型,Client端获取两个模型的预测结果并进行后处理,得到最终的预测结果。
![simple example](model_ensemble_example.png)
需要注意的是,目前只支持在同一个服务中使用多个相同格式输入输出的模型。在该例子中,CNN模型和BOW模型的输入输出格式是相同的。
样例中用到的代码保存在`python/examples/imdb`路径下:
```shell
.
├── get_data.sh
├── imdb_reader.py
├── test_ensemble_client.py
└── test_ensemble_server.py
```
### 数据准备
通过下面命令获取预训练的CNN和BOW模型(您也可以直接运行`get_data.sh`脚本):
```shell
wget --no-check-certificate https://fleet.bj.bcebos.com/text_classification_data.tar.gz
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz
tar -zxvf text_classification_data.tar.gz
tar -zxvf imdb_model.tar.gz
```
### 启动Server
通过下面的Python代码启动Server端(您也可以直接运行`test_ensemble_server.py`脚本):
```python
from paddle_serving_server import OpMaker
from paddle_serving_server import OpGraphMaker
from paddle_serving_server import Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
cnn_infer_op = op_maker.create(
'general_infer', engine_name='cnn', inputs=[read_op])
bow_infer_op = op_maker.create(
'general_infer', engine_name='bow', inputs=[read_op])
response_op = op_maker.create(
'general_response', inputs=[cnn_infer_op, bow_infer_op])
op_graph_maker = OpGraphMaker()
op_graph_maker.add_op(read_op)
op_graph_maker.add_op(cnn_infer_op)
op_graph_maker.add_op(bow_infer_op)
op_graph_maker.add_op(response_op)
server = Server()
server.set_op_graph(op_graph_maker.get_op_graph())
model_config = {cnn_infer_op: 'imdb_cnn_model', bow_infer_op: 'imdb_bow_model'}
server.load_model_config(model_config)
server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
server.run_server()
```
与普通预测服务不同的是,这里我们需要用DAG来描述Server端的运行逻辑。
在创建Op的时候需要指定当前Op的前继(在该例子中,`cnn_infer_op``bow_infer_op`的前继均是`read_op``response_op`的前继是`cnn_infer_op``bow_infer_op`),对于预测Op`infer_op`还需要定义预测引擎名称`engine_name`(也可以使用默认值,建议设置该值方便Client端获取预测结果)。
同时在配置模型路径时,需要以预测Op为key,对应的模型路径为value,创建模型配置字典,来告知Serving每个预测Op使用哪个模型。
### 启动Client
通过下面的Python代码运行Client端(您也可以直接运行`test_ensemble_client.py`脚本):
```python
from paddle_serving_client import Client
from imdb_reader import IMDBDataset
client = Client()
# If you have more than one model, make sure that the input
# and output of more than one model are the same.
client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
client.connect(["127.0.0.1:9393"])
# you can define any english sentence or dataset here
# This example reuses imdb reader in training, you
# can define your own data preprocessing easily.
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')
for i in range(3):
line = 'i am very sad | 0'
word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
fetch_maps = client.predict(feed=feed, fetch=fetch)
if len(fetch_maps) == 1:
print("step: {}, res: {}".format(i, fetch_maps['prediction'][0][1]))
else:
for model, fetch_map in fetch_maps.items():
print("step: {}, model: {}, res: {}".format(i, model, fetch_map[
'prediction'][0][1]))
```
Client端与普通预测服务没有发生太大的变化。当使用多个模型预测时,预测服务将返回一个key为Server端定义的引擎名称`engine_name`,value为对应的模型预测结果的字典。
### 预期结果
```txt
step: 0, model: cnn, res: 0.560272455215
step: 0, model: bow, res: 0.633530199528
step: 1, model: cnn, res: 0.560272455215
step: 1, model: bow, res: 0.633530199528
step: 2, model: cnn, res: 0.560272455215
step: 2, model: bow, res: 0.633530199528
```
# How to develop a new Web service?
([简体中文](NEW_WEB_SERVICE_CN.md)|English)
This document will take the image classification service based on the Imagenet data set as an example to introduce how to develop a new web service. The complete code can be visited at [here](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imagenet/image_classification_service.py).
## WebService base class
Paddle Serving implements the [WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23) base class. You need to override its `preprocess` and `postprocess` method. The default implementation is as follows:
```python
class WebService(object):
def preprocess(self, feed={}, fetch=[]):
return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
return fetch_map
```
### preprocess
The preprocess method has two input parameters, `feed` and `fetch`. For an HTTP request `request`:
- The value of `feed` is request data `request.json`
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
The return values are the feed and fetch values used in the prediction.
### postprocess
The postprocess method has three input parameters, `feed`, `fetch` and `fetch_map`:
- The value of `feed` is request data `request.json`
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
- The value of `fetch_map` is the model output value.
The return value will be processed as `{"reslut": fetch_map}` as the return of the HTTP request.
## Develop ImageService class
```python
class ImageService(WebService):
def preprocess(self, feed={}, fetch=[]):
reader = ImageReader()
if "image" not in feed:
raise ("feed data error!")
if isinstance(feed["image"], list):
feed_batch = []
for image in feed["image"]:
sample = base64.b64decode(image)
img = reader.process_image(sample)
res_feed = {}
res_feed["image"] = img.reshape(-1)
feed_batch.append(res_feed)
return feed_batch, fetch
else:
sample = base64.b64decode(feed["image"])
img = reader.process_image(sample)
res_feed = {}
res_feed["image"] = img.reshape(-1)
return res_feed, fetch
```
For the above `ImageService`, only the `preprocess` method is rewritten to process the image data in Base64 format into the data format required by prediction.
# 如何开发一个新的Web Service?
(简体中文|[English](NEW_WEB_SERVICE.md))
本文档将以Imagenet图像分类服务为例,来介绍如何开发一个新的Web Service。您可以在[这里](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imagenet/image_classification_service.py)查阅完整的代码。
## WebService基类
Paddle Serving实现了[WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23)基类,您需要重写它的`preprocess`方法和`postprocess`方法,默认实现如下:
```python
class WebService(object):
def preprocess(self, feed={}, fetch=[]):
return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None):
return fetch_map
```
### preprocess方法
preprocess方法有两个输入参数,`feed``fetch`。对于一个HTTP请求`request`
- `feed`的值为请求数据`request.json`
- `fetch`的值为请求数据中的fetch部分`request.json["fetch"]`
返回值分别是预测过程中用到的feed和fetch值。
### postprocess方法
postprocess方法有三个输入参数,`feed``fetch``fetch_map`
- `feed`的值为请求数据`request.json`
- `fetch`的值为请求数据中的fetch部分`request.json["fetch"]`
- `fetch_map`的值为fetch到的模型输出值
返回值将会被处理成`{"reslut": fetch_map}`作为HTTP请求的返回。
## 开发ImageService类
```python
class ImageService(WebService):
def preprocess(self, feed={}, fetch=[]):
reader = ImageReader()
if "image" not in feed:
raise ("feed data error!")
if isinstance(feed["image"], list):
feed_batch = []
for image in feed["image"]:
sample = base64.b64decode(image)
img = reader.process_image(sample)
res_feed = {}
res_feed["image"] = img.reshape(-1)
feed_batch.append(res_feed)
return feed_batch, fetch
else:
sample = base64.b64decode(feed["image"])
img = reader.process_image(sample)
res_feed = {}
res_feed["image"] = img.reshape(-1)
return res_feed, fetch
```
对于上述的`ImageService`,只重写了前处理方法,将base64格式的图片数据处理成模型预测需要的数据格式。
...@@ -14,13 +14,19 @@ Deep neural nets often have some preprocessing steps on input data, and postproc ...@@ -14,13 +14,19 @@ Deep neural nets often have some preprocessing steps on input data, and postproc
## How to define Node ## How to define Node
### Simple series structure
PaddleServing has some predefined Computation Node in the framework. A very commonly used Computation Graph is the simple reader-inference-response mode that can cover most of the single model inference scenarios. A example graph and the corresponding DAG definition code is as follows. PaddleServing has some predefined Computation Node in the framework. A very commonly used Computation Graph is the simple reader-inference-response mode that can cover most of the single model inference scenarios. A example graph and the corresponding DAG definition code is as follows.
<center> <center>
<img src='simple_dag.png' width = "260" height = "370" align="middle"/> <img src='simple_dag.png' width = "260" height = "370" align="middle"/>
</center> </center>
``` python ``` python
import paddle_serving_server as serving import paddle_serving_server as serving
from paddle_serving_server import OpMaker
from paddle_serving_server import OpSeqMaker
op_maker = serving.OpMaker() op_maker = serving.OpMaker()
read_op = op_maker.create('general_reader') read_op = op_maker.create('general_reader')
general_infer_op = op_maker.create('general_infer') general_infer_op = op_maker.create('general_infer')
...@@ -32,18 +38,54 @@ op_seq_maker.add_op(general_infer_op) ...@@ -32,18 +38,54 @@ op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(general_response_op) op_seq_maker.add_op(general_response_op)
``` ```
For simple series logic, we simplify it and build it with `OpSeqMaker`. You can determine the successor by default according to the order of joining `OpSeqMaker` without specifying the successor of each node.
Since the code will be commonly used and users do not have to change the code, PaddleServing releases a easy-to-use launching command for service startup. An example is as follows: Since the code will be commonly used and users do not have to change the code, PaddleServing releases a easy-to-use launching command for service startup. An example is as follows:
``` python ``` python
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
``` ```
### Nodes with multiple inputs
An example containing multiple input nodes is given in the [MODEL_ENSEMBLE_IN_PADDLE_SERVING](MODEL_ENSEMBLE_IN_PADDLE_SERVING.md). A example graph and the corresponding DAG definition code is as follows.
<center>
<img src='complex_dag.png' width = "480" height = "400" align="middle"/>
</center>
```python
from paddle_serving_server import OpMaker
from paddle_serving_server import OpGraphMaker
from paddle_serving_server import Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
cnn_infer_op = op_maker.create(
'general_infer', engine_name='cnn', inputs=[read_op])
bow_infer_op = op_maker.create(
'general_infer', engine_name='bow', inputs=[read_op])
response_op = op_maker.create(
'general_response', inputs=[cnn_infer_op, bow_infer_op])
op_graph_maker = OpGraphMaker()
op_graph_maker.add_op(read_op)
op_graph_maker.add_op(cnn_infer_op)
op_graph_maker.add_op(bow_infer_op)
op_graph_maker.add_op(response_op)
```
For a graph with multiple input nodes, we need to use `OpGraphMaker` to build it, and you must give the predecessor of each node.
## More Examples ## More Examples
If a user has sparse features as inputs, and the model will do embedding lookup for each feature, we can do distributed embedding lookup operation which is not in the Paddle training computation graph. An example is as follows: If a user has sparse features as inputs, and the model will do embedding lookup for each feature, we can do distributed embedding lookup operation which is not in the Paddle training computation graph. An example is as follows:
``` python ``` python
import paddle_serving_server as serving import paddle_serving_server as serving
from paddle_serving_server import OpMaker
from paddle_serving_server import OpSeqMaker
op_maker = serving.OpMaker() op_maker = serving.OpMaker()
read_op = op_maker.create('general_reader') read_op = op_maker.create('general_reader')
dist_kv_op = op_maker.create('general_dist_kv') dist_kv_op = op_maker.create('general_dist_kv')
......
...@@ -14,6 +14,8 @@ ...@@ -14,6 +14,8 @@
## 如何定义节点 ## 如何定义节点
### 简单的串联结构
PaddleServing在框架中具有一些预定义的计算节点。 一种非常常用的计算图是简单的reader-infer-response模式,可以涵盖大多数单一模型推理方案。 示例图和相应的DAG定义代码如下。 PaddleServing在框架中具有一些预定义的计算节点。 一种非常常用的计算图是简单的reader-infer-response模式,可以涵盖大多数单一模型推理方案。 示例图和相应的DAG定义代码如下。
<center> <center>
<img src='simple_dag.png' width = "260" height = "370" align="middle"/> <img src='simple_dag.png' width = "260" height = "370" align="middle"/>
...@@ -21,6 +23,9 @@ PaddleServing在框架中具有一些预定义的计算节点。 一种非常常 ...@@ -21,6 +23,9 @@ PaddleServing在框架中具有一些预定义的计算节点。 一种非常常
``` python ``` python
import paddle_serving_server as serving import paddle_serving_server as serving
from paddle_serving_server import OpMaker
from paddle_serving_server import OpSeqMaker
op_maker = serving.OpMaker() op_maker = serving.OpMaker()
read_op = op_maker.create('general_reader') read_op = op_maker.create('general_reader')
general_infer_op = op_maker.create('general_infer') general_infer_op = op_maker.create('general_infer')
...@@ -32,18 +37,54 @@ op_seq_maker.add_op(general_infer_op) ...@@ -32,18 +37,54 @@ op_seq_maker.add_op(general_infer_op)
op_seq_maker.add_op(general_response_op) op_seq_maker.add_op(general_response_op)
``` ```
对于简单的串联逻辑,我们将其简化为`Sequence`,使用`OpSeqMaker`进行构建。用户可以不指定每个节点的前继,默认按加入`OpSeqMaker`的顺序来确定前继。
由于该代码在大多数情况下都会被使用,并且用户不必更改代码,因此PaddleServing会发布一个易于使用的启动命令来启动服务。 示例如下: 由于该代码在大多数情况下都会被使用,并且用户不必更改代码,因此PaddleServing会发布一个易于使用的启动命令来启动服务。 示例如下:
``` python ``` python
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
``` ```
### 包含多个输入的节点
[Paddle Serving中的集成预测](MODEL_ENSEMBLE_IN_PADDLE_SERVING_CN.md)文档中给出了一个包含多个输入节点的样例,示意图和代码如下。
<center>
<img src='complex_dag.png' width = "480" height = "400" align="middle"/>
</center>
```python
from paddle_serving_server import OpMaker
from paddle_serving_server import OpGraphMaker
from paddle_serving_server import Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
cnn_infer_op = op_maker.create(
'general_infer', engine_name='cnn', inputs=[read_op])
bow_infer_op = op_maker.create(
'general_infer', engine_name='bow', inputs=[read_op])
response_op = op_maker.create(
'general_response', inputs=[cnn_infer_op, bow_infer_op])
op_graph_maker = OpGraphMaker()
op_graph_maker.add_op(read_op)
op_graph_maker.add_op(cnn_infer_op)
op_graph_maker.add_op(bow_infer_op)
op_graph_maker.add_op(response_op)
```
对于含有多输入节点的计算图,需要使用`OpGraphMaker`来构建,同时必须给出每个节点的前继。
## 更多示例 ## 更多示例
如果用户将稀疏特征作为输入,并且模型将对每个特征进行嵌入查找,则我们可以进行分布式嵌入查找操作,该操作不在Paddle训练计算图中。 示例如下: 如果用户将稀疏特征作为输入,并且模型将对每个特征进行嵌入查找,则我们可以进行分布式嵌入查找操作,该操作不在Paddle训练计算图中。 示例如下:
``` python ``` python
import paddle_serving_server as serving import paddle_serving_server as serving
from paddle_serving_server import OpMaker
from paddle_serving_server import OpSeqMaker
op_maker = serving.OpMaker() op_maker = serving.OpMaker()
read_op = op_maker.create('general_reader') read_op = op_maker.create('general_reader')
dist_kv_op = op_maker.create('general_dist_kv') dist_kv_op = op_maker.create('general_dist_kv')
......
doc/abtest.png

291.5 KB | W: | H:

doc/abtest.png

295.1 KB | W: | H:

doc/abtest.png
doc/abtest.png
doc/abtest.png
doc/abtest.png
  • 2-up
  • Swipe
  • Onion skin
...@@ -69,7 +69,7 @@ set environmental variable to specify which gpus are used, the command above mea ...@@ -69,7 +69,7 @@ set environmental variable to specify which gpus are used, the command above mea
### HTTP Inference ### HTTP Inference
``` ```
curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
``` ```
### Benchmark ### Benchmark
......
...@@ -65,7 +65,7 @@ head data-c.txt | python bert_client.py --model bert_seq128_client/serving_clien ...@@ -65,7 +65,7 @@ head data-c.txt | python bert_client.py --model bert_seq128_client/serving_clien
### 执行预测 ### 执行预测
``` ```
curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
``` ```
### Benchmark ### Benchmark
......
...@@ -36,3 +36,4 @@ bert_service.set_gpus(gpu_ids) ...@@ -36,3 +36,4 @@ bert_service.set_gpus(gpu_ids)
bert_service.prepare_server( bert_service.prepare_server(
workdir="workdir", port=int(sys.argv[2]), device="gpu") workdir="workdir", port=int(sys.argv[2]), device="gpu")
bert_service.run_server() bert_service.run_server()
bert_service.run_flask()
# Faster RCNN model on Paddle Serving
([简体中文](./README_CN.md)|English)
### Get The Faster RCNN Model
```
wget https://paddle-serving.bj.bcebos.com/pddet_demo/faster_rcnn_model.tar.gz
wget https://paddle-serving.bj.bcebos.com/pddet_demo/infer_cfg.yml
```
If you want to have more detection models, please refer to [Paddle Detection Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/MODEL_ZOO_cn.md)
### Start the service
```
tar xf faster_rcnn_model.tar.gz
mv faster_rcnn_model/pddet *.
GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
```
### Perform prediction
```
python test_client.py pddet_client_conf/serving_client_conf.prototxt infer_cfg.yml 000000570688.jpg
```
## 3. Result analysis
<p align = "center">
    <br>
<img src = '000000570688.jpg'>
    <br>
<p>
This is the input picture
  
<p align = "center">
    <br>
<img src = '000000570688_bbox.jpg'>
    <br>
<p>
This is the picture after adding bbox. You can see that the client has done post-processing for the picture. In addition, the output/bbox.json also has the number and coordinate information of each box.
# 使用Paddle Serving部署Faster RCNN模型
(简体中文|[English](./README.md))
## 获得Faster RCNN模型
```
wget https://paddle-serving.bj.bcebos.com/pddet_demo/faster_rcnn_model.tar.gz
wget https://paddle-serving.bj.bcebos.com/pddet_demo/infer_cfg.yml
```
如果你想要更多的检测模型,请参考[Paddle检测模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/MODEL_ZOO_cn.md)
### 启动服务
```
tar xf faster_rcnn_model.tar.gz
mv faster_rcnn_model/pddet* .
GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
```
### 执行预测
```
python test_client.py pddet_client_conf/serving_client_conf.prototxt infer_cfg.yml 000000570688.jpg
```
## 3. 结果分析
<p align="center">
<br>
<img src='000000570688.jpg' >
<br>
<p>
这是输入图片
<p align="center">
<br>
<img src='000000570688_bbox.jpg' >
<br>
<p>
这是实现添加了bbox之后的图片,可以看到客户端已经为图片做好了后处理,此外在output/bbox.json也有各个框的编号和坐标信息。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
import sys
import os
import time
from paddle_serving_app.reader.pddet import Detection
import numpy as np
py_version = sys.version_info[0]
feed_var_names = ['image', 'im_shape', 'im_info']
fetch_var_names = ['multiclass_nms']
pddet = Detection(config_path=sys.argv[2], output_dir="./output")
feed_dict = pddet.preprocess(feed_var_names, sys.argv[3])
client = Client()
client.load_client_config(sys.argv[1])
client.connect(['127.0.0.1:9494'])
fetch_map = client.predict(feed=feed_dict, fetch=fetch_var_names)
outs = fetch_map.values()
pddet.postprocess(fetch_map, fetch_var_names)
...@@ -46,5 +46,5 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -46,5 +46,5 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
### Client prediction ### Client prediction
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
``` ```
...@@ -47,5 +47,5 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -47,5 +47,5 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
### 客户端预测 ### 客户端预测
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
``` ```
...@@ -17,7 +17,7 @@ import sys ...@@ -17,7 +17,7 @@ import sys
import cv2 import cv2
import base64 import base64
import numpy as np import numpy as np
from image_reader import ImageReader from paddle_serving_app import ImageReader
class ImageService(WebService): class ImageService(WebService):
...@@ -31,14 +31,14 @@ class ImageService(WebService): ...@@ -31,14 +31,14 @@ class ImageService(WebService):
sample = base64.b64decode(image) sample = base64.b64decode(image)
img = reader.process_image(sample) img = reader.process_image(sample)
res_feed = {} res_feed = {}
res_feed["image"] = img.reshape(-1) res_feed["image"] = img
feed_batch.append(res_feed) feed_batch.append(res_feed)
return feed_batch, fetch return feed_batch, fetch
else: else:
sample = base64.b64decode(feed["image"]) sample = base64.b64decode(feed["image"])
img = reader.process_image(sample) img = reader.process_image(sample)
res_feed = {} res_feed = {}
res_feed["image"] = img.reshape(-1) res_feed["image"] = img
return res_feed, fetch return res_feed, fetch
...@@ -47,3 +47,4 @@ image_service.load_model_config(sys.argv[1]) ...@@ -47,3 +47,4 @@ image_service.load_model_config(sys.argv[1])
image_service.prepare_server( image_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu") workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
image_service.run_server() image_service.run_server()
image_service.run_flask()
...@@ -12,12 +12,12 @@ ...@@ -12,12 +12,12 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from paddle_serving_server_gpu.web_service import WebService
import sys import sys
import cv2 import cv2
import base64 import base64
import numpy as np import numpy as np
from image_reader import ImageReader from paddle_serving_app import ImageReader
from paddle_serving_server_gpu.web_service import WebService
class ImageService(WebService): class ImageService(WebService):
...@@ -32,14 +32,14 @@ class ImageService(WebService): ...@@ -32,14 +32,14 @@ class ImageService(WebService):
sample = base64.b64decode(image) sample = base64.b64decode(image)
img = reader.process_image(sample) img = reader.process_image(sample)
res_feed = {} res_feed = {}
res_feed["image"] = img.reshape(-1) res_feed["image"] = img
feed_batch.append(res_feed) feed_batch.append(res_feed)
return feed_batch, fetch return feed_batch, fetch
else: else:
sample = base64.b64decode(feed["image"]) sample = base64.b64decode(feed["image"])
img = reader.process_image(sample) img = reader.process_image(sample)
res_feed = {} res_feed = {}
res_feed["image"] = img.reshape(-1) res_feed["image"] = img
return res_feed, fetch return res_feed, fetch
...@@ -49,3 +49,4 @@ image_service.set_gpus("0,1") ...@@ -49,3 +49,4 @@ image_service.set_gpus("0,1")
image_service.prepare_server( image_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="gpu") workdir=sys.argv[2], port=int(sys.argv[3]), device="gpu")
image_service.run_server() image_service.run_server()
image_service.run_flask()
...@@ -27,11 +27,11 @@ def predict(image_path, server): ...@@ -27,11 +27,11 @@ def predict(image_path, server):
image = base64.b64encode(open(image_path).read()) image = base64.b64encode(open(image_path).read())
else: else:
image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8") image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
req = json.dumps({"image": image, "fetch": ["score"]}) req = json.dumps({"feed": [{"image": image}], "fetch": ["score"]})
r = requests.post( r = requests.post(
server, data=req, headers={"Content-Type": "application/json"}) server, data=req, headers={"Content-Type": "application/json"})
try: try:
print(r.json()["score"][0]) print(r.json()["result"]["score"])
except ValueError: except ValueError:
print(r.text) print(r.text)
return r return r
......
...@@ -26,7 +26,7 @@ start = time.time() ...@@ -26,7 +26,7 @@ start = time.time()
for i in range(1000): for i in range(1000):
with open("./data/n01440764_10026.JPEG", "rb") as f: with open("./data/n01440764_10026.JPEG", "rb") as f:
img = f.read() img = f.read()
img = reader.process_image(img).reshape(-1) img = reader.process_image(img)
fetch_map = client.predict(feed={"image": img}, fetch=["score"]) fetch_map = client.predict(feed={"image": img}, fetch=["score"])
end = time.time() end = time.time()
print(end - start) print(end - start)
......
...@@ -28,7 +28,7 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab ...@@ -28,7 +28,7 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
### HTTP Infer ### HTTP Infer
``` ```
curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
``` ```
### Benchmark ### Benchmark
......
...@@ -27,7 +27,7 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab ...@@ -27,7 +27,7 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
### 执行预测 ### 执行预测
``` ```
curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
``` ```
### Benchmark ### Benchmark
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import Client
from imdb_reader import IMDBDataset
client = Client()
# If you have more than one model, make sure that the input
# and output of more than one model are the same.
client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
client.connect(["127.0.0.1:9393"])
# you can define any english sentence or dataset here
# This example reuses imdb reader in training, you
# can define your own data preprocessing easily.
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')
for i in range(3):
line = 'i am very sad | 0'
word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
fetch_maps = client.predict(feed=feed, fetch=fetch)
if len(fetch_maps) == 1:
print("step: {}, res: {}".format(i, fetch_maps['prediction'][0][1]))
else:
for model, fetch_map in fetch_maps.items():
print("step: {}, model: {}, res: {}".format(i, model, fetch_map[
'prediction'][0][1]))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_server import OpMaker
from paddle_serving_server import OpGraphMaker
from paddle_serving_server import Server
op_maker = OpMaker()
read_op = op_maker.create('general_reader')
cnn_infer_op = op_maker.create(
'general_infer', engine_name='cnn', inputs=[read_op])
bow_infer_op = op_maker.create(
'general_infer', engine_name='bow', inputs=[read_op])
response_op = op_maker.create(
'general_response', inputs=[cnn_infer_op, bow_infer_op])
op_graph_maker = OpGraphMaker()
op_graph_maker.add_op(read_op)
op_graph_maker.add_op(cnn_infer_op)
op_graph_maker.add_op(bow_infer_op)
op_graph_maker.add_op(response_op)
server = Server()
server.set_op_graph(op_graph_maker.get_op_graph())
model_config = {cnn_infer_op: 'imdb_cnn_model', bow_infer_op: 'imdb_bow_model'}
server.load_model_config(model_config)
server.prepare_server(workdir="work_dir1", port=9393, device="cpu")
server.run_server()
...@@ -39,3 +39,4 @@ imdb_service.prepare_server( ...@@ -39,3 +39,4 @@ imdb_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu") workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
imdb_service.prepare_dict({"dict_file_path": sys.argv[4]}) imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
imdb_service.run_server() imdb_service.run_server()
imdb_service.run_flask()
...@@ -28,5 +28,5 @@ python lac_web_service.py jieba_server_model/ lac_workdir 9292 ...@@ -28,5 +28,5 @@ python lac_web_service.py jieba_server_model/ lac_workdir 9292
### HTTP Infer ### HTTP Infer
``` ```
curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天安门", "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
``` ```
...@@ -28,5 +28,5 @@ python lac_web_service.py jieba_server_model/ lac_workdir 9292 ...@@ -28,5 +28,5 @@ python lac_web_service.py jieba_server_model/ lac_workdir 9292
### 执行HTTP预测 ### 执行HTTP预测
``` ```
curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天安门", "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
``` ```
...@@ -101,7 +101,7 @@ class LACReader(object): ...@@ -101,7 +101,7 @@ class LACReader(object):
return word_ids return word_ids
def parse_result(self, words, crf_decode): def parse_result(self, words, crf_decode):
tags = [self.id2label_dict[str(x)] for x in crf_decode] tags = [self.id2label_dict[str(x[0])] for x in crf_decode]
sent_out = [] sent_out = []
tags_out = [] tags_out = []
......
# Chinese sentence sentiment classification
([简体中文](./README_CN.md)|English)
## Get model files and sample data
```
sh get_data.sh
```
## Start http service
```
python senta_web_service.py senta_bilstm_model/ workdir 9292
```
In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). Set model path by ```lac_model_path``` and dictionary path by ```lac_dict_path```.
In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task. The LAC prediction service is deployed on the CPU, and the sentiment classification task is deployed on the GPU, which can be changed according to the actual situation.
## Client prediction
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
```
# 中文语句情感分类
(简体中文|[English](./README.md))
## 获取模型文件和样例数据
```
sh get_data.sh
```
## 启动HTTP服务
```
python senta_web_service.py senta_bilstm_model/ workdir 9292
```
中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词,在脚本中通过```lac_model_path```参数配置LAC任务的模型文件路径,```lac_dict_path```参数配置LAC任务词典路径。
示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分,LAC预测服务部署在CPU上,情感分类任务部署在GPU上,可以根据实际情况进行更改。
## 客户端预测
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
```
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/senta_bilstm.tar.gz --no-check-certificate
tar -xzvf senta_bilstm.tar.gz
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/lac_model.tar.gz --no-check-certificate
tar -xzvf lac_model.tar.gz
wget https://paddle-serving.bj.bcebos.com/reader/lac/lac_dict.tar.gz --no-check-certificate
tar -xzvf lac_dict.tar.gz
wget https://paddle-serving.bj.bcebos.com/reader/senta/vocab.txt --no-check-certificate
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_server_gpu.web_service import WebService
from paddle_serving_client import Client
from paddle_serving_app import LACReader, SentaReader
import numpy as np
import os
import io
import sys
import subprocess
from multiprocessing import Process, Queue
class SentaService(WebService):
def set_config(
self,
lac_model_path,
lac_dict_path,
senta_dict_path, ):
self.lac_model_path = lac_model_path
self.lac_client_config_path = lac_model_path + "/serving_server_conf.prototxt"
self.lac_dict_path = lac_dict_path
self.senta_dict_path = senta_dict_path
self.show = False
def show_detail(self, show=False):
self.show = show
def start_lac_service(self):
os.chdir('./lac_serving')
self.lac_port = self.port + 100
r = os.popen(
"python -m paddle_serving_server.serve --model {} --port {} &".
format("../" + self.lac_model_path, self.lac_port))
os.chdir('..')
def init_lac_service(self):
ps = Process(target=self.start_lac_service())
ps.start()
#self.init_lac_client()
def lac_predict(self, feed_data):
self.init_lac_client()
lac_result = self.lac_client.predict(
feed={"words": feed_data}, fetch=["crf_decode"])
self.lac_client.release()
return lac_result
def init_lac_client(self):
self.lac_client = Client()
self.lac_client.load_client_config(self.lac_client_config_path)
self.lac_client.connect(["127.0.0.1:{}".format(self.lac_port)])
def init_lac_reader(self):
self.lac_reader = LACReader(self.lac_dict_path)
def init_senta_reader(self):
self.senta_reader = SentaReader(vocab_path=self.senta_dict_path)
def preprocess(self, feed=[], fetch=[]):
feed_data = self.lac_reader.process(feed[0]["words"])
if self.show:
print("---- lac reader ----")
print(feed_data)
lac_result = self.lac_predict(feed_data)
if self.show:
print("---- lac out ----")
print(lac_result)
segs = self.lac_reader.parse_result(feed[0]["words"],
lac_result["crf_decode"])
if self.show:
print("---- lac parse ----")
print(segs)
feed_data = self.senta_reader.process(segs)
if self.show:
print("---- senta reader ----")
print("feed_data", feed_data)
return {"words": feed_data}, fetch
senta_service = SentaService(name="senta")
#senta_service.show_detail(True)
senta_service.set_config(
lac_model_path="./lac_model",
lac_dict_path="./lac_dict",
senta_dict_path="./vocab.txt")
senta_service.load_model_config(sys.argv[1])
senta_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
senta_service.init_lac_reader()
senta_service.init_senta_reader()
senta_service.init_lac_service()
senta_service.run_server()
senta_service.run_flask()
...@@ -12,3 +12,6 @@ ...@@ -12,3 +12,6 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from .reader.chinese_bert_reader import ChineseBertReader from .reader.chinese_bert_reader import ChineseBertReader
from .reader.image_reader import ImageReader
from .reader.lac_reader import LACReader
from .reader.senta_reader import SentaReader
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os
from collections import OrderedDict
class ServingModels(object):
def __init__(self):
self.model_dict = OrderedDict()
#senta
for key in [
"senta_bilstm", "senta_bow", "senta_cnn", "senta_gru",
"senta_lstm"
]:
self.model_dict[
key] = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/" + key + ".tar.gz"
#image classification
for key in [
"alexnet_imagenet",
"darknet53-imagenet",
"densenet121_imagenet",
"densenet161_imagenet",
"densenet169_imagenet",
"densenet201_imagenet",
"densenet264_imagenet"
"dpn107_imagenet",
"dpn131_imagenet",
"dpn68_imagenet",
"dpn92_imagenet",
"dpn98_imagenet",
"efficientnetb0_imagenet",
"efficientnetb1_imagenet",
"efficientnetb2_imagenet",
"efficientnetb3_imagenet",
"efficientnetb4_imagenet",
"efficientnetb5_imagenet",
"efficientnetb6_imagenet",
"googlenet_imagenet",
"inception_v4_imagenet",
"inception_v2_imagenet",
"nasnet_imagenet",
"pnasnet_imagenet",
"resnet_v2_101_imagenet",
"resnet_v2_151_imagenet",
"resnet_v2_18_imagenet",
"resnet_v2_34_imagenet",
" resnet_v2_50_imagenet",
"resnext101_32x16d_wsl",
"resnext101_32x32d_wsl",
"resnext101_32x48d_wsl",
"resnext101_32x8d_wsl",
"resnext101_32x4d_imagenet",
"resnext101_64x4d_imagenet",
"resnext101_vd_32x4d_imagenet",
"resnext101_vd_64x4d_imagenet",
"resnext152_64x4d_imagenet",
"resnext152_vd_64x4d_imagenet",
"resnext50_64x4d_imagenet",
"resnext50_vd_32x4d_imagenet",
"resnext50_vd_64x4d_imagenet",
"se_resnext101_32x4d_imagenet",
"se_resnext50_32x4d_imagenet",
"shufflenet_v2_imagenet",
"vgg11_imagenet",
"vgg13_imagenet",
"vgg16_imagenet",
"vgg19_imagenet",
"xception65_imagenet",
"xception71_imagenet",
]:
self.model_dict[
key] = "https://paddle-serving.bj.bcebos.com/paddle_hub_models/image/ImageClassification/" + key + ".tar.gz"
def get_model_list(self):
return (self.model_dict.keys())
def download(self, model_name):
if model_name in self.model_dict:
url = self.model_dict[model_name]
r = os.system('wget ' + url + ' --no-check-certificate')
if __name__ == "__main__":
models = ServingModels()
print(models.get_model_list())
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import numpy as np
class ImageReader():
def __init__(self,
image_shape=[3, 224, 224],
image_mean=[0.485, 0.456, 0.406],
image_std=[0.229, 0.224, 0.225],
resize_short_size=256,
interpolation=None,
crop_center=True):
self.image_mean = image_mean
self.image_std = image_std
self.image_shape = image_shape
self.resize_short_size = resize_short_size
self.interpolation = interpolation
self.crop_center = crop_center
def resize_short(self, img, target_size, interpolation=None):
"""resize image
Args:
img: image data
target_size: resize short target size
interpolation: interpolation mode
Returns:
resized image data
"""
percent = float(target_size) / min(img.shape[0], img.shape[1])
resized_width = int(round(img.shape[1] * percent))
resized_height = int(round(img.shape[0] * percent))
if interpolation:
resized = cv2.resize(
img, (resized_width, resized_height),
interpolation=interpolation)
else:
resized = cv2.resize(img, (resized_width, resized_height))
return resized
def crop_image(self, img, target_size, center):
"""crop image
Args:
img: images data
target_size: crop target size
center: crop mode
Returns:
img: cropped image data
"""
height, width = img.shape[:2]
size = target_size
if center == True:
w_start = (width - size) // 2
h_start = (height - size) // 2
else:
w_start = np.random.randint(0, width - size + 1)
h_start = np.random.randint(0, height - size + 1)
w_end = w_start + size
h_end = h_start + size
img = img[h_start:h_end, w_start:w_end, :]
return img
def process_image(self, sample):
""" process_image """
mean = self.image_mean
std = self.image_std
crop_size = self.image_shape[1]
data = np.fromstring(sample, np.uint8)
img = cv2.imdecode(data, cv2.IMREAD_COLOR)
if img is None:
print("img is None, pass it.")
return None
if crop_size > 0:
target_size = self.resize_short_size
img = self.resize_short(
img, target_size, interpolation=self.interpolation)
img = self.crop_image(
img, target_size=crop_size, center=self.crop_center)
img = img[:, :, ::-1]
img = img.astype('float32').transpose((2, 0, 1)) / 255
img_mean = np.array(mean).reshape((3, 1, 1))
img_std = np.array(std).reshape((3, 1, 1))
img -= img_mean
img /= img_std
return img
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import os
import io
def load_kv_dict(dict_path,
reverse=False,
delimiter="\t",
key_func=None,
value_func=None):
result_dict = {}
for line in io.open(dict_path, "r", encoding="utf8"):
terms = line.strip("\n").split(delimiter)
if len(terms) != 2:
continue
if reverse:
value, key = terms
else:
key, value = terms
if key in result_dict:
raise KeyError("key duplicated with [%s]" % (key))
if key_func:
key = key_func(key)
if value_func:
value = value_func(value)
result_dict[key] = value
return result_dict
class LACReader(object):
"""data reader"""
def __init__(self, dict_folder):
# read dict
#basepath = os.path.abspath(__file__)
#folder = os.path.dirname(basepath)
word_dict_path = os.path.join(dict_folder, "word.dic")
label_dict_path = os.path.join(dict_folder, "tag.dic")
replace_dict_path = os.path.join(dict_folder, "q2b.dic")
self.word2id_dict = load_kv_dict(
word_dict_path, reverse=True, value_func=int)
self.id2word_dict = load_kv_dict(word_dict_path)
self.label2id_dict = load_kv_dict(
label_dict_path, reverse=True, value_func=int)
self.id2label_dict = load_kv_dict(label_dict_path)
self.word_replace_dict = load_kv_dict(replace_dict_path)
@property
def vocab_size(self):
"""vocabulary size"""
return max(self.word2id_dict.values()) + 1
@property
def num_labels(self):
"""num_labels"""
return max(self.label2id_dict.values()) + 1
def word_to_ids(self, words):
"""convert word to word index"""
word_ids = []
idx = 0
try:
words = unicode(words, 'utf-8')
except:
pass
for word in words:
word = self.word_replace_dict.get(word, word)
if word not in self.word2id_dict:
word = "OOV"
word_id = self.word2id_dict[word]
word_ids.append(word_id)
return word_ids
def label_to_ids(self, labels):
"""convert label to label index"""
label_ids = []
for label in labels:
if label not in self.label2id_dict:
label = "O"
label_id = self.label2id_dict[label]
label_ids.append(label_id)
return label_ids
def process(self, sent):
words = sent.strip()
word_ids = self.word_to_ids(words)
return word_ids
def parse_result(self, words, crf_decode):
tags = [self.id2label_dict[str(x[0])] for x in crf_decode]
sent_out = []
tags_out = []
partial_word = ""
for ind, tag in enumerate(tags):
if partial_word == "":
partial_word = words[ind]
tags_out.append(tag.split('-')[0])
continue
if tag.endswith("-B") or (tag == "O" and tag[ind - 1] != "O"):
sent_out.append(partial_word)
tags_out.append(tag.split('-')[0])
partial_word = words[ind]
continue
partial_word += words[ind]
if len(sent_out) < len(tags_out):
sent_out.append(partial_word)
return sent_out
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
from .image_tool import Resize, Detection
此差异已折叠。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import io
class SentaReader():
def __init__(self, vocab_path, max_seq_len=20):
self.max_seq_len = max_seq_len
self.word_dict = self.load_vocab(vocab_path)
def load_vocab(self, vocab_path):
"""
load the given vocabulary
"""
vocab = {}
with io.open(vocab_path, 'r', encoding='utf8') as f:
for line in f:
if line.strip() not in vocab:
data = line.strip().split("\t")
if len(data) < 2:
word = ""
wid = data[0]
else:
word = data[0]
wid = data[1]
vocab[word] = int(wid)
vocab["<unk>"] = len(vocab)
return vocab
def process(self, cols):
unk_id = len(self.word_dict)
pad_id = 0
wids = [
self.word_dict[x] if x in self.word_dict else unk_id for x in cols
]
'''
seq_len = len(wids)
if seq_len < self.max_seq_len:
for i in range(self.max_seq_len - seq_len):
wids.append(pad_id)
else:
wids = wids[:self.max_seq_len]
seq_len = self.max_seq_len
'''
return wids
...@@ -26,6 +26,34 @@ int_type = 0 ...@@ -26,6 +26,34 @@ int_type = 0
float_type = 1 float_type = 1
class _NOPProfiler(object):
def record(self, name):
pass
def print_profile(self):
pass
class _TimeProfiler(object):
def __init__(self):
self.pid = os.getpid()
self.print_head = 'PROFILE\tpid:{}\t'.format(self.pid)
self.time_record = [self.print_head]
def record(self, name):
self.time_record.append('{}:{} '.format(
name, int(round(time.time() * 1000000))))
def print_profile(self):
self.time_record.append('\n')
sys.stderr.write(''.join(self.time_record))
self.time_record = [self.print_head]
_is_profile = int(os.environ.get('FLAGS_profile_client', 0))
_Profiler = _TimeProfiler if _is_profile else _NOPProfiler
class SDKConfig(object): class SDKConfig(object):
def __init__(self): def __init__(self):
self.sdk_desc = sdk.SDKConf() self.sdk_desc = sdk.SDKConf()
...@@ -89,6 +117,7 @@ class Client(object): ...@@ -89,6 +117,7 @@ class Client(object):
self.predictor_sdk_ = None self.predictor_sdk_ = None
self.producers = [] self.producers = []
self.consumer = None self.consumer = None
self.profile_ = _Profiler()
def rpath(self): def rpath(self):
lib_path = os.path.dirname(paddle_serving_client.__file__) lib_path = os.path.dirname(paddle_serving_client.__file__)
...@@ -184,6 +213,8 @@ class Client(object): ...@@ -184,6 +213,8 @@ class Client(object):
key)) key))
def predict(self, feed=None, fetch=None, need_variant_tag=False): def predict(self, feed=None, fetch=None, need_variant_tag=False):
self.profile_.record('py_prepro_0')
if feed is None or fetch is None: if feed is None or fetch is None:
raise ValueError("You should specify feed and fetch for prediction") raise ValueError("You should specify feed and fetch for prediction")
...@@ -256,36 +287,62 @@ class Client(object): ...@@ -256,36 +287,62 @@ class Client(object):
int_slot_batch.append(int_slot) int_slot_batch.append(int_slot)
float_slot_batch.append(float_slot) float_slot_batch.append(float_slot)
self.profile_.record('py_prepro_1')
self.profile_.record('py_client_infer_0')
result_batch = self.result_handle_ result_batch = self.result_handle_
res = self.client_handle_.batch_predict( res = self.client_handle_.batch_predict(
float_slot_batch, float_feed_names, float_shape, int_slot_batch, float_slot_batch, float_feed_names, float_shape, int_slot_batch,
int_feed_names, int_shape, fetch_names, result_batch, self.pid) int_feed_names, int_shape, fetch_names, result_batch, self.pid)
self.profile_.record('py_client_infer_1')
self.profile_.record('py_postpro_0')
if res == -1: if res == -1:
return None return None
result_map_batch = [] multi_result_map = []
result_map = {} model_engine_names = result_batch.get_engine_names()
# result map needs to be a numpy array for mi, engine_name in enumerate(model_engine_names):
for i, name in enumerate(fetch_names): result_map = {}
if self.fetch_names_to_type_[name] == int_type: # result map needs to be a numpy array
result_map[name] = result_batch.get_int64_by_name(name) for i, name in enumerate(fetch_names):
shape = result_batch.get_shape(name) if self.fetch_names_to_type_[name] == int_type:
result_map[name] = np.array(result_map[name]) result_map[name] = result_batch.get_int64_by_name(mi, name)
result_map[name].shape = shape shape = result_batch.get_shape(mi, name)
if name in self.lod_tensor_set: result_map[name] = np.array(result_map[name], dtype='int64')
result_map["{}.lod".format(name)] = result_batch.get_lod( result_map[name].shape = shape
name) if name in self.lod_tensor_set:
elif self.fetch_names_to_type_[name] == float_type: result_map["{}.lod".format(name)] = np.array(
result_map[name] = result_batch.get_float_by_name(name) result_batch.get_lod(mi, name))
shape = result_batch.get_shape(name) elif self.fetch_names_to_type_[name] == float_type:
result_map[name] = np.array(result_map[name]) result_map[name] = result_batch.get_float_by_name(mi, name)
result_map[name].shape = shape shape = result_batch.get_shape(mi, name)
if name in self.lod_tensor_set: result_map[name] = np.array(
result_map["{}.lod".format(name)] = result_batch.get_lod( result_map[name], dtype='float32')
name) result_map[name].shape = shape
if name in self.lod_tensor_set:
return result_map result_map["{}.lod".format(name)] = np.array(
result_batch.get_lod(mi, name))
multi_result_map.append(result_map)
ret = None
if len(model_engine_names) == 1:
# If only one model result is returned, the format of ret is result_map
ret = multi_result_map[0]
else:
# If multiple model results are returned, the format of ret is {name: result_map}
ret = {
engine_name: multi_result_map[mi]
for mi, engine_name in enumerate(model_engine_names)
}
self.profile_.record('py_postpro_1')
self.profile_.print_profile()
# When using the A/B test, the tag of variant needs to be returned
return ret if not need_variant_tag else [
ret, self.result_handle_.variant_tag()
]
def release(self): def release(self):
self.client_handle_.destroy_predictor() self.client_handle_.destroy_predictor()
......
...@@ -20,6 +20,7 @@ from paddle.fluid.framework import default_main_program ...@@ -20,6 +20,7 @@ from paddle.fluid.framework import default_main_program
from paddle.fluid.framework import Program from paddle.fluid.framework import Program
from paddle.fluid import CPUPlace from paddle.fluid import CPUPlace
from paddle.fluid.io import save_inference_model from paddle.fluid.io import save_inference_model
import paddle.fluid as fluid
from ..proto import general_model_config_pb2 as model_conf from ..proto import general_model_config_pb2 as model_conf
import os import os
...@@ -100,3 +101,20 @@ def save_model(server_model_folder, ...@@ -100,3 +101,20 @@ def save_model(server_model_folder,
with open("{}/serving_server_conf.stream.prototxt".format( with open("{}/serving_server_conf.stream.prototxt".format(
server_model_folder), "wb") as fout: server_model_folder), "wb") as fout:
fout.write(config.SerializeToString()) fout.write(config.SerializeToString())
def inference_model_to_serving(infer_model, serving_client, serving_server):
place = fluid.CPUPlace()
exe = fluid.Executor(place)
inference_program, feed_target_names, fetch_targets = \
fluid.io.load_inference_model(dirname=infer_model, executor=exe)
feed_dict = {
x: inference_program.global_block().var(x)
for x in feed_target_names
}
fetch_dict = {x.name: x for x in fetch_targets}
save_model(serving_client, serving_server, feed_dict, fetch_dict,
inference_program)
feed_names = feed_dict.keys()
fetch_names = fetch_dict.keys()
return feed_names, fetch_names
...@@ -11,6 +11,7 @@ ...@@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# pylint: disable=doc-string-missing
import os import os
from .proto import server_configure_pb2 as server_sdk from .proto import server_configure_pb2 as server_sdk
...@@ -21,6 +22,7 @@ import socket ...@@ -21,6 +22,7 @@ import socket
import paddle_serving_server as paddle_serving_server import paddle_serving_server as paddle_serving_server
from .version import serving_server_version from .version import serving_server_version
from contextlib import closing from contextlib import closing
import collections
class OpMaker(object): class OpMaker(object):
...@@ -36,17 +38,35 @@ class OpMaker(object): ...@@ -36,17 +38,35 @@ class OpMaker(object):
"general_dist_kv_quant_infer": "GeneralDistKVQuantInferOp", "general_dist_kv_quant_infer": "GeneralDistKVQuantInferOp",
"general_copy": "GeneralCopyOp" "general_copy": "GeneralCopyOp"
} }
self.node_name_suffix_ = collections.defaultdict(int)
# currently, inputs and outputs are not used def create(self, node_type, engine_name=None, inputs=[], outputs=[]):
# when we have OpGraphMaker, inputs and outputs are necessary if node_type not in self.op_dict:
def create(self, name, inputs=[], outputs=[]): raise Exception("Op type {} is not supported right now".format(
if name not in self.op_dict: node_type))
raise Exception("Op name {} is not supported right now".format(
name))
node = server_sdk.DAGNode() node = server_sdk.DAGNode()
node.name = "{}_op".format(name) # node.name will be used as the infer engine name
node.type = self.op_dict[name] if engine_name:
return node node.name = engine_name
else:
node.name = '{}_{}'.format(node_type,
self.node_name_suffix_[node_type])
self.node_name_suffix_[node_type] += 1
node.type = self.op_dict[node_type]
if inputs:
for dep_node_str in inputs:
dep_node = server_sdk.DAGNode()
google.protobuf.text_format.Parse(dep_node_str, dep_node)
dep = server_sdk.DAGNodeDependency()
dep.name = dep_node.name
dep.mode = "RO"
node.dependencies.extend([dep])
# Because the return value will be used as the key value of the
# dict, and the proto object is variable which cannot be hashed,
# so it is processed into a string. This has little effect on
# overall efficiency.
return google.protobuf.text_format.MessageToString(node)
class OpSeqMaker(object): class OpSeqMaker(object):
...@@ -55,12 +75,25 @@ class OpSeqMaker(object): ...@@ -55,12 +75,25 @@ class OpSeqMaker(object):
self.workflow.name = "workflow1" self.workflow.name = "workflow1"
self.workflow.workflow_type = "Sequence" self.workflow.workflow_type = "Sequence"
def add_op(self, node): def add_op(self, node_str):
node = server_sdk.DAGNode()
google.protobuf.text_format.Parse(node_str, node)
if len(node.dependencies) > 1:
raise Exception(
'Set more than one predecessor for op in OpSeqMaker is not allowed.'
)
if len(self.workflow.nodes) >= 1: if len(self.workflow.nodes) >= 1:
dep = server_sdk.DAGNodeDependency() if len(node.dependencies) == 0:
dep.name = self.workflow.nodes[-1].name dep = server_sdk.DAGNodeDependency()
dep.mode = "RO" dep.name = self.workflow.nodes[-1].name
node.dependencies.extend([dep]) dep.mode = "RO"
node.dependencies.extend([dep])
elif len(node.dependencies) == 1:
if node.dependencies[0].name != self.workflow.nodes[-1].name:
raise Exception(
'You must add op in order in OpSeqMaker. The previous op is {}, but the current op is followed by {}.'.
format(node.dependencies[0].name, self.workflow.nodes[
-1].name))
self.workflow.nodes.extend([node]) self.workflow.nodes.extend([node])
def get_op_sequence(self): def get_op_sequence(self):
...@@ -69,13 +102,30 @@ class OpSeqMaker(object): ...@@ -69,13 +102,30 @@ class OpSeqMaker(object):
return workflow_conf return workflow_conf
class OpGraphMaker(object):
def __init__(self):
self.workflow = server_sdk.Workflow()
self.workflow.name = "workflow1"
# Currently, SDK only supports "Sequence"
self.workflow.workflow_type = "Sequence"
def add_op(self, node_str):
node = server_sdk.DAGNode()
google.protobuf.text_format.Parse(node_str, node)
self.workflow.nodes.extend([node])
def get_op_graph(self):
workflow_conf = server_sdk.WorkflowConf()
workflow_conf.workflows.extend([self.workflow])
return workflow_conf
class Server(object): class Server(object):
def __init__(self): def __init__(self):
self.server_handle_ = None self.server_handle_ = None
self.infer_service_conf = None self.infer_service_conf = None
self.model_toolkit_conf = None self.model_toolkit_conf = None
self.resource_conf = None self.resource_conf = None
self.engine = None
self.memory_optimization = False self.memory_optimization = False
self.model_conf = None self.model_conf = None
self.workflow_fn = "workflow.prototxt" self.workflow_fn = "workflow.prototxt"
...@@ -94,6 +144,7 @@ class Server(object): ...@@ -94,6 +144,7 @@ class Server(object):
self.cur_path = os.getcwd() self.cur_path = os.getcwd()
self.use_local_bin = False self.use_local_bin = False
self.mkl_flag = False self.mkl_flag = False
self.model_config_paths = None # for multi-model in a workflow
def set_max_concurrency(self, concurrency): def set_max_concurrency(self, concurrency):
self.max_concurrency = concurrency self.max_concurrency = concurrency
...@@ -118,6 +169,9 @@ class Server(object): ...@@ -118,6 +169,9 @@ class Server(object):
def set_op_sequence(self, op_seq): def set_op_sequence(self, op_seq):
self.workflow_conf = op_seq self.workflow_conf = op_seq
def set_op_graph(self, op_graph):
self.workflow_conf = op_graph
def set_memory_optimize(self, flag=False): def set_memory_optimize(self, flag=False):
self.memory_optimization = flag self.memory_optimization = flag
...@@ -126,32 +180,30 @@ class Server(object): ...@@ -126,32 +180,30 @@ class Server(object):
self.use_local_bin = True self.use_local_bin = True
self.bin_path = os.environ["SERVING_BIN"] self.bin_path = os.environ["SERVING_BIN"]
def _prepare_engine(self, model_config_path, device): def _prepare_engine(self, model_config_paths, device):
if self.model_toolkit_conf == None: if self.model_toolkit_conf == None:
self.model_toolkit_conf = server_sdk.ModelToolkitConf() self.model_toolkit_conf = server_sdk.ModelToolkitConf()
if self.engine == None: for engine_name, model_config_path in model_config_paths.items():
self.engine = server_sdk.EngineDesc() engine = server_sdk.EngineDesc()
engine.name = engine_name
self.model_config_path = model_config_path engine.reloadable_meta = model_config_path + "/fluid_time_file"
self.engine.name = "general_model" os.system("touch {}".format(engine.reloadable_meta))
self.engine.reloadable_meta = model_config_path + "/fluid_time_file" engine.reloadable_type = "timestamp_ne"
os.system("touch {}".format(self.engine.reloadable_meta)) engine.runtime_thread_num = 0
self.engine.reloadable_type = "timestamp_ne" engine.batch_infer_size = 0
self.engine.runtime_thread_num = 0 engine.enable_batch_align = 0
self.engine.batch_infer_size = 0 engine.model_data_path = model_config_path
self.engine.enable_batch_align = 0 engine.enable_memory_optimization = self.memory_optimization
self.engine.model_data_path = model_config_path engine.static_optimization = False
self.engine.enable_memory_optimization = self.memory_optimization engine.force_update_static_cache = False
self.engine.static_optimization = False
self.engine.force_update_static_cache = False if device == "cpu":
engine.type = "FLUID_CPU_ANALYSIS_DIR"
if device == "cpu": elif device == "gpu":
self.engine.type = "FLUID_CPU_ANALYSIS_DIR" engine.type = "FLUID_GPU_ANALYSIS_DIR"
elif device == "gpu":
self.engine.type = "FLUID_GPU_ANALYSIS_DIR" self.model_toolkit_conf.engines.extend([engine])
self.model_toolkit_conf.engines.extend([self.engine])
def _prepare_infer_service(self, port): def _prepare_infer_service(self, port):
if self.infer_service_conf == None: if self.infer_service_conf == None:
...@@ -184,10 +236,49 @@ class Server(object): ...@@ -184,10 +236,49 @@ class Server(object):
with open(filepath, "w") as fout: with open(filepath, "w") as fout:
fout.write(str(pb_obj)) fout.write(str(pb_obj))
def load_model_config(self, path): def load_model_config(self, model_config_paths):
self.model_config_path = path # At present, Serving needs to configure the model path in
# the resource.prototxt file to determine the input and output
# format of the workflow. To ensure that the input and output
# of multiple models are the same.
workflow_oi_config_path = None
if isinstance(model_config_paths, str):
# If there is only one model path, use the default infer_op.
# Because there are several infer_op type, we need to find
# it from workflow_conf.
default_engine_names = [
'general_infer_0', 'general_dist_kv_infer_0',
'general_dist_kv_quant_infer_0'
]
engine_name = None
for node in self.workflow_conf.workflows[0].nodes:
if node.name in default_engine_names:
engine_name = node.name
break
if engine_name is None:
raise Exception(
"You have set the engine_name of Op. Please use the form {op: model_path} to configure model path"
)
self.model_config_paths = {engine_name: model_config_paths}
workflow_oi_config_path = self.model_config_paths[engine_name]
elif isinstance(model_config_paths, dict):
self.model_config_paths = {}
for node_str, path in model_config_paths.items():
node = server_sdk.DAGNode()
google.protobuf.text_format.Parse(node_str, node)
self.model_config_paths[node.name] = path
print("You have specified multiple model paths, please ensure "
"that the input and output of multiple models are the same.")
workflow_oi_config_path = self.model_config_paths.items()[0][1]
else:
raise Exception("The type of model_config_paths must be str or "
"dict({op: model_path}), not {}.".format(
type(model_config_paths)))
self.model_conf = m_config.GeneralModelConfig() self.model_conf = m_config.GeneralModelConfig()
f = open("{}/serving_server_conf.prototxt".format(path), 'r') f = open(
"{}/serving_server_conf.prototxt".format(workflow_oi_config_path),
'r')
self.model_conf = google.protobuf.text_format.Merge( self.model_conf = google.protobuf.text_format.Merge(
str(f.read()), self.model_conf) str(f.read()), self.model_conf)
# check config here # check config here
...@@ -258,8 +349,9 @@ class Server(object): ...@@ -258,8 +349,9 @@ class Server(object):
if not self.port_is_available(port): if not self.port_is_available(port):
raise SystemExit("Prot {} is already used".format(port)) raise SystemExit("Prot {} is already used".format(port))
self._prepare_resource(workdir) self._prepare_resource(workdir)
self._prepare_engine(self.model_config_path, device) self._prepare_engine(self.model_config_paths, device)
self._prepare_infer_service(port) self._prepare_infer_service(port)
self.port = port
self.workdir = workdir self.workdir = workdir
infer_service_fn = "{}/{}".format(workdir, self.infer_service_fn) infer_service_fn = "{}/{}".format(workdir, self.infer_service_fn)
......
...@@ -79,6 +79,7 @@ def start_standard_model(): # pylint: disable=doc-string-missing ...@@ -79,6 +79,7 @@ def start_standard_model(): # pylint: disable=doc-string-missing
server.set_num_threads(thread_num) server.set_num_threads(thread_num)
server.set_memory_optimize(mem_optim) server.set_memory_optimize(mem_optim)
server.set_max_body_size(max_body_size) server.set_max_body_size(max_body_size)
server.set_port(port)
server.load_model_config(model) server.load_model_config(model)
server.prepare_server(workdir=workdir, port=port, device=device) server.prepare_server(workdir=workdir, port=port, device=device)
......
...@@ -18,6 +18,8 @@ from flask import Flask, request, abort ...@@ -18,6 +18,8 @@ from flask import Flask, request, abort
from multiprocessing import Pool, Process from multiprocessing import Pool, Process
from paddle_serving_server import OpMaker, OpSeqMaker, Server from paddle_serving_server import OpMaker, OpSeqMaker, Server
from paddle_serving_client import Client from paddle_serving_client import Client
from contextlib import closing
import socket
class WebService(object): class WebService(object):
...@@ -41,19 +43,34 @@ class WebService(object): ...@@ -41,19 +43,34 @@ class WebService(object):
server.set_num_threads(16) server.set_num_threads(16)
server.load_model_config(self.model_config) server.load_model_config(self.model_config)
server.prepare_server( server.prepare_server(
workdir=self.workdir, port=self.port + 1, device=self.device) workdir=self.workdir, port=self.port_list[0], device=self.device)
server.run_server() server.run_server()
def port_is_available(self, port):
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
sock.settimeout(2)
result = sock.connect_ex(('0.0.0.0', port))
if result != 0:
return True
else:
return False
def prepare_server(self, workdir="", port=9393, device="cpu"): def prepare_server(self, workdir="", port=9393, device="cpu"):
self.workdir = workdir self.workdir = workdir
self.port = port self.port = port
self.device = device self.device = device
default_port = 12000
self.port_list = []
for i in range(1000):
if self.port_is_available(default_port + i):
self.port_list.append(default_port + i)
break
def _launch_web_service(self): def _launch_web_service(self):
self.client_service = Client() self.client = Client()
self.client_service.load_client_config( self.client.load_client_config("{}/serving_server_conf.prototxt".format(
"{}/serving_server_conf.prototxt".format(self.model_config)) self.model_config))
self.client_service.connect(["0.0.0.0:{}".format(self.port + 1)]) self.client.connect(["0.0.0.0:{}".format(self.port_list[0])])
def get_prediction(self, request): def get_prediction(self, request):
if not request.json: if not request.json:
...@@ -61,23 +78,16 @@ class WebService(object): ...@@ -61,23 +78,16 @@ class WebService(object):
if "fetch" not in request.json: if "fetch" not in request.json:
abort(400) abort(400)
try: try:
feed, fetch = self.preprocess(request.json, request.json["fetch"]) feed, fetch = self.preprocess(request.json["feed"],
if isinstance(feed, list): request.json["fetch"])
fetch_map_batch = self.client_service.predict( if isinstance(feed, dict) and "fetch" in feed:
feed_batch=feed, fetch=fetch) del feed["fetch"]
fetch_map_batch = self.postprocess( fetch_map = self.client.predict(feed=feed, fetch=fetch)
feed=request.json, fetch=fetch, fetch_map=fetch_map_batch) for key in fetch_map:
for key in fetch_map_batch: fetch_map[key] = fetch_map[key].tolist()
fetch_map_batch[key] = fetch_map_batch[key].tolist() fetch_map = self.postprocess(
result = {"result": fetch_map_batch} feed=feed, fetch=fetch, fetch_map=fetch_map)
elif isinstance(feed, dict): result = {"result": fetch_map}
if "fetch" in feed:
del feed["fetch"]
fetch_map = self.client_service.predict(feed=feed, fetch=fetch)
for key in fetch_map:
fetch_map[key] = fetch_map[key][0].tolist()
result = self.postprocess(
feed=request.json, fetch=fetch, fetch_map=fetch_map)
except ValueError: except ValueError:
result = {"result": "Request Value Error"} result = {"result": "Request Value Error"}
return result return result
...@@ -91,8 +101,26 @@ class WebService(object): ...@@ -91,8 +101,26 @@ class WebService(object):
p_rpc = Process(target=self._launch_rpc_service) p_rpc = Process(target=self._launch_rpc_service)
p_rpc.start() p_rpc.start()
def preprocess(self, feed={}, fetch=[]): def run_flask(self):
app_instance = Flask(__name__)
@app_instance.before_first_request
def init():
self._launch_web_service()
service_name = "/" + self.name + "/prediction"
@app_instance.route(service_name, methods=["POST"])
def run():
return self.get_prediction(request)
app_instance.run(host="0.0.0.0",
port=self.port,
threaded=False,
processes=4)
def preprocess(self, feed=[], fetch=[]):
return feed, fetch return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None): def postprocess(self, feed=[], fetch=[], fetch_map=None):
return fetch_map return fetch_map
...@@ -24,6 +24,7 @@ import time ...@@ -24,6 +24,7 @@ import time
from .version import serving_server_version from .version import serving_server_version
from contextlib import closing from contextlib import closing
import argparse import argparse
import collections
def serve_args(): def serve_args():
...@@ -66,17 +67,35 @@ class OpMaker(object): ...@@ -66,17 +67,35 @@ class OpMaker(object):
"general_dist_kv_infer": "GeneralDistKVInferOp", "general_dist_kv_infer": "GeneralDistKVInferOp",
"general_dist_kv": "GeneralDistKVOp" "general_dist_kv": "GeneralDistKVOp"
} }
self.node_name_suffix_ = collections.defaultdict(int)
# currently, inputs and outputs are not used def create(self, node_type, engine_name=None, inputs=[], outputs=[]):
# when we have OpGraphMaker, inputs and outputs are necessary if node_type not in self.op_dict:
def create(self, name, inputs=[], outputs=[]): raise Exception("Op type {} is not supported right now".format(
if name not in self.op_dict: node_type))
raise Exception("Op name {} is not supported right now".format(
name))
node = server_sdk.DAGNode() node = server_sdk.DAGNode()
node.name = "{}_op".format(name) # node.name will be used as the infer engine name
node.type = self.op_dict[name] if engine_name:
return node node.name = engine_name
else:
node.name = '{}_{}'.format(node_type,
self.node_name_suffix_[node_type])
self.node_name_suffix_[node_type] += 1
node.type = self.op_dict[node_type]
if inputs:
for dep_node_str in inputs:
dep_node = server_sdk.DAGNode()
google.protobuf.text_format.Parse(dep_node_str, dep_node)
dep = server_sdk.DAGNodeDependency()
dep.name = dep_node.name
dep.mode = "RO"
node.dependencies.extend([dep])
# Because the return value will be used as the key value of the
# dict, and the proto object is variable which cannot be hashed,
# so it is processed into a string. This has little effect on
# overall efficiency.
return google.protobuf.text_format.MessageToString(node)
class OpSeqMaker(object): class OpSeqMaker(object):
...@@ -85,12 +104,25 @@ class OpSeqMaker(object): ...@@ -85,12 +104,25 @@ class OpSeqMaker(object):
self.workflow.name = "workflow1" self.workflow.name = "workflow1"
self.workflow.workflow_type = "Sequence" self.workflow.workflow_type = "Sequence"
def add_op(self, node): def add_op(self, node_str):
node = server_sdk.DAGNode()
google.protobuf.text_format.Parse(node_str, node)
if len(node.dependencies) > 1:
raise Exception(
'Set more than one predecessor for op in OpSeqMaker is not allowed.'
)
if len(self.workflow.nodes) >= 1: if len(self.workflow.nodes) >= 1:
dep = server_sdk.DAGNodeDependency() if len(node.dependencies) == 0:
dep.name = self.workflow.nodes[-1].name dep = server_sdk.DAGNodeDependency()
dep.mode = "RO" dep.name = self.workflow.nodes[-1].name
node.dependencies.extend([dep]) dep.mode = "RO"
node.dependencies.extend([dep])
elif len(node.dependencies) == 1:
if node.dependencies[0].name != self.workflow.nodes[-1].name:
raise Exception(
'You must add op in order in OpSeqMaker. The previous op is {}, but the current op is followed by {}.'.
format(node.dependencies[0].name, self.workflow.nodes[
-1].name))
self.workflow.nodes.extend([node]) self.workflow.nodes.extend([node])
def get_op_sequence(self): def get_op_sequence(self):
...@@ -99,13 +131,30 @@ class OpSeqMaker(object): ...@@ -99,13 +131,30 @@ class OpSeqMaker(object):
return workflow_conf return workflow_conf
class OpGraphMaker(object):
def __init__(self):
self.workflow = server_sdk.Workflow()
self.workflow.name = "workflow1"
# Currently, SDK only supports "Sequence"
self.workflow.workflow_type = "Sequence"
def add_op(self, node_str):
node = server_sdk.DAGNode()
google.protobuf.text_format.Parse(node_str, node)
self.workflow.nodes.extend([node])
def get_op_graph(self):
workflow_conf = server_sdk.WorkflowConf()
workflow_conf.workflows.extend([self.workflow])
return workflow_conf
class Server(object): class Server(object):
def __init__(self): def __init__(self):
self.server_handle_ = None self.server_handle_ = None
self.infer_service_conf = None self.infer_service_conf = None
self.model_toolkit_conf = None self.model_toolkit_conf = None
self.resource_conf = None self.resource_conf = None
self.engine = None
self.memory_optimization = False self.memory_optimization = False
self.model_conf = None self.model_conf = None
self.workflow_fn = "workflow.prototxt" self.workflow_fn = "workflow.prototxt"
...@@ -122,9 +171,9 @@ class Server(object): ...@@ -122,9 +171,9 @@ class Server(object):
self.max_body_size = 64 * 1024 * 1024 self.max_body_size = 64 * 1024 * 1024
self.module_path = os.path.dirname(paddle_serving_server.__file__) self.module_path = os.path.dirname(paddle_serving_server.__file__)
self.cur_path = os.getcwd() self.cur_path = os.getcwd()
self.check_cuda()
self.use_local_bin = False self.use_local_bin = False
self.gpuid = 0 self.gpuid = 0
self.model_config_paths = None # for multi-model in a workflow
def set_max_concurrency(self, concurrency): def set_max_concurrency(self, concurrency):
self.max_concurrency = concurrency self.max_concurrency = concurrency
...@@ -149,6 +198,9 @@ class Server(object): ...@@ -149,6 +198,9 @@ class Server(object):
def set_op_sequence(self, op_seq): def set_op_sequence(self, op_seq):
self.workflow_conf = op_seq self.workflow_conf = op_seq
def set_op_graph(self, op_graph):
self.workflow_conf = op_graph
def set_memory_optimize(self, flag=False): def set_memory_optimize(self, flag=False):
self.memory_optimization = flag self.memory_optimization = flag
...@@ -158,8 +210,13 @@ class Server(object): ...@@ -158,8 +210,13 @@ class Server(object):
self.bin_path = os.environ["SERVING_BIN"] self.bin_path = os.environ["SERVING_BIN"]
def check_cuda(self): def check_cuda(self):
r = os.system("cat /usr/local/cuda/version.txt") cuda_flag = False
if r != 0: r = os.popen("ldd {} | grep cudart".format(self.bin_path))
r = r.read().split("=")
if len(r) >= 2 and "cudart" in r[1] and os.system(
"ls /dev/ | grep nvidia > /dev/null") == 0:
cuda_flag = True
if not cuda_flag:
raise SystemExit( raise SystemExit(
"CUDA not found, please check your environment or use cpu version by \"pip install paddle_serving_server\"" "CUDA not found, please check your environment or use cpu version by \"pip install paddle_serving_server\""
) )
...@@ -167,33 +224,31 @@ class Server(object): ...@@ -167,33 +224,31 @@ class Server(object):
def set_gpuid(self, gpuid=0): def set_gpuid(self, gpuid=0):
self.gpuid = gpuid self.gpuid = gpuid
def _prepare_engine(self, model_config_path, device): def _prepare_engine(self, model_config_paths, device):
if self.model_toolkit_conf == None: if self.model_toolkit_conf == None:
self.model_toolkit_conf = server_sdk.ModelToolkitConf() self.model_toolkit_conf = server_sdk.ModelToolkitConf()
if self.engine == None: for engine_name, model_config_path in model_config_paths.items():
self.engine = server_sdk.EngineDesc() engine = server_sdk.EngineDesc()
engine.name = engine_name
self.model_config_path = model_config_path # engine.reloadable_meta = model_config_path + "/fluid_time_file"
self.engine.name = "general_model" engine.reloadable_meta = self.workdir + "/fluid_time_file"
#self.engine.reloadable_meta = model_config_path + "/fluid_time_file" os.system("touch {}".format(engine.reloadable_meta))
self.engine.reloadable_meta = self.workdir + "/fluid_time_file" engine.reloadable_type = "timestamp_ne"
os.system("touch {}".format(self.engine.reloadable_meta)) engine.runtime_thread_num = 0
self.engine.reloadable_type = "timestamp_ne" engine.batch_infer_size = 0
self.engine.runtime_thread_num = 0 engine.enable_batch_align = 0
self.engine.batch_infer_size = 0 engine.model_data_path = model_config_path
self.engine.enable_batch_align = 0 engine.enable_memory_optimization = self.memory_optimization
self.engine.model_data_path = model_config_path engine.static_optimization = False
self.engine.enable_memory_optimization = self.memory_optimization engine.force_update_static_cache = False
self.engine.static_optimization = False
self.engine.force_update_static_cache = False if device == "cpu":
engine.type = "FLUID_CPU_ANALYSIS_DIR"
if device == "cpu": elif device == "gpu":
self.engine.type = "FLUID_CPU_ANALYSIS_DIR" engine.type = "FLUID_GPU_ANALYSIS_DIR"
elif device == "gpu":
self.engine.type = "FLUID_GPU_ANALYSIS_DIR" self.model_toolkit_conf.engines.extend([engine])
self.model_toolkit_conf.engines.extend([self.engine])
def _prepare_infer_service(self, port): def _prepare_infer_service(self, port):
if self.infer_service_conf == None: if self.infer_service_conf == None:
...@@ -225,10 +280,49 @@ class Server(object): ...@@ -225,10 +280,49 @@ class Server(object):
with open(filepath, "w") as fout: with open(filepath, "w") as fout:
fout.write(str(pb_obj)) fout.write(str(pb_obj))
def load_model_config(self, path): def load_model_config(self, model_config_paths):
self.model_config_path = path # At present, Serving needs to configure the model path in
# the resource.prototxt file to determine the input and output
# format of the workflow. To ensure that the input and output
# of multiple models are the same.
workflow_oi_config_path = None
if isinstance(model_config_paths, str):
# If there is only one model path, use the default infer_op.
# Because there are several infer_op type, we need to find
# it from workflow_conf.
default_engine_names = [
'general_infer_0', 'general_dist_kv_infer_0',
'general_dist_kv_quant_infer_0'
]
engine_name = None
for node in self.workflow_conf.workflows[0].nodes:
if node.name in default_engine_names:
engine_name = node.name
break
if engine_name is None:
raise Exception(
"You have set the engine_name of Op. Please use the form {op: model_path} to configure model path"
)
self.model_config_paths = {engine_name: model_config_paths}
workflow_oi_config_path = self.model_config_paths[engine_name]
elif isinstance(model_config_paths, dict):
self.model_config_paths = {}
for node_str, path in model_config_paths.items():
node = server_sdk.DAGNode()
google.protobuf.text_format.Parse(node_str, node)
self.model_config_paths[node.name] = path
print("You have specified multiple model paths, please ensure "
"that the input and output of multiple models are the same.")
workflow_oi_config_path = self.model_config_paths.items()[0][1]
else:
raise Exception("The type of model_config_paths must be str or "
"dict({op: model_path}), not {}.".format(
type(model_config_paths)))
self.model_conf = m_config.GeneralModelConfig() self.model_conf = m_config.GeneralModelConfig()
f = open("{}/serving_server_conf.prototxt".format(path), 'r') f = open(
"{}/serving_server_conf.prototxt".format(workflow_oi_config_path),
'r')
self.model_conf = google.protobuf.text_format.Merge( self.model_conf = google.protobuf.text_format.Merge(
str(f.read()), self.model_conf) str(f.read()), self.model_conf)
# check config here # check config here
...@@ -291,7 +385,7 @@ class Server(object): ...@@ -291,7 +385,7 @@ class Server(object):
self.set_port(port) self.set_port(port)
self._prepare_resource(workdir) self._prepare_resource(workdir)
self._prepare_engine(self.model_config_path, device) self._prepare_engine(self.model_config_paths, device)
self._prepare_infer_service(port) self._prepare_infer_service(port)
self.workdir = workdir self.workdir = workdir
...@@ -325,6 +419,7 @@ class Server(object): ...@@ -325,6 +419,7 @@ class Server(object):
time.sleep(1) time.sleep(1)
else: else:
print("Use local bin : {}".format(self.bin_path)) print("Use local bin : {}".format(self.bin_path))
self.check_cuda()
command = "{} " \ command = "{} " \
"-enable_model_toolkit " \ "-enable_model_toolkit " \
"-inferservice_path {} " \ "-inferservice_path {} " \
......
...@@ -11,16 +11,18 @@ ...@@ -11,16 +11,18 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# pylint: disable=doc-string-missing
from flask import Flask, request, abort from flask import Flask, request, abort
from paddle_serving_server_gpu import OpMaker, OpSeqMaker, Server from contextlib import closing
import paddle_serving_server_gpu as serving
from multiprocessing import Pool, Process, Queue from multiprocessing import Pool, Process, Queue
from paddle_serving_client import Client from paddle_serving_client import Client
from paddle_serving_server_gpu import OpMaker, OpSeqMaker, Server
from paddle_serving_server_gpu.serve import start_multi_card from paddle_serving_server_gpu.serve import start_multi_card
import socket
import sys import sys
import numpy as np import numpy as np
import paddle_serving_server_gpu as serving
class WebService(object): class WebService(object):
...@@ -66,22 +68,39 @@ class WebService(object): ...@@ -66,22 +68,39 @@ class WebService(object):
def _launch_rpc_service(self, service_idx): def _launch_rpc_service(self, service_idx):
self.rpc_service_list[service_idx].run_server() self.rpc_service_list[service_idx].run_server()
def port_is_available(self, port):
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock:
sock.settimeout(2)
result = sock.connect_ex(('0.0.0.0', port))
if result != 0:
return True
else:
return False
def prepare_server(self, workdir="", port=9393, device="gpu", gpuid=0): def prepare_server(self, workdir="", port=9393, device="gpu", gpuid=0):
self.workdir = workdir self.workdir = workdir
self.port = port self.port = port
self.device = device self.device = device
self.gpuid = gpuid self.gpuid = gpuid
self.port_list = []
default_port = 12000
for i in range(1000):
if self.port_is_available(default_port + i):
self.port_list.append(default_port + i)
if len(self.port_list) > len(self.gpus):
break
if len(self.gpus) == 0: if len(self.gpus) == 0:
# init cpu service # init cpu service
self.rpc_service_list.append( self.rpc_service_list.append(
self.default_rpc_service( self.default_rpc_service(
self.workdir, self.port + 1, -1, thread_num=10)) self.workdir, self.port_list[0], -1, thread_num=10))
else: else:
for i, gpuid in enumerate(self.gpus): for i, gpuid in enumerate(self.gpus):
self.rpc_service_list.append( self.rpc_service_list.append(
self.default_rpc_service( self.default_rpc_service(
"{}_{}".format(self.workdir, i), "{}_{}".format(self.workdir, i),
self.port + 1 + i, self.port_list[i],
gpuid, gpuid,
thread_num=10)) thread_num=10))
...@@ -93,9 +112,9 @@ class WebService(object): ...@@ -93,9 +112,9 @@ class WebService(object):
endpoints = "" endpoints = ""
if gpu_num > 0: if gpu_num > 0:
for i in range(gpu_num): for i in range(gpu_num):
endpoints += "127.0.0.1:{},".format(self.port + i + 1) endpoints += "127.0.0.1:{},".format(self.port_list[i])
else: else:
endpoints = "127.0.0.1:{}".format(self.port + 1) endpoints = "127.0.0.1:{}".format(self.port_list[0])
self.client.connect([endpoints]) self.client.connect([endpoints])
def get_prediction(self, request): def get_prediction(self, request):
...@@ -103,13 +122,20 @@ class WebService(object): ...@@ -103,13 +122,20 @@ class WebService(object):
abort(400) abort(400)
if "fetch" not in request.json: if "fetch" not in request.json:
abort(400) abort(400)
feed, fetch = self.preprocess(request.json, request.json["fetch"]) try:
fetch_map_batch = self.client.predict(feed=feed, fetch=fetch) feed, fetch = self.preprocess(request.json["feed"],
fetch_map_batch = self.postprocess( request.json["fetch"])
feed=request.json, fetch=fetch, fetch_map=fetch_map_batch) if isinstance(feed, dict) and "fetch" in feed:
for key in fetch_map_batch: del feed["fetch"]
fetch_map_batch[key] = fetch_map_batch[key].tolist() fetch_map = self.client.predict(feed=feed, fetch=fetch)
result = {"result": fetch_map_batch} for key in fetch_map:
fetch_map[key] = fetch_map[key].tolist()
result = self.postprocess(
feed=feed, fetch=fetch, fetch_map=fetch_map)
result = {"result": result}
result = {"result": fetch_map}
except ValueError:
result = {"result": "Request Value Error"}
return result return result
def run_server(self): def run_server(self):
...@@ -125,8 +151,26 @@ class WebService(object): ...@@ -125,8 +151,26 @@ class WebService(object):
for p in server_pros: for p in server_pros:
p.start() p.start()
def preprocess(self, feed={}, fetch=[]): def run_flask(self):
app_instance = Flask(__name__)
@app_instance.before_first_request
def init():
self._launch_web_service()
service_name = "/" + self.name + "/prediction"
@app_instance.route(service_name, methods=["POST"])
def run():
return self.get_prediction(request)
app_instance.run(host="0.0.0.0",
port=self.port,
threaded=False,
processes=4)
def preprocess(self, feed=[], fetch=[]):
return feed, fetch return feed, fetch
def postprocess(self, feed={}, fetch=[], fetch_map=None): def postprocess(self, feed=[], fetch=[], fetch_map=None):
return fetch_map return fetch_map
...@@ -47,7 +47,8 @@ REQUIRED_PACKAGES = [ ...@@ -47,7 +47,8 @@ REQUIRED_PACKAGES = [
packages=['paddle_serving_app', packages=['paddle_serving_app',
'paddle_serving_app.reader', 'paddle_serving_app.reader',
'paddle_serving_app.utils'] 'paddle_serving_app.utils',
'paddle_serving_app.reader.pddet']
package_data={} package_data={}
package_dir={'paddle_serving_app': package_dir={'paddle_serving_app':
...@@ -55,7 +56,9 @@ package_dir={'paddle_serving_app': ...@@ -55,7 +56,9 @@ package_dir={'paddle_serving_app':
'paddle_serving_app.reader': 'paddle_serving_app.reader':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/reader', '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/reader',
'paddle_serving_app.utils': 'paddle_serving_app.utils':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/utils',} '${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/utils',
'paddle_serving_app.reader.pddet':
'${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/reader/pddet',}
setup( setup(
name='paddle-serving-app', name='paddle-serving-app',
......
FROM centos:6
RUN yum -y install wget && \
wget http://people.centos.org/tru/devtools-2/devtools-2.repo -O /etc/yum.repos.d/devtoolset-2.repo && \
yum -y install devtoolset-2-gcc devtoolset-2-gcc-c++ devtoolset-2-binutils && \
source /opt/rh/devtoolset-2/enable && \
echo 'source /opt/rh/devtoolset-2/enable' >> /root/.bashrc && \
yum -y install git openssl-devel curl-devel bzip2-devel && \
wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz && \
tar xzf cmake-3.2.0-Linux-x86_64.tar.gz && \
mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 && \
echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc && \
rm cmake-3.2.0-Linux-x86_64.tar.gz && \
wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz && \
tar xzf go1.14.linux-amd64.tar.gz && \
mv go /usr/local/go && \
echo 'export GOROOT=/usr/local/go' >> /root/.bashrc && \
echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc && \
rm go1.14.linux-amd64.tar.gz && \
yum -y install python-devel sqlite-devel && \
wget https://www.python.org/ftp/python/2.7.5/Python-2.7.5.tgz && \
tar -zxf Python-2.7.5.tgz && \
cd Python-2.7.5 && \
./configure --prefix=/usr/local/python2.7 --enable-shared && \
make all && make install && \
make clean && \
echo 'export PATH=/usr/local/python2.7/bin:$PATH' >> /root/.bashrc && \
echo 'export LD_LIBRARY_PATH=/usr/local/python2.7/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
cd .. && rm -rf Python-2.7.5* && \
source /root/.bashrc && \
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python get-pip.py && \
rm get-pip.py && \
pip install google protobuf setuptools wheel flask numpy==1.16.4 && \
wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz && \
tar -zxf Python-3.6.8.tgz && \
cd Python-3.6.8 && \
./configure --prefix=/usr/local/python3.6 --enable-shared && \
make all && make install && \
make clean && \
echo 'export PATH=/usr/local/python3.6/bin:$PATH' >> /root/.bashrc && \
echo 'export LD_LIBRARY_PATH=/usr/local/python3.6/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
source /root/.bashrc && \
cd .. && rm -rf Python-3.6.8* && \
pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \
yum -y install epel-release && yum -y install patchelf && \
yum clean all
FROM nvidia/cuda:9.0-cudnn7-devel-centos6
RUN yum -y install wget && \
wget http://people.centos.org/tru/devtools-2/devtools-2.repo -O /etc/yum.repos.d/devtoolset-2.repo && \
yum -y install devtoolset-2-gcc devtoolset-2-gcc-c++ devtoolset-2-binutils && \
source /opt/rh/devtoolset-2/enable && \
echo 'source /opt/rh/devtoolset-2/enable' >> /root/.bashrc && \
yum -y install git openssl-devel curl-devel bzip2-devel && \
wget https://cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.tar.gz && \
tar xzf cmake-3.2.0-Linux-x86_64.tar.gz && \
mv cmake-3.2.0-Linux-x86_64 /usr/local/cmake3.2.0 && \
echo 'export PATH=/usr/local/cmake3.2.0/bin:$PATH' >> /root/.bashrc && \
rm cmake-3.2.0-Linux-x86_64.tar.gz && \
wget https://dl.google.com/go/go1.14.linux-amd64.tar.gz && \
tar xzf go1.14.linux-amd64.tar.gz && \
mv go /usr/local/go && \
echo 'export GOROOT=/usr/local/go' >> /root/.bashrc && \
echo 'export PATH=/usr/local/go/bin:$PATH' >> /root/.bashrc && \
rm go1.14.linux-amd64.tar.gz && \
yum -y install python-devel sqlite-devel && \
wget https://www.python.org/ftp/python/2.7.5/Python-2.7.5.tgz && \
tar -zxf Python-2.7.5.tgz && \
cd Python-2.7.5 && \
./configure --prefix=/usr/local/python2.7 --enable-shared && \
make all && make install && \
make clean && \
echo 'export PATH=/usr/local/python2.7/bin:$PATH' >> /root/.bashrc && \
echo 'export LD_LIBRARY_PATH=/usr/local/python2.7/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
cd .. && rm -rf Python-2.7.5* && \
source /root/.bashrc && \
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
python get-pip.py && \
rm get-pip.py && \
pip install google protobuf setuptools wheel flask numpy==1.16.4 && \
wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz && \
tar -zxf Python-3.6.8.tgz && \
cd Python-3.6.8 && \
./configure --prefix=/usr/local/python3.6 --enable-shared && \
make all && make install && \
make clean && \
echo 'export PATH=/usr/local/python3.6/bin:$PATH' >> /root/.bashrc && \
echo 'export LD_LIBRARY_PATH=/usr/local/python3.6/lib:$LD_LIBRARY_PATH' >> /root/.bashrc && \
source /root/.bashrc && \
cd .. && rm -rf Python-3.6.8* && \
pip3 install google protobuf setuptools wheel flask numpy==1.16.4 && \
yum -y install epel-release && yum -y install patchelf && \
yum clean all
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册