fix conflict

7a3a98f2 · MRXLT · d5c8e996 · d35f04b6 · d5c8e996 · 7a3a98f2
23 changed file
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
-Contribute Code
-
-You are welcome to contribute to project Paddle Serving.
-
-We sincerely appreciate your contribution.  This document explains our workflow and work style.
-
-## Workflow
-
-Paddle Serving uses this [Git branching model](http://nvie.com/posts/a-successful-git-branching-model/).  The following steps guide usual contributions.
-
-1. Fork
-
-   Our development community has been growing fastly; it doesn't make sense for everyone to write into the official repo.  So, please file Pull Requests from your fork.  To make a fork,  just head over to the GitHub page and click the ["Fork" button](https://help.github.com/articles/fork-a-repo/).
-
-1. Clone
-
-   To make a copy of your fork to your local computers, please run
-
-   ```bash
-   git clone https://github.com/your-github-account/Serving
-   cd Serving
-   ```
-
-1. Create the local feature branch
-
-   For daily works like adding a new feature or fixing a bug, please open your feature branch before coding:
-
-   ```bash
-   git checkout -b my-cool-stuff
-   ```
-
-1. Commit
-
-   Before issuing your first `git commit` command, please install [`pre-commit`](http://pre-commit.com/) by running the following commands:
-
-   ```bash
-   pip install pre-commit
-   pre-commit install
-   ```
-
-   Our pre-commit configuration requires clang-format 3.8 for auto-formating C/C++ code and yapf for Python.
-
-   Once installed, `pre-commit` checks the style of code and documentation in every commit.  We will see something like the following when you run `git commit`:
-
-   ```shell
-   $  git commit
-   CRLF end-lines remover...............................(no files to check)Skipped
-   yapf.................................................(no files to check)Skipped
-   Check for added large files..............................................Passed
-   Check for merge conflicts................................................Passed
-   Check for broken symlinks................................................Passed
-   Detect Private Key...................................(no files to check)Skipped
-   Fix End of Files.....................................(no files to check)Skipped
-   clang-formater.......................................(no files to check)Skipped
-   [my-cool-stuff c703c041] add test file
-    1 file changed, 0 insertions(+), 0 deletions(-)
-    create mode 100644 233
-   ```
-
-    NOTE: The `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different. Paddle developers use `pip install pre-commit`.
-
-1. Build and test
-
-   Users can build Paddle Serving natively on Linux, see the [BUILD steps](doc/INSTALL.md).
-
-1. Keep pulling
-
-   An experienced Git user pulls from the official repo often -- daily or even hourly, so they notice conflicts with others work early, and it's easier to resolve smaller conflicts.
-
-   ```bash
-   git remote add upstream https://github.com/PaddlePaddle/Serving
-   git pull upstream develop
-   ```
-
-1. Push and file a pull request
-
-   You can "push" your local work into your forked repo:
-
-   ```bash
-   git push origin my-cool-stuff
-   ```
-
-   The push allows you to create a pull request, requesting owners of this [official repo](https://github.com/PaddlePaddle/Serving) to pull your change into the official one.
-
-   To create a pull request, please follow [these steps](https://help.github.com/articles/creating-a-pull-request/).
-
-   If your change is for fixing an issue, please write ["Fixes <issue-URL>"](https://help.github.com/articles/closing-issues-using-keywords/) in the description section of your pull request.  Github would close the issue when the owners merge your pull request.
-
-   Please remember to specify some reviewers for your pull request.  If you don't know who are the right ones, please follow Github's recommendation.
-
-
-1. Delete local and remote branches
-
-   To keep your local workspace and your fork clean, you might want to remove merged branches:
-
-   ```bash
-   git push origin :my-cool-stuff
-   git checkout develop
-   git pull upstream develop
-   git branch -d my-cool-stuff
-   ```
-
-### Code Review
-
-  Please feel free to ping your reviewers by sending them the URL of your pull request via IM or email.  Please do this after your pull request passes the CI.
-
- Please answer reviewers' every comment.  If you are to follow the comment, please write "Done"; please give a reason otherwise.
-
- If you don't want your reviewers to get overwhelmed by email notifications, you might reply their comments by [in a batch](https://help.github.com/articles/reviewing-proposed-changes-in-a-pull-request/).
-
- Reduce the unnecessary commits.  Some developers commit often.  It is recommended to append a sequence of small changes into one commit by running `git commit --amend` instead of `git commit`.
-
-
-## Coding Standard
-
-### Code Style
-
-Our C/C++ code follows the [Google style guide](http://google.github.io/styleguide/cppguide.html).
-
-Our Python code follows the [PEP8 style guide](https://www.python.org/dev/peps/pep-0008/).
-
-Please install pre-commit, which automatically reformat the changes to C/C++ and Python code whenever we run `git commit`.  To check the whole codebase, we can run the command `pre-commit run -a`, which is invoked by [our Travis CI configuration].
-
-### Unit Tests
-
-Please remember to add related unit tests.
-
- For C/C++ code, please follow [`google-test` Primer](https://github.com/google/googletest/blob/master/googletest/docs/primer.md) .
-
- For Python code, please use [Python's standard `unittest` package](http://pythontesting.net/framework/unittest/unittest-introduction/).
-
-
-### Writing Logs
-
-We use [glog](https://github.com/google/glog) for logging in our C/C++ code.
-
-We use LOG() for general logging
-
-```c++
-LOG(INFO) << "Operator FC is taking " << num_inputs << "inputs."
-```
-
-When we run a Paddle Serving application or test, we can specify a logging level.  For example:
-
-```bash
-GLOG_minloglevel=1 bin/serving
-```
-0 - INFO
-
-1 - WARNING
-
-2 -ERROR
-
-3 - FATAL (Be careful as FATAL log will generate a coredump)
--- a/README.md
+++ b/README.md
@@ -70,11 +70,13 @@ from paddle_serving.serving_server import Server

 op_maker = OpMaker()
 read_op = op_maker.create('general_reader')
-general_infer_op = op_maker.create('general_infer')
+infer_op = op_maker.create('general_infer')
+response_op = op_maker.create('general_response')

 op_seq_maker = OpSeqMaker()
 op_seq_maker.add_op(read_op)
-op_seq_maker.add_op(general_infer_op)
+op_seq_maker.add_op(infer_op)
+op_seq_maker.add_op(response_op)

 server = Server()
 server.set_op_sequence(op_seq_maker.get_op_sequence())

--- a/core/CMakeLists.txt
+++ b/core/CMakeLists.txt
@@ -26,5 +26,5 @@ endif()
 if (NOT CLIENT_ONLY)
 add_subdirectory(predictor)
 add_subdirectory(general-server)
-add_subdirectory(util)
 endif()
+add_subdirectory(util)
--- a/core/general-client/CMakeLists.txt
+++ b/core/general-client/CMakeLists.txt
 if(CLIENT_ONLY)
 add_subdirectory(pybind11)
 pybind11_add_module(serving_client src/general_model.cpp src/pybind_general_model.cpp)
-target_link_libraries(serving_client PRIVATE -Wl,--whole-archive sdk-cpp pybind python -Wl,--no-whole-archive -lpthread -lcrypto -lm -lrt -lssl -ldl -lz)
+target_link_libraries(serving_client PRIVATE -Wl,--whole-archive utils sdk-cpp pybind python -Wl,--no-whole-archive -lpthread -lcrypto -lm -lrt -lssl -ldl -lz)
 endif()
--- a/core/general-client/include/general_model.h
+++ b/core/general-client/include/general_model.h
@@ -31,6 +31,9 @@
 using baidu::paddle_serving::sdk_cpp::Predictor;
 using baidu::paddle_serving::sdk_cpp::PredictorApi;

+DECLARE_bool(profile_client);
+DECLARE_bool(profile_server);
+
 // given some input data, pack into pb, and send request
 namespace baidu {
 namespace paddle_serving {
@@ -45,6 +48,8 @@ class PredictorClient {
  PredictorClient() {}
  ~PredictorClient() {}

+  void init_gflags(std::vector<std::string> argv);
+
  int init(const std::string& client_conf);

  void set_predictor_conf(const std::string& conf_path,
@@ -87,6 +92,7 @@ class PredictorClient {
  std::map<std::string, std::string> _fetch_name_to_var_name;
  std::vector<std::vector<int>> _shape;
  std::vector<int> _type;
+  std::vector<int64_t> _last_request_ts;
 };

 }  // namespace general_model

--- a/core/general-client/src/elog
+++ b/core/general-client/src/elog
+make: *** No targets specified and no makefile found.  Stop.
--- a/core/general-client/src/general_model.cpp
+++ b/core/general-client/src/general_model.cpp
@@ -17,23 +17,46 @@
 #include "core/sdk-cpp/builtin_format.pb.h"
 #include "core/sdk-cpp/include/common.h"
 #include "core/sdk-cpp/include/predictor_sdk.h"
+#include "core/util/include/timer.h"

+DEFINE_bool(profile_client, false, "");
+DEFINE_bool(profile_server, false, "");
+
+using baidu::paddle_serving::Timer;
 using baidu::paddle_serving::predictor::general_model::Request;
 using baidu::paddle_serving::predictor::general_model::Response;
 using baidu::paddle_serving::predictor::general_model::Tensor;
 using baidu::paddle_serving::predictor::general_model::FeedInst;
 using baidu::paddle_serving::predictor::general_model::FetchInst;

+std::once_flag gflags_init_flag;
+
 namespace baidu {
 namespace paddle_serving {
 namespace general_model {
 using configure::GeneralModelConfig;

+void PredictorClient::init_gflags(std::vector<std::string> argv) {
+  std::call_once(gflags_init_flag, [&]() {
+    FLAGS_logtostderr = true;
+    argv.insert(argv.begin(), "dummy");
+    int argc = argv.size();
+    char **arr = new char *[argv.size()];
+    std::string line;
+    for (size_t i = 0; i < argv.size(); i++) {
+      arr[i] = &argv[i][0];
+      line += argv[i];
+      line += ' ';
+    }
+    google::ParseCommandLineFlags(&argc, &arr, true);
+    VLOG(2) << "Init commandline: " << line;
+  });
+}
+
 int PredictorClient::init(const std::string &conf_file) {
  try {
    GeneralModelConfig model_config;
-    if (configure::read_proto_conf(conf_file.c_str(),
-                                   &model_config) != 0) {
+    if (configure::read_proto_conf(conf_file.c_str(), &model_config) != 0) {
      LOG(ERROR) << "Failed to load general model config"
                 << ", file path: " << conf_file;
      return -1;
@@ -51,26 +74,27 @@ int PredictorClient::init(const std::string &conf_file) {
      VLOG(2) << "feed alias name: " << model_config.feed_var(i).alias_name()
              << " index: " << i;
      std::vector<int> tmp_feed_shape;
-      VLOG(2) << "feed" << "[" << i << "] shape:";
+      VLOG(2) << "feed"
+              << "[" << i << "] shape:";
      for (int j = 0; j < model_config.feed_var(i).shape_size(); ++j) {
        tmp_feed_shape.push_back(model_config.feed_var(i).shape(j));
-        VLOG(2) << "shape[" << j << "]: "
-                << model_config.feed_var(i).shape(j);
+        VLOG(2) << "shape[" << j << "]: " << model_config.feed_var(i).shape(j);
      }
      _type.push_back(model_config.feed_var(i).feed_type());
-      VLOG(2) << "feed" << "[" << i << "] feed type: "
-              << model_config.feed_var(i).feed_type();
+      VLOG(2) << "feed"
+              << "[" << i
+              << "] feed type: " << model_config.feed_var(i).feed_type();
      _shape.push_back(tmp_feed_shape);
    }

    for (int i = 0; i < fetch_var_num; ++i) {
      _fetch_name_to_idx[model_config.fetch_var(i).alias_name()] = i;
-      VLOG(2) << "fetch [" << i << "]" << " alias name: "
-              << model_config.fetch_var(i).alias_name();
+      VLOG(2) << "fetch [" << i << "]"
+              << " alias name: " << model_config.fetch_var(i).alias_name();
      _fetch_name_to_var_name[model_config.fetch_var(i).alias_name()] =
          model_config.fetch_var(i).name();
    }
-  } catch (std::exception& e) {
+  } catch (std::exception &e) {
    LOG(ERROR) << "Failed load general model config" << e.what();
    return -1;
  }
@@ -88,7 +112,7 @@ int PredictorClient::destroy_predictor() {
  _api.destroy();
 }

-int PredictorClient::create_predictor_by_desc(const std::string & sdk_desc) {
+int PredictorClient::create_predictor_by_desc(const std::string &sdk_desc) {
  if (_api.create(sdk_desc) != 0) {
    LOG(ERROR) << "Predictor Creation Failed";
    return -1;
@@ -117,17 +141,22 @@ std::vector<std::vector<float>> PredictorClient::predict(
    return fetch_result;
  }

+  Timer timeline;
+  int64_t preprocess_start = timeline.TimeStampUS();
+
  // we save infer_us at fetch_result[fetch_name.size()]
  fetch_result.resize(fetch_name.size() + 1);

  _api.thrd_clear();
  _predictor = _api.fetch_predictor("general_model");
+
  VLOG(2) << "fetch general model predictor done.";
  VLOG(2) << "float feed name size: " << float_feed_name.size();
  VLOG(2) << "int feed name size: " << int_feed_name.size();
  VLOG(2) << "fetch name size: " << fetch_name.size();
+
  Request req;
-  for (auto & name : fetch_name) {
+  for (auto &name : fetch_name) {
    req.add_fetch_var_names(name);
  }
  std::vector<Tensor *> tensor_vec;
@@ -175,16 +204,28 @@ std::vector<std::vector<float>> PredictorClient::predict(
    vec_idx++;
  }

-  VLOG(2) << "feed int feed var done.";
+  int64_t preprocess_end = timeline.TimeStampUS();

-  // std::map<std::string, std::vector<float> > result;
+  int64_t client_infer_start = timeline.TimeStampUS();
  Response res;

+  int64_t client_infer_end = 0;
+  int64_t postprocess_start = 0;
+  int64_t postprocess_end = 0;
+
+  if (FLAGS_profile_client) {
+    if (FLAGS_profile_server) {
+      req.set_profile_server(true);
+    }
+  }
+
  res.Clear();
  if (_predictor->inference(&req, &res) != 0) {
    LOG(ERROR) << "failed call predictor with req: " << req.ShortDebugString();
    exit(-1);
  } else {
+    client_infer_end = timeline.TimeStampUS();
+    postprocess_start = client_infer_end;
    for (auto &name : fetch_name) {
      int idx = _fetch_name_to_idx[name];
      int len = res.insts(0).tensor_array(idx).data_size();
@@ -196,8 +237,29 @@ std::vector<std::vector<float>> PredictorClient::predict(
            *(const float *)res.insts(0).tensor_array(idx).data(i).c_str();
      }
    }
-    fetch_result[fetch_name.size()].resize(1);
-    fetch_result[fetch_name.size()][0] = res.mean_infer_us();
+    postprocess_end = timeline.TimeStampUS();
+  }
+
+  if (FLAGS_profile_client) {
+    std::ostringstream oss;
+    oss << "PROFILE\t"
+        << "prepro_0:" << preprocess_start << " "
+        << "prepro_1:" << preprocess_end << " "
+        << "client_infer_0:" << client_infer_start << " "
+        << "client_infer_1:" << client_infer_end << " ";
+
+    if (FLAGS_profile_server) {
+      int op_num = res.profile_time_size() / 2;
+      for (int i = 0; i < op_num; ++i) {
+        oss << "op" << i << "_0:" << res.profile_time(i * 2) << " ";
+        oss << "op" << i << "_1:" << res.profile_time(i * 2 + 1) << " ";
+      }
+    }
+
+    oss << "postpro_0:" << postprocess_start << " ";
+    oss << "postpro_1:" << postprocess_end;
+
+    fprintf(stderr, "%s\n", oss.str().c_str());
  }

  return fetch_result;
@@ -226,7 +288,7 @@ std::vector<std::vector<std::vector<float>>> PredictorClient::batch_predict(
  VLOG(2) << "float feed name size: " << float_feed_name.size();
  VLOG(2) << "int feed name size: " << int_feed_name.size();
  Request req;
-  for (auto & name : fetch_name) {
+  for (auto &name : fetch_name) {
    req.add_fetch_var_names(name);
  }
  //
@@ -262,7 +324,8 @@ std::vector<std::vector<std::vector<float>>> PredictorClient::batch_predict(
      vec_idx++;
    }

-    VLOG(2) << "batch [" << bi << "] " << "float feed value prepared";
+    VLOG(2) << "batch [" << bi << "] "
+            << "float feed value prepared";

    vec_idx = 0;
    for (auto &name : int_feed_name) {
@@ -282,7 +345,8 @@ std::vector<std::vector<std::vector<float>>> PredictorClient::batch_predict(
      vec_idx++;
    }

-    VLOG(2) << "batch [" << bi << "] " << "itn feed value prepared";
+    VLOG(2) << "batch [" << bi << "] "
+            << "itn feed value prepared";
  }

  Response res;
@@ -308,10 +372,6 @@ std::vector<std::vector<std::vector<float>>> PredictorClient::batch_predict(
        }
      }
    }
-    //last index for infer time
-    fetch_result_batch[batch_size].resize(1);
-    fetch_result_batch[batch_size][0].resize(1);
-    fetch_result_batch[batch_size][0][0] = res.mean_infer_us();
  }

  return fetch_result_batch;

--- a/core/general-client/src/pybind_general_model.cpp
+++ b/core/general-client/src/pybind_general_model.cpp
@@ -31,6 +31,10 @@ PYBIND11_MODULE(serving_client, m) {
       )pddoc";
  py::class_<PredictorClient>(m, "PredictorClient", py::buffer_protocol())
      .def(py::init())
+      .def("init_gflags",
+           [](PredictorClient &self, std::vector<std::string> argv) {
+             self.init_gflags(argv);
+           })
      .def("init",
           [](PredictorClient &self, const std::string &conf) {
             return self.init(conf);

--- a/core/general-server/op/general_infer_helper.h
+++ b/core/general-server/op/general_infer_helper.h
@@ -14,6 +14,7 @@

 #pragma once

+#include <string.h>
 #include <vector>
 #ifdef BCLOUD
 #ifdef WITH_GPU
@@ -34,8 +35,8 @@ static const char* GENERAL_MODEL_NAME = "general_model";

 struct GeneralBlob {
  std::vector<paddle::PaddleTensor> tensor_vector;
-  double infer_time;
-  std::vector<std::string> fetch_name_vector;
+  int64_t time_stamp[20];
+  int p_size = 0;

  int _batch_size;

@@ -50,22 +51,20 @@ struct GeneralBlob {
  int SetBatchSize(int batch_size) { _batch_size = batch_size; }

  int GetBatchSize() const { return _batch_size; }
-  /*
-  int GetBatchSize() const {
-    if (tensor_vector.size() > 0) {
-      if (tensor_vector[0].lod.size() == 1) {
-        return tensor_vector[0].lod[0].size() - 1;
-      } else {
-        return tensor_vector[0].shape[0];
-      }
-    } else {
-      return -1;
-    }
-  }
-  */
  std::string ShortDebugString() const { return "Not implemented!"; }
 };

+static void AddBlobInfo(GeneralBlob* blob, int64_t init_value) {
+  blob->time_stamp[blob->p_size] = init_value;
+  blob->p_size++;
+}
+
+static void CopyBlobInfo(const GeneralBlob* src, GeneralBlob* tgt) {
+  memcpy(&(tgt->time_stamp[0]),
+         &(src->time_stamp[0]),
+         src->p_size * sizeof(int64_t));
+}
+
 }  // namespace serving
 }  // namespace paddle_serving
 }  // namespace baidu
--- a/core/general-server/op/general_infer_op.cpp
+++ b/core/general-server/op/general_infer_op.cpp
@@ -51,16 +51,20 @@ int GeneralInferOp::inference() {
  output_blob->SetBatchSize(batch_size);

  VLOG(2) << "infer batch size: " << batch_size;
-  // infer
-  // Timer timeline;
-  // double infer_time = 0.0;
-  // timeline.Start();
+
+  Timer timeline;
+  int64_t start = timeline.TimeStampUS();
+  timeline.Start();
+
  if (InferManager::instance().infer(GENERAL_MODEL_NAME, in, out, batch_size)) {
    LOG(ERROR) << "Failed do infer in fluid model: " << GENERAL_MODEL_NAME;
    return -1;
  }
-  // timeline.Pause();
-  // infer_time = timeline.ElapsedUS();
+
+  int64_t end = timeline.TimeStampUS();
+  CopyBlobInfo(input_blob, output_blob);
+  AddBlobInfo(output_blob, start);
+  AddBlobInfo(output_blob, end);
  return 0;
 }
 DEFINE_OP(GeneralInferOp);

--- a/core/general-server/op/general_reader_op.cpp
+++ b/core/general-server/op/general_reader_op.cpp
@@ -20,11 +20,13 @@
 #include "core/general-server/op/general_infer_helper.h"
 #include "core/predictor/framework/infer.h"
 #include "core/predictor/framework/memory.h"
+#include "core/util/include/timer.h"

 namespace baidu {
 namespace paddle_serving {
 namespace serving {

+using baidu::paddle_serving::Timer;
 using baidu::paddle_serving::predictor::MempoolWrapper;
 using baidu::paddle_serving::predictor::general_model::Tensor;
 using baidu::paddle_serving::predictor::general_model::Request;
@@ -86,9 +88,10 @@ int GeneralReaderOp::inference() {
    LOG(ERROR) << "Failed get op tls reader object output";
  }

+  Timer timeline;
+  int64_t start = timeline.TimeStampUS();
  int var_num = req->insts(0).tensor_array_size();
  VLOG(2) << "var num: " << var_num;
-  // read config

  VLOG(2) << "start to call load general model_conf op";
  baidu::paddle_serving::predictor::Resource &resource =
@@ -197,6 +200,12 @@ int GeneralReaderOp::inference() {
    }
  }

+  timeline.Pause();
+  int64_t end = timeline.TimeStampUS();
+  res->p_size = 0;
+  AddBlobInfo(res, start);
+  AddBlobInfo(res, end);
+
  VLOG(2) << "read data from client success";
  return 0;
 }

--- a/core/general-server/op/general_response_op.cpp
+++ b/core/general-server/op/general_response_op.cpp
@@ -51,6 +51,11 @@ int GeneralResponseOp::inference() {

  const Request *req = dynamic_cast<const Request *>(get_request_message());

+  Timer timeline;
+  // double response_time = 0.0;
+  // timeline.Start();
+  int64_t start = timeline.TimeStampUS();
+
  VLOG(2) << "start to call load general model_conf op";
  baidu::paddle_serving::predictor::Resource &resource =
      baidu::paddle_serving::predictor::Resource::instance();
@@ -69,8 +74,6 @@ int GeneralResponseOp::inference() {
  // response inst with only fetch_var_names
  Response *res = mutable_data<Response>();

-  // res->set_mean_infer_us(infer_time);
-
  for (int i = 0; i < batch_size; ++i) {
    FetchInst *fetch_inst = res->add_insts();
    for (auto &idx : fetch_index) {
@@ -123,6 +126,18 @@ int GeneralResponseOp::inference() {
    }
    var_idx++;
  }
+
+  if (req->profile_server()) {
+    int64_t end = timeline.TimeStampUS();
+    VLOG(2) << "p size for input blob: " << input_blob->p_size;
+    for (int i = 0; i < input_blob->p_size; ++i) {
+      res->add_profile_time(input_blob->time_stamp[i]);
+    }
+    // TODO(guru4elephant): find more elegant way to do this
+    res->add_profile_time(start);
+    res->add_profile_time(end);
+  }
+
  return 0;
 }


--- a/core/general-server/op/general_text_reader_op.cpp
+++ b/core/general-server/op/general_text_reader_op.cpp
@@ -19,11 +19,13 @@
 #include "core/general-server/op/general_text_reader_op.h"
 #include "core/predictor/framework/infer.h"
 #include "core/predictor/framework/memory.h"
+#include "core/util/include/timer.h"

 namespace baidu {
 namespace paddle_serving {
 namespace serving {

+using baidu::paddle_serving::Timer;
 using baidu::paddle_serving::predictor::MempoolWrapper;
 using baidu::paddle_serving::predictor::general_model::Tensor;
 using baidu::paddle_serving::predictor::general_model::Request;
@@ -54,9 +56,11 @@ int GeneralTextReaderOp::inference() {
    return -1;
  }

+  Timer timeline;
+  int64_t start = timeline.TimeStampUS();
+
  int var_num = req->insts(0).tensor_array_size();
  VLOG(2) << "var num: " << var_num;
-  // read config

  VLOG(2) << "start to call load general model_conf op";
  baidu::paddle_serving::predictor::Resource &resource =
@@ -157,6 +161,10 @@ int GeneralTextReaderOp::inference() {
    }
  }

+  int64_t end = timeline.TimeStampUS();
+  AddBlobInfo(res, start);
+  AddBlobInfo(res, end);
+
  VLOG(2) << "read data from client success";
  return 0;
 }

--- a/core/general-server/op/general_text_response_op.cpp
+++ b/core/general-server/op/general_text_response_op.cpp
@@ -12,11 +12,11 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

+#include "core/general-server/op/general_text_response_op.h"
 #include <algorithm>
 #include <iostream>
 #include <memory>
 #include <sstream>
-#include "core/general-server/op/general_text_response_op.h"
 #include "core/predictor/framework/infer.h"
 #include "core/predictor/framework/memory.h"
 #include "core/predictor/framework/resource.h"
@@ -36,12 +36,10 @@ using baidu::paddle_serving::predictor::InferManager;
 using baidu::paddle_serving::predictor::PaddleGeneralModelConfig;

 int GeneralTextResponseOp::inference() {
-  const GeneralBlob *input_blob =
-      get_depend_argument<GeneralBlob>(pre_name());
+  const GeneralBlob *input_blob = get_depend_argument<GeneralBlob>(pre_name());

  if (!input_blob) {
-    LOG(ERROR) << "Failed mutable depended argument, op: "
-               << pre_name();
+    LOG(ERROR) << "Failed mutable depended argument, op: " << pre_name();
    return -1;
  }

@@ -49,10 +47,11 @@ int GeneralTextResponseOp::inference() {
  int batch_size = input_blob->GetBatchSize();

  VLOG(2) << "infer batch size: " << batch_size;
-  // infer
-
  const Request *req = dynamic_cast<const Request *>(get_request_message());

+  Timer timeline;
+  int64_t start = timeline.TimeStampUS();
+
  VLOG(2) << "start to call load general model_conf op";
  baidu::paddle_serving::predictor::Resource &resource =
      baidu::paddle_serving::predictor::Resource::instance();
@@ -67,15 +66,13 @@ int GeneralTextResponseOp::inference() {
    fetch_index[i] =
        model_config->_fetch_alias_name_to_index[req->fetch_var_names(i)];
  }
-  
+
  // response inst with only fetch_var_names
  Response *res = mutable_data<Response>();

-  // res->set_mean_infer_us(infer_time);
-
  for (int i = 0; i < batch_size; ++i) {
    FetchInst *fetch_inst = res->add_insts();
-    for (auto & idx : fetch_index) {
+    for (auto &idx : fetch_index) {
      Tensor *tensor = fetch_inst->add_tensor_array();
      // currently only response float tensor or lod_tensor
      tensor->set_elem_type(1);
@@ -85,8 +82,7 @@ int GeneralTextResponseOp::inference() {
      } else {
        VLOG(2) << "out[" << idx << "] is tensor";
        for (int k = 1; k < in->at(idx).shape.size(); ++k) {
-          VLOG(2) << "shape[" << k - 1 << "]: "
-                  << in->at(idx).shape[k];
+          VLOG(2) << "shape[" << k - 1 << "]: " << in->at(idx).shape[k];
          tensor->add_shape(in->at(idx).shape[k]);
        }
      }
@@ -94,7 +90,7 @@ int GeneralTextResponseOp::inference() {
  }

  int var_idx = 0;
-  for (auto & idx : fetch_index) {
+  for (auto &idx : fetch_index) {
    float *data_ptr = static_cast<float *>(in->at(idx).data.data());
    int cap = 1;
    for (int j = 1; j < in->at(idx).shape.size(); ++j) {
@@ -102,8 +98,8 @@ int GeneralTextResponseOp::inference() {
    }
    if (model_config->_is_lod_fetch[idx]) {
      for (int j = 0; j < batch_size; ++j) {
-        for (int k = in->at(idx).lod[0][j];
-             k < in->at(idx).lod[0][j + 1]; k++) {
+        for (int k = in->at(idx).lod[0][j]; k < in->at(idx).lod[0][j + 1];
+             k++) {
          res->mutable_insts(j)->mutable_tensor_array(var_idx)->add_float_data(
              data_ptr[k]);
        }
@@ -118,6 +114,18 @@ int GeneralTextResponseOp::inference() {
    }
    var_idx++;
  }
+
+  if (req->profile_server()) {
+    int64_t end = timeline.TimeStampUS();
+
+    for (int i = 0; i < input_blob->p_size; ++i) {
+      res->add_profile_time(input_blob->time_stamp[i]);
+    }
+    // TODO(guru4elephant): find more elegant way to do this
+    res->add_profile_time(start);
+    res->add_profile_time(end);
+  }
+
  return 0;
 }
 DEFINE_OP(GeneralTextResponseOp);

--- a/core/general-server/proto/general_model_service.proto
+++ b/core/general-server/proto/general_model_service.proto
@@ -38,11 +38,12 @@ message FetchInst {
 message Request {
  repeated FeedInst insts = 1;
  repeated string fetch_var_names = 2;
+  optional bool profile_server = 3 [ default = false ];
 };

 message Response {
  repeated FetchInst insts = 1;
-  optional float mean_infer_us = 2;
+  repeated int64 profile_time = 2;
 };

 service GeneralModelService {

--- a/core/predictor/framework/service.cpp
+++ b/core/predictor/framework/service.cpp
@@ -147,6 +147,7 @@ int InferService::inference(const google::protobuf::Message* request,
  TRACEPRINTF("finish to thread clear");

  if (_enable_map_request_to_workflow) {
+    LOG(INFO) << "enable map request == True";
    std::vector<Workflow*>* workflows = _map_request_to_workflow(request);
    if (!workflows || workflows->size() == 0) {
      LOG(ERROR) << "Failed to map request to workflow";
@@ -169,6 +170,7 @@ int InferService::inference(const google::protobuf::Message* request,
      }
    }
  } else {
+    LOG(INFO) << "enable map request == False";
    TRACEPRINTF("start to execute one workflow");
    size_t fsize = _flows.size();
    for (size_t fi = 0; fi < fsize; ++fi) {
@@ -233,6 +235,7 @@ int InferService::_execute_workflow(Workflow* workflow,
  TRACEPRINTF("finish to copy from");

  workflow_time.stop();
+  LOG(INFO) << "workflow total time: " << workflow_time.u_elapsed();
  PredictorMetric::GetInstance()->update_latency_metric(
      WORKFLOW_METRIC_PREFIX + dv->full_name(), workflow_time.u_elapsed());


--- a/core/sdk-cpp/proto/general_model_service.proto
+++ b/core/sdk-cpp/proto/general_model_service.proto
@@ -38,11 +38,12 @@ message FetchInst {
 message Request {
  repeated FeedInst insts = 1;
  repeated string fetch_var_names = 2;
+  optional bool profile_server = 3 [ default = false ];
 };

 message Response {
  repeated FetchInst insts = 1;
-  optional float mean_infer_us = 2;
+  repeated int64 profile_time = 2;
 };

 service GeneralModelService {

--- a/core/util/CMakeLists.txt
+++ b/core/util/CMakeLists.txt
 include(src/CMakeLists.txt)
 add_library(utils ${util_srcs})
+
--- a/core/util/include/timer.h
+++ b/core/util/include/timer.h
@@ -38,6 +38,7 @@ class Timer {
  double ElapsedMS();
  // return elapsed time in sec
  double ElapsedSec();
+  int64_t TimeStampUS();

 private:
  struct timeval _start;

--- a/core/util/src/CMakeLists.txt
+++ b/core/util/src/CMakeLists.txt
 FILE(GLOB srcs ${CMAKE_CURRENT_LIST_DIR}/*.cc)
 LIST(APPEND util_srcs ${srcs})
+
--- a/core/util/src/timer.cc
+++ b/core/util/src/timer.cc
@@ -54,6 +54,11 @@ double Timer::ElapsedMS() { return _elapsed / 1000.0; }

 double Timer::ElapsedSec() { return _elapsed / 1000000.0; }

+int64_t Timer::TimeStampUS() {
+  gettimeofday(&_now, NULL);
+  return _now.tv_usec;
+}
+
 int64_t Timer::Tickus() {
  gettimeofday(&_now, NULL);
  return (_now.tv_sec - _start.tv_sec) * 1000 * 1000L +

--- a/doc/CONTRIBUTE.md
+++ b/doc/CONTRIBUTE.md
-# Contribution Guideline
+Contribute Code

-## How to contribute
+You are welcome to contribute to project Paddle Serving.

-### Contribute Code
+We sincerely appreciate your contribution.  This document explains our workflow and work style.

-If you have improvements on Paddle Serving, please send us Pull Requests! github.com provides guidelines for submitting pull requests [howto](https://help.github.com/articles/using-pull-requests/). 
+## Workflow
+
+Paddle Serving uses this [Git branching model](http://nvie.com/posts/a-successful-git-branching-model/).  The following steps guide usual contributions.
+
+1. Fork
+
+   Our development community has been growing fastly; it doesn't make sense for everyone to write into the official repo.  So, please file Pull Requests from your fork.  To make a fork,  just head over to the GitHub page and click the ["Fork" button](https://help.github.com/articles/fork-a-repo/).
+
+1. Clone
+
+   To make a copy of your fork to your local computers, please run
+
+   ```bash
+   git clone https://github.com/your-github-account/Serving
+   cd Serving
+   ```
+
+1. Create the local feature branch
+
+   For daily works like adding a new feature or fixing a bug, please open your feature branch before coding:
+
+   ```bash
+   git checkout -b my-cool-stuff
+   ```
+
+1. Commit
+
+   Before issuing your first `git commit` command, please install [`pre-commit`](http://pre-commit.com/) by running the following commands:
+
+   ```bash
+   pip install pre-commit
+   pre-commit install
+   ```
+
+   Our pre-commit configuration requires clang-format 3.8 for auto-formating C/C++ code and yapf for Python.
+
+   Once installed, `pre-commit` checks the style of code and documentation in every commit.  We will see something like the following when you run `git commit`:
+
+   ```shell
+   $  git commit
+   CRLF end-lines remover...............................(no files to check)Skipped
+   yapf.................................................(no files to check)Skipped
+   Check for added large files..............................................Passed
+   Check for merge conflicts................................................Passed
+   Check for broken symlinks................................................Passed
+   Detect Private Key...................................(no files to check)Skipped
+   Fix End of Files.....................................(no files to check)Skipped
+   clang-formater.......................................(no files to check)Skipped
+   [my-cool-stuff c703c041] add test file
+    1 file changed, 0 insertions(+), 0 deletions(-)
+    create mode 100644 233
+   ```
+
+    NOTE: The `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different. Paddle developers use `pip install pre-commit`.
+
+1. Build and test
+
+   Users can build Paddle Serving natively on Linux, see the [BUILD steps](doc/INSTALL.md).
+
+1. Keep pulling
+
+   An experienced Git user pulls from the official repo often -- daily or even hourly, so they notice conflicts with others work early, and it's easier to resolve smaller conflicts.
+
+   ```bash
+   git remote add upstream https://github.com/PaddlePaddle/Serving
+   git pull upstream develop
+   ```
+
+1. Push and file a pull request
+
+   You can "push" your local work into your forked repo:
+
+   ```bash
+   git push origin my-cool-stuff
+   ```
+
+   The push allows you to create a pull request, requesting owners of this [official repo](https://github.com/PaddlePaddle/Serving) to pull your change into the official one.
+
+   To create a pull request, please follow [these steps](https://help.github.com/articles/creating-a-pull-request/).
+
+   If your change is for fixing an issue, please write ["Fixes <issue-URL>"](https://help.github.com/articles/closing-issues-using-keywords/) in the description section of your pull request.  Github would close the issue when the owners merge your pull request.
+
+   Please remember to specify some reviewers for your pull request.  If you don't know who are the right ones, please follow Github's recommendation.
+
+
+1. Delete local and remote branches
+
+   To keep your local workspace and your fork clean, you might want to remove merged branches:
+
+   ```bash
+   git push origin :my-cool-stuff
+   git checkout develop
+   git pull upstream develop
+   git branch -d my-cool-stuff
+   ```
+
+### Code Review
+
+-  Please feel free to ping your reviewers by sending them the URL of your pull request via IM or email.  Please do this after your pull request passes the CI.
+
+- Please answer reviewers' every comment.  If you are to follow the comment, please write "Done"; please give a reason otherwise.
+
+- If you don't want your reviewers to get overwhelmed by email notifications, you might reply their comments by [in a batch](https://help.github.com/articles/reviewing-proposed-changes-in-a-pull-request/).
+
+- Reduce the unnecessary commits.  Some developers commit often.  It is recommended to append a sequence of small changes into one commit by running `git commit --amend` instead of `git commit`.
+
+
+## Coding Standard
+
+### Code Style
+
+Our C/C++ code follows the [Google style guide](http://google.github.io/styleguide/cppguide.html).
+
+Our Python code follows the [PEP8 style guide](https://www.python.org/dev/peps/pep-0008/).
+
+Please install pre-commit, which automatically reformat the changes to C/C++ and Python code whenever we run `git commit`.  To check the whole codebase, we can run the command `pre-commit run -a`, which is invoked by [our Travis CI configuration].
+
+### Unit Tests
+
+Please remember to add related unit tests.
+
+- For C/C++ code, please follow [`google-test` Primer](https://github.com/google/googletest/blob/master/googletest/docs/primer.md) .
+
+- For Python code, please use [Python's standard `unittest` package](http://pythontesting.net/framework/unittest/unittest-introduction/).
+
+
+### Writing Logs
+
+We use [glog](https://github.com/google/glog) for logging in our C/C++ code.
+
+We use LOG() for general logging
+
+```c++
+LOG(INFO) << "Operator FC is taking " << num_inputs << "inputs."
+```
+
+When we run a Paddle Serving application or test, we can specify a logging level.  For example:
+
+```bash
+GLOG_minloglevel=1 bin/serving
+```
+0 - INFO
+
+1 - WARNING
+
+2 -ERROR
+
+3 - FATAL (Be careful as FATAL log will generate a coredump)

--- a/python/paddle_serving_client/__init__.py
+++ b/python/paddle_serving_client/__init__.py
@@ -17,6 +17,7 @@ from .proto import sdk_configure_pb2 as sdk
 from .proto import general_model_config_pb2 as m_config
 import google.protobuf.text_format
 import time
+import sys

 int_type = 0
 float_type = 1
@@ -87,6 +88,9 @@ class Client(object):
        # map feed names to index
        self.client_handle_ = PredictorClient()
        self.client_handle_.init(path)
+        read_env_flags = ["profile_client", "profile_server"]
+        self.client_handle_.init_gflags([sys.argv[0]] +
+                                        ["--tryfromenv=" + ",".join(read_env_flags)])
        self.feed_names_ = [var.alias_name for var in model_conf.feed_var]
        self.fetch_names_ = [var.alias_name for var in model_conf.fetch_var]
        self.feed_shapes_ = [var.shape for var in model_conf.feed_var]
@@ -143,9 +147,6 @@ class Client(object):
        for i, name in enumerate(fetch_names):
            result_map[name] = result[i]

-        if profile:
-            result_map["infer_time"] = result[-1][0]
-
        return result_map

    def batch_predict(self, feed_batch=[], fetch=[], profile=False):