Merge branch 'develop' into 'develop'

add missing files See merge request paddlepaddle/paddlelitearmbackend!12

Merge branch 'develop' into 'develop'
add missing files See merge request paddlepaddle/paddlelitearmbackend!12
32aceef1 · Tensor Tang · ac5d9406 · 66fafb56 · 32aceef1 · 32aceef1
22 changed file
--- a/.github/ISSUE_TEMPLATE/---feature-request-.md
+++ b/.github/ISSUE_TEMPLATE/---feature-request-.md
+---
+name: 建议(Feature request)
+about: 您可以提出您的建议。 You could use this template for reporting a suggestion  issue.
+---
+欢迎您对PaddlePaddle提出建议，非常感谢您对PaddlePaddle的贡献！
+在留下您的建议时，辛苦您同步提供如下信息：
+- 版本、环境信息
+1）PaddlePaddle版本：请提供您的PaddlePaddle版本号，例如1.1
+2）CPU/GPU：您是否使用GPU进行训练，如是，请提供您的CUDA和cuDNN版本号
+3）系统环境：请您描述系统类型、版本，例如Mac OS 10.14
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 建议描述：请您详细描述，您认为需优化的功能
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
+Please make sure that this is a feature request. 
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg.Mac OS 10.14)
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe the feature and the current behavior/state.**
+**Any Other info.**
--- a/.github/ISSUE_TEMPLATE/---inference-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---inference-issue-.md
+---
+name: 预测（Inference Issue）
+about: 您可以提问预测中报错、应用等问题。 You could use this template for reporting an inference issue.
+---
+为使您的问题得到快速解决，在建立Issue前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+如果您没有查询到相似问题，为快速解决您的提问，建立issue时请提供如下细节信息：
+- 标题：简洁、精准描述您的问题，例如“最新预测库的API文档在哪儿 ”
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供您的PaddlePaddle版本号（如1.1）或CommitID
+    2）CPU：预测若用CPU，请提供CPU型号，MKL/OpenBlas/MKLDNN/等数学库使用情况
+    3）GPU：预测若用GPU，请提供GPU型号、CUDA和CUDNN版本号
+    4）系统环境：请您描述系统类型、版本（如Mac OS 10.14），Python版本
+-预测信息
+    1）C++预测：请您提供预测库安装包的版本信息，及其中的version.txt文件
+    2）CMake包含路径的完整命令
+    3）API信息（如调用请提供）
+    4）预测库来源：官网下载/特殊环境（如BCLOUD编译）
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志/代码关键片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github in case that th
+If there is no solution,please make sure that this is an inference issue including the following details :
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg.Mac OS 10.14)
+-Python version
+-Cmake orders
+-C++version.txt
+-API information
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.github/ISSUE_TEMPLATE/---installation-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---installation-issue-.md
+---
+name: 安装（Installation Issue）
+about: 您可以提问安装、编译出现报错等问题。 You could use this template for reporting an installation
+   issue.
+---
+为使您的问题得到快速解决，在建立Issue前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+建立issue时，为快速解决问题，请您根据使用情况给出如下信息：
+- 标题：请包含关键词“安装错误”/“编译错误”，例如“Mac编译错误”
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供您的PaddlePaddle版本号（如1.1）或CommitID
+    2）CPU：请提供CPU型号，MKL/OpenBlas/MKLDNN/等数学库的使用情况
+    3）GPU：请提供GPU型号，CUDA和CUDNN版本号
+    4）系统环境：请说明系统类型、版本（如Mac OS 10.14）、Python版本
+- 安装方式信息：
+1）pip安装/docker安装
+2）本地编译：请提供cmake命令，编译命令
+3）docker编译：请提供docker镜像，编译命令            
+  特殊环境请注明：如离线安装等
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志/代码关键片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in Github in case that there was a similar issue submitted or resolved before.
+If there is no solution,please make sure that this is an installation issue including the following details:
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg. Mac OS 10.14)
+-Python version
+- Install method: pip install/install with docker/build from source(without docker)/build within docker
+- Other special cases that you think may be related to this problem, eg. offline install, special internet condition   
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.github/ISSUE_TEMPLATE/---model-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---model-issue-.md
+---
+name: 模型（Model Issue）
+about: 您可以提问模型、算法、数据集方向的使用报错等问题。You could use this template for reporting a model/
+  algorithm/dataset  issue.
+---
+为使您的问题得到快速解决，在建立Issue前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+建立issue时，为快速解决问题，请您根据使用情况给出如下信息：
+- 标题：简洁、精准描述您的问题，例如“ssd 模型前置lstm报错  ”
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供PaddlePaddle版本号，例如1.1或CommitID
+    2）CPU：请提供CPU型号，MKL/OpenBlas/MKLDNN/等数学库的使用情况
+    3）GPU：请提供GPU型号，CUDA和CUDNN版本号
+    4）系统环境：请说明系统类型、版本（例如Mac OS 10.14），Python版本
+- 模型信息
+    1）模型名称 2）使用数据集名称 3）使用算法名称 4）模型链接
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志/代码关键片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github.Probably there was a similar issue submitted or resolved before.
+If there is no solution,please make sure that this is a issue of models including the following details:
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg.Mac OS 10.14)
+-Python version
+-Name of Models&Dataset/details of operator
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.github/ISSUE_TEMPLATE/---others-.md
+++ b/.github/ISSUE_TEMPLATE/---others-.md
+---
+name: 其他（Others）
+about: 如上述分类未包含您的问题，可在此提出。 You could use this template for reporting other issues
+---
+为使您的问题得到快速解决，在建立Issues前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+如果您没有查询到相似问题，为快速解决您的提问，建立issue时请提供如下细节信息：
+- 标题：简洁、精准概括您的问题
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供您的PaddlePaddle版本号，例如1.1或CommitID
+    2）CPU/GPU：如果您使用GPU训练，请提供GPU驱动版本、CUDA和cuDNN版本号
+    3）系统环境：请您描述系统类型、版本，例如Mac OS 10.14
+    4）Python版本号
+    5）显存信息
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志/代码关键片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
+If there is no solution,please provide us with the following details :
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/cuDNN version
+-OS Platform and Distribution(eg.Mac OS 10.14)
+-Python version 
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.github/ISSUE_TEMPLATE/---training-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---training-issue-.md
+---
+name: 训练（Training issue）
+about: 您可以提问训练中报错、应用、出core等问题。 You could use this template for reporting an training
+   issue.
+---
+为使您的问题得到快速解决，在建立Issues前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+如果您没有查询到相似问题，为快速解决您的提问，建立issue时请提供如下细节信息：
+- 标题：简洁、精准概括您的问题，例如“Insufficient Memory xxx" ”
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供您的PaddlePaddle版本号，例如1.1或CommitID
+    2）CPU：预测若用CPU，请提供CPU型号，MKL/OpenBlas/MKLDNN/等数学库使用情况
+    3）GPU：预测若用GPU，请提供GPU型号、CUDA和CUDNN版本号
+    4）系统环境：请您描述系统类型、版本，例如Mac OS 10.14，Python版本
+- 训练信息
+    1）单机/多机，单卡/多卡
+    2）显存信息
+    3）Operator信息
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志、可复现的代码片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
+If there is no solution,please make sure that this is a training issue including the following details:
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg.Mac OS 10.14)
+-Other imformation: Distriuted training/informantion of operator/
+Graphics card storage
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.gitignore
+++ b/.gitignore
@@ -10,6 +10,7 @@ paddle/fluid/operators/distributed/send_recv.proto
 *.vs
 build/
 build_doc/
+build.*
 *.user
 *.sh
 *.bkp

--- a/paddle/fluid/lite/core/mir/pattern_matcher_high_api.cc
+++ b/paddle/fluid/lite/core/mir/pattern_matcher_high_api.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include "paddle/fluid/lite/core/mir/pattern_matcher_high_api.h"
+#include <glog/logging.h>
+namespace paddle {
+namespace lite {
+namespace mir {
+void FuseBase::PerformPatternMatcher(SSAGraph *graph) {
+  LOG(INFO) << "\n" << matcher_.pattern().DotString();
+  // Get subgraphs and record the mir::Node pointers for each PMNode.
+  auto handler = [&](const PatternMatcher::subgraph_t &subgraph, SSAGraph *g) {
+    // get all the reigistered nodes.
+    key2nodes_.emplace_back();
+    for (auto &item : nodes_) {
+      key2nodes_.back()[item.first] = subgraph.at(item.second);
+    }
+  };
+  matcher_(graph, handler);
+}
+void FuseBase::DeleteInterNodes(SSAGraph *graph) {
+  std::set<std::string> keys;
+  for (auto &node : nodes_) {
+    if (node.second->IsIntermediate()) {
+      keys.insert(node.first);
+    }
+  }
+  LOG(INFO) << "keys.size " << keys.size();
+  std::unordered_set<const Node *> nodes2rm;
+  for (auto &matched : key2nodes_) {
+    LOG(INFO) << "get matched " << matched.size();
+    for (const auto &key : keys) {
+      nodes2rm.insert(matched.at(key));
+    }
+  }
+  LOG(INFO) << "clean nodes " << nodes2rm.size();
+  GraphSafeRemoveNodes(graph, nodes2rm);
+}
+PMNode *FuseBase::GetOrCreateNode(const std::string &key) {
+  auto it = nodes_.find(key);
+  if (it != nodes_.end()) {
+    return it->second;
+  }
+  nodes_.emplace(key,
+                 matcher_.mutable_pattern()->NewNode(patterns::UniqueKey(key)));
+  it = nodes_.find(key);
+  return it->second;
+}
+PMNode *FuseBase::OpNode(const std::string &key, const std::string &op_type) {
+  GetOrCreateNode(key)->set_op_type(op_type);
+  GetOrCreateNode(key)->AsOp(op_type);
+  return GetOrCreateNode(key);
+}
+PMNode *FuseBase::VarNode(const std::string &key) {
+  GetOrCreateNode(key)->AsVar();
+  return GetOrCreateNode(key);
+}
+}  // namespace mir
+}  // namespace lite
+}  // namespace paddle
--- a/paddle/fluid/lite/core/mir/pattern_matcher_high_api.h
+++ b/paddle/fluid/lite/core/mir/pattern_matcher_high_api.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#pragma once
+#include <map>
+#include <set>
+#include <string>
+#include <unordered_map>
+#include <unordered_set>
+#include <utility>
+#include <vector>
+#include "paddle/fluid/lite/core/mir/node.h"
+#include "paddle/fluid/lite/core/mir/pattern_matcher.h"
+#include "paddle/fluid/lite/core/mir/ssa_graph.h"
+namespace paddle {
+namespace lite {
+namespace mir {
+class FuseBase {
+ public:
+  using key2nodes_t = std::map<std::string, Node*>;
+  virtual ~FuseBase() = default;
+  void operator()(SSAGraph* graph) {
+    BuildPattern();
+    PerformPatternMatcher(graph);
+    for (const auto& matched : key2nodes_) {
+      InsertNewNode(graph, matched);
+    }
+    DeleteInterNodes(graph);
+  }
+  // Build a PMPattern using PMNode.
+  virtual void BuildPattern() = 0;
+  // Generate an operator desc with a matched subgraph.
+  virtual cpp::OpDesc GenOpDesc(const key2nodes_t& matched) = 0;
+  PMNode* OpNode(const std::string& key, const std::string& op_type);
+  PMNode* VarNode(const std::string& key);
+ protected:
+  virtual void InsertNewNode(SSAGraph* graph, const key2nodes_t& matched) = 0;
+ private:
+  void PerformPatternMatcher(SSAGraph* graph);
+  // Delete nodes that are marked as Intermediate
+  void DeleteInterNodes(SSAGraph* graph);
+ private:
+  PMNode* GetOrCreateNode(const std::string& key);
+ protected:
+  PatternMatcher matcher_;
+  std::map<std::string, PMNode*> nodes_;
+  std::vector<key2nodes_t> key2nodes_;
+};
+}  // namespace mir
+}  // namespace lite
+}  // namespace paddle
--- a/paddle/fluid/lite/core/mir/pattern_matcher_high_api_test.cc
+++ b/paddle/fluid/lite/core/mir/pattern_matcher_high_api_test.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include "paddle/fluid/lite/core/mir/pattern_matcher_high_api.h"
+#include <gtest/gtest.h>
+#include <memory>
+#include "paddle/fluid/framework/program_desc.h"
+#include "paddle/fluid/lite/core/compatible_tensor.h"
+#include "paddle/fluid/lite/core/mir/graph_visualize_pass.h"
+#include "paddle/fluid/lite/core/program.h"
+namespace paddle {
+namespace lite {
+namespace mir {
+// An demo.
+class FcFuser : public FuseBase {
+ public:
+  void BuildPattern() override {
+    // create nodes.
+    auto* x = VarNode("x");
+    auto* W = VarNode("W");
+    auto* b = VarNode("b");
+    auto* mul = OpNode("mul", "mul");
+    auto* mul_out = VarNode("mul_out");
+    auto* add = OpNode("add", "elementwise_add");
+    auto* Out = VarNode("Out");
+    // create topology.
+    // std::vector<PMNode*>({W, x}) >> *mul >> *mul_out;
+    // std::vector<PMNode*>({mul_out, b}) >> *add >> *Out;
+    *W >> *mul;
+    *x >> *mul >> *mul_out;
+    *b >> *add;
+    *mul_out >> *add >> *Out;
+    // Some op specialities.
+    mul_out->AsIntermediate();
+    mul->AsIntermediate();
+    add->AsIntermediate();
+  }
+  void InsertNewNode(SSAGraph* graph, const key2nodes_t& matched) override {
+    auto op_desc = GenOpDesc(matched);
+    auto fc_op = LiteOpRegistry::Global().Create("fc");
+    auto mul = matched.at("mul")->stmt()->op;
+    auto* scope = mul->scope();
+    auto& valid_places = mul->valid_places();
+    fc_op->Attach(op_desc, scope);
+    auto* new_op_node = graph->GraphCreateInstructNode(fc_op, valid_places);
+    IR_NODE_LINK_TO(matched.at("W"), new_op_node);
+    IR_NODE_LINK_TO(matched.at("x"), new_op_node);
+    IR_NODE_LINK_TO(matched.at("b"), new_op_node);
+    IR_NODE_LINK_TO(new_op_node, matched.at("Out"));
+  }
+ private:
+  cpp::OpDesc GenOpDesc(const key2nodes_t& matched) override {
+    cpp::OpDesc op_desc;
+    op_desc.SetType("fc");
+    op_desc.SetInput("Input", {matched.at("x")->arg()->name});
+    op_desc.SetInput("W", {matched.at("W")->arg()->name});
+    op_desc.SetInput("Bias", {matched.at("b")->arg()->name});
+    op_desc.SetOutput("Out", {matched.at("Out")->arg()->name});
+    op_desc.SetAttr("in_num_col_dims", 1);
+    return op_desc;
+  }
+};
+std::unique_ptr<SSAGraph> BuildGraph(framework::ProgramDesc* program_desc,
+                                     const std::shared_ptr<Scope>& scope,
+                                     const std::vector<Place>& valid_places) {
+  auto* main_block = program_desc->MutableBlock(0);
+  auto* mul = main_block->AppendOp();
+  auto* add = main_block->AppendOp();
+  main_block->Var("x");
+  main_block->Var("b");
+  main_block->Var("mul_out");
+  main_block->Var("w");
+  main_block->Var("out");
+  main_block->Var("out1");
+  scope->Var("w")->GetMutable<lite::Tensor>();
+  scope->Var("b")->GetMutable<lite::Tensor>();
+  scope->Var("mul_out")->GetMutable<lite::Tensor>();
+  scope->Var("w")->GetMutable<lite::Tensor>();
+  scope->Var("out")->GetMutable<lite::Tensor>();
+  scope->Var("out1")->GetMutable<lite::Tensor>();
+  mul->SetInput("X", {"x"});
+  mul->SetInput("Y", {"w"});
+  mul->SetOutput("Out", {"mul_out"});
+  mul->SetType("mul");
+  mul->SetAttr("x_num_col_dims", 1);
+  mul->SetAttr("y_num_col_dims", 1);
+  add->SetInput("X", {"mul_out"});
+  add->SetInput("Y", {"b"});
+  add->SetOutput("Out", {"out"});
+  add->SetType("elementwise_add");
+  add->SetAttr("axis", 1);
+  program_desc->Flush();
+  lite::Program program(*program_desc->Proto(), scope, valid_places);
+  auto graph = std::unique_ptr<SSAGraph>(new SSAGraph());
+  graph->Build(program, valid_places);
+  return graph;
+}
+TEST(pattern_matcher2, graph_test) {
+  framework::ProgramDesc program_desc;
+  std::vector<Place> places{{TARGET(kHost), PRECISION(kFloat)}};
+  auto scope = std::make_shared<Scope>();
+  auto graph = BuildGraph(&program_desc, scope, places);
+  ASSERT_EQ(graph->nodes().size(),
+            8UL /*real nodes*/ + 2UL /*feed op + fetch op*/);
+  Visualize(graph.get());
+}
+TEST(pattern_matcher2, test) {
+  framework::ProgramDesc program_desc;
+  std::vector<Place> places{{TARGET(kHost), PRECISION(kFloat)}};
+  auto scope = std::make_shared<Scope>();
+  auto graph = BuildGraph(&program_desc, scope, places);
+  const int num_nodes = graph->nodes().size();
+  FcFuser fuser;
+  fuser(graph.get());
+  ASSERT_EQ(graph->nodes().size(),
+            num_nodes - 3UL /*nodes removed */ + 1UL /* fused fc node*/);
+}
+}  // namespace mir
+}  // namespace lite
+}  // namespace paddle
+USE_LITE_OP(fc);
+USE_LITE_OP(mul);
+USE_LITE_OP(elementwise_add);
--- a/paddle/fluid/lite/core/mir/pattern_matcher_test.cc
+++ b/paddle/fluid/lite/core/mir/pattern_matcher_test.cc
+// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include "paddle/fluid/lite/core/mir/pattern_matcher.h"
+#include <gtest/gtest.h>
+namespace paddle {
+namespace lite {
+namespace mir {
+void BuildGraph(SSAGraph* g) {
+  g->mutable_nodes().emplace_back();
+  Node& o1 = g->mutable_nodes().back();
+  o1.AsStmt().op_type = "op1";
+  g->mutable_nodes().emplace_back();
+  Node& o2 = g->mutable_nodes().back();
+  o2.AsStmt().op_type = "op2";
+  g->mutable_nodes().emplace_back();
+  Node& o3 = g->mutable_nodes().back();
+  o3.AsStmt().op_type = "op3";
+  g->mutable_nodes().emplace_back();
+  Node& o4 = g->mutable_nodes().back();
+  o4.AsStmt().op_type = "op4";
+  g->mutable_nodes().emplace_back();
+  Node& o5 = g->mutable_nodes().back();
+  o5.AsStmt().op_type = "op5";
+  g->mutable_nodes().emplace_back();
+  Node& v1 = g->mutable_nodes().back();
+  v1.AsArg("var1");
+  g->mutable_nodes().emplace_back();
+  Node& v2 = g->mutable_nodes().back();
+  v2.AsArg("var2");
+  g->mutable_nodes().emplace_back();
+  Node& v3 = g->mutable_nodes().back();
+  v3.AsArg("var3");
+  g->mutable_nodes().emplace_back();
+  Node& v4 = g->mutable_nodes().back();
+  v4.AsArg("var4");
+  // o1->v1->o2
+  o1.outlinks.push_back(&v1);
+  o2.inlinks.push_back(&v1);
+  v1.inlinks.push_back(&o1);
+  v1.outlinks.push_back(&o2);
+  // o2->v2->o3
+  // o2->v2->o4
+  o2.outlinks.push_back(&v2);
+  o3.inlinks.push_back(&v2);
+  o4.inlinks.push_back(&v2);
+  v2.inlinks.push_back(&o2);
+  v2.outlinks.push_back(&o3);
+  v2.outlinks.push_back(&o4);
+  // o2->v3->o5
+  o2.outlinks.push_back(&v3);
+  o5.inlinks.push_back(&v3);
+  v3.inlinks.push_back(&o2);
+  v3.outlinks.push_back(&o5);
+  // o3-v4->o5
+  o3.outlinks.push_back(&v4);
+  o5.inlinks.push_back(&v4);
+  v4.inlinks.push_back(&o3);
+  v4.outlinks.push_back(&o5);
+}
+TEST(PMPattern, NewNode) {
+  PMPattern x;
+  auto* n = x.NewNode([](const Node* x) { return true; });
+  ASSERT_TRUE(n);
+  ASSERT_EQ(x.nodes_.size(), 1UL);
+}
+TEST(PMPattern, AddEdge) {
+  PMPattern x;
+  auto* a = x.NewNode([](const Node* x) { return true; });
+  auto* b = x.NewNode([](const Node* x) { return true; });
+  ASSERT_TRUE(a);
+  ASSERT_TRUE(b);
+  x.AddEdge(a, b);
+  ASSERT_EQ(x.nodes_.size(), 2UL);
+  ASSERT_EQ(x.edges_.size(), 1UL);
+  ASSERT_EQ(x.edges_.front().first, a);
+  ASSERT_EQ(x.edges_.front().second, b);
+  ASSERT_EQ(x.nodes().size(), 2UL);
+  ASSERT_EQ(x.edges().size(), 1UL);
+  ASSERT_EQ(x.edges().front().first, a);
+  ASSERT_EQ(x.edges().front().second, b);
+}
+TEST(PatternMatcher, MarkPMNodesInGraph) {
+  PatternMatcher x;
+  // mark o2, o3, v2
+  // The pattern is a graph:
+  //   o2(a node named o2) -> v2(a node named v2)
+  //   v2 -> o3(a node named o3)
+  auto* o2 = x.pattern_.NewNode([](const Node* node) {
+    // The teller can be any condition, such as op type, or variable's shape.
+    return node && node->IsStmt() && node->stmt()->op_type == "op2";
+  });
+  auto* o3 = x.pattern_.NewNode([](const Node* node) {
+    // The teller can be any condition, such as op type, or variable's shape.
+    return node && node->IsStmt() && node->stmt()->op_type == "op3";
+  });
+  auto* v2 = x.pattern_.NewNode([](const Node* node) {
+    // The teller can be any condition, such as op type, or variable's shape.
+    return node && node->IsArg() && node->arg()->name == "var2";
+  });
+  ASSERT_FALSE(o2->Tell(nullptr));
+  ASSERT_FALSE(o3->Tell(nullptr));
+  ASSERT_FALSE(v2->Tell(nullptr));
+  x.pattern_.AddEdge(o2, v2);
+  x.pattern_.AddEdge(v2, o3);
+  ASSERT_EQ(x.pattern_.edges().size(), 2UL);
+  ASSERT_EQ(x.pattern_.edges()[0].first, o2);
+  ASSERT_EQ(x.pattern_.edges()[0].second, v2);
+  ASSERT_EQ(x.pattern_.edges()[1].first, v2);
+  ASSERT_EQ(x.pattern_.edges()[1].second, o3);
+  SSAGraph graph;
+  BuildGraph(&graph);
+  x.MarkPMNodesInGraph(&graph);
+  ASSERT_EQ(x.pmnodes2nodes_.size(), 3UL);
+  auto subgraphs = x.DetectPatterns();
+  ASSERT_EQ(subgraphs.size(), 1UL);
+}
+TEST(PatternMatcher, MultiSubgraph) {
+  SSAGraph graph;
+  BuildGraph(&graph);
+  PatternMatcher x;
+  // The pattern is a graph:
+  //   op -> var
+  auto* any_op = x.mutable_pattern()->NewNode(
+      [](const Node* node) {
+        return node->IsStmt() && (node->stmt()->op_type == "op2" ||
+                                  node->stmt()->op_type == "op3");
+      },
+      "OP0");
+  auto* any_var =
+      x.mutable_pattern()
+          ->NewNode([](const Node* node) { return node->IsArg(); }, "VAR")
+          ->AsIntermediate();
+  auto* any_op1 = x.mutable_pattern()->NewNode(
+      [](const Node* node) { return node->IsStmt(); }, "OP1");
+  x.mutable_pattern()->AddEdge(any_op, any_var);
+  x.mutable_pattern()->AddEdge(any_var, any_op1);
+  int count = 0;
+  PatternMatcher::handle_t handle = [&](const PatternMatcher::subgraph_t& s,
+                                        SSAGraph* g) {
+    LOG(INFO) << "Detect " << s.at(any_op)->stmt()->op_type << " -> "
+              << s.at(any_var)->arg()->name << " -> "
+              << s.at(any_op1)->stmt()->op_type;
+    count++;
+  };
+  x(&graph, handle);
+  // 1. Detect op3 -> var4 -> op5
+  // 2. Detect op2 -> var2 -> op3
+  // 3. Detect op2 -> var2 -> op4
+  // 4. Detect op2 -> var3 -> op5
+  // But 2 and 3 and 4 overlapped, so keep 2, so the final choices are 1 and 2
+  ASSERT_GE(count, 1);
+  ASSERT_LE(count, 2);
+}
+TEST(PatternMatcher, IntermediateCheck) {
+  SSAGraph graph;
+  BuildGraph(&graph);
+  // o2->v2->o3
+  // o2->v2->o4
+  // check o2+o3 fuse, should fail because v2 also link to o4.
+  PatternMatcher matcher;
+  auto* op2 = matcher.mutable_pattern()->NewNode(
+      [](const Node* x) {
+        return x && x->IsStmt() && x->stmt()->op_type == "op2";
+      },
+      "op2");
+  auto* op3 = matcher.mutable_pattern()->NewNode(
+      [](const Node* x) {
+        return x && x->IsStmt() && x->stmt()->op_type == "op3";
+      },
+      "op3");
+  auto* v2 = matcher.mutable_pattern()
+                 ->NewNode(
+                     [](const Node* x) {
+                       return x && x->IsArg() && x->arg()->name == "var2";
+                     },
+                     "var2")
+                 ->AsIntermediate();
+  v2->LinksFrom({op2}).LinksTo({op3});
+  int count = 0;
+  matcher(&graph, [&](const PatternMatcher::subgraph_t& g, SSAGraph* graph) {
+    ++count;
+  });
+  EXPECT_EQ(count, 0);
+  count = 0;
+  v2->AsInput();
+  matcher(&graph, [&](const PatternMatcher::subgraph_t& g, SSAGraph* graph) {
+    ++count;
+  });
+  ASSERT_EQ(count, 1);
+}
+}  // namespace mir
+}  // namespace lite
+}  // namespace paddle
--- a/paddle/fluid/lite/gen_code/CMakeLists.txt
+++ b/paddle/fluid/lite/gen_code/CMakeLists.txt
+lite_cc_library(gen_code_lite SRCS gen_code.cc
+        DEPS program_lite op_lite scope_lite
+        cpp_op_desc_lite
+        HVY_DEPS operator)
+lite_cc_library(paddle_infer_gencode SRCS paddle_infer.cc DEPS program_lite utils_lite)
+if (NOT LITE_WITH_LIGHT_WEIGHT_FRAMEWORK)
+    lite_cc_test(test_gen_code_lite SRCS gen_code_test.cc DEPS gen_code_lite ${tensor_lite}
+            mul_op_lite
+            compatible_pb_lite
+            model_parser_lite
+            X86_DEPS mul_compute_x86
+            ARM_DEPS mul_compute_arm
+            ARGS --optimized_model=${LITE_MODEL_DIR}/lite_naive_model_opt SERIAL)
+    lite_cc_library(__generated_code__
+        SRCS ${CMAKE_BINARY_DIR}/paddle/fluid/lite/gen_code/__generated_code__.cc
+        DEPS scope_lite op_lite kernel_lite paddle_infer_gencode
+    )
+    lite_cc_test(test_generated_code SRCS generated_code_test.cc DEPS __generated_code__
+      ${ops_lite} ${host_kernels}
+      X86_DEPS ${x86_kernels}
+      )
+    add_dependencies(__generated_code__ test_gen_code_lite)
+endif()
--- a/paddle/fluid/lite/gen_code/gen_code.cc
+++ b/paddle/fluid/lite/gen_code/gen_code.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include "paddle/fluid/lite/gen_code/gen_code.h"
+#include <algorithm>
+#include <string>
+#include <vector>
+namespace paddle {
+namespace lite {
+namespace gencode {
+void Module::AddWeight(const std::string &name, const TensorRepr &tensor) {
+  auto w_name = WeightUniqueName();
+  Line(string_format("// Create weight: %s", name.c_str()));
+  // auto* w0 = scope.Var("w0")->GetMutable<lite::Tensor>();
+  Line(string_format("auto* %s = scope->Var(%s)->GetMutable<lite::Tensor>();",
+                     w_name.c_str(), Repr(name).c_str()));
+  // lite::DDim w_ddim({1, 2})
+  Line(string_format("lite::DDim %s_ddim(std::vector<int64_t>(%s));",
+                     w_name.c_str(), tensor.ddim.repr().c_str()));
+  // std::vector<float> w_data({});
+  auto w_data_repr = DataRepr(
+      std::string(static_cast<const char *>(tensor.raw_data), tensor.num_bytes),
+      tensor.dtype);
+  Line(string_format("std::vector<%s> %s_data({%s});",
+                     PrecisionToStr(tensor.dtype).c_str(), w_name.c_str(),
+                     w_data_repr.c_str()));
+  // w0->Assign<float, lite::DDim, TARGET(kX86)>(w0_data.data(), w0_ddim);
+  Line(string_format(
+      "%s->Assign<%s, lite::DDim, TARGET(kX86)>(%s_data.data(), %s_ddim);",
+      w_name.c_str(), PrecisionToStr(tensor.dtype).c_str(), w_name.c_str(),
+      w_name.c_str()));
+  Line("");
+}
+void Module::AddHeaderIncludeGenCode() {
+  Line("");
+  Line("#include <string>");
+  Line("#include <vector>");
+  Line("#include \"paddle/fluid/lite/core/compatible_tensor.h\"");
+  Line("#include \"paddle/fluid/lite/core/context.h\"");
+  Line("#include \"paddle/fluid/lite/gen_code/paddle_infer.h\"");
+  Line("#include \"paddle/fluid/lite/core/op_registry.h\"");
+  Line("#include \"paddle/fluid/lite/core/scope.h\"");
+  Line("#include \"paddle/fluid/lite/model_parser/cpp/op_desc.h\"");
+  Line("");
+  Line("");
+}
+std::string Module::DataRepr(const std::string &raw_data, PrecisionType dtype) {
+  std::stringstream ss;
+  switch (dtype) {
+    case PRECISION(kFloat): {
+      const float *raw = reinterpret_cast<const float *>(raw_data.c_str());
+      int num_elems = raw_data.size() / sizeof(float);
+      if (num_elems) {
+        for (int i = 0; i < num_elems - 1; i++) {
+          ss << raw[i] << ",";
+        }
+        ss << raw[num_elems - 1];
+      }
+    } break;
+    default:
+      LOG(FATAL) << "Unsupported type " << PrecisionToStr(dtype);
+  }
+  return ss.str();
+}
+void Module::AddOpDescHelper(const std::string &op_id,
+                             const cpp::OpDesc &desc) {
+  std::string desc_var = op_id + "_desc";
+  Line(string_format("lite::cpp::OpDesc %s;", desc_var.c_str()));
+  auto vec_str_repr = [](const std::vector<std::string> &vec) {
+    return Repr(vec);
+  };
+  for (auto &item : desc.inputs()) {
+    Line(string_format("%s.SetInput(%s, %s);", desc_var.c_str(),
+                       Repr(item.first).c_str(),
+                       vec_str_repr(item.second).c_str()));
+  }
+  for (auto &item : desc.outputs()) {
+    Line(string_format("%s.SetOutput(%s, %s);", desc_var.c_str(),
+                       Repr(item.first).c_str(),
+                       vec_str_repr(item.second).c_str()));
+  }
+  auto attr_repr = [&](const std::string &name) -> std::string {
+    using AttrType = OpDescAPI::AttrType;
+    auto type = desc.GetAttrType(name);
+    switch (type) {
+      case AttrType::INT:
+        return std::to_string(desc.GetAttr<int>(name));
+      case AttrType::FLOAT:
+        return std::to_string(desc.GetAttr<float>(name));
+      case AttrType::BOOLEAN:
+        return std::to_string(desc.GetAttr<bool>(name));
+      case AttrType::STRING:
+        return "\"" + desc.GetAttr<std::string>(name) + "\"";
+      case AttrType::STRINGS: {
+        std::vector<std::string> tmp;
+        auto vals = desc.GetAttr<std::vector<std::string>>(name);
+        std::transform(vals.begin(), vals.end(), std::back_inserter(tmp),
+                       [](const std::string &x) { return Repr(x); });
+        return "{" + Join(tmp, ",") + "}";
+      }
+      default:
+        LOG(FATAL) << "Unsupported attribute type: " << static_cast<int>(type);
+    }
+    return "";
+  };
+  auto attr_type_repr = [&](const std::string &name) -> std::string {
+    using AttrType = OpDescAPI::AttrType;
+    auto type = desc.GetAttrType(name);
+    switch (type) {
+      case AttrType::INT:
+        return "int";
+      case AttrType::FLOAT:
+        return "float";
+      case AttrType::BOOLEAN:
+        return "bool";
+      case AttrType::STRING:
+        return "std::string";
+      case AttrType::STRINGS:
+        return "std::vector<std::string>";
+      default:
+        LOG(FATAL) << "Unsupported attribute type: " << static_cast<int>(type);
+    }
+    return "unk_t";
+  };
+  for (auto &item : desc.AttrNames()) {
+    // Drop the python information.
+    if (item == "op_callstack") continue;
+    auto attr_type = attr_type_repr(item);
+    auto attr_val = attr_repr(item);
+    Line(string_format("%s.SetAttr<%s>(%s, %s);",  //
+                       desc_var.c_str(), attr_type.c_str(), Repr(item).c_str(),
+                       attr_val.c_str()));
+  }
+}
+void Module::AddOp(const cpp::OpDesc &op) {
+  auto op_name = OpUniqueName();
+  AddOpDescHelper(op_name, op);
+  Line(string_format("// Create Op: %s", op.Type().c_str()));
+  Line(string_format("auto %s = lite::LiteOpRegistry::Global().Create(\"%s\");",
+                     op_name.c_str(), op.Type().c_str()));
+  CHECK(op.HasAttr(kKernelTypeAttr))
+      << "the kernel type should be specified before generate code.";
+  auto kernel_type = op.GetAttr<std::string>(kKernelTypeAttr);
+  Line(string_format("%s->Attach(%s, exec_scope);", op_name.c_str(),
+                     (op_name + "_desc").c_str()));
+  // Create kernel
+  auto kernel_name = KernelUniqueName();
+  Line(string_format(
+      "auto %s = std::move(%s->CreateKernels(valid_places, \"%s\").front());",
+      kernel_name.c_str(), op_name.c_str(), kernel_type.c_str()));
+  // Set Context for kernel
+  // clang-format off
+  Line(string_format("%s->SetContext(lite::ContextScheduler::Global().NewContext(%s->target()));", kernel_name.c_str(), kernel_name.c_str()));  // NOLINT
+  // clang-format on
+  Line(string_format("ops.push_back(%s);", op_name.c_str()));
+  Line(string_format("kernels.push_back(std::move(%s));", kernel_name.c_str()));
+  op_kinds_.insert(op.Type());
+  kernel_kinds_.insert(kernel_type);
+}
+}  // namespace gencode
+}  // namespace lite
+}  // namespace paddle
--- a/paddle/fluid/lite/gen_code/gen_code.h
+++ b/paddle/fluid/lite/gen_code/gen_code.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#pragma once
+#include <set>
+#include <string>
+#include <vector>
+#include "paddle/fluid/lite/core/compatible_tensor.h"
+#include "paddle/fluid/lite/core/framework.pb.h"
+#include "paddle/fluid/lite/core/program.h"
+#include "paddle/fluid/lite/core/target_wrapper.h"
+#include "paddle/fluid/lite/model_parser/cpp/op_desc.h"
+#include "paddle/fluid/lite/model_parser/desc_apis.h"
+#include "paddle/fluid/lite/utils/string.h"
+namespace paddle {
+namespace lite {
+namespace gencode {
+struct TensorRepr {
+  TensorRepr() = default;
+  TensorRepr(PrecisionType dtype, const std::vector<int64_t> &ddim,
+             void *raw_data, size_t num_bytes)
+      : dtype(dtype), ddim(ddim), raw_data(raw_data), num_bytes(num_bytes) {}
+  PrecisionType dtype;
+  lite::DDim ddim;
+  const void *raw_data;
+  size_t num_bytes{};
+};
+class Module {
+  std::vector<cpp::OpDesc> ops;
+  std::vector<TensorRepr> weights;
+  std::vector<std::string> tmp_vars_;
+  std::stringstream stream_;
+  std::set<std::string> kernel_kinds_;
+  std::set<std::string> op_kinds_;
+  int line_indent_{};
+  const int indent_unit_{2};
+ public:
+  void NewOp(const cpp::OpDesc &desc) { ops.push_back(desc); }
+  void NewWeight(const TensorRepr &x) { weights.push_back(x); }
+  void NewTmpVar(const std::string &x) { tmp_vars_.push_back(x); }
+  std::stringstream &stream() { return stream_; }
+  void AddHeaderIncludeGenCode();
+  void AddNamespaceBegin() {
+    Line("namespace paddle {");
+    Line("namespace gencode{");
+    Line("");
+  }
+  void AddNamespaceEnd() {
+    Line("");
+    Line("}  // namespace gencode");
+    Line("}  // namespace paddle");
+  }
+  void AddInitFuncBegin() {
+    Line("void PaddlePredictor::Init() {");
+    Line("");
+    IncIndent();
+  }
+  void AddInitFuncEnd() {
+    DecIndent();
+    Line("");
+    Line("}");
+  }
+  void AddScopeDecl() {
+    Line("lite::Scope* scope = static_cast<lite::Scope*>(raw_scope_);");
+    // clang-format off
+    Line("lite::Scope* exec_scope = static_cast<lite::Scope*>(raw_exe_scope_);");  // NOLINT
+    // clang-format on
+    // Create feed and fetch in exec_scope.
+    Line(string_format("exec_scope->Var(%s);", Repr("feed").c_str()));
+    Line(string_format("exec_scope->Var(%s);", Repr("fetch").c_str()));
+  }
+  void AddValidPlaceDecl() {
+    // clang-format off
+    Line("std::vector<lite::Place> valid_places({lite::Place({TARGET(kX86), PRECISION(kFloat), DATALAYOUT(kNCHW)}), lite::Place({TARGET(kHost), PRECISION(kAny), DATALAYOUT(kAny)})});");  // NOLINT
+    // clang-format on
+  }
+  void AddMemberCast() {
+    Line("// Cast the raw members");
+    // clang-format off
+    Line(string_format("auto& ops = *static_cast<std::vector<std::shared_ptr<lite::OpLite>>*>(raw_ops_);"));  // NOLINT
+    Line(string_format("auto& kernels = *static_cast<std::vector<std::unique_ptr<lite::KernelBase>>*>(raw_kernels_);"));  // NOLINT
+    // clang-format on
+    Line("");
+  }
+  void AddWeight(const std::string &name, const TensorRepr &tensor);
+  void AddTmpVar(const std::string &x) {
+    Line(string_format("// Create temporary variable: %s", x.c_str()));
+    Line(string_format("exec_scope->Var(%s);", Repr(x).c_str()));
+    Line("");
+  }
+  void AddOp(const cpp::OpDesc &op);
+  void AddOpDescHelper(const std::string &op_id, const cpp::OpDesc &desc);
+  void AddOpCompileDeps() {
+    Line("");
+    Line("// Add Operator compile deps");
+    for (auto &op_type : op_kinds_) {
+      Line(string_format("USE_LITE_OP(%s)", op_type.c_str()));
+    }
+    Line("");
+  }
+  void AddKernelCompileDeps() {
+    Line("// Add Kernel compile deps");
+    std::string op_type, alias;
+    Place place;
+    for (auto &kernel_type : kernel_kinds_) {
+      KernelBase::ParseKernelType(kernel_type, &op_type, &alias, &place);
+      Line(string_format("USE_LITE_KERNEL(%s, %s, %s, %s, %s)",  //
+                         op_type.c_str(),                        //
+                         TargetRepr(place.target).c_str(),
+                         PrecisionRepr(place.precision).c_str(),
+                         DataLayoutRepr(place.layout).c_str(), alias.c_str()));
+    }
+  }
+ private:
+  std::string WeightUniqueName() const {
+    return "w_" + std::to_string(weight_counter_++);
+  }
+  std::string TmpVarUniqueName() const {
+    return "tmp_" + std::to_string(tmp_var_counter_++);
+  }
+  std::string OpUniqueName() const {
+    return "op_" + std::to_string(op_counter_++);
+  }
+  std::string KernelUniqueName() const {
+    return "kernel_" + std::to_string(kernel_counter_++);
+  }
+  std::string DataRepr(const std::string &raw_data, PrecisionType dtype);
+  void IncIndent() { line_indent_++; }
+  void DecIndent() { line_indent_--; }
+  void Line(const std::string &x) {
+    std::string indent_str(line_indent_ * indent_unit_, ' ');
+    stream() << indent_str << x << "\n";
+  }
+ private:
+  mutable int weight_counter_{};
+  mutable int tmp_var_counter_{};
+  mutable int op_counter_{};
+  mutable int kernel_counter_{};
+};
+class ProgramCodeGenerator {
+ public:
+  ProgramCodeGenerator(const framework::proto::ProgramDesc &program,
+                       const lite::Scope &exec_scope)
+      : program_(program), exec_scope_(exec_scope) {
+    LOG(INFO) << program.DebugString();
+  }
+  std::string GenCode() {
+    Module m;
+    m.AddHeaderIncludeGenCode();
+    m.AddNamespaceBegin();
+    m.AddInitFuncBegin();
+    m.AddMemberCast();
+    m.AddScopeDecl();
+    m.AddValidPlaceDecl();
+    AddWeights(&m);
+    AddTmpVars(&m);
+    AddOps(&m);
+    m.AddInitFuncEnd();
+    m.AddNamespaceEnd();
+    m.AddOpCompileDeps();
+    m.AddKernelCompileDeps();
+    return m.stream().str();
+  }
+  void AddWeights(Module *m) {
+    for (auto &var : program_.blocks(0).vars()) {
+      if (var.persistable()) {
+        auto name = var.name();
+        if (name == "feed" || name == "fetch") continue;
+        const auto &tensor = exec_scope_.FindVar(name)->Get<lite::Tensor>();
+        TensorRepr repr;
+        TensorToRepr(tensor, &repr);
+        m->AddWeight(name, repr);
+      }
+    }
+  }
+  void AddTmpVars(Module *m) {
+    for (auto &var : program_.blocks(0).vars()) {
+      if (!var.persistable()) {
+        m->AddTmpVar(var.name());
+      }
+    }
+  }
+  void AddOps(Module *m) {
+    for (auto &op : program_.blocks(0).ops()) {
+      pb::OpDesc pb_desc(op);
+      cpp::OpDesc cpp_desc;
+      TransformOpDescPbToCpp(pb_desc, &cpp_desc);
+      m->AddOp(cpp_desc);
+    }
+  }
+ private:
+  void TensorToRepr(const lite::Tensor &tensor, TensorRepr *repr) {
+    repr->ddim = tensor.dims();
+    // TODO(Superjomn) support other types.
+    repr->dtype = PRECISION(kFloat);
+    repr->raw_data = tensor.data<float>();
+    repr->num_bytes = repr->ddim.production() * sizeof(float);
+  }
+ private:
+  const framework::proto::ProgramDesc &program_;
+  const lite::Scope &exec_scope_;
+};
+}  // namespace gencode
+}  // namespace lite
+}  // namespace paddle
--- a/paddle/fluid/lite/gen_code/gen_code_test.cc
+++ b/paddle/fluid/lite/gen_code/gen_code_test.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include "paddle/fluid/lite/gen_code/gen_code.h"
+#include <gflags/gflags.h>
+#include <gtest/gtest.h>
+#include <fstream>
+#include <string>
+#include <utility>
+#include <vector>
+#include "paddle/fluid/lite/core/compatible_tensor.h"
+#include "paddle/fluid/lite/core/context.h"
+#include "paddle/fluid/lite/core/op_registry.h"
+#include "paddle/fluid/lite/core/scope.h"
+#include "paddle/fluid/lite/model_parser/cpp/op_desc.h"
+#include "paddle/fluid/lite/model_parser/model_parser.h"
+DEFINE_string(optimized_model, "", "");
+DEFINE_string(generated_code_file, "__generated_code__.cc", "");
+namespace paddle {
+namespace lite {
+namespace gencode {
+// Manually construct a program.
+TEST(gen_code, manual) {
+  // For holding the weights.
+  lite::Scope scope;
+  // For holding the temporary variables.
+  auto &tmp_scope = scope.NewScope();
+  // Create weight variables.
+  auto *w0 = scope.Var("w0")->GetMutable<lite::Tensor>();
+  // Create temporary variables.
+  auto *a = tmp_scope.Var("x")->GetMutable<lite::Tensor>();
+  tmp_scope.Var("out")->GetMutable<lite::Tensor>();
+  // Set weights.
+  std::vector<float> w0_data({0, 1, 2, 3});
+  w0->Assign<float, lite::DDim, TARGET(kX86)>(
+      w0_data.data(), lite::DDim{std::vector<int64_t>({2, 2})});
+  std::vector<float> a_data({0, 1, 2, 3});
+  a->Assign<float, lite::DDim, TARGET(kX86)>(
+      a_data.data(), lite::DDim{std::vector<int64_t>({2, 2})});
+  std::vector<Place> valid_places({
+      Place{TARGET(kX86), PRECISION(kFloat)},
+      Place{TARGET(kHost), PRECISION(kFloat)},
+      Place{TARGET(kHost), PRECISION(kAny)},
+  });
+  auto mul_op = LiteOpRegistry::Global().Create("mul");
+  cpp::OpDesc mul_op_desc;
+  mul_op_desc.SetType("mul");
+  mul_op_desc.SetInput("X", {"x"});
+  mul_op_desc.SetInput("Y", {"w0"});
+  mul_op_desc.SetAttr("x_num_col_dims", 1);
+  mul_op_desc.SetAttr("y_num_col_dims", 1);
+  mul_op_desc.SetOutput("Out", {"out"});
+  mul_op->Attach(mul_op_desc, &tmp_scope);
+  auto mul_kernel = std::move(mul_op->CreateKernels(valid_places).front());
+  auto fc_ctx = ContextScheduler::Global().NewContext(TARGET(kX86));
+  mul_op->CheckShape();
+  mul_op->InferShape();
+  mul_kernel->SetContext(std::move(fc_ctx));
+  mul_kernel->Launch();
+}
+TEST(gen_code, auto_gen) {
+  std::vector<float> w0_data({0, 1, 2, 3});
+  TensorRepr w0(PRECISION(kFloat), std::vector<int64_t>({2, 2}), w0_data.data(),
+                w0_data.size() * sizeof(float));
+  std::vector<float> w1_data({0.01, 1.2, 2.3, 3.4, 1.1, 2.2});
+  TensorRepr w1(PRECISION(kFloat), std::vector<int64_t>({3, 2}), w1_data.data(),
+                w1_data.size() * sizeof(float));
+  cpp::OpDesc op0;
+  op0.SetType("mul");
+  op0.SetInput("X", {"a", "b"});
+  op0.SetOutput("Out", {"out0"});
+  op0.SetAttr<std::string>("desc", "this is a desc");
+  op0.SetAttr<int>("x_col", 1);
+  op0.SetAttr<int>("y_col", 2);
+  op0.SetAttr<std::string>(kKernelTypeAttr, "x86");
+  gencode::Module module;
+  module.AddHeaderIncludeGenCode();
+  module.AddNamespaceBegin();
+  module.AddInitFuncBegin();
+  module.AddMemberCast();
+  module.AddWeight("w0", w0);
+  module.AddWeight("w1", w1);
+  module.AddTmpVar("a");
+  module.AddTmpVar("b");
+  module.AddOp(op0);
+  module.AddInitFuncEnd();
+  module.AddNamespaceEnd();
+  LOG(INFO) << module.stream().str();
+}
+TEST(gen_code, optimized_program) {
+  lite::Scope scope;
+  framework::proto::ProgramDesc desc;
+  LoadModel(FLAGS_optimized_model, &scope, &desc);
+  ProgramCodeGenerator codegen(desc, scope);
+  std::ofstream file(FLAGS_generated_code_file);
+  file << codegen.GenCode();
+  file.close();
+}
+}  // namespace gencode
+}  // namespace lite
+}  // namespace paddle
+USE_LITE_OP(mul);
+#ifdef LITE_WITH_X86
+USE_LITE_KERNEL(mul, kX86, kFloat, kNCHW, def);
+#endif
+#ifdef LITE_WITH_ARM
+USE_LITE_KERNEL(mul, kARM, kFloat, kNCHW, def);
+#endif
--- a/paddle/fluid/lite/gen_code/generated_code_test.cc
+++ b/paddle/fluid/lite/gen_code/generated_code_test.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include <glog/logging.h>
+#include <gtest/gtest.h>
+#include "paddle/fluid/lite/gen_code/paddle_infer.h"
+namespace paddle {
+namespace lite {
+TEST(PaddlePredictor, Init) {
+  gencode::PaddlePredictor predictor;
+  predictor.Init();
+}
+TEST(PaddlePredictor, Run) {
+  gencode::PaddlePredictor predictor;
+  predictor.Init();
+  LOG(INFO) << "run the generated code";
+  auto input_tensor = predictor.GetInput(0);
+  input_tensor->Resize(std::vector<int64_t>({100, 100}));
+  auto* data = input_tensor->mutable_data<float>();
+  for (int i = 0; i < 100 * 100; i++) {
+    data[i] = i;
+  }
+  predictor.Run();
+  auto output_tensor = predictor.GetOutput(0);
+  LOG(INFO) << "output: " << output_tensor->data<float>()[0];
+}
+}  // namespace lite
+}  // namespace paddle
--- a/paddle/fluid/lite/gen_code/paddle_infer.cc
+++ b/paddle/fluid/lite/gen_code/paddle_infer.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include "paddle/fluid/lite/gen_code/paddle_infer.h"
+#include "paddle/fluid/lite/core/compatible_tensor.h"
+#include "paddle/fluid/lite/core/op_lite.h"
+namespace paddle {
+namespace gencode {
+void Tensor::Resize(const Tensor::ddim_t &shape) {
+  CHECK(raw_mutable_tensor_);
+  auto *tensor = static_cast<lite::Tensor *>(raw_mutable_tensor_);
+  tensor->Resize(shape);
+}
+#define FOR_EACH_TYPE(HANDLE) \
+  HANDLE(int);                \
+  HANDLE(float);              \
+  HANDLE(int8_t);             \
+  HANDLE(int64_t);
+#define IMPL_DATA(T)                                                     \
+  template <>                                                            \
+  const T *Tensor::data<T>() const {                                     \
+    CHECK(raw_tensor_);                                                  \
+    const auto *tensor = static_cast<const lite::Tensor *>(raw_tensor_); \
+    return tensor->data<T>();                                            \
+  }
+FOR_EACH_TYPE(IMPL_DATA);
+#undef IMPL_DATA
+#define IMPL_MUTABLE_DATA(T)                                         \
+  template <>                                                        \
+  T *Tensor::mutable_data<T>() {                                     \
+    CHECK(raw_mutable_tensor_);                                      \
+    auto *tensor = static_cast<lite::Tensor *>(raw_mutable_tensor_); \
+    return tensor->mutable_data<T>();                                \
+  }
+FOR_EACH_TYPE(IMPL_MUTABLE_DATA);
+#undef IMPL_MUTABLE_DATA
+PaddlePredictor::PaddlePredictor() {
+  raw_ops_ = new std::vector<std::shared_ptr<lite::OpLite>>;
+  raw_kernels_ = new std::vector<std::unique_ptr<lite::KernelBase>>;
+  raw_scope_ = new lite::Scope;
+  raw_exe_scope_ = &(static_cast<lite::Scope *>(raw_scope_)->NewScope());
+}
+std::unique_ptr<Tensor> PaddlePredictor::GetTensor(
+    const std::string &id) const {
+  auto *exe_scope = static_cast<lite::Scope *>(raw_exe_scope_);
+  const auto *var = exe_scope->FindVar(id);
+  const auto &tensor = var->Get<lite::Tensor>();
+  return std::unique_ptr<Tensor>(new Tensor(&tensor, nullptr));
+}
+std::unique_ptr<Tensor> PaddlePredictor::GetMutableTensor(
+    const std::string &id) {
+  auto *exe_scope = static_cast<lite::Scope *>(raw_exe_scope_);
+  auto *var = exe_scope->FindVar(id);
+  auto *tensor = var->GetMutable<lite::Tensor>();
+  return std::unique_ptr<Tensor>(new Tensor(nullptr, tensor));
+}
+#define CAST_OPS \
+  auto *ops =    \
+      static_cast<std::vector<std::shared_ptr<lite::OpLite>> *>(raw_ops_);
+#define CAST_KERNELS                                                 \
+  auto *kernels =                                                    \
+      static_cast<std::vector<std::unique_ptr<lite::KernelBase>> *>( \
+          raw_kernels_);
+#define CAST_SCOPE auto *scope = static_cast<lite::Scope *>(raw_scope_);
+PaddlePredictor::~PaddlePredictor() {
+  CAST_OPS
+  CAST_KERNELS
+  CAST_SCOPE
+  if (ops) {
+    delete ops;
+  }
+  if (kernels) {
+    delete kernels;
+  }
+  if (scope) {
+    delete scope;
+  }
+}
+void PaddlePredictor::Run() {
+  CAST_OPS
+  CAST_KERNELS
+  CHECK(ops);
+  CHECK(kernels);
+  CHECK_EQ(ops->size(), kernels->size());
+  for (size_t i = 0; i < ops->size(); i++) {
+    LOG(INFO) << "Running the " << i << "-th operator";
+    ops->at(i)->InferShape();
+    kernels->at(i)->Launch();
+  }
+}
+std::unique_ptr<Tensor> PaddlePredictor::GetInput(size_t offset) {
+  auto *exec_scope = static_cast<lite::Scope *>(raw_exe_scope_);
+  auto *_feed_list = exec_scope->FindVar("feed");
+  CHECK(_feed_list) << "no feed variable in exec_scope";
+  auto *feed_list = _feed_list->GetMutable<std::vector<lite::Tensor>>();
+  if (offset >= feed_list->size()) {
+    feed_list->resize(offset + 1);
+  }
+  return std::unique_ptr<Tensor>(new Tensor(nullptr, &feed_list->at(offset)));
+}
+std::unique_ptr<Tensor> PaddlePredictor::GetOutput(size_t offset) {
+  auto *exec_scope = static_cast<lite::Scope *>(raw_exe_scope_);
+  auto *_fetch_list = exec_scope->FindVar("fetch");
+  CHECK(_fetch_list) << "no fatch variable in exec_scope";
+  auto &fetch_list = *_fetch_list->GetMutable<std::vector<lite::Tensor>>();
+  CHECK_LT(offset, fetch_list.size()) << "offset " << offset << " overflow";
+  return std::unique_ptr<Tensor>(new Tensor(&fetch_list.at(offset), nullptr));
+}
+}  // namespace gencode
+}  // namespace paddle
--- a/paddle/fluid/lite/gen_code/paddle_infer.h
+++ b/paddle/fluid/lite/gen_code/paddle_infer.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#pragma once
+#include <memory>
+#include <string>
+#include <vector>
+namespace paddle {
+namespace gencode {
+/// Zero Copy Tensor.
+class Tensor {
+ public:
+  using ddim_t = std::vector<int64_t>;
+  Tensor(const void *raw_tensor, void *raw_mutable_tensor)
+      : raw_tensor_(raw_tensor), raw_mutable_tensor_(raw_mutable_tensor) {}
+  void Resize(const ddim_t &shape);
+  template <typename T>
+  const T *data() const;
+  template <typename T>
+  T *mutable_data();
+ private:
+  const void *raw_tensor_;
+  void *raw_mutable_tensor_{};
+};
+/*
+ * Predictor for the generated code.
+ */
+class PaddlePredictor {
+ public:
+  void Init();
+  std::unique_ptr<Tensor> GetTensor(const std::string &id) const;
+  std::unique_ptr<Tensor> GetMutableTensor(const std::string &id);
+  // Get offset-th col of feed.
+  std::unique_ptr<Tensor> GetInput(size_t offset);
+  std::unique_ptr<Tensor> GetOutput(size_t offset);
+  void Run();
+  PaddlePredictor();
+  ~PaddlePredictor();
+ private:
+  void *raw_ops_;
+  void *raw_kernels_;
+  void *raw_scope_{};
+  void *raw_exe_scope_{};  // raw_exe_scope is not owned.
+};
+}  // namespace gencode
+}  // namespace paddle
--- a/paddle/fluid/lite/kernels/x86/conv_compute.cc
+++ b/paddle/fluid/lite/kernels/x86/conv_compute.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include <Eigen/Core>
+#include <string>
+#include <vector>
+#include "paddle/fluid/framework/eigen.h"
+#include "paddle/fluid/lite/core/kernel.h"
+#include "paddle/fluid/lite/core/op_registry.h"
+#include "paddle/fluid/lite/core/types.h"
+#include "paddle/fluid/lite/operators/conv_op.h"
+#include "paddle/fluid/operators/math/blas.h"
+#include "paddle/fluid/operators/math/depthwise_conv.h"
+#include "paddle/fluid/operators/math/im2col.h"
+#include "paddle/fluid/operators/math/vol2col.h"
+namespace paddle {
+namespace lite {
+namespace kernels {
+namespace x86 {
+inline bool IsExpand(const std::vector<int64_t>& filter_dim,
+                     const std::vector<int>& strides,
+                     const std::vector<int>& paddings,
+                     const std::vector<int>& dilations) {
+  bool filter_1 = true, strides_1 = true, padding_0 = true, dilation_1 = true;
+  for (size_t j = 0; j < strides.size(); ++j) {
+    filter_1 = filter_1 && (static_cast<int>(filter_dim[j + 2]) == 1);
+    strides_1 = strides_1 && (strides[j] == 1);
+    padding_0 = padding_0 && (paddings[j] == 0);
+    dilation_1 = dilation_1 && (dilations[j] == 1);
+  }
+  return !(filter_1 && strides_1 && padding_0 && dilation_1);
+}
+template <typename T>
+class Conv2dCompute : public KernelLite<TARGET(kX86), PRECISION(kFloat)> {
+ public:
+  using param_t = operators::ConvParam;
+  void Run() override {
+    auto& param = *param_.get_mutable<operators::ConvParam>();
+    lite::Tensor filter = *param.filter;
+    param.output->template mutable_data<T>();
+    const int batch_size = static_cast<int>(param.x->dims()[0]);
+    std::vector<int64_t> filter_shape_vec(filter.dims().Vectorize());
+    std::vector<int64_t> output_shape_vec(param.output->dims().Vectorize());
+    size_t data_dim = filter_shape_vec.size() - 2;
+    std::vector<int64_t> col_shape_vec(1 + 2 * data_dim);
+    col_shape_vec[0] = param.x->dims()[1] / param.groups;
+    for (size_t j = 0; j < data_dim; ++j) {
+      col_shape_vec[j + 1] = filter_shape_vec[j + 2];
+      col_shape_vec[j + 1 + data_dim] = output_shape_vec[j + 2];
+    }
+    lite::DDim col_shape(col_shape_vec);
+    lite::DDim col_matrix_shape = col_shape.Flattern2D(data_dim + 1);
+    bool is_expand = IsExpand(filter_shape_vec, param.strides, param.paddings,
+                              param.dilations);
+    lite::Tensor col;
+    lite::Tensor col_matrix;
+    if (is_expand) {
+      col.Resize(col_shape);
+      col_matrix.ShareDataWith(col);
+      col_matrix.Resize(col_matrix_shape);
+    }
+    lite::DDim input_shape = param.x->dims().Slice(1, param.x->dims().size());
+    lite::DDim filter_matrix_shape(std::vector<int64_t>{
+        filter.dims()[0], filter.dims().production() / filter.dims()[0]});
+    filter.Resize(filter_matrix_shape);
+    lite::DDim output_matrix_shape(std::vector<int64_t>{
+        param.output->dims()[1],
+        param.output->dims().production() /
+            (param.output->dims()[0] * param.output->dims()[1])});
+    int in_step = static_cast<int>(param.x->dims()[1]) / param.groups;
+    int out_step = static_cast<int>(param.output->dims()[1]) / param.groups;
+    paddle::operators::math::Vol2ColFunctor<platform::CPUDeviceContext, T>
+        vol2col;
+    paddle::operators::math::Im2ColFunctor<
+        paddle::operators::math::ColFormat::kCFO, platform::CPUDeviceContext, T>
+        im2col;
+    auto blas = paddle::operators::math::GetBlas<platform::CPUDeviceContext, T>(
+        platform::CPUDeviceContext());
+    for (int i = 0; i < batch_size; i++) {
+      lite::Tensor in_batch;
+      in_batch.ShareDataWith(
+          param.x->raw_tensor().Slice(i, i + 1).Resize(input_shape.data()));
+      lite::Tensor out_batch;
+      out_batch.ShareDataWith(param.output->raw_tensor().Slice(i, i + 1).Resize(
+          input_shape.data()));
+      for (int g = 0; g < param.groups; g++) {
+        lite::Tensor in_slice;
+        in_slice.ShareDataWith(
+            in_batch.raw_tensor().Slice(g * in_step, (g + 1) * in_step));
+        if (!is_expand) {
+          col.ShareDataWith(in_slice);
+          col_matrix.ShareDataWith(col);
+          col_matrix.Resize(col_matrix_shape);
+        } else if (data_dim == 2U) {
+          // im2col
+          im2col(platform::CPUDeviceContext(), in_slice.raw_tensor(),
+                 param.dilations, param.strides,
+                 std::vector<int>{param.paddings[0], param.paddings[1],
+                                  param.paddings[0], param.paddings[1]},
+                 &(col.raw_tensor()));
+        } else if (data_dim == 3U) {
+          // vol2col
+          vol2col(platform::CPUDeviceContext(), in_slice.raw_tensor(),
+                  param.dilations, param.strides, param.paddings,
+                  &(col.raw_tensor()));
+        }
+        // gemm
+        lite::Tensor out_slice;
+        out_slice.ShareDataWith(
+            out_batch.raw_tensor().Slice(g * out_step, (g + 1) * out_step));
+        lite::Tensor filter_slice;
+        filter_slice.ShareDataWith(
+            filter.raw_tensor().Slice(g * out_step, (g + 1) * out_step));
+        blas.MatMul(filter_slice.raw_tensor(), false, col_matrix.raw_tensor(),
+                    false, T(1.0), &(out_slice.raw_tensor()), T(0.0));
+      }
+    }
+  }
+  virtual ~Conv2dCompute() = default;
+};
+}  // namespace x86
+}  // namespace kernels
+}  // namespace lite
+}  // namespace paddle
+REGISTER_LITE_KERNEL(conv2d, kX86, kFloat, kNCHW,
+                     paddle::lite::kernels::x86::Conv2dCompute<float>, def)
+    .BindInput("Input", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindInput("Filter", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindInput("Bias", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindInput("Input", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindOutput("Output", {LiteType::GetTensorTy(TARGET(kX86))})
+    .Finalize();
+REGISTER_LITE_KERNEL(depthwise_conv2d, kX86, kFloat, kNCHW,
+                     paddle::lite::kernels::x86::Conv2dCompute<float>, def)
+    .BindInput("Input", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindInput("Filter", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindInput("Bias", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindInput("Input", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindOutput("Output", {LiteType::GetTensorTy(TARGET(kX86))})
+    .Finalize();
--- a/paddle/fluid/lite/kernels/x86/pool_compute.cc
+++ b/paddle/fluid/lite/kernels/x86/pool_compute.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include <Eigen/Core>
+#include "paddle/fluid/framework/eigen.h"
+#include "paddle/fluid/lite/core/kernel.h"
+#include "paddle/fluid/lite/core/op_registry.h"
+#include "paddle/fluid/lite/core/types.h"
+#include "paddle/fluid/operators/math/math_function.h"
+#include "paddle/fluid/operators/math/pooling.h"
+namespace paddle {
+namespace lite {
+namespace kernels {
+namespace x86 {
+template <typename T>
+class PoolCompute : public KernelLite<TARGET(kX86), PRECISION(kFloat)> {
+ public:
+  using param_t = operators::PoolParam;
+  void Run() override {
+    auto& param = *param_.get_mutable<param_t>();
+    if (param.global_pooling) {
+      for (size_t i = 0; i < param.ksize.size(); ++i) {
+        param.paddings[i] = 0;
+        param.ksize[i] = static_cast<int>(param.x->dims()[i + 2]);
+      }
+    }
+    switch (param.ksize.size()) {
+      case 2: {
+        if (param.pooling_type == "max") {
+          paddle::operators::math::Pool2dFunctor<
+              platform::CPUDeviceContext, paddle::operators::math::MaxPool<T>,
+              T>
+              pool2d_forward;
+          paddle::operators::math::MaxPool<T> pool_process;
+          pool2d_forward(platform::CPUDeviceContext(), param.x->raw_tensor(),
+                         param.ksize, param.strides, param.paddings,
+                         pool_process, true, false,
+                         &(param.output->raw_tensor()));
+        } else if (param.pooling_type == "avg") {
+          paddle::operators::math::Pool2dFunctor<
+              platform::CPUDeviceContext, paddle::operators::math::AvgPool<T>,
+              T>
+              pool2d_forward;
+          paddle::operators::math::AvgPool<T> pool_process;
+          pool2d_forward(platform::CPUDeviceContext(), param.x->raw_tensor(),
+                         param.ksize, param.strides, param.paddings,
+                         pool_process, param.exclusive, param.adaptive,
+                         &(param.output->raw_tensor()));
+        }
+      } break;
+      case 3: {
+      } break;
+    }
+  }
+  virtual ~PoolCompute() = default;
+};
+}  // namespace x86
+}  // namespace kernels
+}  // namespace lite
+}  // namespace paddle
+REGISTER_LITE_KERNEL(pool2d, kX86, kFloat, kNCHW,
+                     paddle::lite::kernels::x86::PoolCompute<float>, def)
+    .BindInput("X", {LiteType::GetTensorTy(TARGET(kX86))})
+    .BindOutput("Out", {LiteType::GetTensorTy(TARGET(kX86))})
+    .Finalize();
--- a/paddle/fluid/lite/utils/string.cc
+++ b/paddle/fluid/lite/utils/string.cc
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#include "paddle/fluid/lite/utils/string.h"
+namespace paddle {
+namespace lite {}  // namespace lite
+}  // namespace paddle
--- a/paddle/fluid/lite/utils/string.h
+++ b/paddle/fluid/lite/utils/string.h
+// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+#pragma once
+#include <stdarg.h>  // For va_start, etc.
+#include <algorithm>
+#include <cstring>
+#include <memory>  // For std::unique_ptr
+#include <sstream>
+#include <string>
+#include <vector>
+namespace paddle {
+namespace lite {
+static std::string string_format(const std::string fmt_str, ...) {
+  /* Reserve two times as much as the length of the fmt_str */
+  int final_n, n = (static_cast<int>(fmt_str.size())) * 2;
+  std::unique_ptr<char[]> formatted;
+  va_list ap;
+  while (1) {
+    formatted.reset(
+        new char[n]); /* Wrap the plain char array into the unique_ptr */
+    std::strcpy(&formatted[0], fmt_str.c_str());  // NOLINT
+    va_start(ap, fmt_str);
+    final_n = vsnprintf(&formatted[0], n, fmt_str.c_str(), ap);
+    va_end(ap);
+    if (final_n < 0 || final_n >= n)
+      n += abs(final_n - n + 1);
+    else
+      break;
+  }
+  return std::string(formatted.get());
+}
+template <typename T>
+static std::string to_string_with_precision(const T& v, const int n = 6) {
+  std::stringstream ss;
+  ss.precision(n);
+  ss << std::fixed << v;
+  return ss.str();
+}
+static std::string Join(const std::vector<std::string>& vec,
+                        const std::string& delim) {
+  if (vec.empty()) return "";
+  std::stringstream ss;
+  for (size_t i = 0; i < vec.size() - 1; i++) ss << vec[i] << delim;
+  if (!vec.empty()) {
+    ss << vec.back();
+  }
+  return ss.str();
+}
+static std::string Repr(const std::string& x) { return "\"" + x + "\""; }
+static std::string Repr(const std::vector<std::string>& v) {
+  std::vector<std::string> tmp;
+  std::transform(v.begin(), v.end(), std::back_inserter(tmp),
+                 [](const std::string& x) { return Repr(x); });
+  return "{" + Join(tmp, ",") + "}";
+}
+}  // namespace lite
+}  // namespace paddle