fastdeploy_en.md

## 0. FastDeploy

FastDeploy is an Easy-to-use and High Performance AI model deployment toolkit for Cloud, Mobile and Edge with out-of-the-box and unified experience, end-to-end optimization for over 150+ Text, Vision, Speech and Cross-modal AI models. FastDeploy Supports AI model deployment on
**X86 CPU、NVIDIA GPU、ARM CPU、XPU、NPU、IPU** etc. You can switch different inference backends and hardware with a single line of code.

Deploying AI model in 3 steps with FastDeploy: (1)Install FastDeploy SDK;  (2)Use FastDeploy's API to implement the deployment code;  (3) Deploy.

**Notes** : This document downloads FastDeploy examples to complete the high performance deployment experience; only X86 CPUs, NVIDIA GPUs are shown for reasoning and GPU environments are ready by default (e.g. CUDA >= 11.2, etc.), if you need to deploy AI model on other hardware or learn about FastDeploy's full capabilities, please refer to [FastDeploy GitHub](https://github.com/PaddlePaddle/FastDeploy).

## 1. Install FastDeploy SDK
```
pip install fastdeploy-gpu-python==0.0.0 -f https://www.paddlepaddle.org.cn/whl/fastdeploy_nightly_build.html
```
## 2. Run Deployment Example
```
# download deployment example
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd  FastDeploy/examples/text/ernie-3.0/python

#  download the fine-tuned ERNIE 3.0 model trained from the AFQMC dataset
wget https://bj.bcebos.com/fastdeploy/models/ernie-3.0/ernie-3.0-medium-zh-afqmc.tgz
tar xvfz ernie-3.0-medium-zh-afqmc.tgz

# CPU deployment
python seq_cls_infer.py --device cpu --model_dir ernie-3.0-medium-zh-afqmc

# GPU deployment
python seq_cls_infer.py --device gpu --model_dir ernie-3.0-medium-zh-afqmc
```
The results returned after the operation is completed are as follows:

```bash
[INFO] fastdeploy/runtime.cc(469)::Init	Runtime initialized with Backend::ORT in Device::CPU.
Batch id:0, example id:0, sentence1:花呗收款额度限制, sentence2:收钱码，对花呗支付的金额有限制吗, label:1, similarity:0.5819
Batch id:1, example id:0, sentence1:花呗支持高铁票支付吗, sentence2:为什么友付宝不支持花呗付款, label:0, similarity:0.9979
```

### Parameter Description

`seq_cls_infer.py` In addition to the command line parameters in the above example, more command line parameters are also supported. The following is a description of each command line parameter.

| Parameter |Parameter Description |
|----------|--------------|
|--model_dir | Specify the directory where the model is deployed， |
|--batch_size |Maximum measurable batch size，default 1|
|--max_length |Maximum sequence length，default 128|
|--device | equipment running，Optional range: ['cpu', 'gpu']，default'cpu' |
|--backend | Supported Inference Backends，Optional range: ['onnx_runtime', 'paddle', 'openvino', 'tensorrt', 'paddle_tensorrt']，default 'onnx_runtime' |
|--use_fp16 | Whether to use FP16 mode for inference。Use tensorrt and paddle_tensorrt can be turned on when backend，default False |
|--use_fast| Whether to use FastTokenizer to speed up the word segmentation stage。default True|