PERFORMANCE_OPTIM.md 1.8 KB
Newer Older
M
fix doc  
MRXLT 已提交
1 2 3
# Performance Optimization

([简体中文](./PERFORMANCE_OPTIM_CN.md)|English)
M
MRXLT 已提交
4

M
fix doc  
MRXLT 已提交
5
Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services.
M
MRXLT 已提交
6

M
fix doc  
MRXLT 已提交
7
For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
M
MRXLT 已提交
8 9 10 11 12 13 14 15 16 17 18

For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.

For computation-intensive prediction services, you can use GPU prediction services instead of CPU prediction services, or increase the number of graphics cards for GPU prediction services.

Under the same conditions, the communication time of the HTTP prediction service provided by Paddle Serving is longer than that of the RPC prediction service, so for communication-intensive services, please give priority to using RPC communication.

Parameters for performance optimization:

| Parameters | Type | Default | Description                                                  |
| ---------- | ---- | ------- | ------------------------------------------------------------ |
M
MRXLT 已提交
19 20
| mem_optim  | - | - | Enable memory / graphic memory optimization                                   |
| ir_optim   | - | -  | Enable analysis and optimization of calculation graph,including OP fusion, etc |