FACE_DETECTION_en.md

English | [简体中文](FACE_DETECTION.md)
# FaceDetection

## Table of Contents
- [Introduction](#Introduction)
- [Benchmark and Model Zoo](#Benchmark-and-Model-Zoo)
- [Quick Start](#Quick-Start)
    - [Data Pipline](#Data-Pipline)
    - [Training and Inference](#Training-and-Inference)
    - [Evaluation](#Evaluation)
- [Algorithm Description](#Algorithm-Description)
- [Contributing](#Contributing)

## Introduction
The goal of FaceDetection is to provide efficient and high-speed face detection solutions,
including cutting-edge and classic models.

![](../images/12_Group_Group_12_Group_Group_12_935.jpg)

## Benchmark and Model Zoo
PaddleDetection Supported architectures is shown in the below table, please refer to
[Algorithm Description](#Algorithm-Description) for details of the algorithm.

|                          | Original | Lite <sup>[1](#lite)</sup> | NAS <sup>[2](#nas)</sup> |
|:------------------------:|:--------:|:--------------------------:|:------------------------:|
| [BlazeFace](#BlazeFace)  | ✓        |                          ✓ | ✓                        |
| [FaceBoxes](#FaceBoxes)  | ✓        |                          ✓ | x                        |

<a name="lite">[1]</a> `Lite` edition means reduces the number of network layers and channels.  
<a name="nas">[2]</a> `NAS` edition means use `Neural Architecture Search` algorithm to
optimized network structure.


### Model Zoo

#### mAP in WIDER FACE

| Architecture | Type     | Size | Img/gpu | Lr schd | Easy Set  | Medium Set | Hard Set  | Download |
|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:|
| BlazeFace    | Original | 640  |    8    | 32w     | **0.915** | **0.892**  | **0.797** | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_original.tar) |
| BlazeFace    | Lite     | 640  |    8    | 32w     | 0.909     | 0.885      | 0.781     | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) |
| BlazeFace    | NAS      | 640  |    8    | 32w     | 0.837     | 0.807      | 0.658     | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) |
| FaceBoxes    | Original | 640  |    8    | 32w     | 0.878     | 0.851      | 0.576     | [model](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_original.tar) |
| FaceBoxes    | Lite     | 640  |    8    | 32w     | 0.901     | 0.875      | 0.760     | [model](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_lite.tar) |

**NOTES:**  
- Get mAP in `Easy/Medium/Hard Set` by multi-scale evaluation in `tools/face_eval.py`.
For details can refer to [Evaluation](#Evaluate-on-the-WIDER-FACE).
- BlazeFace-Lite Training and Testing ues [blazeface.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/configs/face_detection/blazeface.yml)
configs file and set `lite_edition: true`.

#### mAP in FDDB

| Architecture | Type     | Size | DistROC | ContROC |
|:------------:|:--------:|:----:|:-------:|:-------:|
| BlazeFace    | Original | 640  | **0.992**   | **0.762**   |
| BlazeFace    | Lite     | 640  | 0.990   | 0.756   |
| BlazeFace    | NAS      | 640  | 0.981   | 0.741   |
| FaceBoxes    | Original | 640  | 0.987   | 0.736   |
| FaceBoxes    | Lite     | 640  | 0.988   | 0.751   |

**NOTES:**  
- Get mAP by multi-scale evaluation on the FDDB dataset.
For details can refer to [Evaluation](#Evaluate-on-the-FDDB).

#### Infer Time and Model Size comparison  

| Architecture | Type     | Size | P4(trt32) (ms) | CPU (ms) | Qualcomm SnapDragon 855(armv8) (ms)   | Model size (MB) |
|:------------:|:--------:|:----:|:--------------:|:--------:|:-------------------------------------:|:---------------:|
| BlazeFace    | Original | 128  | 1.387          | 23.461   |  6.036                                | 0.777           |
| BlazeFace    | Lite     | 128  | 1.323          | 12.802   |  6.193                                | 0.68            |
| BlazeFace    | NAS      | 128  | 1.03           | 6.714    |  2.7152                               | 0.234           |
| FaceBoxes    | Original | 128  | 3.144          | 14.972   |  19.2196                              | 3.6             |
| FaceBoxes    | Lite     | 128  | 2.295          | 11.276   |  8.5278                               | 2               |
| BlazeFace    | Original | 320  | 3.01           | 132.408  |  70.6916                              | 0.777           |
| BlazeFace    | Lite     | 320  | 2.535          | 69.964   |  69.9438                              | 0.68            |
| BlazeFace    | NAS      | 320  | 2.392          | 36.962   |  39.8086                              | 0.234           |
| FaceBoxes    | Original | 320  | 7.556          | 84.531   |  52.1022                              | 3.6             |
| FaceBoxes    | Lite     | 320  | 18.605         | 78.862   |  59.8996                              | 2               |
| BlazeFace    | Original | 640  | 8.885          | 519.364  |  149.896                              | 0.777           |
| BlazeFace    | Lite     | 640  | 6.988          | 284.13   |  149.902                              | 0.68            |
| BlazeFace    | NAS      | 640  | 7.448          | 142.91   |  69.8266                              | 0.234           |
| FaceBoxes    | Original | 640  | 78.201         | 394.043  |  169.877                              | 3.6             |
| FaceBoxes    | Lite     | 640  | 59.47          | 313.683  |  139.918                              | 2               |


**NOTES:**  
- CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz.
- P4(trt32) and CPU tests based on PaddlePaddle, PaddlePaddle version is 1.6.1.
- ARM test environment:
    - Qualcomm SnapDragon 855(armv8);
    - Single thread;
    - Paddle-Lite version 2.0.0.

## Quick Start

### Data Pipline
We use the [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/) to carry out the training
and testing of the model, the official website gives detailed data introduction.
- WIDER Face data source:  
Loads `wider_face` type dataset with directory structures like this:

  ```
  dataset/wider_face/
  ├── wider_face_split
  │   ├── wider_face_train_bbx_gt.txt
  │   ├── wider_face_val_bbx_gt.txt
  ├── WIDER_train
  │   ├── images
  │   │   ├── 0--Parade
  │   │   │   ├── 0_Parade_marchingband_1_100.jpg
  │   │   │   ├── 0_Parade_marchingband_1_381.jpg
  │   │   │   │   ...
  │   │   ├── 10--People_Marching
  │   │   │   ...
  ├── WIDER_val
  │   ├── images
  │   │   ├── 0--Parade
  │   │   │   ├── 0_Parade_marchingband_1_1004.jpg
  │   │   │   ├── 0_Parade_marchingband_1_1045.jpg
  │   │   │   │   ...
  │   │   ├── 10--People_Marching
  │   │   │   ...
  ```

- Download dataset manually:  
To download the WIDER FACE dataset, run the following commands:
```
cd dataset/wider_face && ./download.sh
```

- Download dataset automatically:
If a training session is started but the dataset is not setup properly
(e.g, not found in dataset/wider_face), PaddleDetection can automatically
download them from [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/),
the decompressed datasets will be cached in ~/.cache/paddle/dataset/ and can be discovered
automatically subsequently.

#### Data Augmentation

- **Data-anchor-sampling:** Randomly transform the scale of the image to a certain range of scales,
greatly enhancing the scale change of the face. The specific operation is to obtain $v=\sqrt{width * height}$
according to the randomly selected face height and width, and judge the value of `v` in which interval of
 `[16,32,64,128]`. Assuming `v=45` && `32<v<64`, and any value of `[16,32,64]` is selected with a probability
 of uniform distribution. If `64` is selected, the face's interval is selected in `[64 / 2, min(v * 2, 64 * 2)]`.

- **Other methods:** Including `RandomDistort`,`ExpandImage`,`RandomInterpImage`,`RandomFlipImage` etc.
Please refer to [READER.md](../advanced_tutorials/READER.md) for details.

### Training and Inference
`Training` and `Inference` please refer to [GETTING_STARTED.md](../tutorials/GETTING_STARTED.md)  
**NOTES:**
- `BlazeFace` and `FaceBoxes` is trained in 4 GPU with `batch_size=8` per gpu (total batch size as 32)
and trained 320000 iters.(If your GPU count is not 4, please refer to the rule of training parameters
in the table of [calculation rules](../tutorials/GETTING_STARTED.html#faq)).
- Currently we do not support evaluation in training.

### Evaluation
Currently we support evaluation on the `WIDER FACE` dataset and the` FDDB` dataset. First run `tools / face_eval.py`
to generate the evaluation result file, and then use matlab(WIDER FACE)
or OpenCV(FDDB) calculates specific evaluation indicators.  
Among them, the optional arguments list for running `tools / face_eval.py` is as follows:
- `-f` or `--output_eval`: Evaluation file directory, default is `output/pred`.
- `-e` or `--eval_mode`: Evaluation mode, include `widerface` and `fddb`, default is `widerface`.
- `--multi_scale`: If you add this action button in the command, it will select `multi_scale` evaluation.
Default is `False`, it will select `single-scale` evaluation.

#### Evaluate on the WIDER FACE
- Evaluate and generate results files:
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python -u tools/face_eval.py -c configs/face_detection/blazeface.yml \
       -o weights=output/blazeface/model_final/ \
       --eval_mode=widerface
```
After the evaluation is completed, the test result in txt format will be generated in `output/pred`.

- Download the official evaluation script to evaluate the AP metrics:
```
wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
unzip eval_tools.zip && rm -f eval_tools.zip
```
- Modify the result path and the name of the curve to be drawn in `eval_tools/wider_eval.m`:
```
# Modify the folder name where the result is stored.
pred_dir = './pred';  
# Modify the name of the curve to be drawn
legend_name = 'Fluid-BlazeFace';
```
- `wider_eval.m` is the main execution program of the evaluation module. The run command is as follows:
```
matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
```

#### Evaluate on the FDDB
We provide a FDDB data set evaluation process (currently only supports Linux systems),
please refer to [FDDB official website](http://vis-www.cs.umass.edu/fddb/) for other specific details.

- 1)Download and install OpenCV：  
Download OpenCV: go to [OpenCV library](https://opencv.org/releases/) to Manual download  
Install OpenCV：Please refer to [Official OpenCV Installation Tutorial](https://docs.opencv.org/master/d7/d9f/tutorial_linux_install.html)
to install through source code.

- 2)Download datasets, evaluation code, and formatted data:  
```
./dataset/fddb/download.sh
```

- 3)Compile FDDB evaluation code:
Go to the `dataset/fddb/evaluation` directory and modify the contents of the MakeFile file as follows:
```
evaluate: $(OBJS)
    $(CC) $(OBJS) -o $@ $(LIBS)
```
Modify the content in `common.hpp` to the following form:
```
#define __IMAGE_FORMAT__ ".jpg"
//#define __IMAGE_FORMAT__ ".ppm"
#define __CVLOADIMAGE_WORKING__
```
According to the `grep -r "CV_RGB"` command, find the code segment containing `CV_RGB`, change `CV_RGB` to `Scalar`,
and add `using namespace cv;` in cpp, then compile:
```
make clean && make
```

- 4)Start evaluation:  
Modify the contents of the `dataset_dir` and` annotation` fields in the config file:
```
EvalReader:
  ...
  dataset:
    dataset_dir: dataset/fddb
    anno_path: FDDB-folds/fddb_annotFile.txt
    ...
```
Evaluate and generate results files:
```
python -u tools/face_eval.py -c configs/face_detection/blazeface.yml \
       -o weights=output/blazeface/model_final/ \
       --eval_mode=fddb
```
After the evaluation is completed, the test result in txt format will be generated in `output/pred/pred_fddb_res.txt`.  
Generate ContROC and DiscROC data:  
```
cd dataset/fddb/evaluation
./evaluate -a ./FDDB-folds/fddb_annotFile.txt \
           -f 0 -i ./ -l ./FDDB-folds/filePath.txt -z .jpg \
           -d {RESULT_FILE} \
           -r {OUTPUT_DIR}
```
**NOTES:**  
(1)`RESULT_FILE` is the FDDB prediction result file output by `tools/face_eval.py`;  
(2)`OUTPUT_DIR` is the prefix of the FDDB evaluation output file,
which will generate two files `{OUTPUT_DIR}ContROC.txt`、`{OUTPUT_DIR}DiscROC.txt`;  
(3)The interpretation of the argument can be performed by `./evaluate --help`.

## Algorithm Description

### BlazeFace
**Introduction:**  
[BlazeFace](https://arxiv.org/abs/1907.05047) is Google Research published face detection model.
It's lightweight but good performance, and tailored for mobile GPU inference. It runs at a speed
of 200-1000+ FPS on flagship devices.

**Particularity:**  
- Anchor scheme stops at 8×8(input 128x128), 6 anchors per pixel at that resolution.
- 5 single, and 6 double BlazeBlocks: 5×5 depthwise convs, same accuracy with fewer layers.
- Replace the non-maximum suppression algorithm with a blending strategy that estimates the
regression parameters of a bounding box as a weighted mean between the overlapping predictions.

**Edition information:**
- Original: Reference original paper reproduction.
- Lite: Replace 5x5 conv with 3x3 conv, fewer network layers and conv channels.
- NAS: use `Neural Architecture Search` algorithm to optimized network structure,
less network layer and conv channel number than `Lite`.

### FaceBoxes
**Introduction:**  
[FaceBoxes](https://arxiv.org/abs/1708.05234) which named A CPU Real-time Face Detector
with High Accuracy is face detector proposed by Shifeng Zhang, with high performance on
both speed and accuracy. This paper is published by IJCB(2017).

**Particularity:**
- Anchor scheme stops at 20x20, 10x10, 5x5, which network input size is 640x640,
including 3, 1, 1 anchors per pixel at each resolution. The corresponding densities
are 1, 2, 4(20x20), 4(10x10) and 4(5x5).
- 2 convs with CReLU, 2 poolings, 3 inceptions and 2 convs with ReLU.
- Use density prior box to improve detection accuracy.

**Edition information:**
- Original: Reference original paper reproduction.
- Lite: 2 convs with CReLU, 1 pooling, 2 convs with ReLU, 3 inceptions and 2 convs with ReLU.
Anchor scheme stops at 80x80 and 40x40, including 3, 1 anchors per pixel at each resolution.
The corresponding densities are 1, 2, 4(80x80) and 4(40x40), using less conv channel number than lite.


## Contributing
Contributions are highly welcomed and we would really appreciate your feedback!!