README.md 13.1 KB
Newer Older
G
Guanghua Yu 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
English | [简体中文](README_cn.md)

# FaceDetection
The goal of FaceDetection is to provide efficient and high-speed face detection solutions,
including cutting-edge and classic models.


<div align="center">
  <img src="../../demo/output/12_Group_Group_12_Group_Group_12_935.jpg" />
</div>

## Data Pipline
We use the [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/) to carry out the training
and testing of the model, the official website gives detailed data introduction.
- WIDER Face data source:  
Loads `wider_face` type dataset with directory structures like this:

  ```
  dataset/wider_face/
  ├── wider_face_split
  │   ├── wider_face_train_bbx_gt.txt
  │   ├── wider_face_val_bbx_gt.txt
  ├── WIDER_train
  │   ├── images
  │   │   ├── 0--Parade
  │   │   │   ├── 0_Parade_marchingband_1_100.jpg
  │   │   │   ├── 0_Parade_marchingband_1_381.jpg
  │   │   │   │   ...
  │   │   ├── 10--People_Marching
  │   │   │   ...
  ├── WIDER_val
  │   ├── images
  │   │   ├── 0--Parade
  │   │   │   ├── 0_Parade_marchingband_1_1004.jpg
  │   │   │   ├── 0_Parade_marchingband_1_1045.jpg
  │   │   │   │   ...
  │   │   ├── 10--People_Marching
  │   │   │   ...
  ```

- Download dataset manually:  
42
To download the WIDER FACE dataset, run the following commands:
G
Guanghua Yu 已提交
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
```
cd dataset/wider_face && ./download.sh
```

- Download dataset automatically:
If a training session is started but the dataset is not setup properly
(e.g, not found in dataset/wider_face), PaddleDetection can automatically
download them from [WIDER FACE dataset](http://shuoyang1213.me/WIDERFACE/),
the decompressed datasets will be cached in ~/.cache/paddle/dataset/ and can be discovered
automatically subsequently.

### Data Augmentation

- **Data-anchor-sampling:** Randomly transform the scale of the image to a certain range of scales,
greatly enhancing the scale change of the face. The specific operation is to obtain $v=\sqrt{width * height}$
according to the randomly selected face height and width, and judge the value of `v` in which interval of
 `[16,32,64,128]`. Assuming `v=45` && `32<v<64`, and any value of `[16,32,64]` is selected with a probability
 of uniform distribution. If `64` is selected, the face's interval is selected in `[64 / 2, min(v * 2, 64 * 2)]`.

- **Other methods:** Including `RandomDistort`,`ExpandImage`,`RandomInterpImage`,`RandomFlipImage` etc.
Please refer to [DATA.md](../../docs/DATA.md#APIs) for details.


##  Benchmark and Model Zoo
Supported architectures is shown in the below table, please refer to
[Algorithm Description](#Algorithm-Description) for details of the algorithm.

|                          | Original | Lite <sup>[1](#lite)</sup> | NAS <sup>[2](#nas)</sup> |
|:------------------------:|:--------:|:--------------------------:|:------------------------:|
| [BlazeFace](#BlazeFace)  | ✓        |                          ✓ | ✓                        |
| [FaceBoxes](#FaceBoxes)  | ✓        |                          ✓ | x                        |

<a name="lite">[1]</a> `Lite` edition means reduces the number of network layers and channels.  
<a name="nas">[2]</a> `NAS` edition means use `Neural Architecture Search` algorithm to
optimized network structure.

**Todo List:**
- [ ] HamBox
- [ ] Pyramidbox

### Model Zoo

#### mAP in WIDER FACE

87 88 89 90 91 92 93
| Architecture | Type     | Size | Img/gpu | Lr schd | Easy Set  | Medium Set | Hard Set  | Download |
|:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:|
| BlazeFace    | Original | 640  |    8    | 32w     | **0.915** | **0.892**  | **0.797** | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_original.tar) |
| BlazeFace    | Lite     | 640  |    8    | 32w     | 0.909     | 0.885      | 0.781     | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) |
| BlazeFace    | NAS      | 640  |    8    | 32w     | 0.837     | 0.807      | 0.658     | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) |
| FaceBoxes    | Original | 640  |    8    | 32w     | 0.875     | 0.848      | 0.568     | [model](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_original.tar) |
| FaceBoxes    | Lite     | 640  |    8    | 32w     | 0.898     | 0.872      | 0.752     | [model](https://paddlemodels.bj.bcebos.com/object_detection/faceboxes_lite.tar) |
G
Guanghua Yu 已提交
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116

**NOTES:**  
- Get mAP in `Easy/Medium/Hard Set` by multi-scale evaluation in `tools/face_eval.py`.
For details can refer to [Evaluation](#Evaluate-on-the-WIDER-FACE).
- BlazeFace-Lite Training and Testing ues [blazeface.yml](../../configs/face_detection/blazeface.yml)
configs file and set `lite_edition: true`.

#### mAP in FDDB

| Architecture | Type     | Size | DistROC | ContROC |
|:------------:|:--------:|:----:|:-------:|:-------:|
| BlazeFace    | Original | 640  | **0.992**   | **0.762**   |
| BlazeFace    | Lite     | 640  | 0.990   | 0.756   |
| BlazeFace    | NAS      | 640  | 0.981   | 0.741   |
| FaceBoxes    | Original | 640  | 0.985   | 0.731   |
| FaceBoxes    | Lite     | 640  | 0.987   | 0.741   |

**NOTES:**  
- Get mAP by multi-scale evaluation on the FDDB dataset.
For details can refer to [Evaluation](#Evaluate-on-the-FDDB).

#### Infer Time and Model Size comparison  

L
Lv Mengsi 已提交
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
| Architecture | Type     | Size | P4(trt32) (ms) | CPU (ms) | Qualcomm SnapDragon 855(armv8) (ms)   | Model size (MB) |
|:------------:|:--------:|:----:|:--------------:|:--------:|:-------------------------------------:|:---------------:|
| BlazeFace    | Original | 128  | 1.387          | 23.461   |  6.036                                | 0.777           |
| BlazeFace    | Lite     | 128  | 1.323          | 12.802   |  6.193                                | 0.68            |
| BlazeFace    | NAS      | 128  | 1.03           | 6.714    |  2.7152                               | 0.234           |
| FaceBoxes    | Original | 128  | 3.144          | 14.972   |  19.2196                              | 3.6             |
| FaceBoxes    | Lite     | 128  | 2.295          | 11.276   |  8.5278                               | 2               |
| BlazeFace    | Original | 320  | 3.01           | 132.408  |  70.6916                              | 0.777           |
| BlazeFace    | Lite     | 320  | 2.535          | 69.964   |  69.9438                              | 0.68            |
| BlazeFace    | NAS      | 320  | 2.392          | 36.962   |  39.8086                              | 0.234           |
| FaceBoxes    | Original | 320  | 7.556          | 84.531   |  52.1022                              | 3.6             |
| FaceBoxes    | Lite     | 320  | 18.605         | 78.862   |  59.8996                              | 2               |
| BlazeFace    | Original | 640  | 8.885          | 519.364  |  149.896                              | 0.777           |
| BlazeFace    | Lite     | 640  | 6.988          | 284.13   |  149.902                              | 0.68            |
| BlazeFace    | NAS      | 640  | 7.448          | 142.91   |  69.8266                              | 0.234           |
| FaceBoxes    | Original | 640  | 78.201         | 394.043  |  169.877                              | 3.6             |
| FaceBoxes    | Lite     | 640  | 59.47          | 313.683  |  139.918                              | 2               |
G
Guanghua Yu 已提交
134 135 136


**NOTES:**  
L
Lv Mengsi 已提交
137 138 139 140 141 142
- CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
- P4(trt32) and CPU tests based on PaddlePaddle, PaddlePaddle version is 1.6.1
- ARM test environment:
    - Qualcomm SnapDragon 855(armv8)
    - Single thread
    - Paddle-Lite version 2.0.0
G
Guanghua Yu 已提交
143 144 145 146


## Get Started
`Training` and `Inference` please refer to [GETTING_STARTED.md](../../docs/GETTING_STARTED.md)
W
wangguanzhong 已提交
147 148 149
- **NOTES:**  
- `BlazeFace` and `FaceBoxes` is trained in 4 GPU with `batch_size=8` per gpu (total batch size as 32)
and trained 320000 iters.(If your GPU count is not 4, please refer to the rule of training parameters
150 151
in the table of [calculation rules](../../docs/GETTING_STARTED.md#faq))
- Currently we do not support evaluation in training.
G
Guanghua Yu 已提交
152 153 154 155 156 157 158 159 160 161 162

### Evaluation
```
export CUDA_VISIBLE_DEVICES=0
export PYTHONPATH=$PYTHONPATH:.
python tools/face_eval.py -c configs/face_detection/blazeface.yml
```
- Optional arguments
- `-d` or `--dataset_dir`: Dataset path, same as dataset_dir of configs. Such as: `-d dataset/wider_face`.
- `-f` or `--output_eval`: Evaluation file directory, default is `output/pred`.
- `-e` or `--eval_mode`: Evaluation mode, include `widerface` and `fddb`, default is `widerface`.
W
wangguanzhong 已提交
163
- `--multi_scale`: If you add this action button in the command, it will select `multi_scale` evaluation.
164
Default is `False`, it will select `single-scale` evaluation.
G
Guanghua Yu 已提交
165 166

After the evaluation is completed, the test result in txt format will be generated in `output/pred`,
167 168 169
and then mAP will be calculated according to different data sets. If you set `--eval_mode=widerface`,
it will [Evaluate on the WIDER FACE](#Evaluate-on-the-WIDER-FACE).If you set `--eval_mode=fddb`,
it will [Evaluate on the FDDB](#Evaluate-on-the-FDDB).
G
Guanghua Yu 已提交
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189

#### Evaluate on the WIDER FACE
- Download the official evaluation script to evaluate the AP metrics:
```
wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
unzip eval_tools.zip && rm -f eval_tools.zip
```
- Modify the result path and the name of the curve to be drawn in `eval_tools/wider_eval.m`:
```
# Modify the folder name where the result is stored.
pred_dir = './pred';  
# Modify the name of the curve to be drawn
legend_name = 'Fluid-BlazeFace';
```
- `wider_eval.m` is the main execution program of the evaluation module. The run command is as follows:
```
matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
```

#### Evaluate on the FDDB
W
wangguanzhong 已提交
190
[FDDB dataset](http://vis-www.cs.umass.edu/fddb/) details can refer to FDDB's official website.  
G
Guanghua Yu 已提交
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244
- Download the official dataset and evaluation script to evaluate the ROC metrics:
```
#external link to the Faces in the Wild data set
wget http://tamaraberg.com/faceDataset/originalPics.tar.gz
#The annotations are split into ten folds. See README for details.
wget http://vis-www.cs.umass.edu/fddb/FDDB-folds.tgz
#information on directory structure and file formats
wget http://vis-www.cs.umass.edu/fddb/README.txt
```
- Install OpenCV: Requires [OpenCV library](http://sourceforge.net/projects/opencvlibrary/)  
If the utility 'pkg-config' is not available for your operating system,
edit the Makefile to manually specify the OpenCV flags as following:
```
INCS = -I/usr/local/include/opencv
LIBS = -L/usr/local/lib -lcxcore -lcv -lhighgui -lcvaux -lml
```

- Compile FDDB evaluation code: execute `make` in evaluation folder.

- Generate full image path list and groundtruth in FDDB-folds. The run command is as follows:
```
cat `ls|grep -v"ellipse"` > filePath.txt` and `cat *ellipse* > fddb_annotFile.txt`
```
- Evaluation
Finally evaluation command is:
```
./evaluate -a ./FDDB/FDDB-folds/fddb_annotFile.txt \
           -d DETECTION_RESULT.txt -f 0 \
           -i ./FDDB -l ./FDDB/FDDB-folds/filePath.txt \
           -r ./OUTPUT_DIR -z .jpg
```
**NOTES:** The interpretation of the argument can be performed by `./evaluate --help`.

## Algorithm Description

### BlazeFace
**Introduction:**  
[BlazeFace](https://arxiv.org/abs/1907.05047) is Google Research published face detection model.
It's lightweight but good performance, and tailored for mobile GPU inference. It runs at a speed
of 200-1000+ FPS on flagship devices.

**Particularity:**  
- Anchor scheme stops at 8×8(input 128x128), 6 anchors per pixel at that resolution.
- 5 single, and 6 double BlazeBlocks: 5×5 depthwise convs, same accuracy with fewer layers.
- Replace the non-maximum suppression algorithm with a blending strategy that estimates the
regression parameters of a bounding box as a weighted mean between the overlapping predictions.

**Edition information:**
- Original: Reference original paper reproduction.
- Lite: Replace 5x5 conv with 3x3 conv, fewer network layers and conv channels.
- NAS: use `Neural Architecture Search` algorithm to optimized network structure,
less network layer and conv channel number than `Lite`.

### FaceBoxes
W
wangguanzhong 已提交
245
**Introduction:**  
G
Guanghua Yu 已提交
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265
[FaceBoxes](https://arxiv.org/abs/1708.05234) which named A CPU Real-time Face Detector
with High Accuracy is face detector proposed by Shifeng Zhang, with high performance on
both speed and accuracy. This paper is published by IJCB(2017).

**Particularity:**
- Anchor scheme stops at 20x20, 10x10, 5x5, which network input size is 640x640,
including 3, 1, 1 anchors per pixel at each resolution. The corresponding densities
are 1, 2, 4(20x20), 4(10x10) and 4(5x5).
- 2 convs with CReLU, 2 poolings, 3 inceptions and 2 convs with ReLU.
- Use density prior box to improve detection accuracy.

**Edition information:**
- Original: Reference original paper reproduction.
- Lite: 2 convs with CReLU, 1 pooling, 2 convs with ReLU, 3 inceptions and 2 convs with ReLU.
Anchor scheme stops at 80x80 and 40x40, including 3, 1 anchors per pixel at each resolution.
The corresponding densities are 1, 2, 4(80x80) and 4(40x40), using less conv channel number than lite.


## Contributing
Contributions are highly welcomed and we would really appreciate your feedback!!