motion_driving.md 10.3 KB
Newer Older
Z
ZhengLeyizly 已提交
1
# First Order Motion
2

Z
ZhengLeyizly 已提交
3
[First order motion model](https://arxiv.org/abs/2003.00196) is to complete the Image animation task, which consists of generating a video sequence so that an object in a source image is animated according to the motion of the driving video. The image below gives examples of source images with objects and driving videos containing a series of motions.
4 5 6 7

<div align="center">
  <img src="../../imgs/fom_demo.png" width="500"/>
</div>
Z
ZhengLeyizly 已提交
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Taking the face in the upper left corner as an example, given a source object and a driving video, a new video can be generated in which the source subject wears an expression that is derived from the driving video. Usually, we need to annotate the keypoints of the source object and train the model for facial expression transfer.

The following gif clearly expounds the principle:  

[![img](https://user-images.githubusercontent.com/48054808/127443878-b9369c1a-909c-4af6-8c84-a62821262910.gif)](https://user-images.githubusercontent.com/48054808/127443878-b9369c1a-909c-4af6-8c84-a62821262910.gif)

The proposed method is not exclusively for facial expression transfer, it also supports other object with training on similar datasets. For example, you can transfer the motion of playing Tai Chi by training on Tai Chi video datasets and achieve facial expression transfer by using dataset voxceleb for training. After that, you could realize real-time image animation with the corresponding pre-training model.

## Features

- #### Multi-Faces Swapping

  - **Unique face detection algorithm that supports automatic multi-faces detection and expression swapping.**

    - We adopt PaddleGAN's face detection model [S3FD](https://github.com/PaddlePaddle/PaddleGAN/tree/develop/ppgan/faceutils/face_detection/detection) to detect all the faces in an image and transfer those expressions for multi-faces swapping.
23

Z
ZhengLeyizly 已提交
24
      Specific technical steps are shown below:
25

Z
ZhengLeyizly 已提交
26
       a. Use the S3FD model to detect all the faces in an image
27

Z
ZhengLeyizly 已提交
28
       b. Use the First Order Motion model to do the facial expression transfer of each face
29

Z
ZhengLeyizly 已提交
30 31 32 33
       c. Crop those "new" generated faces and put them back to the original photo

    At the same time, PaddleGAN also provides a ["faceutils" tool](https://github.com/PaddlePaddle/PaddleGAN/tree/develop/ppgan/faceutils) for face-related work, including face detection, face segmentation, keypoints detection, etc.

L
LielinJiang 已提交
34
- #### Face Enhancement
Z
ZhengLeyizly 已提交
35 36 37 38 39 40 41 42 43

  - **This effect significantly improves the definition of the driven video.**

- #### Abundant applications for online experience

  - 🐜**Ant Ah Hey**🐜:https://aistudio.baidu.com/aistudio/projectdetail/1603391
  - 💙**Special For Love Confession on May 20th (pronounced as I love you)**💙:https://aistudio.baidu.com/aistudio/projectdetail/1956943
  - **Smile of the Deceased(▰˘◡˘▰)**:https://aistudio.baidu.com/aistudio/projectdetail/1660701
  - 👨**Special For Father's Day**:https://aistudio.baidu.com/aistudio/projectdetail/2068655
44 45

## How to use
Z
ZhengLeyizly 已提交
46 47 48 49 50 51 52
### 1. Quick Start: Face Detection and Effect Enhancement

Users can upload a source image with single or multiple faces and driving video, then substitute the paths of source image and driving video for the `source_image` and `driving_video` parameters respectively and run the following command. It will generate a video file named `result.mp4` in the `output` folder, which is the animated video file.

Note: For photos with multiple faces, the longer the distances between faces, the better the result quality you can get, or you could optimize the effect by adjusting ratio.

The original image and driving video here are provided for demonstration purposes, the running command is as follows:
53

Z
ZhengLeyizly 已提交
54
#### Running Command:
55

56
```
L
lijianshe02 已提交
57 58 59 60
cd applications/
python -u tools/first-order-demo.py  \
     --driving_video ../docs/imgs/fom_dv.mp4 \
     --source_image ../docs/imgs/fom_source_image.png \
L
lijianshe02 已提交
61
     --ratio 0.4 \
Z
ZhengLeyizly 已提交
62 63
     --relative \
     --adapt_scale \
64
     --image_size 512 \
Z
ZhengLeyizly 已提交
65 66
     --face_enhancement \
     --multi_person
67
```
F
FNRE 已提交
68

69

F
FNRE 已提交
70

Z
ZhengLeyizly 已提交
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
#### Parameters:

| Parameters       | Instructions                                                 |
| ---------------- | ------------------------------------------------------------ |
| driving_video    | driving video, the motion and expression of the driving video is to be migrated. |
| source_image     | source image, support single face and multi-faces images, the image will be animated according to the motion and expression of the driving video. |
| relative         | indicate whether the relative or absolute coordinates of the key points in the video are used in the program. It is recommended to use relative coordinates, or the characters will be distorted after animation. |
| adapt_scale      | adaptive movement scale based on convex hull of keypoints.   |
| ratio            | The pasted face percentage of generated image, this parameter should be adjusted in the case of multi-person image in which the adjacent faces are close. The default value is 0.4 and the range is [0.4, 0.5]. |
| image_size       | The image size of the face. 256 by default, 512 is supported |
| face_enhancement | enhance the face, closed by default with no parameter added. |
| multi_person     | multiple faces in the image. Default means only one face in the image |

#### 📣Result of Face Enhancement

|                            Before                            |                            After                             |
| :----------------------------------------------------------: | :----------------------------------------------------------: |
| [![img](https://user-images.githubusercontent.com/17897185/126444836-b68593e3-ae43-4450-b18f-1a549230bf07.gif)](https://user-images.githubusercontent.com/17897185/126444836-b68593e3-ae43-4450-b18f-1a549230bf07.gif) | [![img](https://user-images.githubusercontent.com/17897185/126444194-436cc885-259d-4636-ad4c-c3dcc52fe175.gif)](https://user-images.githubusercontent.com/17897185/126444194-436cc885-259d-4636-ad4c-c3dcc52fe175.gif) |

### 2. Training

#### **Datasets:**
93

Z
ZhengLeyizly 已提交
94 95
- fashion See [here](https://vision.cs.ubc.ca/datasets/fashion/)
- VoxCeleb See [here](https://github.com/AliaksandrSiarohin/video-preprocessing). Here you can process the data sizes according to your requirements. We deal with two resolution sizes: 256 and 512, the results can be seen below: [![img](https://github.com/PaddlePaddle/PaddleGAN/raw/develop/docs/imgs/fom_512_vs_256.png)](https://github.com/PaddlePaddle/PaddleGAN/blob/develop/docs/imgs/fom_512_vs_256.png)
F
FNRE 已提交
96

Z
ZhengLeyizly 已提交
97
**Parameters:**
F
FNRE 已提交
98

Z
ZhengLeyizly 已提交
99
- dataset_name.yaml: Configure your own yaml document and parameters
F
FNRE 已提交
100

Z
ZhengLeyizly 已提交
101
- Training using single GPU:
F
FNRE 已提交
102 103 104 105
```
export CUDA_VISIBLE_DEVICES=0
python tools/main.py --config-file configs/dataset_name.yaml
```
Z
ZhengLeyizly 已提交
106
- Training using multiple GPUs: change the *nn.BatchNorm* in /ppgan/modules/first_order.py to *nn.SyncBatchNorm*
F
FNRE 已提交
107 108 109 110 111 112 113 114 115
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
    tools/main.py \
    --config-file configs/dataset_name.yaml

```

**Example:**
Z
ZhengLeyizly 已提交
116
- Training using single GPU:
F
FNRE 已提交
117 118 119 120
```
export CUDA_VISIBLE_DEVICES=0
python tools/main.py --config-file configs/firstorder_fashion.yaml \
```
Z
ZhengLeyizly 已提交
121
- Training using multiple GPUs:
F
FNRE 已提交
122 123 124 125 126 127 128
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
    tools/main.py \
    --config-file configs/firstorder_fashion.yaml \
```

129

130 131 132

## Animation results

Z
ZhengLeyizly 已提交
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156
  [![img](https://user-images.githubusercontent.com/48054808/119469551-0a377b00-bd7a-11eb-9117-e4871c8fb9c0.gif)](https://user-images.githubusercontent.com/48054808/119469551-0a377b00-bd7a-11eb-9117-e4871c8fb9c0.gif)



### 3. Model Compression

**Prediction:**

```
cd applications/
python -u tools/first-order-demo.py  \
     --driving_video ../docs/imgs/mayiyahei.MP4 \
     --source_image ../docs/imgs/father_23.jpg \
     --config ../configs/firstorder_vox_mobile_256.yaml \
     --ratio 0.4 \
     --relative \
     --adapt_scale \
     --mobile_net
```

Currently, we use mobilenet combined with pruning to compress models, see the comparison below:

|            | Size(M) | reconstruction loss |
| ---------- | ------- | ------------------- |
L
LielinJiang 已提交
157 158
| Original   | 229     | 0.041781392         |
| Compressed | 10.1    | 0.047878753         |
Z
ZhengLeyizly 已提交
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192

**Training:** First, set mode in configs/firstorder_vox_mobile_256.yaml as kp_detector, train the compressed kp_detector model, and immobilize the original  generator model. Then set mode in configs/firstorder_vox_mobile_256.yaml as generator,train the compressed generator model, and immobilize the original kp_detector model. Finally, set mode as both and modify kp_weight_path and gen_weight_path in the config to the path of trained model to train together。

```
export CUDA_VISIBLE_DEVICES=0
python tools/main.py --config-file configs/firstorder_vox_mobile_256.yaml
```



### 4. Deployment

#### 4.1 Export

Use the `tools/fom_export.py` script to export the configuration file used when the model has been deployed with the config name of `firstorder_vox_mobile_256.yml`. The export script of the model is as follows.

```
# Export FOM Model

python tools/export_model.py \
    --config-file configs/firstorder_vox_mobile_256.yaml \
    --load /root/.cache/ppgan/vox_mobile.pdparams \
    --inputs_size "1,3,256,256;1,3,256,256;1,10,2;1,10,2,2" \
    --export_model output_inference/
```

The prediction models will be exported to the directory of `output_inference/fom_dy2st/` as `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`

- [Pre-training Model](https://paddlegan.bj.bcebos.com/applications/first_order_model/paddle_lite/inference/lite.zip)

#### 4.2 Deployment of PaddleLite

- [Deployment of FOM model  with Paddle Lite](https://github.com/PaddlePaddle/PaddleGAN/tree/develop/deploy/lite)

L
lzzyzlbb 已提交
193
- [FOM-Lite-Demo](https://paddlegan.bj.bcebos.com/applications/first_order_model/paddle_lite/apk/fom_demo.zip)。For more details, please refer to [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) .
194

Z
ZhengLeyizly 已提交
195
  Current problems: (a).Paddle Lite performs slightly worse than Paddle Inference,under optimization (b).Run Generator in a single thread, if the number of frames is too large, it will run at the small core rather than the large core.
196

Z
ZhengLeyizly 已提交
197
## References
198

Q
qingqing01 已提交
199
```
200 201 202 203 204 205 206
@InProceedings{Siarohin_2019_NeurIPS,
  author={Siarohin, Aliaksandr and Lathuilière, Stéphane and Tulyakov, Sergey and Ricci, Elisa and Sebe, Nicu},
  title={First Order Motion Model for Image Animation},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  month = {December},
  year = {2019}
}
Q
qingqing01 已提交
207
```