motion_driving.md 5.7 KB
Newer Older
1
# First Order Motion model
2

3
## First Order Motion model introduction
4 5 6 7 8 9

[First order motion model](https://arxiv.org/abs/2003.00196) is to complete the Image animation task, which consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. The first order motion framework addresses this problem without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), this method can be applied to any object of this class. To achieve this, the innovative method decouple appearance and motion information using a self-supervised formulation. In addition, to support complex motions, it use a representation consisting of a set of learned keypoints along with their local affine transformations. A generator network models occlusions arising during target motions and combines the appearance extracted from the source image and the motion derived from the driving video.

<div align="center">
  <img src="../../imgs/fom_demo.png" width="500"/>
</div>
10 11
## Multi-Faces swapping

F
FNRE 已提交
12
For photoes with multiple faces, we first detect all of the faces,  then do facial expression transfer for each face, and finally put those faces back to the original photo to generate a complete new video.
13 14 15 16 17 18 19

Specific technical steps are shown below:

1. Use the S3FD model to detect the faces of a photo
2. Use the First Order Motion model to do the facial expression transfer of each face
3. Put those "new" generated faces back to the original photo

F
FNRE 已提交
20
At the same time, specifically for face related work, PaddleGAN provides a ["faceutils" tool](https://github.com/PaddlePaddle/PaddleGAN/tree/develop/ppgan/faceutils), including face detection, face segmentation models and more.
21 22

## How to use
F
FNRE 已提交
23
### 1 Test for Face
24 25
Users can upload the prepared source image and driving video, then substitute the path of source image and driving video for the `source_image` and `driving_video` parameter in the following running command. It will geneate a video file named `result.mp4` in the `output` folder, which is the animated video file.

26 27
Note: for photoes with multiple faces, the longer the distances between faces, the better the result quality you can get.  

F
FNRE 已提交
28
- single face:
29
```
L
lijianshe02 已提交
30 31 32 33
cd applications/
python -u tools/first-order-demo.py  \
     --driving_video ../docs/imgs/fom_dv.mp4 \
     --source_image ../docs/imgs/fom_source_image.png \
L
lijianshe02 已提交
34
     --ratio 0.4 \
L
lzzyzlbb 已提交
35
     --relative --adapt_scale \
36 37
     --image_size 512 \
     --face_enhancement
38
```
F
FNRE 已提交
39 40 41

- multi face:
```
F
FNRE 已提交
42 43 44 45 46 47
cd applications/
python -u tools/first-order-demo.py  \
     --driving_video ../docs/imgs/fom_dv.mp4 \
     --source_image ../docs/imgs/fom_source_image_multi_person.png \
     --ratio 0.4 \
     --relative --adapt_scale \
L
lzzyzlbb 已提交
48
     --image_size 512 \
F
FNRE 已提交
49
     --multi_person
50

F
FNRE 已提交
51

52 53
**params:**
- driving_video: driving video, the motion of the driving video is to be migrated.
54
- source_image: source_image, support single people and multi-person in the image, the image will be animated according to the motion of the driving video.
55 56
- relative: indicate whether the relative or absolute coordinates of the key points in the video are used in the program. It is recommended to use relative coordinates. If absolute coordinates are used, the characters will be distorted after animation.
- adapt_scale: adapt movement scale based on convex hull of keypoints.
57
- ratio: The pasted face percentage of generated image, this parameter should be adjusted in the case of multi-person image in which the adjacent faces are close. The defualt value is 0.4 and the range is [0.4, 0.5].
L
lzzyzlbb 已提交
58
- image_size: The image size of the face. Default is 256
F
FNRE 已提交
59
- multi_person: There are multi faces in the images. Default means only one face in the image
60
- face_enhancement: enhance the face, default is False
L
lzzyzlbb 已提交
61
```
62 63 64 65 66 67 68 69
result of face_enhancement:
<div align='center'>
  <img src='https://user-images.githubusercontent.com/17897185/126444836-b68593e3-ae43-4450-b18f-1a549230bf07.gif' width='700'/>
</div>
<div align='center'>
  <img src='https://user-images.githubusercontent.com/17897185/126444194-436cc885-259d-4636-ad4c-c3dcc52fe175.gif' width='700'/>
</div>

F
FNRE 已提交
70 71 72 73

### 2 Training
**Datasets:**
- fashion See[here](https://vision.cs.ubc.ca/datasets/fashion/)
L
lzzyzlbb 已提交
74 75
- VoxCeleb See[here](https://github.com/AliaksandrSiarohin/video-preprocessing). Here you can process the data sizes according to your requirements. We deal with two sizes: 256 and 512, the results can be seen below
![](../../imgs/fom_512_vs_256.png)
F
FNRE 已提交
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107

**params:**
- dataset_name.yaml: Create a config of your own dataset

- For single GPU:
```
export CUDA_VISIBLE_DEVICES=0
python tools/main.py --config-file configs/dataset_name.yaml
```
- For multiple GPUs:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
    tools/main.py \
    --config-file configs/dataset_name.yaml

```

**Example:**
- For single GPU:
```
export CUDA_VISIBLE_DEVICES=0
python tools/main.py --config-file configs/firstorder_fashion.yaml \
```
- For multiple GPUs:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
    tools/main.py \
    --config-file configs/firstorder_fashion.yaml \
```

108 109 110 111 112

**Online Tutorial running in AI Studio:**

* **Multi-faces swapping: https://aistudio.baidu.com/aistudio/projectdetail/1603391**
* **Single face swapping: https://aistudio.baidu.com/aistudio/projectdetail/1586056**
113 114 115 116 117 118 119 120

## Animation results

![](../../imgs/first_order.gif)


## Reference

Q
qingqing01 已提交
121
```
122 123 124 125 126 127 128
@InProceedings{Siarohin_2019_NeurIPS,
  author={Siarohin, Aliaksandr and Lathuilière, Stéphane and Tulyakov, Sergey and Ricci, Elisa and Sebe, Nicu},
  title={First Order Motion Model for Image Animation},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  month = {December},
  year = {2019}
}
Q
qingqing01 已提交
129
```