styleganv2clip.md 5.2 KB
Newer Older
qq_19291021's avatar
qq_19291021 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
# StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

## Introduction

The task of StyleGAN V2 is image generation while the Clip guided Editing module uses the attribute manipulation vector obtained by CLIP (Contrastive Language-Image Pre-training) Model for mapping text prompts to input-agnostic directions in StyleGAN’s style space, enabling interactive text-driven image manipulation.


This model uses pretrained StyleGAN V2 generator and uses Pixel2Style2Pixel model for image encoding. At present, only the models of portrait editing (trained on FFHQ dataset) is available.

Paddle-CLIP and dlib package is needed for this module.

```
pip install -e .
pip install paddleclip
pip install dlib-bin
```

## How to use

### Editing

```
cd applications/
python -u tools/styleganv2clip.py \
       --latent <<PATH TO STYLE VECTOR> \
       --output_path <DIRECTORY TO STORE OUTPUT IMAGE> \
       --weight_path <YOUR PRETRAINED MODEL PATH> \
       --model_type ffhq-config-f \
       --size 1024 \
       --style_dim 512 \
       --n_mlp 8 \
       --channel_multiplier 2 \
       --direction_path <PATH TO STORE ATTRIBUTE DIRECTIONS> \
       --neutral <DESCRIPTION OF THE SOURCE IMAGE> \
       --target <DESCRIPTION OF THE TARGET IMAGE> \
       --beta_threshold 0.12 \
       --direction_offset 5
       --cpu
```

**params:**
- latent: The path of the style vector which represents an image. Come from `dst.npy` generated by Pixel2Style2Pixel or `dst.fitting.npy` generated by StyleGANv2 Fitting module
- output_path: the directory where the generated images are stored
- weight_path: pretrained StyleGANv2 model path
- model_type: inner model type, currently only `ffhq-config-f` is available.
- direction_path: The path of CLIP mapping vector
- stat_path: The path of latent statisitc file
- neutral: Description of the source image,for example: face
- target: Description of the target image,for example: young face
- beta_threshold:  editing threshold of the attribute channels
- direction_offset: Offset strength of the attribute
- cpu: whether to use cpu inference, if not, please remove it from the command

>inherited params for the pretrained StyleGAN model
- size: model parameters, output image resolution
- style_dim: model parameters, dimensions of style z
- n_mlp: model parameters, the number of multi-layer perception layers for style z
- channel_multiplier: model parameters, channel product, affect model size and the quality of generated pictures

### Results

Input portrait:
<div align="center">
    <img src="../../imgs/stylegan2fitting-sample.png" width="300"/>
</div>

with
> direction_offset = [ -1, 0, 1, 2, 3, 4, 5]
> beta_threshold = 0.1

edit from 'face' to 'boy face':

![stylegan2clip-sample-boy](https://user-images.githubusercontent.com/29187613/187344690-6709fba5-6e21-4bc0-83d1-5996947c99a4.png)


edit from 'face' to 'happy face':

![stylegan2clip-sample-happy](https://user-images.githubusercontent.com/29187613/187344681-6509f01b-0d9e-4dea-8a97-ee9ca75d152e.png)


edit from 'face' to 'angry face':

![stylegan2clip-sample-angry](https://user-images.githubusercontent.com/29187613/187344686-ff5047ab-5499-420d-ad02-e0908ac71bf7.png)

edit from 'face' to 'face with long hair':

![stylegan2clip-sample-long-hair](https://user-images.githubusercontent.com/29187613/187344684-4e452631-52b0-47cf-966e-3216c0392815.png)



edit from 'face' to 'face with curly hair':

![stylegan2clip-sample-curl-hair](https://user-images.githubusercontent.com/29187613/187344677-c9a3aa9f-1f3c-41b3-a1f0-fcd48a9c627b.png)


edit from 'head with black hair' to 'head with gold hair':

![stylegan2clip-sample-gold-hair](https://user-images.githubusercontent.com/29187613/187344678-5220e8b2-b1c9-4f2f-8655-621b6272c457.png)

## Make Attribute Direction Vector

For details, please refer to [Puzer/stylegan-encoder](https://github.com/Puzer/stylegan-encoder/blob/master/Learn_direction_in_latent_space.ipynb)

Currently pretrained weight for `stylegan2` & `ffhq-config-f` dataset is provided:

direction: https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f-styleclip-global-directions.pdparams

stats: https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f-styleclip-stats.pdparams

## Training

1. extract style latent vector stats
```
python styleclip_getf.py
```
2. calcuate mapping vector using CLIP model

```
python ppgan/apps/styleganv2clip_predictor.py extract
```

# Reference

- 1. [StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery](https://arxiv.org/abs/2103.17249)

  ```
  @article{Patashnik2021StyleCLIPTM,
    title={StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery},
    author={Or Patashnik and Zongze Wu and Eli Shechtman and Daniel Cohen-Or and D. Lischinski},
    journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
    year={2021},
    pages={2065-2074}
  }
  ```
- 2. [Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation](hhttps://arxiv.org/abs/2008.00951)

  ```
  @article{richardson2020encoding,
    title={Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation},
    author={Richardson, Elad and Alaluf, Yuval and Patashnik, Or and Nitzan, Yotam and Azar, Yaniv and Shapiro, Stav and Cohen-Or, Daniel},
    journal={arXiv preprint arXiv:2008.00951},
    year={2020}
  }
  ```