未验证 提交 ffdb45d4 编写于 作者: Z ZhengLeyizly 提交者: GitHub

Update apps.md (#439)

* Update data_prepare.md

* Update config_doc.md

* Update apps.md

* Update README.md
上级 6d626fa7
...@@ -58,7 +58,7 @@ GAN-Generative Adversarial Network, was praised by "the Father of Convolutional ...@@ -58,7 +58,7 @@ GAN-Generative Adversarial Network, was praised by "the Father of Convolutional
<div align='center'> <div align='center'>
<img src='https://ai-studio-static-online.cdn.bcebos.com/da1c51844ac048aa8d4fa3151be95215eee75d8bb488409d92ec17285b227c2c' width='250'/> <img src='https://ai-studio-static-online.cdn.bcebos.com/da1c51844ac048aa8d4fa3151be95215eee75d8bb488409d92ec17285b227c2c' width='250'/>
</div> </div>
- **💞Add Face Morphing function💞: you can perfectly merge any two faces and make the new face get any facial expressions!** - **💞Add Face Morphing function💞: you can perfectly merge any two faces and make the new face get any facial expressions!**
- Tutorials: https://aistudio.baidu.com/aistudio/projectdetail/2254031 - Tutorials: https://aistudio.baidu.com/aistudio/projectdetail/2254031
...@@ -80,21 +80,22 @@ GAN-Generative Adversarial Network, was praised by "the Father of Convolutional ...@@ -80,21 +80,22 @@ GAN-Generative Adversarial Network, was praised by "the Father of Convolutional
<img src='https://user-images.githubusercontent.com/48054808/129904830-8b87e310-ea51-4aff-b29b-88920ee82447.png' width='700'/> <img src='https://user-images.githubusercontent.com/48054808/129904830-8b87e310-ea51-4aff-b29b-88920ee82447.png' width='700'/>
</div> </div>
## Quick Start ## Document Tutorial
#### **Installation**
* Please refer to the [installation document](./docs/en_US/install.md) to make sure you have installed PaddlePaddle and PaddleGAN correctly. * Environment dependence:
- PaddlePaddle >= 2.1.0
- Python >= 3.6
- CUDA >= 10.1
* [Full installation tutorial](https://github.com/PaddlePaddle/PaddleGAN/blob/develop/docs/zh_CN/install.md)
* Get started through ppgan.app interface: #### **Starter Tutorial**
```python - [Quick start](./docs/en_US/get_started.md)
from ppgan.apps import RealSRPredictor - [Data Preparation](./docs/en_US/data_prepare.md)
sr = RealSRPredictor() - [Instruction of APIs](./docs/en_US/apis/apps.md)
sr.run("docs/imgs/monarch.png") - [Instruction of Config Files](./docs/en_US/config_doc.md)
```
* More applications, please refer to [ppgan.apps apis](./docs/en_US/apis/apps.md)
* More tutorials:
- [Data preparation](./docs/en_US/data_prepare.md)
- [Training/Evaluating/Testing basic usage](./docs/en_US/get_started.md)
## Model Tutorial ## Model Tutorial
......
../../zh_CN/apis/apps.md # Introduction of Prediction Interface
\ No newline at end of file
PaddleGAN(ppgan.apps)provides prediction APIs covering multiple applications, including super resolution, video frame interpolation, colorization, makeup shifter, image animation, face parsing, etc. The integral pre-trained high-performance models enable users' flexible and efficient usage and inference.
* Colorization:
* [DeOldify](#ppgan.apps.DeOldifyPredictor)
* [DeepRemaster](#ppgan.apps.DeepRemasterPredictor)
* Super Resolution:
* [RealSR](#ppgan.apps.RealSRPredictor)
* [EDVR](#ppgan.apps.EDVRPredictor)
* Video Frame Interpolation:
* [DAIN](#ppgan.apps.DAINPredictor)
* Motion Driving:
* [FirstOrder](#ppgan.apps.FirstOrderPredictor)
* Face:
* [FaceFaceParse](#ppgan.apps.FaceParsePredictor)
* Image Animation:
* [AnimeGAN](#ppgan.apps.AnimeGANPredictor)
* Lip-syncing:
* [Wav2Lip](#ppgan.apps.Wav2LipPredictor)
## Public Usage
### Switch of CPU and GPU
By default, GPU devices with the [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/windows-pip.html) GPU environment package installed conduct inference by using GPU. If the CPU environment package is installed, CPU is used for inference.
If manual switch of CPU and GPU is needed,you can do the following:
```
import paddle
paddle.set_device('cpu') #set as CPU
#paddle.set_device('gpu') #set as GPU
```
## ppgan.apps.DeOldifyPredictor
```python
ppgan.apps.DeOldifyPredictor(output='output', weight_path=None, render_factor=32)
```
> Build the instance of DeOldify. DeOldify is a coloring model based on GAN. The interface supports the colorization of images or videos. The recommended video format is mp4.
>
> **Example**
>
> ```python
> from ppgan.apps import DeOldifyPredictor
> deoldify = DeOldifyPredictor()
> deoldify.run("docs/imgs/test_old.jpeg")
> ```
> **Parameters**
>
> > - output (str): path of the output image, default: output. Note that the save path should be set as output/DeOldify.
> > - weight_path (str): path of the model, default: None,pre-trained integral model will then be automatically downloaded.
> > - artistic (bool): whether to use "artistic" model, which may produce interesting colors, but there are more glitches.
> > - render_factor (int): the zoom factor during image rendering and colorization. The image will be zoomed to a square with side length of 16xrender_factor before being colorized. For example, with a default value of 32,the entered image will be resized to (16x32=512) 512x512. Normally,the smaller the render_factor,the faster the computation and the more vivid the colors. Therefore, old images with low quality usually benefits from lowering the value of rendering factor. The higher the value, the better the image quality, but the color may fade slightly.
### run
```python
run(input)
```
> The execution interface after building the instance.
> **Parameters**
>
> > - input (str|np.ndarray|Image.Image): the input image or video files。For images, it could be its path, np.ndarray, or PIL.Image type. For videos, it could only be the file path.
>
>**Return Value**
>
>> - tuple(pred_img(np.array), out_paht(str)): for image input, return the predicted image, PIL.Image type and the path where the image is saved.
> > - tuple(frame_path(str), out_path(str)): for video input, frame_path is the save path of the images after colorizing each frame of the video, and out_path is the save path of the colorized video.
### run_image
```python
run_image(img)
```
> The interface of image colorization.
> **Parameters**
>
> > - img (str|np.ndarray|Image.Image): input image,it could be the path of the image, np.ndarray, or PIL.Image type.
>
>**Return Value**
>
>> - pred_img(PIL.Image): return the predicted image, PIL.Image type.
### run_video
```python
run_video(video)
```
> The interface of video colorization.
> **Parameters**
>
> > - Video (str): path of the input video files.
>
> **Return Value**
>
> > - tuple(frame_path(str), out_path(str)): frame_path is the save path of the images after colorizing each frame of the video, and out_path is the save path of the colorized video.
## ppgan.apps.DeepRemasterPredictor
```python
ppgan.apps.DeepRemasterPredictor(output='output', weight_path=None, colorization=False, reference_dir=None, mindim=360)
```
> Build the instance of DeepRemasterPredictor. DeepRemaster is a GAN-based coloring and restoring model, which can provide input reference frames. Only video input is available now, and the recommended format is mp4.
>
> **Example**
>
> ```
> from ppgan.apps import DeepRemasterPredictor
> deep_remaster = DeepRemasterPredictor()
> deep_remaster.run("docs/imgs/test_old.jpeg")
> ```
>
>
> **Parameters**
>
> > - output (str): path of the output image, default: output. Note that the path should be set as output/DeepRemaster.
> > - weight_path (str): path of the model, default: None,pre-trained integral model will then be automatically downloaded.
> > - colorization (bool): whether to enable the coloring function, default: False, only the restoring function will be executed.
> > - reference_dir(str|None): path of the reference frame when the coloring function is on, no reference frame is also allowed.
> > - mindim(int): minimum side length of the resized image before prediction.
### run
```python
run(video_path)
```
> The execution interface after building the instance.
> **Parameters**
>
> > - video_path (str): path of the video file.
> >
> > **Return Value**
> >
> > - tuple(str, str)): return two types of str, the former is the save path of each frame of the colorized video, the latter is the save path of the colorized video.
## ppgan.apps.RealSRPredictor
```python
ppgan.apps.RealSRPredictor(output='output', weight_path=None)
```
> Build the instance of RealSR。RealSR, Real-World Super-Resolution via Kernel Estimation and Noise Injection, is launched by CVPR 2020 Workshops in its super resolution model based on real-world images training. The interface imposes 4x super resolution on the input image or video. The recommended video format is mp4.
>
> *Note: the size of the input image should be less than 1000x1000pix。
>
> **Example**
>
> ```
> from ppgan.apps import RealSRPredictor
> sr = RealSRPredictor()
> sr.run("docs/imgs/test_sr.jpeg")
> ```
> **Parameters**
>
> > - output (str): path of the output image, default: output. Note that the path should be set as output/RealSR.
> > - weight_path (str): path of the model, default: None,pre-trained integral model will then be automatically downloaded.
```python
run(video_path)
```
> The execution interface after building the instance.
> **Parameters**
>
> > - video_path (str): path of the video file.
>
>**Return Value**
>
>> - tuple(pred_img(np.array), out_paht(str)): for image input, return the predicted image, PIL.Image type and the path where the image is saved.
> > - tuple(frame_path(str), out_path(str)): for video input, frame_path is the save path of each frame of the video after super resolution, and out_path is the save path of the video after super resolution.
### run_image
```python
run_image(img)
```
> The interface of image super resolution.
> **Parameter**
>
> > - img (str|np.ndarray|Image.Image): input image, it could be the path of the image, np.ndarray, or PIL.Image type.
>
> **Return Value**
>
> > - pred_img(PIL.Image): return the predicted image, PIL.Image type.
### run_video
```python
run_video(video)
```
> The interface of video super resolution.
> **Parameter**
>
> > - Video (str): path of the video file.
>
> **Return Value**
>
> > - tuple(frame_path(str), out_path(str)): frame_path is the save path of each frame of the video after super resolution, and out_path is the save path of the video after super resolution.
## ppgan.apps.EDVRPredictor
```python
ppgan.apps.EDVRPredictor(output='output', weight_path=None)
```
> Build the instance of RealSR. EDVR is a model designed for video super resolution. For more details, see the paper, EDVR: Video Restoration with Enhanced Deformable Convolutional Networks (https://arxiv.org/abs/1905.02716). The interface imposes 2x super resolution on the input video. The recommended video format is mp4.
>
> *Note: The interface is only available in static graph, add the following codes to enable static graph before using it:
>
> ```
> import paddle
> paddle.enable_static() #enable static graph
> paddle.disable_static() #disable static graph
> ```
>
> **Parameter**
>
> ```
> from ppgan.apps import EDVRPredictor
> sr = EDVRPredictor()
> # test a video file
> sr.run("docs/imgs/test.mp4")
> ```
> **参数**
>
> > - output (str): path of the output image, default: output. Note that the path should be set as output/EDVR.
> > - weight_path (str): path of the model, default: None,pre-trained integral model will then be automatically downloaded.
```python
run(video_path)
```
> The execution interface after building the instance.
> **Parameter**
>
> > - video_path (str): path of the video files.
>
> **Return Value**
>
> > - tuple(str, str): the former is the save path of each frame of the video after super resolution, the latter is the save path of the video after super resolution.
## ppgan.apps.DAINPredictor
```python
ppgan.apps.DAINPredictor(output='output', weight_path=Nonetime_step=None, use_gpu=True, key_frame_thread=0remove_duplicates=False)
```
> Build the instance of DAIN model. DAIN supports video frame interpolation, producing videos with higher frame rate. For more details, see the paper, DAIN: Depth-Aware Video Frame interpolation (https://arxiv.org/abs/1904.00830).
>
> *Note: The interface is only available in static graph, add the following codes to enable static graph before using it:
>
> ```
> import paddle
> paddle.enable_static() #enable static graph
> paddle.disable_static() #disable static graph
> ```
>
> **Example**
>
> ```
> from ppgan.apps import DAINPredictor
> dain = DAINPredictor(time_step=0.5) # With no defualt value, time_step need to be manually specified
> # test a video file
> dain.run("docs/imgs/test.mp4")
> ```
> **Parameters**
>
> > - output_path (str): path of the predicted output, default: output. Note that the path should be set as output/DAIN.
> > - weight_path (str): path of the model, default: None, pre-trained integral model will then be automatically downloaded.
> > - time_step (float): the frame rate changes by a factor of 1./time_step, e.g. 2x frames if time_step is 0.5 and 4x frames if it is 0.25.
> > - use_gpu (bool): whether to make predictions by using GPU, default: True.
> > - remove_duplicates (bool): whether to remove duplicates, default: False.
```python
run(video_path)
```
> The execution interface after building the instance.
> **Parameters**
>
> > - video_path (str): path of the video file.
>
> **Return Value**
>
> > - tuple(str, str): for video input, frame_path is the save path of the image after colorizing each frame of the video, and out_path is the save path of the colorized video.
## ppgan.apps.FirstOrderPredictor
```python
ppgan.apps.FirstOrderPredictor(output='output', weight_path=Noneconfig=None, relative=False, adapt_scale=Falsefind_best_frame=False, best_frame=None)
```
> Build the instance of FirstOrder model. The model is dedicated to Image Animation, i.e., generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
>
> For more details, see paper, First Order Motion Model for Image Animation (https://arxiv.org/abs/2003.00196) .
>
> **Example**
>
> ```
> from ppgan.apps import FirstOrderPredictor
> animate = FirstOrderPredictor()
> # test a video file
> animate.run("source.png","driving.mp4")
> ```
> **Parameters**
>
> > - output_path (str): path of the predicted output, default: output. Note that the path should be set as output/result.mp4.
> > - weight_path (str): path of the model, default: None, pre-trained integral model will then be automatically downloaded.
> > - config (dict|str|None): model configuration, it can be a dictionary type or a YML file, and the default value None is adopted. When the weight is None by default, the config also needs to adopt the default value None. otherwise, the configuration here should be consistent with the corresponding weight.
> > - relative (bool): indicate whether the relative or absolute coordinates of key points in the video are used in the program, default: False.
> > - adapt_scale (bool): adapt movement scale based on convex hull of key points, default: False.
> > - find_best_frame (bool): whether to start generating from the frame that best matches the source image, which exclusively applies to face applications and requires libraries with face alignment.
> > - best_frame (int): set the number of the starting frame, default: None, that is, starting from the first frame(counting from 1).
```python
run(source_imagedriving_video)
```
> The execution interface after building the instance, the predicted video is save in output/result.mp4.
> **Parameters**
>
> > - source_image (str): input the source image。
> > - driving_video (str): input the driving video, mp4 format recommended.
>
> **Return Value**
>
> > None.
## ppgan.apps.FaceParsePredictor
```pyhton
ppgan.apps.FaceParsePredictor(output_path='output')
```
> Build the instance of the face parsing model. The model is devoted to address the task of distributing a pixel-wise label to each semantic components (e.g. hair, lips, nose, ears, etc.) in accordance with the input facial image. The task proceeds with the help of BiseNet.
>
> For more details, see the paper, BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation (https://arxiv.org/abs/1808.00897v1).
>
> *Note: dlib package is needed for this interface, use the following codes to install it:
>
> ```
> pip install dlib
> ```
> It may take long to install this package under Windows, please be patient.
>
> **Parameters:**
>
> > - input_image: path of the input image to be parsed
> > - output_path: path of the output to be saved
> **Example:**
>
> ```
> from ppgan.apps import FaceParsePredictor
> parser = FaceParsePredictor()
> parser.run('docs/imgs/face.png')
> ```
> **Return Value:**
>
> > - mask(numpy.ndarray): return the mask matrix of the parsed facial components, data type: numpy.ndarray.
## ppgan.apps.AnimeGANPredictor
```pyhton
ppgan.apps.AnimeGANPredictor(output_path='output_dir',weight_path=None,use_adjust_brightness=True)
```
> Adopt the AnimeGAN v2 to realize the animation of scenery images.
>
> For more details, see the paper, AnimeGAN: A Novel Lightweight GAN for Photo Animation (https://link.springer.com/chapter/10.1007/978-981-15-5577-0_18).
> **Parameters:**
>
> > - input_image: path of the input image to be parsed.
> **Example:**
>
> ```
> from ppgan.apps import AnimeGANPredictor
> predictor = AnimeGANPredictor()
> predictor.run('docs/imgs/animeganv2_test.jpg')
> ```
> **Return Value:**
>
> > - anime_image(numpy.ndarray): return the stylized scenery image.
## ppgan.apps.MiDaSPredictor
```pyhton
ppgan.apps.MiDaSPredictor(output=None, weight_path=None)
```
> MiDaSv2 is a monocular depth estimation model (see https://github.com/intel-isl/MiDaS). Monocular depth estimation is a method used to compute depth from a singe RGB image.
>
> For more details, see the paper Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer (https://arxiv.org/abs/1907.01341v3).
> **Example**
>
> ```python
> from ppgan.apps import MiDaSPredictor
> # if set output, will write depth pfm and png file in output/MiDaS
> model = MiDaSPredictor()
> prediction = model.run()
> ```
>
> Color display of the depth image:
>
> ```python
> import numpy as np
> import PIL.Image as Image
> import matplotlib as mpl
> import matplotlib.cm as cm
>
> vmax = np.percentile(prediction, 95)
> normalizer = mpl.colors.Normalize(vmin=prediction.min(), vmax=vmax)
> mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
> colormapped_im = (mapper.to_rgba(prediction)[:, :, :3] * 255).astype(np.uint8)
> im = Image.fromarray(colormapped_im)
> im.save('test_disp.jpeg')
> ```
>
> **Parameters:**
>
> > - output (str): path of the output, if it is None, no pfm and png depth image will be saved.
> > - weight_path (str): path of the model, default: None, pre-trained integral model will then be automatically downloaded.
> **Return Value:**
>
> > - prediction (numpy.ndarray): return the prediction.
> > - pfm_f (str): return the save path of pfm files if the output path is set.
> > - png_f (str): return the save path of png files if the output path is set.
## ppgan.apps.Wav2LipPredictor
```python
ppgan.apps.Wav2LipPredictor(face=None, ausio_seq=None, outfile=None)
```
> Build the instance for the Wav2Lip model, which is used for lip generation, i.e., achieving the synchronization of lip movements on a talking face video and the voice from an input audio.
>
> For more details, see the paper, A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild (http://arxiv.org/abs/2008.10010).
>
> **Example**
>
> ```
> from ppgan.apps import Wav2LipPredictor
> import ppgan
> predictor = Wav2LipPredictor()
> predictor.run('/home/aistudio/先烈.jpeg', '/home/aistudio/pp_guangquan_zhenzhu46s.mp4','wav2lip')
> ```
> **Parameters:**
> - face (str): path of images or videos containing human face.
> - audio_seq (str): path of the input audio, any processable format in ffmpeg is supported, including `.wav`, `.mp3`, `.m4a` etc.
> - outfile (str): path of the output video file.
>**Return Value**
>
>> None
# Instruction of Config Files
## Introduction of Parameters
Take`lapstyle_rev_first.yaml` as an example.
### Global
| Field | Usage | Default |
| ------------------------- | :------------------------- | --------------- |
| total_iters | total training steps | 30000 |
| min_max | numeric range of tensor(for image storage) | (0., 1.) |
| output_dir | path of the output | ./output_dir |
| snapshot_config: interval | interval for saving model parameters | 5000 |
### Model
| Field | Usage | Default |
| :---------------------- | -------- | ------ |
| name | name of the model | LapStyleRevFirstModel |
| revnet_generator | set the revnet generator | RevisionNet |
| revnet_discriminator | set the revnet discriminator | LapStyleDiscriminator |
| draftnet_encode | set the draftnet encoder | Encoder |
| draftnet_decode | set the draftnet decoder | DecoderNet |
| calc_style_emd_loss | set the style loss 1 | CalcStyleEmdLoss |
| calc_content_relt_loss | set the content loss 1 | CalcContentReltLoss |
| calc_content_loss | set the content loss 2 | CalcContentLoss |
| calc_style_loss | set the style loss 2 | CalcStyleLoss |
| gan_criterion: name | set the GAN loss | GANLoss |
| gan_criterion: gan_mode | set the modal parameter of GAN loss | vanilla |
| content_layers | set the network layer that calculates content loss 2 |['r11', 'r21', 'r31', 'r41', 'r51']|
| style_layers | set the network layer that calculates style loss 2 | ['r11', 'r21', 'r31', 'r41', 'r51'] |
| content_weight | set the weight of total content loss | 1.0 |
| style_weigh | set the weight of total style loss | 3.0 |
### Dataset (train & test)
| Field | Usage | Default |
| :----------- | -------------------- | -------------------- |
| name | name of the dataset | LapStyleDataset |
| content_root | path of the dataset | data/coco/train2017/ |
| style_root | path of the target style image | data/starrynew.png |
| load_size | image size after resizing the input image | 280 |
| crop_size | image size after random cropping | 256 |
| num_workers | number of worker process | 16 |
| batch_size | size of the data sample for one training session | 5 |
### Lr_scheduler
| Field | Usage | Default |
| :------------ | ---------------- | -------------- |
| name | name of the learning strategy | NonLinearDecay |
| learning_rate | initial learning rate | 1e-4 |
| lr_decay | decay rate of the learning rate | 5e-5 |
### Optimizer
| Field | Usage | Default |
| :-------- | ---------- | ------- |
| name | class name of the optimizer | Adam |
| net_names | the network under the optimizer | net_rev |
| beta1 | set beta1, parameter of the optimizer | 0.9 |
| beta2 | set beta2, parameter of the optimizer | 0.999 |
### Validate
| Field | Usage | Default |
| :------- | ---- | ------ |
| interval | validation interval | 500 |
| save_img | whether to save image while validating | false |
### Log_config
| Field | Usage | Default |
| :--------------- | ---- | ------ |
| interval | log printing interval | 10 |
| visiual_interval | interval for saving the generated images during training | 500 |
# 预测接口说明
PaddleGAN(ppgan.apps)提供超分、插帧、上色、换妆、图像动画生成、人脸解析等多种应用的预测API接口。接口内置训练好的高性能模型,支持用户进行灵活高效的应用推理。
* 上色:
* [DeOldify](#ppgan.apps.DeOldifyPredictor)
* [DeepRemaster](#ppgan.apps.DeepRemasterPredictor)
* 超分:
* [RealSR](#ppgan.apps.RealSRPredictor)
* [EDVR](#ppgan.apps.EDVRPredictor)
* 插帧:
* [DAIN](#ppgan.apps.DAINPredictor)
* 图像动作驱动:
* [FirstOrder](#ppgan.apps.FirstOrderPredictor)
* 人脸:
* [FaceFaceParse](#ppgan.apps.FaceParsePredictor)
* 动漫画:
* [AnimeGAN](#ppgan.apps.AnimeGANPredictor)
* 唇形合成:
* [Wav2Lip](#ppgan.apps.Wav2LipPredictor)
## 公共用法
### CPU和GPU的切换
默认情况下,如果是GPU设备、并且安装了[PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/windows-pip.html)的GPU环境包,则默认使用GPU进行推理。否则,如果安装的是CPU环境包,则使用CPU进行推理。
如果需要手动切换CPU、GPU,可以通过以下方式:
```
import paddle
paddle.set_device('cpu') #设置为CPU
#paddle.set_device('gpu') #设置为GPU
```
## ppgan.apps.DeOldifyPredictor
```python
ppgan.apps.DeOldifyPredictor(output='output', weight_path=None, render_factor=32)
```
> 构建DeOldify实例。DeOldify是一个基于GAN的影像上色模型。该接口支持对图片或视频上色。视频建议使用mp4格式。
>
> **示例**
>
> ```python
> from ppgan.apps import DeOldifyPredictor
> deoldify = DeOldifyPredictor()
> deoldify.run("docs/imgs/test_old.jpeg")
> ```
> **参数**
>
> > - output (str): 设置输出图片的保存路径,默认是output。注意,保存路径为设置output/DeOldify。
> > - weight_path (str): 指定模型路径,默认是None,则会自动下载内置的已经训练好的模型。
> > - artistic (bool): 是否使用偏"艺术性"的模型。"艺术性"的模型有可能产生一些有趣的颜色,但是毛刺比较多。
> > - render_factor (int): 图片渲染上色时的缩放因子,图片会缩放到边长为16xrender_factor的正方形, 再上色,例如render_factor默认值为32,输入图片先缩放到(16x32=512) 512x512大小的图片。通常来说,render_factor越小,计算速度越快,颜色看起来也更鲜活。较旧和较低质量的图像通常会因降低渲染因子而受益。渲染因子越高,图像质量越好,但颜色可能会稍微褪色。
### run
```python
run(input)
```
> 构建实例后的执行接口。
> **参数**
>
> > - input (str|np.ndarray|Image.Image): 输入的图片或视频文件。如果是图片,可以是图片的路径、np.ndarray、或PIL.Image类型。如果是视频,只能是视频文件路径。
>
>**返回值**
>
>> - tuple(pred_img(np.array), out_paht(str)): 当属输入时图片时,返回预测后的图片,类型PIL.Image,以及图片的保存的路径。
> > - tuple(frame_path(str), out_path(str)): 当输入为视频时,frame_path为视频每帧上色后保存的图片路径,out_path为上色后视频的保存路径。
### run_image
```python
run_image(img)
```
> 图片上色的接口。
> **参数**
>
> > - img (str|np.ndarray|Image.Image): 输入图片,可以是图片的路径、np.ndarray、或PIL.Image类型。
>
>**返回值**
>
>> - pred_img(PIL.Image): 返回预测后的图片,为PIL.Image类型。
### run_video
```python
run_video(video)
```
> 视频上色的接口。
> **参数**
>
> > - Video (str): 输入视频文件的路径。
>
> **返回值**
>
> > - tuple(frame_path(str), out_path(str)): frame_path为视频每帧上色后保存的图片路径,out_path为上色后视频的保存路径。
## ppgan.apps.DeepRemasterPredictor
```python
ppgan.apps.DeepRemasterPredictor(output='output', weight_path=None, colorization=False, reference_dir=None, mindim=360)
```
> 构建DeepRemasterPredictor实例。DeepRemaster是一个基于GAN的视频上色、修复模型,该模型可以提供一个参考色的图片作为输入。该接口目前只支持视频输入,建议使用mp4格式。
>
> **示例**
>
> ```
> from ppgan.apps import DeepRemasterPredictor
> deep_remaster = DeepRemasterPredictor()
> deep_remaster.run("docs/imgs/test_old.jpeg")
> ```
>
>
> **参数**
>
> > - output (str): 设置输出图片的保存路径,默认是output。注意,保存路径为设置output/DeepRemaster。
> > - weight_path (str): 指定模型路径,默认是None,则会自动下载内置的已经训练好的模型。
> > - colorization (bool): 是否打开上色功能,默认是False,既不打开,只执行修复功能。
> > - reference_dir(str|None): 打开上色功能时,输入参考色图片路径,也可以不设置参考色图片。
> > - mindim(int): 预测前图片会进行缩放,最小边长度。
### run
```python
run(video_path)
```
> 构建实例后的执行接口。
> **参数**
>
> > - video_path (str): 输入视频文件路径。
> >
> > 返回值
> >
> > - tuple(str, str)): 返回两个str类型,前者是视频上色后每帧图片的保存路径,后者是上色之后的视频保存路径。
## ppgan.apps.RealSRPredictor
```python
ppgan.apps.RealSRPredictor(output='output', weight_path=None)
```
> 构建RealSR实例。RealSR: Real-World Super-Resolution via Kernel Estimation and Noise Injection发表于CVPR 2020 Workshops的基于真实世界图像训练的超分辨率模型。此接口对输入图片或视频做4倍的超分辨率。建议视频使用mp4格式。
>
> *注意:RealSR的输入图片尺寸需小于1000x1000pix。
>
> **用例**
>
> ```
> from ppgan.apps import RealSRPredictor
> sr = RealSRPredictor()
> sr.run("docs/imgs/test_sr.jpeg")
> ```
> **参数**
>
> > - output (str): 设置输出图片的保存路径,默认是output。注意,保存路径为设置output/RealSR。
> > - weight_path (str): 指定模型路径,默认是None,则会自动下载内置的已经训练好的模型。
```python
run(video_path)
```
> 构建实例后的执行接口。
> **参数**
>
> > - video_path (str): 输入视频文件路径。
>
>**返回值**
>
>> - tuple(pred_img(np.array), out_paht(str)): 当属输入时图片时,返回预测后的图片,类型PIL.Image,以及图片的保存的路径。
> > - tuple(frame_path(str), out_path(str)): 当输入为视频时,frame_path为超分后视频每帧图片的保存路径,out_path为超分后的视频保存路径。
### run_image
```python
run_image(img)
```
> 图片超分的接口。
> **参数**
>
> > - img (str|np.ndarray|Image.Image): 输入图片,可以是图片的路径、np.ndarray、或PIL.Image类型。
>
> **返回值**
>
> > - pred_img(PIL.Image): 返回预测后的图片,为PIL.Image类型。
### run_video
```python
run_video(video)
```
> 视频超分的接口。
> **参数**
>
> > - Video (str): 输入视频文件的路径。
>
> **返回值**
>
> > - tuple(frame_path(str), out_path(str)): frame_path为超分后视频每帧图片的保存路径,out_path为超分后的视频保存路径。
## ppgan.apps.EDVRPredictor
```python
ppgan.apps.EDVRPredictor(output='output', weight_path=None)
```
> 构建RealSR实例。EDVR: Video Restoration with Enhanced Deformable Convolutional Networks,论文链接: https://arxiv.org/abs/1905.02716 ,是一个针对视频超分的模型。该接口,对视频做2倍的超分。建议视频使用mp4格式。
>
> *注意:目前该接口仅支持在静态图下使用,需在使用前添加如下代码开启静态图:
>
> ```
> import paddle
> paddle.enable_static() #开启静态图
> paddle.disable_static() #关闭静态图
> ```
>
> **示例**
>
> ```
> from ppgan.apps import EDVRPredictor
> sr = EDVRPredictor()
> # 测试一个视频文件
> sr.run("docs/imgs/test.mp4")
> ```
> **参数**
>
> > - output (str): 设置输出图片的保存路径,默认是output。注意,保存路径为设置output/EDVR。
> > - weight_path (str): 指定模型路径,默认是None,则会自动下载内置的已经训练好的模型。
```python
run(video_path)
```
> 构建实例后的执行接口。
> **参数**
>
> > - video_path (str): 输入视频文件路径。
>
> **返回值**
>
> > - tuple(str, str): 前者超分后的视频每帧图片的保存路径,后者为做完超分的视频路径。
## ppgan.apps.DAINPredictor
```python
ppgan.apps.DAINPredictor(output='output', weight_path=Nonetime_step=None, use_gpu=True, key_frame_thread=0remove_duplicates=False)
```
> 构建插帧DAIN模型的实例。DAIN: Depth-Aware Video Frame Interpolation,论文链接: https://arxiv.org/abs/1904.00830 ,对视频做插帧,获得帧率更高的视频。
>
> *注意:目前该接口仅支持在静态图下使用,需在使用前添加如下代码开启静态图:
>
> ```
> import paddle
> paddle.enable_static() #开启静态图
> paddle.disable_static() #关闭静态图
> ```
>
> **示例**
>
> ```
> from ppgan.apps import DAINPredictor
> dain = DAINPredictor(time_step=0.5) #目前 time_step 无默认值,需手动指定
> # 测试一个视频文件
> dain.run("docs/imgs/test.mp4")
> ```
> **参数**
>
> > - output_path (str): 设置预测输出的保存路径,默认是output。注意,保存路径为设置output/DAIN。
> > - weight_path (str): 指定模型路径,默认是None,则会自动下载内置的已经训练好的模型。
> > - time_step (float): 帧率变化的倍数为 1./time_step,例如,如果time_step为0.5,则2倍插针,为0.25,则为4倍插帧。
> > - use_gpu (bool): 是否使用GPU做预测,默认是True。
> > - remove_duplicates (bool): 是否去除重复帧,默认是False。
```python
run(video_path)
```
> 构建实例后的执行接口。
> **参数**
>
> > - video_path (str): 输入视频文件路径。
>
> **返回值**
>
> > - tuple(str, str): 当输入为视频时,frame_path为视频每帧上色后保存的图片路径,out_path为上色后视频的保存路径。
## ppgan.apps.FirstOrderPredictor
```python
ppgan.apps.FirstOrderPredictor(output='output', weight_path=Noneconfig=None, relative=False, adapt_scale=Falsefind_best_frame=False, best_frame=None)
```
> 构建FirsrOrder模型的实例,此模型用来做Image Animation,即给定一张源图片和一个驱动视频,生成一段视频,其中主体是源图片,动作是驱动视频中的动作。
>
> 论文是First Order Motion Model for Image Animation,论文链接: https://arxiv.org/abs/2003.00196 。
>
> **示例**
>
> ```
> from ppgan.apps import FirstOrderPredictor
> animate = FirstOrderPredictor()
> # 测试一个视频文件
> animate.run("source.png","driving.mp4")
> ```
> **参数**
>
> > - output_path (str): 设置预测输出的保存路径,默认是output。注意,保存路径为设置output/result.mp4。
> > - weight_path (str): 指定模型路径,默认是None,则会自动下载内置的已经训练好的模型。
> > - config (dict|str|None): 设置模型的参数,可以是字典类型或YML文件,默认值是None,采用的默认的参数。当权重默认是None时,config也需采用默认值None。否则,这里的配置和对应权重保持一致
> > - relative (bool): 使用相对还是绝对关键点坐标,默认是False。
> > - adapt_scale (bool): 是否基于关键点凸包的自适应运动,默认是False。
> > - find_best_frame (bool): 是否从与源图片最匹配的帧开始生成,仅仅适用于人脸应用,需要人脸对齐的库。
> > - best_frame (int): 设置起始帧数,默认是None,从第1帧开始(从1开始计数)。
```python
run(source_imagedriving_video)
```
> 构建实例后的执行接口,预测视频保存位置为output/result.mp4。
> **参数**
>
> > - source_image (str): 输入源图片。
> > - driving_video (str): 输入驱动视频,支持mp4格式。
>
> **返回值**
>
> > 无。
## ppgan.apps.FaceParsePredictor
```pyhton
ppgan.apps.FaceParsePredictor(output_path='output')
```
> 构建人脸解析模型实例,此模型用来做人脸解析, 即给定一个输入的人脸图像,人脸解析将为每个语义成分(如头发、嘴唇、鼻子、耳朵等)分配一个像素级标签。我们用BiseNet来完成这项任务。
>
> 论文是 BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, 论文链接: https://arxiv.org/abs/1808.00897v1.
>
> *注意:此接口需要dlib包,使用前需用以下代码安装:
>
> ```
> pip install dlib
> ```
> Windows下安装此包时间可能过长,请耐心等待。
>
> **参数:**
>
> > - input_image: 输入待解析的图片文件路径
> > - output_path:输出保存的路径
> **示例:**
>
> ```
> from ppgan.apps import FaceParsePredictor
> parser = FaceParsePredictor()
> parser.run('docs/imgs/face.png')
> ```
> **返回值:**
>
> > - mask(numpy.ndarray): 返回解析完成的人脸成分mask矩阵, 数据类型为numpy.ndarray
## ppgan.apps.AnimeGANPredictor
```pyhton
ppgan.apps.AnimeGANPredictor(output_path='output_dir',weight_path=None,use_adjust_brightness=True)
```
> 利用AnimeGAN v2来对景物图像进行动漫风格化。
>
> 论文是 AnimeGAN: A Novel Lightweight GAN for Photo Animation, 论文链接: https://link.springer.com/chapter/10.1007/978-981-15-5577-0_18.
> **参数:**
>
> > - input_image: 输入待解析的图片文件路径
> **示例:**
>
> ```
> from ppgan.apps import AnimeGANPredictor
> predictor = AnimeGANPredictor()
> predictor.run('docs/imgs/animeganv2_test.jpg')
> ```
> **返回值:**
>
> > - anime_image(numpy.ndarray): 返回风格化后的景色图像
## ppgan.apps.MiDaSPredictor
```pyhton
ppgan.apps.MiDaSPredictor(output=None, weight_path=None)
```
> 单目深度估计模型MiDaSv2, 参考 https://github.com/intel-isl/MiDaS 单目深度估计是从单幅RGB图像中估计深度的方法
>
> 论文是 Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer , 论文链接: https://arxiv.org/abs/1907.01341v3
> **示例**
>
> ```python
> from ppgan.apps import MiDaSPredictor
> # if set output, will write depth pfm and png file in output/MiDaS
> model = MiDaSPredictor()
> prediction = model.run()
> ```
>
> 深度图彩色显示:
>
> ```python
> import numpy as np
> import PIL.Image as Image
> import matplotlib as mpl
> import matplotlib.cm as cm
>
> vmax = np.percentile(prediction, 95)
> normalizer = mpl.colors.Normalize(vmin=prediction.min(), vmax=vmax)
> mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
> colormapped_im = (mapper.to_rgba(prediction)[:, :, :3] * 255).astype(np.uint8)
> im = Image.fromarray(colormapped_im)
> im.save('test_disp.jpeg')
> ```
>
> **参数:**
>
> > - output (str): 输出路径,如果是None,则不保存pfm和png的深度图文件。
> > - weight_path (str): 指定模型路径,默认是None,则会自动下载内置的已经训练好的模型。
> **返回值:**
>
> > - prediction (numpy.ndarray): 返回预测结果。
> > - pfm_f (str): 如果设置output路径,返回pfm文件保存路径。
> > - png_f (str): 如果设置output路径,返回png文件保存路径。
## ppgan.apps.Wav2LipPredictor
```python
ppgan.apps.Wav2LipPredictor(face=None, ausio_seq=None, outfile=None)
```
> 构建Wav2Lip模型的实例,此模型用来做唇形合成,即给定一个人物视频和一个音频,实现人物口型与输入语音同步。
>
> 论文是A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild,论文链接: http://arxiv.org/abs/2008.10010.
>
> **示例**
>
> ```
> from ppgan.apps import Wav2LipPredictor
> import ppgan
> predictor = Wav2LipPredictor()
> predictor.run('/home/aistudio/先烈.jpeg', '/home/aistudio/pp_guangquan_zhenzhu46s.mp4','wav2lip')
> ```
> **参数:**
> - face (str): 指定的包含人物的图片或者视频的文件路径。
> - audio_seq (str): 指定的输入音频的文件路径,它的格式可以是 `.wav`, `.mp3`, `.m4a`等,任何ffmpeg可以处理的文件格式都可以。
> - outfile (str): 指定的输出视频文件路径。
>**返回值**
>
>> 无。
# 配置文件说明文档
## Config文件参数介绍
`lapstyle_rev_first.yaml`为例。
### Global
| 字段 | 用途 | 默认值 |
| ------------------------- | -------------------------- | --------------- |
| total_iters | 设置总训练步数 | 30000 |
| min_max | tensor数值范围(存图像时使用) | (0., 1.) |
| output_dir | 设置输出结果所在的文件路径 | ./output_dir |
| snapshot_config: interval | 设置保存模型参数的间隔 | 5000 |
### Model
| 字段 | 用途 | 默认值 |
| :---------------------- | -------- | ------ |
| name | 模型名称 | LapStyleRevFirstModel |
| revnet_generator | 设置revnet生成器 | RevisionNet |
| revnet_discriminator | 设置revnet判别器 | LapStyleDiscriminator |
| draftnet_encode | 设置draftnet编码器 | Encoder |
| draftnet_decode | 设置draftnet解码器 | DecoderNet |
| calc_style_emd_loss | 设置style损失1 | CalcStyleEmdLoss |
| calc_content_relt_loss | 设置content损失1 | CalcContentReltLoss |
| calc_content_loss | 设置content损失2 | CalcContentLoss |
| calc_style_loss | 设置style损失2 | CalcStyleLoss |
| gan_criterion: name | 设置GAN损失 | GANLoss |
| gan_criterion: gan_mode | 设置GAN损失模态参数 | vanilla |
| content_layers | 设置计算content损失2的网络层 |['r11', 'r21', 'r31', 'r41', 'r51']|
| style_layers | 设置计算style损失2的网络层 | ['r11', 'r21', 'r31', 'r41', 'r51'] |
| content_weight | 设置content总损失权重 | 1.0 |
| style_weigh | 设置style总损失权重 | 3.0 |
### Dataset (train & test)
| 字段 | 用途 | 默认值 |
| :----------- | -------------------- | -------------------- |
| name | 数据集名称 | LapStyleDataset |
| content_root | 数据集所在路径 | data/coco/train2017/ |
| style_root | 目标风格图片所在路径 | data/starrynew.png |
| load_size | 输入图像resize后图像大小 | 280 |
| crop_size | 随机剪裁图像后图像大小 | 256 |
| num_workers | 设置工作进程个数 | 16 |
| batch_size | 设置一次训练所抓取的数据样本数量 | 5 |
### Lr_scheduler
| 字段 | 用途 | 默认值 |
| :------------ | ---------------- | -------------- |
| name | 学习策略名称 | NonLinearDecay |
| learning_rate | 设置初始学习率 | 1e-4 |
| lr_decay | 设置学习率衰减率 | 5e-5 |
### Optimizer
| 字段 | 用途 | 默认值 |
| :-------- | ---------- | ------- |
| name | 优化器类名 | Adam |
| net_names | 优化器作用的网络 | net_rev |
| beta1 | 设置优化器参数beta1 | 0.9 |
| beta2 | 设置优化器参数beta2 | 0.999 |
### Validate
| 字段 | 用途 | 默认值 |
| :------- | ---- | ------ |
| interval | 设置验证间隔 | 500 |
| save_img | 验证时是否保存图像 | false |
### Log_config
| 字段 | 用途 | 默认值 |
| :--------------- | ---- | ------ |
| interval | 设置打印log间隔 | 10 |
| visiual_interval | 设置训练过程中保存生成图像的间隔 | 500 |
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册