未验证 提交 aa949a69 编写于 作者: C chenjian 提交者: GitHub

add english readme for text to image modules (#2023)

* add english readme for text to image modules

* fix docs
上级 7fe194c2
# disco_diffusion_clip_rn101
|Module Name|disco_diffusion_clip_rn101|
| :--- | :---: |
|Category|text to image|
|Network|dd+clip ResNet101|
|Dataset|-|
|Fine-tuning supported or not|No|
|Module Size|2.9GB|
|Latest update date|2022-08-02|
|Data indicators|-|
## I.Basic Information
### Application Effect Display
- Prompt "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
- Output image
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/184836003-96d82c9e-b43e-4eb5-a237-bb22972d1848.png" width = "80%" hspace='10'/>
<br />
- Generating process
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/184836085-beef3562-f89d-405e-ae04-2d3cd8499df8.gif" width = "80%" hspace='10'/>
<br />
### Module Introduction
disco_diffusion_clip_rn101 is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (CLIP), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and CLIP is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of CLIP iteratively generates new images, eventually generating images of what the text describes. The CLIP model used in this module is ResNet101.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233) and [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
- ```shell
$ hub install disco_diffusion_clip_rn101
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
- ```shell
$ hub run disco_diffusion_clip_rn101 --text_prompts "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation." --output_dir disco_diffusion_clip_rn101_out
```
- ### 2.Prediction Code Example
- ```python
import paddlehub as hub
module = hub.Module(name="disco_diffusion_clip_rn101")
text_prompts = ["A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."]
# Output images will be saved in disco_diffusion_clip_rn101_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
da = module.generate_image(text_prompts=text_prompts, output_dir='./disco_diffusion_clip_rn101_out/')
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_clip_rn101_out-result.png')
# Show all immediate results
da[0].chunks.plot_image_sprites(skip_empty=True, show_index=True, keep_aspect_ratio=True)
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_clip_rn101_out-result.gif')
```
- ### 3.API
- ```python
def generate_image(
text_prompts,
style: Optional[str] = None,
artist: Optional[str] = None,
width_height: Optional[List[int]] = [1280, 768],
seed: Optional[int] = None,
output_dir: Optional[str] = 'disco_diffusion_clip_rn101_out'):
```
- Image generating api, which generates an image corresponding to your prompt..
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "a beautiful painting of Chinese architecture, by krenz, sunny, super wide angle, artstation.". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as Greg Rutkowsk, krenz, image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_clip_rn101_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m disco_diffusion_clip_rn101
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
- ```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': 'in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.'}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/disco_diffusion_clip_rn101"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# Get results
da = DocumentArray.from_base64(r.json()["results"])
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_clip_rn101_out-result.png')
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_clip_rn101_out-result.gif')
## V.Release Note
* 1.0.0
First release
```shell
$ hub install disco_diffusion_clip_rn101 == 1.0.0
```
# disco_diffusion_clip_rn50
|Module Name|disco_diffusion_clip_rn50|
| :--- | :---: |
|Category|text to image|
|Network|dd+clip ResNet50|
|Dataset|-|
|Fine-tuning supported or not|No|
|Module Size|2.8GB|
|Latest update date|2022-08-02|
|Data indicators|-|
## I.Basic Information
### Application Effect Display
- Prompt "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
- Output image
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/184826628-7a716163-3439-489b-b5f5-0104b6a107de.png" width = "80%" hspace='10'/>
<br />
- Generating process
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/184826692-7959337f-8144-46d5-affb-362ad023420c.gif" width = "80%" hspace='10'/>
<br />
### Module Introduction
disco_diffusion_clip_rn50 is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (CLIP), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and CLIP is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of CLIP iteratively generates new images, eventually generating images of what the text describes. The CLIP model used in this module is ResNet50.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233) and [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
- ```shell
$ hub install disco_diffusion_clip_rn50
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
- ```shell
$ hub run disco_diffusion_clip_rn50 --text_prompts "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation." --output_dir disco_diffusion_clip_rn50_out
```
- ### 2.Prediction Code Example
- ```python
import paddlehub as hub
module = hub.Module(name="disco_diffusion_clip_rn50")
text_prompts = ["A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."]
# Output images will be saved in disco_diffusion_clip_rn50_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
da = module.generate_image(text_prompts=text_prompts, output_dir='./disco_diffusion_clip_rn50_out/')
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_clip_rn50_out-result.png')
# Show all immediate results
da[0].chunks.plot_image_sprites(skip_empty=True, show_index=True, keep_aspect_ratio=True)
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_clip_rn50_out-result.gif')
```
- ### 3.API
- ```python
def generate_image(
text_prompts,
style: Optional[str] = None,
artist: Optional[str] = None,
width_height: Optional[List[int]] = [1280, 768],
seed: Optional[int] = None,
output_dir: Optional[str] = 'disco_diffusion_clip_rn50_out'):
```
- Image generating api, which generates an image corresponding to your prompt.
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "a beautiful painting of Chinese architecture, by krenz, sunny, super wide angle, artstation.". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as Greg Rutkowsk, krenz, image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_clip_rn50_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m disco_diffusion_clip_rn50
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
- ```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': 'in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.'}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/disco_diffusion_clip_rn50"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# Get results
da = DocumentArray.from_base64(r.json()["results"])
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_clip_rn50_out-result.png')
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_clip_rn50_out-result.gif')
## V.Release Note
* 1.0.0
First release
```shell
$ hub install disco_diffusion_clip_rn50 == 1.0.0
```
# disco_diffusion_clip_vitb32
|Module Name|disco_diffusion_clip_vitb32|
| :--- | :---: |
|Category|text to image|
|Network|dd+clip ViTB32|
|Dataset|-|
|Fine-tuning supported or not|No|
|Module Size|3.1GB|
|Latest update date|2022-08-02|
|Data indicators|-|
## I.Basic Information
### Application Effect Display
- Prompt "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
- Output image
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/182298446-7feb530b-62cc-4e3f-a693-249ec8383daa.png" width = "80%" hspace='10'/>
<br />
- Generating process
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/182298453-9a8a8336-66e6-4adb-a46f-7a0fa211b467.gif" width = "80%" hspace='10'/>
<br />
### Module Introduction
disco_diffusion_clip_vitb32 disco_diffusion_clip_rn50 is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (CLIP), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and CLIP is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of CLIP iteratively generates new images, eventually generating images of what the text describes. The CLIP model used in this module is ViTB32.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233) and [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
- ```shell
$ hub install disco_diffusion_clip_vitb32
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
- ```shell
$ hub run disco_diffusion_clip_vitb32 --text_prompts "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation." --output_dir disco_diffusion_clip_vitb32_out
```
- ### 2.Prediction Code Example
- ```python
import paddlehub as hub
module = hub.Module(name="disco_diffusion_clip_vitb32")
text_prompts = ["A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."]
# Output images will be saved in disco_diffusion_clip_vitb32_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
da = module.generate_image(text_prompts=text_prompts, output_dir='./disco_diffusion_clip_vitb32_out/')
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_clip_vitb32_out-result.png')
# Show all immediate results
da[0].chunks.plot_image_sprites(skip_empty=True, show_index=True, keep_aspect_ratio=True)
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_clip_vitb32_out-result.gif')
```
- ### 3.API
- ```python
def generate_image(
text_prompts,
style: Optional[str] = None,
artist: Optional[str] = None,
width_height: Optional[List[int]] = [1280, 768],
seed: Optional[int] = None,
output_dir: Optional[str] = 'disco_diffusion_clip_vitb32_out'):
```
- Image generating api, which generates an image corresponding to your prompt..
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "a beautiful painting of Chinese architecture, by krenz, sunny, super wide angle, artstation.". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as Greg Rutkowsk, krenz, image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_clip_vitb32_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m disco_diffusion_clip_vitb32
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
- ```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': 'in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.'}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/disco_diffusion_clip_vitb32"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# Get results
da = DocumentArray.from_base64(r.json()["results"])
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_clip_vitb32_out-result.png')
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_clip_vitb32_out-result.gif')
## V.Release Note
* 1.0.0
First release
```shell
$ hub install disco_diffusion_clip_vitb32 == 1.0.0
```
# disco_diffusion_cnclip_vitb16
|Module Name|disco_diffusion_cnclip_vitb16|
| :--- | :---: |
|Category|text to image|
|Network|dd+cnclip ViTB16|
|Dataset|-|
|Fine-tuning supported or not|No|
|Module Size|2.9GB|
|Latest update date|2022-08-02|
|Data indicators|-|
## I.Basic Information
### Application Effect Display
- Prompt "在宁静的风景中画一幅美丽的建筑画,由Arthur Adams在artstation上所作"
- Output image
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/184838000-566f0548-f0f5-4df6-a4af-16bb70220137.png" width = "80%" hspace='10'/>
<br />
- Generating process
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/184837520-f30f21e1-5310-4925-a8d2-946337795010.gif" width = "80%" hspace='10'/>
<br />
### Module Introduction
disco_diffusion_cnclip_vitb16 is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (CLIP), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and CLIP is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of CLIP iteratively generates new images, eventually generating images of what the text describes. The CLIP model used in this module is ViTB16.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233) and [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
- ```shell
$ hub install disco_diffusion_cnclip_vitb16
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
- ```shell
$ hub run disco_diffusion_cnclip_vitb16 --text_prompts "孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作。" --output_dir disco_diffusion_cnclip_vitb16_out
```
- ### 2.Prediction Code Example
- ```python
import paddlehub as hub
module = hub.Module(name="disco_diffusion_cnclip_vitb16")
text_prompts = ["孤舟蓑笠翁,独钓寒江雪。"]
# Output images will be saved in disco_diffusion_cnclip_vitb16_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
da = module.generate_image(text_prompts=text_prompts, artist='齐白石', output_dir='./disco_diffusion_cnclip_vitb16_out/')
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_cnclip_vitb16_out-result.png')
# Show all immediate results
da[0].chunks.plot_image_sprites(skip_empty=True, show_index=True, keep_aspect_ratio=True)
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_cnclip_vitb16_out-result.gif')
```
- ### 3.API
- ```python
def generate_image(
text_prompts,
style: Optional[str] = None,
artist: Optional[str] = None,
width_height: Optional[List[int]] = [1280, 768],
seed: Optional[int] = None,
output_dir: Optional[str] = 'disco_diffusion_cnclip_vitb16_out'):
```
- Image generating api, which generates an image corresponding to your prompt.
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as 齐白石,Greg Rutkowsk,image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_cnclip_vitb16_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m disco_diffusion_cnclip_vitb16
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
- ```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': '孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作'}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/disco_diffusion_cnclip_vitb16"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# Get results
da = DocumentArray.from_base64(r.json()["results"])
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_cnclip_vitb16_out-result.png')
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_cnclip_vitb16_out-result.gif')
## V.Release Note
* 1.0.0
First release
```shell
$ hub install disco_diffusion_cnclip_vitb16 == 1.0.0
```
# disco_diffusion_ernievil_base
|Module Name|disco_diffusion_ernievil_base|
| :--- | :---: |
|Category|text to image|
|Network|dd+ERNIE-ViL|
|Dataset|-|
|Fine-tuning supported or not|No|
|Module Size|2.9GB|
|Latest update date|2022-08-02|
|Data indicators|-|
## I.Basic Information
### Application Effect Display
- Prompt "小桥流水人家"
- Output image
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/183010362-ec76fa49-5170-462f-8fc8-4353fd648924.png" width = "80%" hspace='10'/>
<br />
- Generating process
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/183010368-834e6388-411b-4e73-9bc4-a0ac97cb58d5.gif" width = "80%" hspace='10'/>
<br />
### Module Introduction
disco_diffusion_ernievil_base is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (ERNIE-ViL), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and ERNIE-ViL is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of ERNIE-ViL iteratively generates new images, eventually generating images of what the text describes. The model used in this module is ERNIE-ViL, consisting of ERNIE 3.0+ViT.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
- ```shell
$ hub install disco_diffusion_ernievil_base
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
- ```shell
$ hub run disco_diffusion_ernievil_base --text_prompts "孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作。" --output_dir disco_diffusion_ernievil_base_out
```
- ### 2.Prediction Code Example
- ```python
import paddlehub as hub
module = hub.Module(name="disco_diffusion_ernievil_base")
text_prompts = ["孤舟蓑笠翁,独钓寒江雪。"]
# Output images will be saved in disco_diffusion_ernievil_base_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
da = module.generate_image(text_prompts=text_prompts, artist='齐白石', output_dir='./disco_diffusion_ernievil_base_out/')
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_ernievil_base_out-result.png')
# Show all immediate results
da[0].chunks.plot_image_sprites(skip_empty=True, show_index=True, keep_aspect_ratio=True)
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_ernievil_base_out-result.gif')
```
- ### 3.API
- ```python
def generate_image(
text_prompts,
style: Optional[str] = None,
artist: Optional[str] = None,
width_height: Optional[List[int]] = [1280, 768],
seed: Optional[int] = None,
output_dir: Optional[str] = 'disco_diffusion_ernievil_base_out'):
```
- Image generating api, which generates an image corresponding to your prompt.
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as 齐白石,Greg Rutkowsk,image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_ernievil_base_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m disco_diffusion_ernievil_base
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
- ```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': '孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作'}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/disco_diffusion_ernievil_base"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# Get results
da = DocumentArray.from_base64(r.json()["results"])
# Save final result image to a file
da[0].save_uri_to_file('disco_diffusion_ernievil_base_out-result.png')
# Save the generating process as a gif
da[0].chunks.save_gif('disco_diffusion_ernievil_base_out-result.gif')
## V.Release Note
* 1.0.0
First release
```shell
$ hub install disco_diffusion_ernievil_base == 1.0.0
```
# stable_diffusion
|Module Name|stable_diffusion|
| :--- | :---: |
|Category|text to image|
|Network|CLIP Text Encoder+UNet+VAD|
|Dataset|-|
|Fine-tuning supported or not|No|
|Module Size|4.0GB|
|Latest update date|2022-08-26|
|Data indicators|-|
## I.Basic Information
### Application Effect Display
- Prompt "in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation."
- Output image
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/186873437-2e426acd-7656-4d37-9ee4-8cafa48f097f.png" width = "80%" hspace='10'/>
<br />
- Generating process
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/186873216-d2a9761a-78b0-4f6a-97ec-919768f324f5.gif" width = "80%" hspace='10'/>
<br />
### Module Introduction
Stable Diffusion is a latent diffusion model (Latent Diffusion), which belongs to the generative model. This kind of model obtains the images by iteratively denoising noise and sampling step by step, and currently has achieved amazing results. Compared with Disco Diffusion, Stable Diffusion iterates in a lower dimensional latent space instead of the original pixel space, which greatly reduces the memory and computational requirements. You can render the desired image within a minute on the V100, welcome to enjoy it in [aistudio](https://aistudio.baidu.com/aistudio/projectdetail/4512600).
For more details, please refer to [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
- ```shell
$ hub install stable_diffusion
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
- ```shell
$ hub run stable_diffusion --text_prompts "in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation." --output_dir stable_diffusion_out
```
- ### 2.Prediction Code Example
- ```python
import paddlehub as hub
module = hub.Module(name="stable_diffusion")
text_prompts = ["in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation."]
# Output images will be saved in stable_diffusion_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
# you can set batch_size parameter to generate number of batch_size images at one inference step.
da = module.generate_image(text_prompts=text_prompts, batch_size=3, output_dir='./stable_diffusion_out/')
# Show all immediate results
da[0].chunks[-1].chunks.plot_image_sprites(skip_empty=True, show_index=True, keep_aspect_ratio=True)
# Save the generating process as a gif
da[0].chunks[-1].chunks.save_gif('stable_diffusion_out-merged-result.gif')
da[0].chunks[0].chunks.plot_image_sprites(skip_empty=True, show_index=True, keep_aspect_ratio=True)
da[0].chunks[0].chunks.save_gif('stable_diffusion_out-image-0-result.gif')
```
- ### 3.API
- ```python
def generate_image(
text_prompts,
style: Optional[str] = None,
artist: Optional[str] = None,
width_height: Optional[List[int]] = [512, 512],
seed: Optional[int] = None,
batch_size: Optional[int] = 1,
output_dir: Optional[str] = 'stable_diffusion_out'):
```
- Image generating api, which generates an image corresponding to your prompt.
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as Greg Rutkowsk,krenz, image style is as whose works you choose. If not provided, style is totally up to your prompt.(https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- batch_size(Optional[int]): Number of images generated for one inference step.
- output_dir(Optional[str]): Output directory, default is "stable_diffusion_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `batch_size` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
- ```shell
$ hub serving start -m stable_diffusion
```
- The servitization API is now deployed and the default port number is 8866.
- **NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
- ```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': 'in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.'}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/stable_diffusion"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# Get results
r.json()["results"]
da = DocumentArray.from_base64(r.json()["results"])
# Save final result image to a file
da[0].save_uri_to_file('stable_diffusion_out.png')
# Save the generating process as a gif
da[0].chunks[0].chunks.save_gif('stable_diffusion_out.gif')
```
## V.Release Note
* 1.0.0
First release
```shell
$ hub install stable_diffusion == 1.0.0
```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册