- Prompt "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
disco_diffusion_clip_rn101 is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (CLIP), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and CLIP is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of CLIP iteratively generates new images, eventually generating images of what the text describes. The CLIP model used in this module is ResNet101.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233) and [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
-```shell
$ hub install disco_diffusion_clip_rn101
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
-```shell
$ hub run disco_diffusion_clip_rn101 --text_prompts "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation." --output_dir disco_diffusion_clip_rn101_out
text_prompts = ["A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."]
# Output images will be saved in disco_diffusion_clip_rn101_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
da = module.generate_image(text_prompts=text_prompts, output_dir='./disco_diffusion_clip_rn101_out/')
- Image generating api, which generates an image corresponding to your prompt..
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "a beautiful painting of Chinese architecture, by krenz, sunny, super wide angle, artstation.". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as Greg Rutkowsk, krenz, image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_clip_rn101_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
-```shell
$ hub serving start -m disco_diffusion_clip_rn101
```
- The servitization API is now deployed and the default port number is 8866.
-**NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
-```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': 'in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.'}
- Prompt "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
disco_diffusion_clip_rn50 is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (CLIP), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and CLIP is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of CLIP iteratively generates new images, eventually generating images of what the text describes. The CLIP model used in this module is ResNet50.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233) and [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
-```shell
$ hub install disco_diffusion_clip_rn50
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
-```shell
$ hub run disco_diffusion_clip_rn50 --text_prompts "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation." --output_dir disco_diffusion_clip_rn50_out
text_prompts = ["A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."]
# Output images will be saved in disco_diffusion_clip_rn50_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
da = module.generate_image(text_prompts=text_prompts, output_dir='./disco_diffusion_clip_rn50_out/')
- Image generating api, which generates an image corresponding to your prompt.
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "a beautiful painting of Chinese architecture, by krenz, sunny, super wide angle, artstation.". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as Greg Rutkowsk, krenz, image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_clip_rn50_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
-```shell
$ hub serving start -m disco_diffusion_clip_rn50
```
- The servitization API is now deployed and the default port number is 8866.
-**NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
-```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': 'in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.'}
- Prompt "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."
disco_diffusion_clip_vitb32 disco_diffusion_clip_rn50 is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (CLIP), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and CLIP is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of CLIP iteratively generates new images, eventually generating images of what the text describes. The CLIP model used in this module is ViTB32.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233) and [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
-```shell
$ hub install disco_diffusion_clip_vitb32
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
-```shell
$ hub run disco_diffusion_clip_vitb32 --text_prompts "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation." --output_dir disco_diffusion_clip_vitb32_out
text_prompts = ["A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation."]
# Output images will be saved in disco_diffusion_clip_vitb32_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
da = module.generate_image(text_prompts=text_prompts, output_dir='./disco_diffusion_clip_vitb32_out/')
- Image generating api, which generates an image corresponding to your prompt..
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "a beautiful painting of Chinese architecture, by krenz, sunny, super wide angle, artstation.". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as Greg Rutkowsk, krenz, image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_clip_vitb32_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
disco_diffusion_cnclip_vitb16 is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (CLIP), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and CLIP is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of CLIP iteratively generates new images, eventually generating images of what the text describes. The CLIP model used in this module is ViTB16.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233) and [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
-```shell
$ hub install disco_diffusion_cnclip_vitb16
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
-```shell
$ hub run disco_diffusion_cnclip_vitb16 --text_prompts "孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作。" --output_dir disco_diffusion_cnclip_vitb16_out
- Image generating api, which generates an image corresponding to your prompt.
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as 齐白石,Greg Rutkowsk,image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_cnclip_vitb16_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
disco_diffusion_ernievil_base is a text-to-image generation model that can generate images that match the semantics of the sentence you prompt. The model consists of two parts, one is the diffusion model, which is a generative model that reconstructs the original image from the noisy input. The other part is the multimodal pre-training model (ERNIE-ViL), which can represent text and images in the same feature space, and text and images with similar semantics will be closer in this feature space. In the text image generation model, the diffusion model is responsible for generating the target image from the initial noise or the specified initial image, and ERNIE-ViL is responsible for guiding the generated image to be as close as possible to the semantics of the input text. Diffusion model under the guidance of ERNIE-ViL iteratively generates new images, eventually generating images of what the text describes. The model used in this module is ERNIE-ViL, consisting of ERNIE 3.0+ViT.
For more details, please refer to [Diffusion Models Beat GANs on Image Synthesis](https://arxiv.org/abs/2105.05233)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.2.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
-```shell
$ hub install disco_diffusion_ernievil_base
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
-```shell
$ hub run disco_diffusion_ernievil_base --text_prompts "孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作。" --output_dir disco_diffusion_ernievil_base_out
- Image generating api, which generates an image corresponding to your prompt.
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "孤舟蓑笠翁,独钓寒江雪。风格如齐白石所作". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as 齐白石,Greg Rutkowsk,image style is as whose works you choose. If not provided, style is totally up to your [prompt](https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- output_dir(Optional[str]): Output directory, default is "disco_diffusion_ernievil_base_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `n_batches` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
Stable Diffusion is a latent diffusion model (Latent Diffusion), which belongs to the generative model. This kind of model obtains the images by iteratively denoising noise and sampling step by step, and currently has achieved amazing results. Compared with Disco Diffusion, Stable Diffusion iterates in a lower dimensional latent space instead of the original pixel space, which greatly reduces the memory and computational requirements. You can render the desired image within a minute on the V100, welcome to enjoy it in [aistudio](https://aistudio.baidu.com/aistudio/projectdetail/4512600).
For more details, please refer to [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)
## II.Installation
- ### 1.Environmental Dependence
- paddlepaddle >= 2.0.0
- paddlehub >= 2.0.0 | [How to install PaddleHub](../../../../docs/docs_en/get_start/installation.rst)
- ### 2.Installation
-```shell
$ hub install stable_diffusion
```
- In case of any problems during installation, please refer to:[Windows_Quickstart](../../../../docs/docs_en/get_start/windows_quickstart.md) | [Linux_Quickstart](../../../../docs/docs_en/get_start/linux_quickstart.md) | [Mac_Quickstart](../../../../docs/docs_en/get_start/mac_quickstart.md)
## III.Module API Prediction
- ### 1.Command line Prediction
-```shell
$ hub run stable_diffusion --text_prompts "in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation." --output_dir stable_diffusion_out
```
- ### 2.Prediction Code Example
-```python
import paddlehub as hub
module = hub.Module(name="stable_diffusion")
text_prompts = ["in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation."]
# Output images will be saved in stable_diffusion_out directory.
# The returned da is a DocumentArray object, which contains all immediate and final results
# You can manipulate the DocumentArray object to do post-processing and save images
# you can set batch_size parameter to generate number of batch_size images at one inference step.
da = module.generate_image(text_prompts=text_prompts, batch_size=3, output_dir='./stable_diffusion_out/')
- Image generating api, which generates an image corresponding to your prompt.
- **Parameters**
- text_prompts(str): Prompt, used to describe your image content. You can construct a prompt conforms to the format "content" + "artist/style", such as "in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.". For more details, you can refer to [website](https://docs.google.com/document/d/1XUT2G9LmkZataHFzmuOtRXnuWBfhvXDAo8DkS--8tec/edit#).
- style(Optional[str]): Image style, such as "watercolor" and "Chinese painting". If not provided, style is totally up to your prompt.
- artist(Optional[str]): Artist name, such as Greg Rutkowsk,krenz, image style is as whose works you choose. If not provided, style is totally up to your prompt.(https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/).
- width_height(Optional[List[int]]): The width and height of output images, should be better multiples of 64. The larger size is, the longger computation time is.
- seed(Optional[int]): Random seed, different seeds result in different output images.
- batch_size(Optional[int]): Number of images generated for one inference step.
- output_dir(Optional[str]): Output directory, default is "stable_diffusion_out".
- **Return**
- ra(DocumentArray): DocumentArray object, including `batch_size` Documents,each document keeps all immediate results during generation, please refer to [DocumentArray tutorial](https://docarray.jina.ai/fundamentals/documentarray/index.html) for more details.
## IV.Server Deployment
- PaddleHub Serving can deploy an online service of text-to-image.
- ### Step 1: Start PaddleHub Serving
- Run the startup command:
-```shell
$ hub serving start -m stable_diffusion
```
- The servitization API is now deployed and the default port number is 8866.
-**NOTE:** If GPU is used for prediction, set CUDA_VISIBLE_DEVICES environment variable before the service, otherwise it need not be set.
- ### Step 2: Send a predictive request
- With a configured server, use the following lines of code to send the prediction request and obtain the result.
-```python
import requests
import json
import cv2
import base64
from docarray import DocumentArray
# Send an HTTP request
data = {'text_prompts': 'in the morning light,Overlooking TOKYO city by greg rutkowski and thomas kinkade,Trending on artstation.'}