Automatic 1111: Sketch-to-Image Workflow

With Stable Diffusion Web UI Automatic 1111, create custom workflows for converting Sketches into photo-realistic images using diffusion and ControlNet models.

Tharakarama Reddy Yernapalli Sreenivasulu

Pavan Vemuri

Prince Bose

Aug. 21, 24 · Tutorial

Likes (5)

Comment

Save

4.7K Views

In this article, we will be discussing how to convert hand-drawn or digital sketches into photorealistic images using stable diffusion models with the help of ControlNet. We will be extending the Automatic 1111's txt2img feature to develop this custom workflow.

Prerequisites

Before diving in, let's make sure we have the following prerequisites covered:

1. ControlNet Extension Installed

If the ControlNet extension isn't already installed in Automatic 1111 (Stable Diffusion Web UI), you'll need to do that first. If it's already set up, feel free to skip the following instructions.

Installing ControlNet Extension

Click on the Extensions tab on Stable Diffusion Web UI.
Navigate to the Install from URL tab.
Paste the following link in URL for extension's git repository input field and click install.
After the successful installation, restart the application by closing and reopening the run.bat file if you're a PC user; Mac users may need to run ./webui.sh instead.
After restarting the application, the ControlNet dropdown will become visible under the Generation tab in the txt2img screen.

2. Download the Following Models

We need the following Diffusion and ControlNet models to be downloaded and added to Automatic 1111 as prerequisites.

RealVisXL_V4.0_Lightning (huggingface.co): Copy this model to the Stable Diffusion models folder which is under the project root directory:/models/Stable-diffusion.
diffusers_xl_canny_full (huggingface.co): Copy the downloaded model to /extensions/sd-webui-controlnet.

We are using the RealVisXL_V4.0_Lightning model here for faster image generation. As the name of the model says itself, the model is a lightning version, which takes a smaller number of steps and consumes very less time for generations when compared to other Stable Diffusion models. We will talk about the ControlNet model in a bit after understanding its basic features and purposes. Skip this section if you're well versed with ControlNet models.

ControlNet Models

ControlNet constitutes a neural network architecture meticulously crafted to augment pre-trained diffusion models through the integration of supplementary controls. This integration empowers more precise and adaptable content generation. They are originally introduced in the context of text-to-image generation to expand the capabilities of diffusion models by enabling them to respond to additional input conditions, such as edge maps, depth maps, or other structured data. This approach allows for the manipulation of output images in a more controlled, precise, and predictable manner, making it highly valuable for applications where accuracy and specificity are crucial.

Basic Features

Integration with diffusion models: Enhances diffusion models by adding control channels for more targeted outputs
Multi-conditional inputs: Supports various input types like sketches, depth maps, or poses for better content control
Enhanced precision: Improves output accuracy, especially for detailed or specific content placement
Flexibility: Adaptable for tasks beyond image generation, including video and 3D model creation
Compatibility with existing models: Works with pre-trained models, saving time and resources for deployment

Some Real-Life Use Cases

Digital art and design: ControlNet enables artists to generate detailed images from sketches, poses, or styles, streamlining the creative process.
3D model generation: ControlNet creates 3D models from sketches, aiding rapid and precise model development in gaming and animation.
Medical imaging: ControlNet enhances medical imaging by generating accurate scans based on specific anatomical inputs, aiding diagnosis and treatment planning.
Robotics and automation: ControlNet helps generate environment maps or scenarios for autonomous systems, improving navigation in complex settings.
Interactive storytelling: ControlNet enables dynamic scene and character generation based on narrative cues, enriching interactive media experiences.

Sketch-to-Image Workflow

Open the Stable Diffusion web UI, navigate to txt2img tab and start making the following changes.

Key in the positive and negative prompts describing what the generation should look like and what objects to avoid during the generation. Use something like this:
- Prompt: A photorealistic image of a beautiful butterfly in the garden
- Negative Prompt: fake, unreal, low quality, blurry
Sampling Method: DPM++ SDE
Scheduler: Karras
Sampling steps: 6
Expand the ControlNet section and upload the sketch in the Single Image tab. You can use your own sketch or from internet downloads. The one I used in this example is downloaded from the American Museum of Natural History site.
Check the Enable and Pixel Perfect checkboxes.
Control Type: Canny
Processor: canny
Model: diffusers_xl_canny_full
Control Weight: 1.15
Control Mode: Balanced
Resize Mode: Resize and Fill

Those are all the changes we need to make! Click on Generate to see your sketches converted to photorealistic images. The screenshots below are for your reference.

Conclusion

We've explored how to integrate diffusion and ControlNet models into the Stable Diffusion Web UI, and we've also demonstrated how to transform hand-drawn or digital sketches into photorealistic images using the RealVisXL_V4.0_Lightening model, powered by the diffusers_xl_canny_full ControlNet model.

In the upcoming article, we'll dive into creating a custom sketch-to-image workflow using the Stable Diffusion Web UI APIs.

Hope you found something useful in this article. See you soon in our next article. Happy learning!

neural network workflow AI

Opinions expressed by DZone contributors are their own.

Related

Trending