Automatic 1111: Custom Sketch-to-Image API

Develop a custom Sketch-to-Image API for converting hand-drawn/digital sketches into photorealistic images using stable diffusion models powered by ControlNet.

Tharakarama Reddy Yernapalli Sreenivasulu

Pavan Vemuri

Prince Bose

Sep. 10, 24 · Tutorial

Likes (3)

Comment

Save

4.1K Views

In this article, we will develop a custom Sketch-to-Image API for converting hand-drawn or digital sketches into photorealistic images using stable diffusion models powered by a ControlNet model. We will extend the Automatic 1111's txt2img API to develop this custom workflow.

Prerequisites

Stable Diffusion Web UI (Automatic 1111) running on your local machine. Follow the instructions here if you are starting from scratch.
SD APIs Enabled. Follow the instructions on this page (scroll down to the Enabling APIs section) to enable the APIs if you haven't already done so.
ControlNet extension installed:
- Click on the Extensions tab on Stable Diffusion Web UI.
- Navigate to the Install from URL tab.
- Paste the following link in URL for extension's git repository input field and click Install.
- After the successful installation, restart the application by closing and reopening the run.bat file if you're a PC user; Mac users may need to run ./webui.sh instead.
- After restarting the application, the ControlNet dropdown will become visible under the Generation tab in the txt2img screen.
Download and add the following models to Automatic 1111:
- RealVisXL_V4.0_Lightning: HuggingFace: SG161222/RealVisXL_V4.0_Lightning. Copy this model to the Stable Diffusion models folder which is under the project root directory:/models/Stable-diffusion
- diffusers_xl_canny_full - HuggingFace: lllyasviel/sd_control_collection
  Copy the downloaded model to /extensions/sd-webui-controlnet

Payload

Now that we have all our prerequisites in place, let's build the payload for the/sdapi/v1/txt2img API.

    Python
   
 

   payload = {
    "sd_model": "RealVisXL_V4.0_Lightning.safetensors [d6a48d3e20]",
    "prompt": f"{prompt}",
    "negative_prompt": f"{negative_prompt}",
    "steps": 6,
    "batch_size": 3,
    "cfg_scale": 1.5,
    "width": f"{width}",
    "height": f"{height}",
    "seed": -1,
    "sampler_index": "DPM++ SDE",
    "hr_scheduler": "Karras",
    "alwayson_scripts": {
        "controlnet": {
            "args": {
                "enabled": True,
                "input_image": f"{encoded_image}",
                "model": "diffusers_xl_canny_full [2b69fca4]",
                "module": "canny",
                "guidance_start": 0.0,
                "guidance_end": 1.0,
                "weight": 1.15,
                "threshold_a": 100,
                "threshold_b": 200,
                "resize_mode": "Resize and Fill",
                "lowvram": False,
                "guess_mode": False,
                "pixel_perfect": True,
                "control_mode": "My prompt is more important",
                "processor_res": 1024
            }
        }
    }
}
  

For now, we have set some placeholders for prompt, negative_prompt, width, height, and encoded_image attributes, while others are hardcoded to some default preset values. These values yielded the best results during our experimentation. Feel free to experiment with different values and models of your choice.

The encoded_image is our input sketch converted to a base64 encoded string.

Let's talk about some of the important attributes of our payload.

Attributes

Prompt: A textual description that guides the image generation process, specifying which objects to create and detailing their intended appearance
Negative prompt: Text input specifying the objects that should be excluded from the generated images
Steps: A numerical value indicating the number of iterations the model should perform to refine the generated image, with more steps generally leading to higher-quality results
Seed: A random numerical value used to generate images; Using the same seed will produce identical images when other attributes remain unchanged
Guidance scale: Adjusts the degree to which the generated image aligns with the input prompt; Higher values ensure closer adherence but may reduce image quality or diversity.
Starting control step: Refers to the starting parameters or conditions that guide the model's generation process, setting the initial direction and constraints for the output
Ending control step: Includes the final adjustments or criteria used to refine and perfect the generated output, ensuring it meets the desired specifications and quality standards
Control weight: Defines the impact or influence of a particular control or condition in the model's generation process, directly affecting how closely the model follows the specified control criteria during output generation

Refer to the model documentation for all other attribute details.

Client

Here's the Python client for converting sketches into photorealistic images.

    Python
   
 

   import io
import requests
import base64
from PIL import Image


def run_sketch_client(pil, prompt, negative_prompt, height, width):
    buffered = io.BytesIO()
    pil.save(buffered, format="PNG")
    encoded_image = base64.b64encode(buffered.getvalue()).decode("utf-8")
    
    payload = {
        "sd_model": "RealVisXL_V4.0_Lightning.safetensors [d6a48d3e20]",
        "prompt": f"{prompt}",
        "negative_prompt": f"{negative_prompt}",
        "steps": 6,
        "batch_size": 3,
        "cfg_scale": 1.5,
        "width": f"{width}",
        "height": f"{height}",
        "seed": -1,
        "sampler_index": "DPM++ SDE",
        "hr_scheduler": "Karras",
        "alwayson_scripts": {
            "controlnet": {
                "args": [
                    {
                        "enabled": True,
                        "input_image": f"{encoded_image}",
                        "model": "diffusers_xl_canny_full [2b69fca4]",
                        "module": "canny",
                        "guidance_start": 0.0,
                        "guidance_end": 1.0,
                        "weight": 1.15,
                        "threshold_a": 100,
                        "threshold_b": 200,
                        "resize_mode": "Resize and Fill",
                        "lowvram": False,
                        "guess_mode": False,
                        "pixel_perfect": True,
                        "control_mode": "My prompt is more important",
                        "processor_res": 1024
                    }
                ]
            }
        }
    }

    print(payload)
    res = requests.post("http://localhost:7860/sdapi/v1/txt2img", json=payload)
    print(res)

    r = res.json()
    print(r)
    images = []
    if 'images' in r:
        for image in r['images']:
            image = Image.open(io.BytesIO(base64.b64decode(image)))
            images.append(image)

    return images


if __name__ == "__main__":
    pil = Image.open("butterfly.jpg")
    width, height = pil.size
    images = run_sketch_client(pil, "A photorealistic image of a beautiful butterfly", "fake, ugly, blurry, low quality", width, height)
    for i, image in enumerate(images):
        image.save(f"output_{i}.jpg")
  

The code uses the butterfly.jpg file as the input image, which is located in the same directory as the client code. The batch_size in our payload is set to the default value of 3, meaning the model will generate three variations of the butterfly along with an edge map (a sketch input converted into white lines on a black background). As a result, four output images will be created in the directory.

Let's focus on the edge map. This map is often used in combination with techniques like "ControlNet" to guide image generation. It highlights the subject's contours and edges, which the diffusion model leverages to maintain the structure while generating or modifying images. In our case, the edge map guides the RealVisXL Lightning model to generate the butterfly image, strictly following the canny edges provided by the edge map.

Conclusion

In this post, we've successfully created a comprehensive client that showcases the conversion of sketches into photorealistic images by extending the Stable Diffusion Web UI's txt2img API. Additionally, we've explored how the ControlNet model (diffusers_xl_canny_full) effectively guided the Stable Diffusion model (RealVisXL_V4.0_Lightning) to produce realistic images by adhering to the canny edges outlined in the generated edge map. This demonstrates the powerful synergy between these models in achieving highly detailed and accurate visual outputs from simple sketches.

You can use this API to turn your sketches into digital images, or you can make it a fun tool for your kids to convert their drawings into digital pictures.

Hope you found something useful in this article. See you soon in our next article. Happy learning!

API Base64 Attribute (computing) Payload (computing) Machine learning

Opinions expressed by DZone contributors are their own.

Related

Trending