DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Securing AI/ML Workloads in the Cloud: Integrating DevSecOps with MLOps
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • How to Enhance the Performance of .NET Core Applications for Large Responses
  • Automatic 1111: Adding Custom APIs

Trending

  • 5 Common Security Pitfalls in Serverless Architectures
  • Chaos Engineering Has a Blind Spot. Agentic AI Lives in It.
  • Implementing Observability in Distributed Systems Using OpenTelemetry
  • Stateless JWT Auth Microservice Architecture With Spring Boot 3 and Redis Sentinel
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Automatic 1111: Custom Sketch-to-Image API

Automatic 1111: Custom Sketch-to-Image API

Develop a custom Sketch-to-Image API for converting hand-drawn/digital sketches into photorealistic images using stable diffusion models powered by ControlNet.

By 
Tharakarama Reddy Yernapalli Sreenivasulu user avatar
Tharakarama Reddy Yernapalli Sreenivasulu
·
Pavan Vemuri user avatar
Pavan Vemuri
DZone Core CORE ·
Prince Bose user avatar
Prince Bose
·
Sep. 10, 24 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
5.3K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, we will develop a custom Sketch-to-Image API for converting hand-drawn or digital sketches into photorealistic images using stable diffusion models powered by a ControlNet model. We will extend the Automatic 1111's txt2img API to develop this custom workflow.

Prerequisites

  1. Stable Diffusion Web UI (Automatic 1111) running on your local machine. Follow the instructions here if you are starting from scratch.
  2. SD APIs Enabled. Follow the instructions on this page (scroll down to the Enabling APIs section) to enable the APIs if you haven't already done so.
  3. ControlNet extension installed:
    • Click on the Extensions tab on Stable Diffusion Web UI.
    • Navigate to the Install from URL tab.
    • Paste the following link in URL for extension's git repository input field and click Install.
      Paste link in URL for extension's git repository input field and click Install.
    • After the successful installation, restart the application by closing and reopening the run.bat file if you're a PC user; Mac users may need to run ./webui.sh instead.
    • After restarting the application, the ControlNet dropdown will become visible under the Generation tab in the txt2img screen.
      ControlNet dropdown
  4. Download and add the following models to Automatic 1111:
    • RealVisXL_V4.0_Lightning: HuggingFace: SG161222/RealVisXL_V4.0_Lightning. Copy this model to the Stable Diffusion models folder which is under the project root directory:/models/Stable-diffusion
    • diffusers_xl_canny_full - HuggingFace: lllyasviel/sd_control_collection
      Copy the downloaded model to /extensions/sd-webui-controlnet

Payload

Now that we have all our prerequisites in place, let's build the payload for the/sdapi/v1/txt2img API.

Python
 
payload = {
    "sd_model": "RealVisXL_V4.0_Lightning.safetensors [d6a48d3e20]",
    "prompt": f"{prompt}",
    "negative_prompt": f"{negative_prompt}",
    "steps": 6,
    "batch_size": 3,
    "cfg_scale": 1.5,
    "width": f"{width}",
    "height": f"{height}",
    "seed": -1,
    "sampler_index": "DPM++ SDE",
    "hr_scheduler": "Karras",
    "alwayson_scripts": {
        "controlnet": {
            "args": {
                "enabled": True,
                "input_image": f"{encoded_image}",
                "model": "diffusers_xl_canny_full [2b69fca4]",
                "module": "canny",
                "guidance_start": 0.0,
                "guidance_end": 1.0,
                "weight": 1.15,
                "threshold_a": 100,
                "threshold_b": 200,
                "resize_mode": "Resize and Fill",
                "lowvram": False,
                "guess_mode": False,
                "pixel_perfect": True,
                "control_mode": "My prompt is more important",
                "processor_res": 1024
            }
        }
    }
}


For now, we have set some placeholders for prompt, negative_prompt, width, height, and encoded_image attributes, while others are hardcoded to some default preset values. These values yielded the best results during our experimentation. Feel free to experiment with different values and models of your choice.

The encoded_image is our input sketch converted to a base64 encoded string.

Let's talk about some of the important attributes of our payload.

Attributes

  • Prompt: A textual description that guides the image generation process, specifying which objects to create and detailing their intended appearance
  • Negative prompt: Text input specifying the objects that should be excluded from the generated images
  • Steps: A numerical value indicating the number of iterations the model should perform to refine the generated image, with more steps generally leading to higher-quality results
  • Seed: A random numerical value used to generate images; Using the same seed will produce identical images when other attributes remain unchanged
  • Guidance scale: Adjusts the degree to which the generated image aligns with the input prompt; Higher values ensure closer adherence but may reduce image quality or diversity. 
  • Starting control step: Refers to the starting parameters or conditions that guide the model's generation process, setting the initial direction and constraints for the output
  • Ending control step: Includes the final adjustments or criteria used to refine and perfect the generated output, ensuring it meets the desired specifications and quality standards
  • Control weight: Defines the impact or influence of a particular control or condition in the model's generation process, directly affecting how closely the model follows the specified control criteria during output generation

Refer to the model documentation for all other attribute details.

Client

Here's the Python client for converting sketches into photorealistic images.

Python
 
import io
import requests
import base64
from PIL import Image


def run_sketch_client(pil, prompt, negative_prompt, height, width):
    buffered = io.BytesIO()
    pil.save(buffered, format="PNG")
    encoded_image = base64.b64encode(buffered.getvalue()).decode("utf-8")
    
    payload = {
        "sd_model": "RealVisXL_V4.0_Lightning.safetensors [d6a48d3e20]",
        "prompt": f"{prompt}",
        "negative_prompt": f"{negative_prompt}",
        "steps": 6,
        "batch_size": 3,
        "cfg_scale": 1.5,
        "width": f"{width}",
        "height": f"{height}",
        "seed": -1,
        "sampler_index": "DPM++ SDE",
        "hr_scheduler": "Karras",
        "alwayson_scripts": {
            "controlnet": {
                "args": [
                    {
                        "enabled": True,
                        "input_image": f"{encoded_image}",
                        "model": "diffusers_xl_canny_full [2b69fca4]",
                        "module": "canny",
                        "guidance_start": 0.0,
                        "guidance_end": 1.0,
                        "weight": 1.15,
                        "threshold_a": 100,
                        "threshold_b": 200,
                        "resize_mode": "Resize and Fill",
                        "lowvram": False,
                        "guess_mode": False,
                        "pixel_perfect": True,
                        "control_mode": "My prompt is more important",
                        "processor_res": 1024
                    }
                ]
            }
        }
    }

    print(payload)
    res = requests.post("http://localhost:7860/sdapi/v1/txt2img", json=payload)
    print(res)

    r = res.json()
    print(r)
    images = []
    if 'images' in r:
        for image in r['images']:
            image = Image.open(io.BytesIO(base64.b64decode(image)))
            images.append(image)

    return images


if __name__ == "__main__":
    pil = Image.open("butterfly.jpg")
    width, height = pil.size
    images = run_sketch_client(pil, "A photorealistic image of a beautiful butterfly", "fake, ugly, blurry, low quality", width, height)
    for i, image in enumerate(images):
        image.save(f"output_{i}.jpg")


The code uses the butterfly.jpg file as the input image, which is located in the same directory as the client code. The batch_size in our payload is set to the default value of 3, meaning the model will generate three variations of the butterfly along with an edge map (a sketch input converted into white lines on a black background). As a result, four output images will be created in the directory.

Let's focus on the edge map. This map is often used in combination with techniques like "ControlNet" to guide image generation. It highlights the subject's contours and edges, which the diffusion model leverages to maintain the structure while generating or modifying images. In our case, the edge map guides the RealVisXL Lightning model to generate the butterfly image, strictly following the canny edges provided by the edge map.

Conclusion

In this post, we've successfully created a comprehensive client that showcases the conversion of sketches into photorealistic images by extending the Stable Diffusion Web UI's txt2img API. Additionally, we've explored how the ControlNet model (diffusers_xl_canny_full) effectively guided the Stable Diffusion model (RealVisXL_V4.0_Lightning) to produce realistic images by adhering to the canny edges outlined in the generated edge map. This demonstrates the powerful synergy between these models in achieving highly detailed and accurate visual outputs from simple sketches. 

You can use this API to turn your sketches into digital images, or you can make it a fun tool for your kids to convert their drawings into digital pictures. 

Hope you found something useful in this article. See you soon in our next article. Happy learning!

API Base64 Attribute (computing) Payload (computing) Machine learning

Opinions expressed by DZone contributors are their own.

Related

  • Securing AI/ML Workloads in the Cloud: Integrating DevSecOps with MLOps
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes
  • How to Enhance the Performance of .NET Core Applications for Large Responses
  • Automatic 1111: Adding Custom APIs

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook