DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • How to Enhance the Performance of .NET Core Applications for Large Responses
  • Automatic 1111: Adding Custom APIs
  • On-Demand-Schedulers With MuleSoft CloudHub APIs
  • AI Advancement for API and Microservices

Trending

  • What Is Plagiarism? How to Avoid It and Cite Sources
  • Debugging With Confidence in the Age of Observability-First Systems
  • How to Format Articles for DZone
  • Building Resilient Networks: Limiting the Risk and Scope of Cyber Attacks
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Automatic 1111: Custom Sketch-to-Image API

Automatic 1111: Custom Sketch-to-Image API

Develop a custom Sketch-to-Image API for converting hand-drawn/digital sketches into photorealistic images using stable diffusion models powered by ControlNet.

By 
Tharakarama Reddy Yernapalli Sreenivasulu user avatar
Tharakarama Reddy Yernapalli Sreenivasulu
·
Pavan Vemuri user avatar
Pavan Vemuri
·
Prince Bose user avatar
Prince Bose
·
Sep. 10, 24 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
4.1K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, we will develop a custom Sketch-to-Image API for converting hand-drawn or digital sketches into photorealistic images using stable diffusion models powered by a ControlNet model. We will extend the Automatic 1111's txt2img API to develop this custom workflow.

Prerequisites

  1. Stable Diffusion Web UI (Automatic 1111) running on your local machine. Follow the instructions here if you are starting from scratch.
  2. SD APIs Enabled. Follow the instructions on this page (scroll down to the Enabling APIs section) to enable the APIs if you haven't already done so.
  3. ControlNet extension installed:
    • Click on the Extensions tab on Stable Diffusion Web UI.
    • Navigate to the Install from URL tab.
    • Paste the following link in URL for extension's git repository input field and click Install.
      Paste link in URL for extension's git repository input field and click Install.
    • After the successful installation, restart the application by closing and reopening the run.bat file if you're a PC user; Mac users may need to run ./webui.sh instead.
    • After restarting the application, the ControlNet dropdown will become visible under the Generation tab in the txt2img screen.
      ControlNet dropdown
  4. Download and add the following models to Automatic 1111:
    • RealVisXL_V4.0_Lightning: HuggingFace: SG161222/RealVisXL_V4.0_Lightning. Copy this model to the Stable Diffusion models folder which is under the project root directory:/models/Stable-diffusion
    • diffusers_xl_canny_full - HuggingFace: lllyasviel/sd_control_collection
      Copy the downloaded model to /extensions/sd-webui-controlnet

Payload

Now that we have all our prerequisites in place, let's build the payload for the/sdapi/v1/txt2img API.

Python
 
payload = {
    "sd_model": "RealVisXL_V4.0_Lightning.safetensors [d6a48d3e20]",
    "prompt": f"{prompt}",
    "negative_prompt": f"{negative_prompt}",
    "steps": 6,
    "batch_size": 3,
    "cfg_scale": 1.5,
    "width": f"{width}",
    "height": f"{height}",
    "seed": -1,
    "sampler_index": "DPM++ SDE",
    "hr_scheduler": "Karras",
    "alwayson_scripts": {
        "controlnet": {
            "args": {
                "enabled": True,
                "input_image": f"{encoded_image}",
                "model": "diffusers_xl_canny_full [2b69fca4]",
                "module": "canny",
                "guidance_start": 0.0,
                "guidance_end": 1.0,
                "weight": 1.15,
                "threshold_a": 100,
                "threshold_b": 200,
                "resize_mode": "Resize and Fill",
                "lowvram": False,
                "guess_mode": False,
                "pixel_perfect": True,
                "control_mode": "My prompt is more important",
                "processor_res": 1024
            }
        }
    }
}


For now, we have set some placeholders for prompt, negative_prompt, width, height, and encoded_image attributes, while others are hardcoded to some default preset values. These values yielded the best results during our experimentation. Feel free to experiment with different values and models of your choice.

The encoded_image is our input sketch converted to a base64 encoded string.

Let's talk about some of the important attributes of our payload.

Attributes

  • Prompt: A textual description that guides the image generation process, specifying which objects to create and detailing their intended appearance
  • Negative prompt: Text input specifying the objects that should be excluded from the generated images
  • Steps: A numerical value indicating the number of iterations the model should perform to refine the generated image, with more steps generally leading to higher-quality results
  • Seed: A random numerical value used to generate images; Using the same seed will produce identical images when other attributes remain unchanged
  • Guidance scale: Adjusts the degree to which the generated image aligns with the input prompt; Higher values ensure closer adherence but may reduce image quality or diversity. 
  • Starting control step: Refers to the starting parameters or conditions that guide the model's generation process, setting the initial direction and constraints for the output
  • Ending control step: Includes the final adjustments or criteria used to refine and perfect the generated output, ensuring it meets the desired specifications and quality standards
  • Control weight: Defines the impact or influence of a particular control or condition in the model's generation process, directly affecting how closely the model follows the specified control criteria during output generation

Refer to the model documentation for all other attribute details.

Client

Here's the Python client for converting sketches into photorealistic images.

Python
 
import io
import requests
import base64
from PIL import Image


def run_sketch_client(pil, prompt, negative_prompt, height, width):
    buffered = io.BytesIO()
    pil.save(buffered, format="PNG")
    encoded_image = base64.b64encode(buffered.getvalue()).decode("utf-8")
    
    payload = {
        "sd_model": "RealVisXL_V4.0_Lightning.safetensors [d6a48d3e20]",
        "prompt": f"{prompt}",
        "negative_prompt": f"{negative_prompt}",
        "steps": 6,
        "batch_size": 3,
        "cfg_scale": 1.5,
        "width": f"{width}",
        "height": f"{height}",
        "seed": -1,
        "sampler_index": "DPM++ SDE",
        "hr_scheduler": "Karras",
        "alwayson_scripts": {
            "controlnet": {
                "args": [
                    {
                        "enabled": True,
                        "input_image": f"{encoded_image}",
                        "model": "diffusers_xl_canny_full [2b69fca4]",
                        "module": "canny",
                        "guidance_start": 0.0,
                        "guidance_end": 1.0,
                        "weight": 1.15,
                        "threshold_a": 100,
                        "threshold_b": 200,
                        "resize_mode": "Resize and Fill",
                        "lowvram": False,
                        "guess_mode": False,
                        "pixel_perfect": True,
                        "control_mode": "My prompt is more important",
                        "processor_res": 1024
                    }
                ]
            }
        }
    }

    print(payload)
    res = requests.post("http://localhost:7860/sdapi/v1/txt2img", json=payload)
    print(res)

    r = res.json()
    print(r)
    images = []
    if 'images' in r:
        for image in r['images']:
            image = Image.open(io.BytesIO(base64.b64decode(image)))
            images.append(image)

    return images


if __name__ == "__main__":
    pil = Image.open("butterfly.jpg")
    width, height = pil.size
    images = run_sketch_client(pil, "A photorealistic image of a beautiful butterfly", "fake, ugly, blurry, low quality", width, height)
    for i, image in enumerate(images):
        image.save(f"output_{i}.jpg")


The code uses the butterfly.jpg file as the input image, which is located in the same directory as the client code. The batch_size in our payload is set to the default value of 3, meaning the model will generate three variations of the butterfly along with an edge map (a sketch input converted into white lines on a black background). As a result, four output images will be created in the directory.

Let's focus on the edge map. This map is often used in combination with techniques like "ControlNet" to guide image generation. It highlights the subject's contours and edges, which the diffusion model leverages to maintain the structure while generating or modifying images. In our case, the edge map guides the RealVisXL Lightning model to generate the butterfly image, strictly following the canny edges provided by the edge map.

Conclusion

In this post, we've successfully created a comprehensive client that showcases the conversion of sketches into photorealistic images by extending the Stable Diffusion Web UI's txt2img API. Additionally, we've explored how the ControlNet model (diffusers_xl_canny_full) effectively guided the Stable Diffusion model (RealVisXL_V4.0_Lightning) to produce realistic images by adhering to the canny edges outlined in the generated edge map. This demonstrates the powerful synergy between these models in achieving highly detailed and accurate visual outputs from simple sketches. 

You can use this API to turn your sketches into digital images, or you can make it a fun tool for your kids to convert their drawings into digital pictures. 

Hope you found something useful in this article. See you soon in our next article. Happy learning!

API Base64 Attribute (computing) Payload (computing) Machine learning

Opinions expressed by DZone contributors are their own.

Related

  • How to Enhance the Performance of .NET Core Applications for Large Responses
  • Automatic 1111: Adding Custom APIs
  • On-Demand-Schedulers With MuleSoft CloudHub APIs
  • AI Advancement for API and Microservices

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!