DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Comparing Managed Postgres Options on The Azure Marketplace
  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Database Query Service With OpenAI and PostgreSQL in .NET
  • PostgreSQL 12 End of Life: What to Know and How to Prepare

Trending

  • Introducing Graph Concepts in Java With Eclipse JNoSQL
  • Enforcing Architecture With ArchUnit in Java
  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models
  • Chat With Your Knowledge Base: A Hands-On Java and LangChain4j Guide
  1. DZone
  2. Data Engineering
  3. Databases
  4. Questioning an Image Database With Local AI/LLM on Ollama and Spring AI

Questioning an Image Database With Local AI/LLM on Ollama and Spring AI

In this article, the reader will learn how to query an image database with natural language. Read further to learn more!

By 
Sven Loesekann user avatar
Sven Loesekann
·
Jul. 15, 24 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
4.2K Views

Join the DZone community and get the full member experience.

Join For Free

The AIDocumentLibraryChat project has been extended to include an image database that can be questioned for images. It uses the LLava model of Ollama, which can analyze images. The image search uses embeddings with the PGVector extension of PostgreSQL.

Architecture

The AIDocumentLibraryChat project has this architecture:

AIDocumentLibraryChat architecture

The Angular front-end shows the upload and question features to the user. The Spring AI Backend adjusts the model's image size, uses the database to store the data/vectors, and creates the image descriptions with the LLava model of Ollama.

The flow of image upload/analysis/storage looks like this:

The image is uploaded with the front-end. The back-end resizes it to a format the LLava model can process. The LLava model then generates a description of the image based on the provided prompt. The resized image and the metadata are stored in a relational Table of PostgreSQL. The image description is then used to create Embeddings. The Embeddings are stored with the description in the PGVector database with metadata to find the corresponding row in the PostgreSQL Table. Then the image description and the resized image are shown in the frontend.

The flow of image questions looks like this:

question image database

The user can input the question in the front-end. The backend converts the question to Embeddings and searches the PGVector database for the nearest entry. The entry has the row ID of the image table with the image and the metadata. The image table data is queried combined with the description and shown to the user.

Backend

To run the PGVector database and the Ollama framework the files runPostgresql.sh and runOllama.sh contain Docker commands.

The backend needs these entries in application-ollama.properties:

Properties files
 
# image processing
spring.ai.ollama.chat.model=llava:34b-v1.6-q6_K
spring.ai.ollama.chat.options.num-thread=8
spring.ai.ollama.chat.options.keep_alive=1s


The application needs to be built with Ollama support (property: ‘useOllama’) and started with the ‘ollama’ profile and these properties need to be activated to enable the LLava model and set a useful keep_alive. The num_thread is only needed if Ollama does not select the right amount automatically.

The Controller

The ImageController contains the endpoints:

Java
 
@RestController
@RequestMapping("rest/image")
public class ImageController {
...
  @PostMapping("/query")
  public List<ImageDto> postImageQuery(@RequestParam("query") String 
    query,@RequestParam("type") String type) {		
    var result = this.imageService.queryImage(query);		
    return result;
  }
	
  @PostMapping("/import")
  public ImageDto postImportImage(@RequestParam("query") String query, 
    @RequestParam("type") String type, 
    @RequestParam("file") MultipartFile imageQuery) {		
    var result = 
      this.imageService.importImage(this.imageMapper.map(imageQuery, query),   
      this.imageMapper.map(imageQuery));		
    return result;
  }	
}


The query endpoint contains the ‘postImageQuery(…)’ method that receives a form with the query and the image type and calls the ImageService to handle the request.

The import endpoint contains the ‘postImportImage(…)’ method that receives a form with the query(prompt), the image type, and the file. The ImageMapper converts the form to the ImageQueryDto and the Image entity and calls the ImageService to handle the request.

The Service

The ImageService looks like this:

Java
 
@Service
@Transactional
public class ImageService {
...
  public ImageDto importImage(ImageQueryDto imageDto, Image image) {
    var resultData = this.createAIResult(imageDto);
    image.setImageContent(resultData.imageQueryDto().getImageContent());
    var myImage = this.imageRepository.save(image);
    var aiDocument = new Document(resultData.answer());
    aiDocument.getMetadata().put(MetaData.ID, myImage.getId().toString());
    aiDocument.getMetadata().put(MetaData.DATATYPE, 
      MetaData.DataType.IMAGE.toString());
    this.documentVsRepository.add(List.of(aiDocument));
    return new ImageDto(resultData.answer(),  
      Base64.getEncoder().encodeToString(resultData.imageQueryDto()
       .getImageContent()), resultData.imageQueryDto().getImageType());
  }

  public List<ImageDto> queryImage(String imageQuery) {
    var aiDocuments = this.documentVsRepository.retrieve(imageQuery, 
      MetaData.DataType.IMAGE, this.resultSize.intValue())
       .stream().filter(myDoc -> myDoc.getMetadata()
        .get(MetaData.DATATYPE).equals(DataType.IMAGE.toString()))
        .sorted((myDocA, myDocB) -> 
           ((Float) myDocA.getMetadata().get(MetaData.DISTANCE))
          .compareTo(((Float) myDocB.getMetadata().get(MetaData.DISTANCE))))
        .toList();
    var imageMap = this.imageRepository.findAllById(
      aiDocuments.stream().map(myDoc -> 
        (String) myDoc.getMetadata().get(MetaData.ID)).map(myUuid -> 
          UUID.fromString(myUuid)).toList())
        .stream().collect(Collectors.toMap(myDoc -> myDoc.getId(), 
          myDoc -> myDoc));
    return imageMap.entrySet().stream().map(myEntry ->   
      createImageContainer(aiDocuments, myEntry))
	.sorted((containerA, containerB) -> 
          containerA.distance().compareTo(containerB.distance()))
	.map(myContainer -> new ImageDto(myContainer.document().getContent(), 
	  Base64.getEncoder().encodeToString(
            myContainer.image().getImageContent()),
	  myContainer.image().getImageType())).limit(this.resultSize)
        .toList();
  }

  private ImageContainer createImageContainer(List<Document> aiDocuments, 
    Entry<UUID, Image> myEntry) {
    return new ImageContainer(
      createIdFilteredStream(aiDocuments, myEntry)
        .findFirst().orElseThrow(),
        myEntry.getValue(),
	createIdFilteredStream(aiDocuments, myEntry).map(myDoc -> 
          (Float) myDoc.getMetadata().get(MetaData.DISTANCE))
            .findFirst().orElseThrow());
  }

  private Stream<Document> createIdFilteredStream(List<Document> aiDocuments, 
    Entry<UUID, Image> myEntry) {
    return aiDocuments.stream().filter(myDoc -> myEntry.getKey().toString()
      .equals((String) myDoc.getMetadata().get(MetaData.ID)));
  }

  private ResultData createAIResult(ImageQueryDto imageDto) {
    if (ImageType.JPEG.equals(imageDto.getImageType()) || 
      ImageType.PNG.equals(imageDto.getImageType())) {
	imageDto = this.resizeImage(imageDto);
    } 
    var prompt = new Prompt(new UserMessage(imageDto.getQuery(), 
      List.of(new Media(MimeType.valueOf(imageDto.getImageType()
        .getMediaType()), imageDto.getImageContent()))));
    var response = this.chatClient.call(prompt);
    var resultData = new  
    ResultData(response.getResult().getOutput().getContent(), imageDto);
    return resultData;
  }

  private ImageQueryDto resizeImage(ImageQueryDto imageDto) {
    ...
  }
}


In the ‘importImage(…)’ method the method ‘createAIResult(…)’ is called. It checks the image type and calls the ‘resizeImage(…)’ method to scale the image to a size that the LLava model supports. Then the Spring AI Prompt is created with the prompt text and the media with the image, media type, and the image byte array. Then the ‘chatClient’ calls the prompt and the response is returned in the ‘ResultData’ record with the description and the resized image. Then the resized image is added to the image entity and the entity is persisted. Now the AI document is created with the embeddings, description, and the image entity ID in the metadata. Then the ImageDto is created with the description, the resized image, and the image type and returned.

In the ‘queryImage(…)’ method the Spring AI Documents with the lowest distances are retrieved and filtered for AI documents of image type in the metadata. The Documents are then sorted for the lowest distance. Then the image entities with the metadata IDs of the Spring AI Documents are loaded. That enables the creation of the ImageDtos with the matching documents and image entities. The image is provided as a Base64 encoded string. That enables the MediaType the easy display of the image in an IMG tag.

To display a Base64 Png image you can use: ‘<img src=”data:image/png;base64,iVBORw0KG…” />’

Result

The UI result looks like this:

image query

The application found the large airplane in the vector database using the embeddings. The second image was selected because of a similar sky. The search took only a fraction of a second.

Conclusion

The support of Spring AI and Ollama enables the use of the free LLava model. That makes the implementation of this image database easy. The LLava model generates good descriptions of the images that can be converted into embeddings for fast searching. Spring AI is missing support for the generate API endpoint, because of that the parameter ‘spring.ai.ollama.chat.options.keep_alive=1s’ is needed to avoid having old data in the context window. The LLava model needs GPU acceleration for productive use. The LLava is only used on import, which means the creation of the descriptions could be done asynchronously. The LLava model on a medium-powered Laptop runs on a CPU, for 5-10 minutes per image. Such a solution for image searching is a leap forward compared to previous implementations. With more GPUs or CPU support for AI such Image search solutions will become much more popular.

AI Database PostgreSQL

Published at DZone with permission of Sven Loesekann. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Comparing Managed Postgres Options on The Azure Marketplace
  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • Database Query Service With OpenAI and PostgreSQL in .NET
  • PostgreSQL 12 End of Life: What to Know and How to Prepare

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!