Using Spring AI With LLMs to Generate Java Tests

Did you ever want to just generate your tests? The goal of this article is to test how well LLMs can help developers create tests.

Sven Loesekann

Sep. 30, 24 · Tutorial

Likes (7)

Comment

Save

7.4K Views

The AIDocumentLibraryChat project has been extended to generate test code (Java code has been tested). The project can generate test code for publicly available GitHub projects. The URL of the class to test can be provided then the class is loaded, the imports are analyzed and the dependent classes in the project are also loaded. That gives the LLM the opportunity to consider the imported source classes while generating mocks for tests. The testUrl can be provided to give an example to the LLM to base the generated test. The granite-code, deepseek-coder-v2, and codestral models have been tested with Ollama.

The goal is to test how well the LLMs can help developers create tests.

Implementation

Configuration

To select the LLM model the application-ollama.properties file needs to be updated:

    Properties files
   
 

   spring.ai.ollama.base-url=${OLLAMA-BASE-URL:http://localhost:11434}
spring.ai.ollama.embedding.enabled=false
spring.ai.embedding.transformer.enabled=true
document-token-limit=150
embedding-token-limit=500
spring.liquibase.change-log=classpath:/dbchangelog/db.changelog-master-ollama.xml

...

# generate code
#spring.ai.ollama.chat.model=granite-code:20b
#spring.ai.ollama.chat.options.num-ctx=8192

spring.ai.ollama.chat.options.num-thread=8
spring.ai.ollama.chat.options.keep_alive=1s

#spring.ai.ollama.chat.model=deepseek-coder-v2:16b
#spring.ai.ollama.chat.options.num-ctx=65536

spring.ai.ollama.chat.model=codestral:22b
spring.ai.ollama.chat.options.num-ctx=32768
  

The spring.ai.ollama.chat.model selects the LLM code model to use.

The spring.ollama.chat.options.num-ctx sets the number of tokens in the context window. The context window contains the tokens required by the request and the tokens required by the response.

The spring.ollama.chat.options.num-thread can be used if Ollama does not choose the right amount of cores to use. The spring.ollama.chat.options.keep_alive sets the number of seconds the context window is retained.

Controller

The interface to get the sources and to generate the test is the controller:

    Java
   
 

   @RestController
@RequestMapping("rest/code-generation")
public class CodeGenerationController {
  private final CodeGenerationService codeGenerationService;

  public CodeGenerationController(CodeGenerationService 
    codeGenerationService) {
    this.codeGenerationService = codeGenerationService;
  }

  @GetMapping("/test")
  public String getGenerateTests(@RequestParam("url") String url,
    @RequestParam(name = "testUrl", required = false) String testUrl) {
    return this.codeGenerationService.generateTest(URLDecoder.decode(url, 
      StandardCharsets.UTF_8),
    Optional.ofNullable(testUrl).map(myValue -> URLDecoder.decode(myValue, 
      StandardCharsets.UTF_8)));
  }

  @GetMapping("/sources")
  public GithubSources getSources(@RequestParam("url") String url, 
    @RequestParam(name="testUrl", required = false) String testUrl) {
    var sources = this.codeGenerationService.createTestSources(
      URLDecoder.decode(url, StandardCharsets.UTF_8), true);
    var test = Optional.ofNullable(testUrl).map(myTestUrl -> 
      this.codeGenerationService.createTestSources(
        URLDecoder.decode(myTestUrl, StandardCharsets.UTF_8), false))
          .orElse(new GithubSource("none", "none", List.of(), List.of()));
    return new GithubSources(sources, test);
  }
}
  

The CodeGenerationController has the method getSources(...). It gets the URL and optionally the testUrl for the class to generate tests for and for the optional example test. It decodes the request parameters and calls the createTestSources(...) method with them. The method returns the GithubSources with the sources of the class to test, its dependencies in the project, and the test example.

The method getGenerateTests(...) gets the url for the test class and the optional testUrl to be url decoded and calls the method generateTests(...) of the CodeGenerationService.

Service

The CodeGenerationService collects the classes from GitHub and generates the test code for the class under test.

The Service with the prompts looks like this:

    Java
   
 

   @Service
public class CodeGenerationService {
  private static final Logger LOGGER = LoggerFactory
    .getLogger(CodeGenerationService.class);
  private final GithubClient githubClient;
  private final ChatClient chatClient;
  private final String ollamaPrompt = """
    You are an assistant to generate spring tests for the class under test. 
    Analyse the classes provided and generate tests for all methods. Base  
    your tests on the example.
    Generate and implement the test methods. Generate and implement complete  
    tests methods.
    Generate the complete source of the test class.
					 
    Generate tests for this class:
    {classToTest}

    Use these classes as context for the tests:
    {contextClasses}

    {testExample}
  """;	
  private final String ollamaPrompt1 = """
    You are an assistant to generate a spring test class for the source 
    class.
    1. Analyse the source class
    2. Analyse the context classes for the classes used by the source class
    3. Analyse the class in test example to base the code of the generated 
       test class on it.
    4. Generate a test class for the source class and use the context classes 
       as sources for creating the test class. 
    5. Use the code of the test class as test example.
    6. Generate tests for each of the public methods of the source class. 
    Generate the complete source code of the test class implementing the 
    tests.						

    {testExample}

    Use these context classes as extension for the source class:
    {contextClasses}
			
    Generate the complete source code of the test class implementing the  
    tests.
    Generate tests for this source class:
    {classToTest}	
  """;
  @Value("${spring.ai.ollama.chat.options.num-ctx:0}")
  private Long contextWindowSize;

  public CodeGenerationService(GithubClient githubClient, ChatClient 
    chatClient) {
    this.githubClient = githubClient;
    this.chatClient = chatClient;
  }
  

This is the CodeGenerationService with the GithubClient and the ChatClient. The GithubClient is used to load the sources from a publicly available repository and the ChatClient is the Spring AI interface to access the AI/LLM.

The ollamaPrompt is the prompt for the IBM Granite LLM with a context window of 8k tokens. The {classToTest} is replaced with the source code of the class under test. The {contextClasses} can be replaced with the dependent classes of the class under test and the {testExample} is optional and can be replaced with a test class that can serve as an example for the code generation.

The ollamaPrompt2 is the prompt for the Deepseek Coder V2 and Codestral LLMs. These LLMs can "understand" or work with a chain of thought prompt and have a context window of more than 32k tokens. The {...} placeholders work the same as in the ollamaPrompt. The long context window enables the addition of context classes for code generation.

The contextWindowSize property is injected by Spring to control if the context window of the LLM is big enough to add the {contextClasses} to the prompt.

The method createTestSources(...) collects and returns the sources for the AI/LLM prompts:

    Java
   
 

   public GithubSource createTestSources(String url, final boolean 
  referencedSources) {
  final var myUrl = url.replace("https://github.com", 
    GithubClient.GITHUB_BASE_URL).replace("/blob", "");
  var result = this.githubClient.readSourceFile(myUrl);
  final var isComment = new AtomicBoolean(false);
  final var sourceLines = result.lines().stream().map(myLine -> 
      myLine.replaceAll("[\t]", "").trim())
    .filter(myLine -> !myLine.isBlank()).filter(myLine -> 
      filterComments(isComment, myLine)).toList();
  final var basePackage = List.of(result.sourcePackage()
    .split("\\.")).stream().limit(2)
    .collect(Collectors.joining("."));
  final var dependencies = this.createDependencies(referencedSources, myUrl, 
    sourceLines, basePackage);
  return new GithubSource(result.sourceName(), result.sourcePackage(), 
    sourceLines, dependencies);
}

private List<GithubSource> createDependencies(final boolean 
  referencedSources, final String myUrl, final List<String> sourceLines, 
  final String basePackage) {
  return sourceLines.stream().filter(x -> referencedSources)
    .filter(myLine -> myLine.contains("import"))
    .filter(myLine -> myLine.contains(basePackage))
    .map(myLine -> String.format("%s%s%s", 
      myUrl.split(basePackage.replace(".", "/"))[0].trim(),
	myLine.split("import")[1].split(";")[0].replaceAll("\\.", 
          "/").trim(), myUrl.substring(myUrl.lastIndexOf('.'))))
    .map(myLine -> this.createTestSources(myLine, false)).toList();
}

private boolean filterComments(AtomicBoolean isComment, String myLine) {
  var result1 = true;
  if (myLine.contains("/*") || isComment.get()) {
    isComment.set(true);
    result1 = false;
  }
  if (myLine.contains("*/")) {
    isComment.set(false);
    result1 = false;
  }
  result1 = result1 && !myLine.trim().startsWith("//");
  return result1;
}
  

The method createTestSources(...) with the source code of the GitHub source url and depending on the value of the referencedSources the sources of the dependent classes in the project provide the GithubSource records.

To do that the myUrl is created to get the raw source code of the class. Then the githubClient is used to read the source file as a string. The source string is then turned in source lines without formatting and comments with the method filterComments(...).

To read the dependent classes in the project the base package is used. For example in a package ch.xxx.aidoclibchat.usecase.service the base package is ch.xxx. The method createDependencies(...) is used to create the GithubSource records for the dependent classes in the base packages. The basePackage parameter is used to filter out the classes and then the method createTestSources(...) is called recursively with the parameter referencedSources set to false to stop the recursion. That is how the dependent class GithubSource records are created.

The method generateTest(...) is used to create the test sources for the class under test with the AI/LLM:

    Java
   
 

   public String generateTest(String url, Optional<String> testUrlOpt) {
  var start = Instant.now();
  var githubSource = this.createTestSources(url, true);
  var githubTestSource = testUrlOpt.map(testUrl -> 
    this.createTestSources(testUrl, false))
      .orElse(new GithubSource(null, null, List.of(), List.of()));
  String contextClasses = githubSource.dependencies().stream()
    .filter(x -> this.contextWindowSize >= 16 * 1024)
    .map(myGithubSource -> myGithubSource.sourceName() + ":"  + 
      System.getProperty("line.separator")	
      + myGithubSource.lines().stream()
        .collect(Collectors.joining(System.getProperty("line.separator")))
      .collect(Collectors.joining(System.getProperty("line.separator")));
  String testExample = Optional.ofNullable(githubTestSource.sourceName())
    .map(x -> "Use this as test example class:" + 
      System.getProperty("line.separator") +  
      githubTestSource.lines().stream()
        .collect(Collectors.joining(System.getProperty("line.separator"))))
    .orElse("");
  String classToTest = githubSource.lines().stream()
    .collect(Collectors.joining(System.getProperty("line.separator")));
  LOGGER.debug(new PromptTemplate(this.contextWindowSize >= 16 * 1024 ? 
    this.ollamaPrompt1 : this.ollamaPrompt, Map.of("classToTest", 
      classToTest, "contextClasses", contextClasses, "testExample", 
      testExample)).createMessage().getContent());
  LOGGER.info("Generation started with context window: {}",  
    this.contextWindowSize);
  var response = chatClient.call(new PromptTemplate(
    this.contextWindowSize >= 16 * 1024 ? this.ollamaPrompt1 :  
      this.ollamaPrompt, Map.of("classToTest", classToTest, "contextClasses", 
      contextClasses, "testExample", testExample)).create());
  if((Instant.now().getEpochSecond() - start.getEpochSecond()) >= 300) {
    LOGGER.info(response.getResult().getOutput().getContent());
  }
  LOGGER.info("Prompt tokens: " + 
    response.getMetadata().getUsage().getPromptTokens());
  LOGGER.info("Generation tokens: " + 
    response.getMetadata().getUsage().getGenerationTokens());
  LOGGER.info("Total tokens: " + 
    response.getMetadata().getUsage().getTotalTokens());
  LOGGER.info("Time in seconds: {}", (Instant.now().toEpochMilli() - 
    start.toEpochMilli()) / 1000.0);
  return response.getResult().getOutput().getContent();
}
  

To do that the createTestSources(...) method is used to create the records with the source lines. Then the string contextClasses is created to replace the {contextClasses} placeholder in the prompt. If the context window is smaller than 16k tokens the string is empty to have enough tokens for the class under test and the test example class. Then the optional testExample string is created to replace the {testExample} placeholder in the prompt. If no testUrl is provided the string is empty. Then the classToTest string is created to replace the {classToTest} placeholder in the prompt.

The chatClient is called to send the prompt to the AI/LLM. The prompt is selected based on the size of the context window in the contextWindowSize property. The PromptTemplate replaces the placeholders with the prepared strings.

The response is used to log the amount of the prompt tokens, the generation tokens, and the total tokens to be able to check if the context window boundary was honored. Then the time to generate the test source is logged and the test source is returned. If the generation of the test source took more than 5 minutes the test source is logged as protection against browser timeouts.

Conclusion

Both models have been tested to generate Spring Controller tests and Spring service tests. The test URLs have been:

http://localhost:8080/rest/code-generation/test?url=https://github.com/Angular2Guy/MovieManager/blob/master/backend/src/main/java/ch/xxx/moviemanager/adapter/controller/ActorController.java&testUrl=https://github.com/Angular2Guy/MovieManager/blob/master/backend/src/test/java/ch/xxx/moviemanager/adapter/controller/MovieControllerTest.java

http://localhost:8080/rest/code-generation/test?url=https://github.com/Angular2Guy/MovieManager/blob/master/backend/src/main/java/ch/xxx/moviemanager/usecase/service/ActorService.java&testUrl=https://github.com/Angular2Guy/MovieManager/blob/master/backend/src/test/java/ch/xxx/moviemanager/usecase/service/MovieServiceTest.java

The granite-code:20b LLM on Ollama has a context window of 8k tokens. That is too small to provide contextClasses and have enough tokens for a response. That means the LLM just had the class under test and the test example to work with.

The deepseek-coder-v2:16b and the 'codestral:22b' LLMs on Ollama have a context window of more than 32k tokens. That enabled the addition of the contextClasses to the prompt and the models can work with chain of thought prompts.

Results

The Granite-Code LLM was able to generate a buggy but useful basis for a Spring service test. No test worked but the missing parts could be explained with the missing context classes. The Spring Controller test was not so good. It missed too much code to be useful as a basis. The test generation took more than 10 minutes on a medium-power laptop CPU.

The Deepseek-Coder-V2 LLM was able to create a Spring service test with the majority of the tests working. That was a good basis to work with and the missing parts were easy to fix. The Spring Controller test had more bugs but was a useful basis to start from. The test generation took less than ten minutes on a medium-power laptop CPU.

The Codestral LLM was able to create a Spring service test with 1 test failing. That more complicated test needed some fixes. The Spring Controller test had also only 1 failing test case, but that was because a configuration call was missing that made the tests succeed without doing the testing. Both generated tests were a good starting point. The test generation took more than half an hour on a medium-power laptop CPU.

Opinion

The Deepseek-Coder-V2 and the Codestral LLMs can help with writing tests for Spring applications. Codestal is the better model but needs significantly more processing power and memory. For productive use, both models need GPU acceleration. The LLM is not able to create non-trivial code correctly, even with context classes available. The help a LLM can provide is very limited because LLMs do not understand the code. Code is just characters for a LLM and without an understanding of language syntax, the results are not impressive. The developer has to be able to fix all the bugs in the tests. That means it just saves some time typing the tests.

The experience with GitHub Copilot is similar to the Granite-Code LLM. As of September 2024, the context window is too small to do good code generation and the code completion suggestions need to be ignored too often.

Is a LLM a help -> yes.

Is the LLM a large timesaver -> no.

AI GitHub Java (programming language) Testing Spring Framework

Published at DZone with permission of Sven Loesekann. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending