DZone Spotlight

Sunday, June 21 View All Articles »

When Your Documentation Manages Itself: mdship and AI-Assisted Markdown

By Peter Verhas

CORE

If you write technical documentation in markdown, you already know the tension: some parts of your document are hand-written prose, while others — a table of contents, an included code snippet, a rendered diagram — are generated from somewhere else. How you handle that boundary says a lot about your workflow. Most documentation toolchains resolve it the same way preprocessors like PET or Jamal do: separate the source from the output. You maintain a template file, run a build step, and get a rendered document as the result. Clean, predictable, and easy to reason about — but it adds a build step, and the output file is not the thing you actually edit or share. mdship takes a different approach. It is a command-line tool and MCP server that edits your markdown in place: it reads the file, updates specific sections, and writes the result back to the same file. Everything else — your prose, your headings, your structure — is untouched. No separate output file, no build pipeline. The document you see is the document you ship. Think of it less like a preprocessor and more like a very opinionated editor that knows how to regenerate a table of contents, pull in a code snippet from another file, or render a Mermaid diagram — all within the file you are already editing. One File: The Trade-Off Working in a single file has real advantages for technical writers. The managed content — including snippets, generated TOC entries — is visible inline while you are editing. You can read the full document as your readers will see it, without switching to a preview mode or running a build. There is no output file to track separately, and markdown-aware tools like GitHub or your IDE render it correctly wherever it lives. The downside is equally real: because managed and hand-written content share the same file, it is easy to accidentally edit a section that is meant to be regenerated. You fix a typo in an included code snippet; on the next run, your fix is gone. You add a note inside a generated TOC block; mdship overwrites it without warning. Preprocessor tools sidestep this entirely. The source is one file, the output is another, and you never edit the output directly. The separation of concerns is clean. But you pay for it: every change requires a build step, the output is not portable without that step, and contributors who are not familiar with the toolchain may not know which file to edit. Neither model is universally better. mdship makes the pragmatic choice that for most documentation workflows, a single file with good guardrails beats a clean architecture that requires a build. Content Integrity: The Guardrail The guardrail is a checksum. Every time mdship writes content into a managed section — a TOC block, an INCLUDE block, a MERMAID block — it records a checksum of that content inside the opening placeholder marker, under a key called _content_generated_. On the next run, before overwriting anything, it verifies that the checksum still matches. If it does not, mdship stops and reports an error instead of silently discarding your edits. Plain Text ERROR: Placeholder TOC content was manually edited. Hash mismatch detected. Delete _content_generated_ line to override and accept data loss. This turns an accidental overwrite — which would otherwise be invisible until you notice the missing content — into an explicit decision. You can delete the _content_generated_ line to tell mdship "I know, proceed anyway," or you can pass --force on the command line to skip the check for a single run. Either way, you are opting in, not being surprised. AI-Generated Sections: The Same Idea, Extended The same pattern extends naturally to sections written by an LLM. mdship supports an  placeholder: an HTML comment embedded in the markdown file that contains a prompt. When you invoke the /ai-placeholder skill in Claude Code, it reads the prompt and writes the generated content between the opening and closing markers — directly into the file, in place, just like any other mdship operation. The workflow has three steps, enforced by the skill: Check: before writing anything, the skill calls mdship ai-check via MCP to verify that the existing content has not been manually edited since it was last generated. If the checksum does not match, the skill stops and reports the conflict to you rather than overwriting your edits.Generate: if the check passes (or there is no checksum yet, meaning the section is new), the LLM reads the prompt and writes the content.Seal: after writing, the skill calls mdship ai-fix via MCP to record a new checksum for the freshly generated content, protecting it against accidental edits until the next intentional update. The MCP integration means these calls happen automatically, as part of the skill's defined behavior — not as something the LLM has to remember to do. The Prompt Is Documentation, Too There is a subtler benefit to this approach that is easy to overlook. The prompt that instructs the LLM remains embedded in the file as a non-rendered HTML comment, right above the content it produced. It does not live in a commit message, a Jira ticket, or a separate prompt library that may be hard to find six months later. It is part of the document. This has practical consequences. If you need to regenerate a section — because the underlying API changed, or a referenced file was updated, or you simply want a fresh pass — you re-run the same prompt against the same file. The instruction is already there; you do not have to reconstruct it. The prompt can also reference external files: other documentation pages, source code, configuration files. If those change, rerunning the prompt automatically picks up the changes. The document becomes self-updating in the sense that the machinery to update it is built in. Conclusion mdship's in-place editing model and its LLM integration are two expressions of the same design choice: keep everything in one file, protect it with checksums, and let the tooling manage the regeneration cycle rather than the author. For technical writers, this means fewer context switches, no build step, and a document that carries both its content and the instructions for maintaining that content in a single portable file. The trade-off — shared space for managed and hand-written content — is managed by the checksum guardrail, which turns silent overwrites into explicit decisions. Whether the content is generated by mdship itself or by an LLM following an embedded prompt, the contract is the same: write it, seal it, and trust that the next update will ask before it overwrites. More

From printTriangularNumber to Duff’s Device: Mastering Java Switch Statements Old and New

By NaveenKumar Namachivayam

CORE

In this blog post, we will see how the humble Java switch statement evolved from a fall-through curiosity into a powerful expression, and how understanding its mechanics unlocks classic techniques like Duff's Device. Java's switch statement has evolved from a fall-through-prone construct into a modern expression syntax introduced in Java 14. The post traces this evolution using a concrete example, a method that computes triangular numbers by intentionally allowing execution to cascade through cases without break statements. The post also connects this behavior to Duff's Device, a 1983 loop-unrolling technique that uses deliberate fall-through to handle remainder elements before processing full blocks. A comparison of old and new switch syntax outlines trade-offs, and practical guidance is offered on when each form is appropriate. The Accidental Discovery I was prepping for the OCP Java 21 exam and stumbled across a tricky question. A method named question2 used a switch statement without any break statements. The output surprised me at first. Once I traced through it, I renamed the method to printTriangularNumber. That one rename told the whole story. This post dives into why. The Old Switch Statement The traditional switch statement has been part of Java since day one. The syntax looks like this: Java int day = 3; switch (day) { case 1: System.out.println("Monday"); break; case 2: System.out.println("Tuesday"); break; case 3: System.out.println("Wednesday"); break; default: System.out.println("Unknown"); break; } As shown above, every case ends with a break. Without it, execution does not stop. It keeps going into the next case. The old switch works on int, char, String, and enum types. Fall-Through: Feature or Bug? The most misunderstood behavior in switch is fall-through. When you omit break, execution literally falls into the next case. Java int x = 2; switch (x) { case 3: System.out.println("three"); case 2: System.out.println("two"); // jumps here case 1: System.out.println("one"); // falls through default: System.out.println("done"); // falls through } Output: Plain Text two one done Most developers treat this as a bug waiting to happen. They are not wrong. Forgetting a break is one of the most common Java mistakes. But intentional fall-through is a different story. It is a deliberate tool. And printTriangularNumber is the perfect example. printTriangularNumber: Fall-Through in Action Here is the method I renamed from question2 during my OCP prep: Java private static void printTriangularNumber(int n) { int res = 0; switch (n) { case 5: res += 5; case 4: res += 4; case 3: res += 3; case 2: res += 2; case 1: res += 1; default: break; } System.out.println(res == 0 ? "Ok, bye." : res); Let us trace through n = 4: Jumps to case 4, adds 4. res = 4 Falls to case 3, adds 3. res = 7 Falls to case 2, adds 2. res = 9 Falls to case 1, adds 1. res = 10 Hits default, breaks Output: 10 The pattern for each input: nResultFormula111232+1363+2+14104+3+2+15155+4+3+2+1 This is n * (n + 1) / 2, the triangular number formula. The fall-through is doing the summation for you. Each case accumulates the remaining values by simply not stopping. For n = 0 or any value above 5, no case matches, default fires immediately, and res stays 0. The ternary prints "Ok, bye.". I personally find it a beautiful example of using language semantics intentionally. This is also the kind of question the OCP exam loves to throw at you. The New Switch Expression (Java 14+) Java 14 introduced switch expressions as a standard feature. The arrow syntax -> eliminates fall-through entirely. Each arm is independent. Java int day = 3; String name = switch (day) { case 1 -> "Monday"; case 2 -> "Tuesday"; case 3 -> "Wednesday"; default -> "Unknown"; }; System.out.println(name); // Wednesday A few things to notice here: Switch is now an expression. It returns a value. The arrow -> replaces : and break together. No fall-through. Each arm executes independently. Multiple labels on a single arm: case 1, 7 -> "Weekend"; You can also use it inline: Java System.out.println(switch (day) { case 1, 7 -> "Weekend"; default -> "Weekday"; }); Much cleaner. Much safer. Switch Expressions With Yield Sometimes you need more than a single expression in an arm. That is where yield comes in. Java int n = 4; int result = switch (n) { case 1, 2 -> n * 10; case 3, 4 -> { int temp = n * n; System.out.println("Computing for: " + n); yield temp; // return value from block } default -> 0; }; System.out.println(result); // 16 Think of yield as the return statement for a switch block arm. You need it whenever the arm has multiple statements inside {}. A common mistake is using return instead of yield inside a switch expression block. That compiles only inside a method and it returns from the entire method, not just the switch. Always use yield inside switch expression blocks. Duff's Device: Fall-Through Taken to the Extreme Now that we understand fall-through well, let us look at the most famous intentional use of it: Duff's Device. Tom Duff invented this in 1983 to speed up memory copy operations by reducing loop branch overhead. The trick is to unroll the copy loop and use a switch to jump into the middle of it based on the remainder. In Java, we replicate it in two clean phases since Java does not allow interleaved switch+loop syntax: Java public static void duffCopy(int[] src, int[] dst, int n) { int i = 0; int rem = n % 4; // Phase 1: handle remainder via fall-through switch (rem) { case 3: dst[i] = src[i]; i++; case 2: dst[i] = src[i]; i++; case 1: dst[i] = src[i]; i++; case 0: break; } // Phase 2: full blocks of 4 int fullBlocks = (n - rem) / 4; while (fullBlocks-- > 0) { dst[i] = src[i]; i++; dst[i] = src[i]; i++; dst[i] = src[i]; i++; dst[i] = src[i]; i++; } } Let us trace through n = 13: rem = 13 % 4 = 1 Switch jumps to case 1, copies 1 element. i = 1 fullBlocks = (13 - 1) / 4 = 3 Loop runs 3 times, copying 4 elements each time Total: 1 + 12 = 13 elements The Python equivalent makes the two phases explicit: Python def duff_copy(src, n): dst = [None] * n rem = n % 4 for i in range(rem): # Phase 1: remainder dst[i] = src[i] i = rem while i < n: # Phase 2: full blocks dst[i] = src[i] dst[i+1] = src[i+1] dst[i+2] = src[i+2] dst[i+3] = src[i+3] i += 4 return dst The connection to printTriangularNumber is direct. Both use fall-through intentionally. In printTriangularNumber, the switch jumps to the right case and accumulates downward. In Duff's Device, the switch jumps to the right case and copies the remainder before the main loop takes over. Old vs. New Switch at a Glance FeatureOld Switch (:)New Switch (->)Fall-throughYes (default)NoReturns valueNoYesbreak neededYesNoMultiple labelsNoYes (case 1, 2 ->)Block with yieldNoYesNull safeNoYes (Java 21 preview)OCP exam topicYesYes Which One Should You Use? For new code, always prefer the switch expression with ->. It is safer, cleaner, and expressive. Your reviewers will thank you. Reserve the old switch with fall-through only when you genuinely need the cascading behavior, like in printTriangularNumber or a hand-tuned loop like Duff's Device. In those cases, add a comment explaining the intent. Otherwise, the next developer (including future you) will assume the break is missing by accident. My personal observation: the OCP Java 21 exam tests both heavily. Knowing when fall-through is intentional versus accidental is the key distinction examiners probe. Make sure you can trace through any switch block without running it. Happy testing! What is your take: is intentional fall-through clever engineering or a maintenance nightmare waiting to happen? Drop your thoughts below! More

Trend Report

Cognitive Databases, Intelligent Data

No longer passive storage and query engines, databases are becoming active, intelligent participants in how modern systems interpret, connect, and act on data. As AI moves deeper into production and enterprises adopt generative and agentic architectures, the database layer is being reshaped to support semantic search, contextual retrieval, and real-time decision-making. Vector databases, semantic indexing, and AI-driven optimization are changing how developers work with both structured and unstructured data, while the line between transactional and analytical systems continues to fade under hybrid workload demands.This report examines these industry shifts in practical terms, exploring how relational, NoSQL, vector, and multi-model systems are coming together to support AI-native applications. Our research, guest thought leadership, and practitioner insights look at how teams are bringing vector search into production, updating architectures for AI workloads, and redesigning data pipelines around semantic and contextual intelligence.

Refcard #403

Shipping Production-Grade AI Agents

By Vidyasagar (Sarath Chandra) Machupalli FBCS

CORE

Refcard #388

Threat Modeling Core Practices

By Apostolos Giannakidis

CORE

Context Rot: Why Your AI Agent Gets Worse the Longer It Works

AI-powered features often behave perfectly during testing and quietly degrade in production. The model has not changed. The prompts have not changed. Latency looks normal. Error rates are clean. Yet the responses gradually feel off, slightly disconnected, missing nuance, referencing things that are no longer relevant to the task at hand. This pattern has a name: context rot. It does not throw exceptions. It does not appear in dashboards. It is one of the more subtle failure modes in production AI systems, and understanding it early makes a meaningful difference in the quality of what gets built. How Attention Works in LLMs To understand context rot, just enough of the underlying mechanic is needed. Before an LLM generates each new token, it looks at every token in the context and decides how much weight to give each one. This is called attention. The key insight: attention scores are normalized, and they sum to 1.0 across all tokens. That means attention is a fixed budget. When the context has 500 tokens, each important piece of information might receive 0.15 or 0.20 of the total attention. When the context has 50,000 tokens, that same important piece might receive only 0.002, even if it is equally critical to the task. Java // Simplified illustration — not actual LLM code public float[] generateNextToken(String[] contextTokens) { float[] scores = new float[contextTokens.length]; for (int i = 0; i < contextTokens.length; i++) { // How relevant is each past token to what we are generating? scores[i] = computeRelevance(currentState, contextTokens[i]); } // Scores must sum to 1.0 — a fixed attention budget float[] weights = softmax(scores); return weightedCombination(contextTokens, weights); } Every token added, relevant or not, slightly dilutes the attention available for everything else. That is the seed of context rot. Context Position and Attention A well-known multi-document question-answering experiment revealed something that should give every engineer building AI systems reason to pause. The correct answer was hidden at different positions across a long context, and retrieval accuracy was measured purely by position: Answer at the beginning: ~75% accuracyAnswer at the end: ~72% accuracyAnswer in the middle: ~55% accuracy A 20 percentage point drop caused entirely by where the information sat, not by its quality or relevance. The information was present. The model could technically see it. It simply was not attending to it properly. This is known as the Lost-in-the-Middle effect. It is an emergent architectural property of the transformer training process itself. Models learn to attend strongly to the beginning and end of their inputs, where the most signal-dense content tends to appear in human writing. The middle of a long context becomes an attention dead zone as a natural consequence of how these models are trained, not as an oversight. Does this still apply to modern models? The honest answer is: yes, with important nuance. Newer models have largely resolved the effect for simple factoid retrieval — finding a specific fact at a specific position in a long context is something recent architectures handle well. The problem persists, and arguably intensifies, on multi-step reasoning tasks where the model must synthesize information across several documents simultaneously. That is precisely the category most production AI systems fall into, so the practical risk remains significant even as benchmark numbers improve. What Context Rot Looks Like in Practice Scenario 1: The wandering coding agent. An agent is asked to fix a bug. It reads 15 files, explores 3 wrong leads, and backtracks. Each file, each search result, each dead end accumulates in context. By the time the agent finds the right file, buried in the middle of 20,000 tokens, attention is spread thin. The analysis of the one file that actually matters is noticeably weaker than it would have been with a clean context. Scenario 2: The RAG pipeline that drifts. A retrieval pipeline fetches 10 document chunks per query, roughly 5,000 tokens. For most queries, this works fine. But longer queries trigger larger system prompts and conversation history. Total context grows to 40,000 tokens, and the documents retrieved third and fourth, sitting in the middle, fall into the attention dead zone. The model answers confidently, drawing on what it can see well. A crucial nuance from chunk 4 gets missed. The pattern is always the same: no error, no warning, just answers that are subtly less accurate than they should be. How to Detect It Step 1: Log context length alongside every LLM call. What cannot be measured cannot be managed. Step 2: Run a positional accuracy test. Place a key fact at different positions in a realistic context and check whether the model retrieves it correctly. Java public void positionalAccuracyTest(LlmClient client, String keyFact, String fillerText) { double[] positions = {0.1, 0.5, 0.9}; // beginning, middle, end for (double pos : positions) { int split = (int) (fillerText.length() * pos); String context = fillerText.substring(0, split) + "\nKEY: " + keyFact + "\n" + fillerText.substring(split); String response = client.complete(context, "Summarise the most important information from the context."); boolean found = response.toLowerCase().contains(keyFact.toLowerCase()); System.out.printf("Position %d%%: %s%n", (int)(pos * 100), found ? "RECALLED" : "MISSED"); } } If the model passes at 10% and 90% but fails at 50%, context rot is measurable in that system at that context length. Step 3: Alert on context length thresholds. Set a warning at around 50,000 tokens and a hard alert at 100,000. These are starting points — the positional accuracy test above will help calibrate the right numbers for a specific model and task type. Context Rot Is Also a Cost Problem Most conversations about context rot focus on quality, and rightly so. But at any meaningful scale, it is equally a financial problem, and that dimension tends to get overlooked until the infrastructure bill arrives. LLM providers charge by the token. Every token in the context window is billed on every single call. A context that has grown to 80,000 tokens costs roughly 8x more per call than one held at 10,000 tokens, for the same task, often with worse output quality. That is not a trade-off; it is strictly worse in both dimensions simultaneously. The exact cost per token varies by provider and model tier, but the ratio holds universally — longer context means a proportionally larger bill. The compute reality makes this more pronounced. Transformer attention scales quadratically with context length. Doubling the number of tokens does not double the compute required; it roughly quadruples it. At low volumes, this is invisible. With millions of calls per day, it becomes one of the largest line items in an AI system's operating cost. The numbers are illustrative, but the ratio is the point. A context that has grown to 80,000 tokens costs roughly 8x more per call than one held at 10,000 tokens, for the same task, often with worse output quality. That is not a trade-off; it is strictly worse in both dimensions simultaneously. Context rot at scale is not a quality inconvenience. It is a budget problem. Compaction, precise retrieval, and subagent isolation are not just engineering best practices; they are cost controls. 4 Practical Mitigations 1. Compact early — do not wait until quality degrades. Summarize older conversation turns before the context gets large, not after the damage is done. Java public List<Message> compactIfNeeded(List<Message> messages, LlmClient client) { int limit = 30_000; if (estimateTokens(messages) < limit) return messages; // Need at least a system prompt + messages to summarise + recent turns if (messages.size() < 7) return messages; // Everything except system prompt and last 5 turns List<Message> older = messages.subList(1, messages.size() - 5); String summary = client.complete("Summarise concisely: " + format(older)); List<Message> compacted = new ArrayList<>(); compacted.add(messages.get(0)); // system prompt compacted.add(new Message("system", summary)); compacted.addAll(messages.subList(messages.size() - 5, messages.size())); return compacted; } 2. Use subagents for exploration. When an agent needs to search or explore, do it in a dedicated subagent with its own context window. Only the compact result, not the exploration trace, returns to the parent agent. Noise stays isolated. 3. In RAG, retrieve less and rerank. Three precisely relevant chunks consistently outperform ten loosely relevant ones. Retrieval quantity does not equal retrieval quality. Fetch a wider candidate set, rerank by relevance, and pass only the top results to the model. 4. Position critical content deliberately. Given what is known about the attention curve, the most important context belongs at the beginning or end, not sandwiched in the middle. The system prompt and the current user query naturally occupy those positions. Keep them there, and be intentional about what fills the space between. What This Means at Each Level For early-career engineers: when an AI feature works in local testing but feels off in production, check context length first. Adding llm.context_tokens to an observability stack, alongside latency and error rate, is a small change with a meaningful signal. For tech leads and architects: context is not a free resource. Every design session for an LLM-powered feature should include a clear answer to "what is in this context window and why?" If that question cannot be answered clearly, the design is incomplete. For engineering managers and leaders: context rot does not appear in standard dashboards. Error rate and latency can look perfectly healthy while response quality silently degrades. Correlating context length with downstream quality metrics, task success rates, and user satisfaction is the monitoring work that production AI systems now require. Conclusion Context rot is one of those concepts that feels advanced until it is encountered in production, and then it feels like something that should have been understood from day one. The core reality is simple: transformer attention is a finite, dilutable resource. Every token added to a context window reduces the focus available for everything else. When contexts grow long, and important information ends up in the middle, quality degrades in ways that are real, measurable, and unfortunately silent. The good news is that it is manageable. Compact early. Isolate exploration into subagents. Be precise with retrieval. Position critical content deliberately. None of these requires advanced machine learning knowledge; they are engineering disciplines applied to a new kind of resource. The mental model that tends to help most is treating context the way experienced engineers treat memory: allocate it deliberately, release what is no longer needed, and keep the working set small and focused. The models are already capable of doing remarkable work, if given a clean signal and kept free of noise.

By Vineet Bhatkoti

Top Java Security Vulnerabilities and How to Prevent Them in Modern Java

With the increasing number of security threats, organizations have invested heavily in cybersecurity initiatives to protect their applications, infrastructure, and sensitive data. Security vulnerabilities are rarely introduced intentionally. Most of them creep into applications through shortcuts, overlooked edge cases, outdated libraries, or some bad coding habits. Modern Java has significantly improved its security capabilities, but no framework or JVM version can completely protect an application from insecure coding practices. As developers, we still need to understand where vulnerabilities originate and how to prevent them before they reach production. In this article, I am trying to summarize some of the most common Java security vulnerabilities and practical techniques used to prevent them. These are the same security best practices and lessons learned that I frequently share with new team members joining my team. I am sharing them here in the hope that they can serve as a practical handbook for Java developers looking to build more secure applications. 1. SQL Injection SQL injection remains one of the oldest and most dangerous vulnerabilities. It occurs when user input is directly concatenated into SQL statements. Consider the following example: Java String query = "SELECT * FROM users WHERE username = '" + username + "'"; Statement stmt = connection.createStatement(); ResultSet rs = stmt.executeQuery(query); If an attacker enters, the query can be manipulated to return unintended results. SQL admin' OR '1'='1 Prevention Always use parameterized queries. Java String query = "SELECT * FROM users WHERE username = ?"; PreparedStatement stmt = connection.prepareStatement(query); stmt.setString(1, username); ResultSet rs = stmt.executeQuery(); Prepared statements separate data from executable SQL, eliminating injection opportunities. 2. Hardcoded Secrets One of the most common findings during security reviews is hardcoded credentials. Java private static final String API_KEY = "abcd123456789"; This may seem harmless during development, but once committed to source control, secrets often remain exposed indefinitely. Prevention Store secrets externally. SQL String apiKey = System.getenv("PAYMENT_API_KEY"); Better alternatives are to include it in AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, or Kubernetes Secrets. Secrets should never live inside source code repositories. 3. Insecure Deserialization Java serialization has been responsible for numerous security incidents. Example: Java ObjectInputStream input = new ObjectInputStream(request.getInputStream()); Object obj = input.readObject(); The danger is that attackers can craft malicious serialized objects that execute unexpected code during deserialization. Prevention Avoid Java serialization whenever possible. Prefer formats such as JSON, XML (with secure parsing), or Protocol Buffers. Example using Jackson: Java ObjectMapper mapper = new ObjectMapper(); User user = mapper.readValue(json, User.class); Using structured formats reduces attack surfaces significantly. 4. Cross-Site Scripting (XSS) Although often associated with front-end applications, backend services can accidentally enable XSS vulnerabilities when user-generated content is returned without sanitization. Example: Java String comment = request.getParameter("comment"); response.getWriter().write(comment); If the user submits, the browser executes the script. HTML <script>alert('Hacked')</script> Prevention Always encode output. Using Spring: Java String safeComment = HtmlUtils.htmlEscape(comment); Additionally, validate inputs, sanitize rich text, and implement Content Security Policies (CSP). 5. Path Traversal Attacks File download functionality often introduces path traversal vulnerabilities. Example: Java String file = request.getParameter("file"); Path path = Paths.get("/documents/" + file); An attacker could submit and potentially access sensitive files. Shell ../../../etc/passwd Prevention Normalize and validate paths. Java Path base = Paths.get("/documents"); Path resolved = base.resolve(file).normalize(); if (!resolved.startsWith(base)) { throw new SecurityException( "Invalid file path"); } Never trust file names coming directly from user input 6. Weak Password Storage Storing passwords improperly remains surprisingly common. Bad practice: Java String passwordHash = DigestUtils.md5Hex(password); MD5 and SHA-1 are no longer considered secure for password storage. Prevention Use adaptive hashing algorithms. Example with BCrypt: Java BCryptPasswordEncoder encoder = new BCryptPasswordEncoder(); String hash = encoder.encode(password); BCrypt automatically includes salting and work-factor adjustments. Other strong alternatives include Argon2, PBKDF2 or SCrypt 7. Dependency Vulnerabilities Modern Java applications often contain more third-party code than custom code. A secure application can still become vulnerable because of outdated dependencies. Prevention Integrate dependency scanning into CI/CD pipelines. Example Maven plugin: XML <plugin> <groupId>org.owasp</groupId> <artifactId>dependency-check-maven</artifactId> </plugin> Additionally, tools such as Snyk can automatically identify known vulnerabilities. We have been using Snyk for the last couple of years, and it is effective. Regular dependency updates should be part of every release cycle. 8. Improper Logging of Sensitive Data Developers often log information for troubleshooting without considering security implications. Example: Java logger.info( "Login request received for user={} password={}", username, password); This exposes credentials inside log files. Prevention Mask or exclude sensitive information. Java logger.info( "Login request received for user={}", username); Never log passwords, access tokens, credit card information, Personal health information (PHI), or PII information. This is especially important in regulated industries such as healthcare, like ours. 9. Insufficient Authentication and Authorization Authentication verifies identity, and authorization determines access. Many applications perform authentication correctly but fail to enforce authorization consistently. Example: Java @GetMapping("/admin/users") public List<User> getUsers() { return userService.findAll(); } Without authorization checks, any authenticated user might gain access. Prevention Use role-based security. Java @PreAuthorize("hasRole('ADMIN')") @GetMapping("/admin/users") public List<User> getUsers() { return userService.findAll(); } Security should be enforced at every layer, not just the UI. 10. Lack of Input Validation Many vulnerabilities originate from accepting unexpected input. Example: Java String age = request.getParameter("age"); int userAge = Integer.parseInt(age); Invalid input can cause exceptions or unexpected behavior. Prevention Validate all external input. Java @Min(18) @Max(120) private Integer age; Bean Validation provides a simple and consistent approach for validating request payloads. Never assume user input is safe. Final Thoughts Security is not a feature that can be added at the end of a project. It needs to be part of the development process from the very beginning. The vulnerabilities discussed here are not theoretical. They are among the most common findings during security assessments, penetration tests, and production incident investigations. Fortunately, modern Java provides mature frameworks, libraries, and tools that make secure development significantly easier than it was a decade ago. The key is building security awareness into everyday development practices: Use parameterized queriesProtect secrets properlyValidate all inputsKeep dependencies updatedApply strong authentication and authorizationLog responsiblyContinuously scan for vulnerabilities Security is ultimately about reducing risk. Small improvements applied consistently across a codebase can prevent incidents that would otherwise become expensive lessons later.

By Muhammed Harris Kodavath

Amazon CodeWhisperer to Q Developer to Kiro: The Rise of Agentic Coding

The Abrupt End of Amazon Q Developer In May 2026, AWS dropped a bombshell: Amazon Q Developer IDE plugins and paid subscriptions will reach end-of-support on April 30, 2027, with new signups blocked as of May 15, 2026. The successor? Kiro — AWS's next-generation AI IDE that reframes how engineers build software from scratch. If you're a backend engineer who has been relying on Q Developer for code completion, inline chat, and security scanning inside VS Code or JetBrains, the clock is ticking. But before you begrudgingly migrate, it's worth understanding why this transition is happening, what Kiro actually offers, and whether the trade-offs are worth it — especially in production backend contexts like microservices, distributed systems, and observability pipelines. Historical Context: From CodeWhisperer to Q Developer to Kiro AWS's AI coding journey started with Amazon CodeWhisperer (launched in preview in 2022), which was a single-model code suggestion tool — think GitHub Copilot, but AWS-native. It supported security scanning against common vulnerability patterns and could suggest AWS SDK calls contextually. In early 2023, AWS folded CodeWhisperer into the broader Amazon Q branding — an umbrella AI assistant that spanned not just code but AWS console assistance, documentation search, and operational queries. Q Developer became the IDE-facing arm of that product. The problem? Q Developer tried to be everything: a coding assistant, a console assistant, a documentation bot, and a security scanner all jammed into one plugin. Feedback from engineering teams consistently pointed to context window limitations, poor multi-file understanding, and weak support for complex backend architectures spanning multiple services. Kiro is AWS's response. Built from the ground up with "spec-driven development" as its core philosophy, Kiro is less of an autocomplete engine and more of an agentic coding environment — it can plan, scaffold, and implement across your entire project tree, not just the file you have open. Architecture Comparison The architectural difference is significant. Q Developer operated in a request-response model where you asked a question or triggered a completion and got a result. Kiro introduces hooks — lifecycle-aware automations that fire when you save a file, open a PR, or change a spec. This is closer to how CI/CD pipelines work, and backend engineers will immediately recognize the paradigm. Feature-by-Feature Breakdown FeatureAmazon Q DeveloperAmazon KiroMulti-file contextLimited (single file primary)Full project treeAgentic task executionNoYes (plan → implement → test)Spec-driven developmentNoYes (SPEC.md driven)MCP integrationNoYes (external tool calls)Security scanningYes (CodeWhisperer rules)Yes (enhanced)JetBrains supportYesYesVS Code supportYesYesAWS Free Tier accessYesYes (via AIdeas Competition)Paid subscription$19/mo (deprecated)Separate Kiro pricingEnd of supportApril 30, 2027Active Production Code Example 1: Spec-Driven Microservice Scaffolding With Kiro One of Kiro's most powerful features is its SPEC.md-driven workflow. Instead of writing code and hoping the AI helps, you write a specification and Kiro implements it. Here's what that looks like for a backend order processing microservice. TypeScript // SPEC.md concept implemented as TypeScript types // Kiro reads your spec and generates this scaffolding import { Logger } from '@aws-lambda-powertools/logger'; import { Tracer } from '@aws-lambda-powertools/tracer'; import { DynamoDBClient, PutItemCommand, GetItemCommand } from '@aws-sdk/client-dynamodb'; import { SQSClient, SendMessageCommand } from '@aws-sdk/client-sqs'; import { marshall, unmarshall } from '@aws-sdk/util-dynamodb'; const logger = new Logger({ serviceName: 'order-service', logLevel: 'INFO' }); const tracer = new Tracer({ serviceName: 'order-service' }); const ddb = tracer.captureAWSv3Client(new DynamoDBClient({})); const sqs = tracer.captureAWSv3Client(new SQSClient({})); interface Order { orderId: string; customerId: string; items: Array<{ sku: string; qty: number; price: number }>; status: 'PENDING' | 'CONFIRMED' | 'SHIPPED' | 'CANCELLED'; createdAt: string; } interface CreateOrderResult { success: boolean; orderId?: string; error?: string; } // Kiro-generated handler with full error handling + structured logging export const createOrder = async ( order: Omit<Order, 'orderId' | 'status' | 'createdAt'> ): Promise<CreateOrderResult> => { const segment = tracer.getSegment(); const subsegment = segment?.addNewSubsegment('createOrder'); try { const orderId = `ORD-${Date.now()}-${Math.random().toString(36).slice(2, 7).toUpperCase()}`; const newOrder: Order = { ...order, orderId, status: 'PENDING', createdAt: new Date().toISOString(), }; logger.info('Creating order', { orderId, customerId: order.customerId, itemCount: order.items.length }); // Persist to DynamoDB await ddb.send(new PutItemCommand({ TableName: process.env.ORDERS_TABLE!, Item: marshall(newOrder), ConditionExpression: 'attribute_not_exists(orderId)', // idempotency guard })); // Publish to downstream processing queue await sqs.send(new SendMessageCommand({ QueueUrl: process.env.ORDER_QUEUE_URL!, MessageBody: JSON.stringify(newOrder), MessageGroupId: order.customerId, // FIFO ordering per customer MessageDeduplicationId: orderId, })); logger.info('Order created and queued', { orderId }); return { success: true, orderId }; } catch (error) { const err = error as Error; logger.error('Failed to create order', { error: err.message, stack: err.stack }); subsegment?.addError(err); return { success: false, error: err.message }; } finally { subsegment?.close(); } What Q Developer would do: Suggest inline completions line-by-line based on your cursor position. What Kiro does: Reads your SPEC.md that says "Create an order service with DynamoDB persistence, SQS publishing, idempotency, and X-Ray tracing" — and generates the entire file, including imports, error handling, and the logging pattern your team already uses (learned from your codebase). Production Code Example 2: Using Kiro Hooks for Automatic Test Generation Kiro's hook system is where backend engineers will find the most leverage. A hook is a YAML-defined automation that triggers on file system events within your project. YAML # .kiro/hooks/auto-test.yaml name: Generate Unit Tests on Save trigger: event: file_saved pattern: "src/**/*.ts" exclude: "**/*.test.ts" actions: - type: agent_task prompt: | A TypeScript file was just saved at {{file_path}. Review the exported functions. For any function that does not have a corresponding test in {{file_path_without_ext}.test.ts, generate comprehensive unit tests using Vitest. Include: - Happy path tests - Error boundary tests (network failures, malformed input) - Edge cases for empty arrays and null values Use @aws-sdk/client-dynamodb mocks from @aws-sdk/lib-dynamodb MockDocumentClient. output_file: "{{file_path_without_ext}.test.ts" mode: merge # Don't overwrite existing tests, only append missing ones YAML // Auto-generated test from the hook above (Vitest) import { describe, it, expect, vi, beforeEach } from 'vitest'; import { mockClient } from 'aws-sdk-client-mock'; import { DynamoDBClient, PutItemCommand } from '@aws-sdk/client-dynamodb'; import { SQSClient, SendMessageCommand } from '@aws-sdk/client-sqs'; import { createOrder } from './order-service'; const ddbMock = mockClient(DynamoDBClient); const sqsMock = mockClient(SQSClient); describe('createOrder', () => { beforeEach(() => { ddbMock.reset(); sqsMock.reset(); process.env.ORDERS_TABLE = 'test-orders'; process.env.ORDER_QUEUE_URL = 'https://sqs.us-east-1.amazonaws.com/123/orders.fifo'; }); it('should create order and return orderId on success', async () => { ddbMock.on(PutItemCommand).resolves({}); sqsMock.on(SendMessageCommand).resolves({ MessageId: 'msg-123' }); const result = await createOrder({ customerId: 'cust-001', items: [{ sku: 'SKU-A', qty: 2, price: 29.99 }], }); expect(result.success).toBe(true); expect(result.orderId).toMatch(/^ORD-/); }); it('should return error when DynamoDB PutItem fails', async () => { ddbMock.on(PutItemCommand).rejects(new Error('ProvisionedThroughputExceededException')); const result = await createOrder({ customerId: 'cust-001', items: [{ sku: 'SKU-A', qty: 1, price: 9.99 }], }); expect(result.success).toBe(false); expect(result.error).toContain('ProvisionedThroughputExceededException'); }); it('should handle empty items array gracefully', async () => { ddbMock.on(PutItemCommand).resolves({}); sqsMock.on(SendMessageCommand).resolves({}); const result = await createOrder({ customerId: 'cust-001', items: [] }); expect(result.success).toBe(true); }); }); This hook fires every time you save a .ts file, checks if tests are missing, and generates them — without you ever typing a prompt. Production Code Example 3: Migrating Q Developer Inline Chat Workflows to Kiro If you've built muscle memory around Q Developer's /dev inline chat, here's how equivalent workflows map in Kiro. TypeScript // Q Developer workflow (deprecated): // You'd type: /dev add retry logic with exponential backoff to this fetch call // And it would modify the highlighted code block. // Kiro equivalent: Add to your SPEC.md or use agent chat with file pinning: // "Add retry logic with exponential backoff to src/clients/http-client.ts" // The result Kiro produces — production-grade with observability: import { Logger } from '@aws-lambda-powertools/logger'; const logger = new Logger({ serviceName: 'http-client' }); interface RetryOptions { maxAttempts: number; baseDelayMs: number; maxDelayMs: number; retryableStatusCodes: number[]; } const DEFAULT_RETRY_OPTIONS: RetryOptions = { maxAttempts: 3, baseDelayMs: 100, maxDelayMs: 5000, retryableStatusCodes: [429, 500, 502, 503, 504], }; export async function fetchWithRetry<T>( url: string, options: RequestInit = {}, retryOpts: Partial<RetryOptions> = {} ): Promise<T> { const opts = { ...DEFAULT_RETRY_OPTIONS, ...retryOpts }; let lastError: Error | null = null; for (let attempt = 1; attempt <= opts.maxAttempts; attempt++) { try { const response = await fetch(url, options); if (!response.ok && opts.retryableStatusCodes.includes(response.status)) { const delay = Math.min(opts.baseDelayMs * Math.pow(2, attempt - 1), opts.maxDelayMs); logger.warn('Retryable HTTP error, backing off', { url, status: response.status, attempt, delayMs: delay, }); await new Promise(res => setTimeout(res, delay)); continue; } if (!response.ok) { throw new Error(`HTTP ${response.status}: ${response.statusText}`); } return response.json() as Promise<T>; } catch (error) { lastError = error as Error; if (attempt === opts.maxAttempts) break; const delay = Math.min(opts.baseDelayMs * Math.pow(2, attempt - 1), opts.maxDelayMs); logger.warn('Request failed, retrying', { url, attempt, delayMs: delay, error: lastError.message }); await new Promise(res => setTimeout(res, delay)); } } logger.error('All retry attempts exhausted', { url, maxAttempts: opts.maxAttempts }); throw lastError ?? new Error('Unknown fetch error after retries'); When to Migrate Now vs. Wait Migrate now if: You're starting a new service or greenfield project — Kiro's spec-driven approach saves the most time at project inceptionYour team does heavy test generation — the hook system is a net productivity winYou're building MCP-integrated tooling or AWS-native agentic workflows Wait if: You have a heavily customized Q Developer security scanning ruleset — give the Kiro security scanner time to matureYou're on a locked-down enterprise network — Kiro's agentic features require broader outbound connectivity than Q Developer's plugin model Performance and Productivity Metrics MetricAmazon Q DeveloperAmazon Kiro (early data)Avg. context window (tokens)~16K~128K+Multi-file edits per session1-210-20+Test coverage improvement~15%~35% (with hooks)Time to scaffold new service~2-3 hrs manual~20-40 min spec-drivenSecurity scan languages1520+ Summary The Q Developer → Kiro transition isn't just a rebranding. It's a fundamental shift from a reactive autocomplete tool to a proactive agentic development environment. For backend engineers building distributed systems on AWS, Kiro's spec-driven planning, multi-file context, and hook-based automation represent a genuine productivity leap — not just an incremental update. Start your migration now. The deprecation deadline of April 2027 sounds far off, but enterprise procurement, security reviews, and team retraining take time. Get ahead of it. References AWS: Amazon Q Developer End-of-Support Announcement — AWS News Blog, May 2026AWS: Top Announcements of What's Next with AWS 2026 — AWS News Blog, April 2026AWS Lambda Powertools for TypeScript — Official DocumentationAWS SDK Client Mock — GitHubKiro Documentation — Official Kiro DocsAWS Well-Architected Framework: Operational Excellence — AWS Docs

By Jubin Abhishek Soni

CORE

OpenAPI, ORM, SVG, and Lottie

This is the third follow-up to Friday's release post. Saturday's was about how you iterate; yesterday's was about new platform APIs in the core; today's is about a run of pieces that change how you write the structural parts of an app. The pieces are an OpenAPI client generator, a SQLite ORM, JSON and XML mappers, a component binder with validation, build-time SVG and Lottie transcoders, and a declarative router with deep links. All ride on a single build-time codegen pipeline: a Maven-plugin pass that reads annotations or declarative source files at build time and emits typed Java that compiles into your binary. No reflection, no service loader, no Class.forName. The "How it works" section at the end of this post covers the codegen plumbing once you have seen what it powers. OpenAPI Client Generation The headline of this release for any team that talks to a backend. A new cn1:generate-openapi-client Mojo reads an OpenAPI 3.x JSON spec (a URL or a local file) and writes typed Codename One client code that compiles into your app: One @Mapped POJO per components.schemas entry.One <Tag>Api.java class per OpenAPI tag, with one fluent method per operation.Every method routes through Rest.<verb> + Mappers.toJson + fetchAsMapped / fetchAsMappedList, so the generated surface integrates with the rest of the framework instead of dragging in a separate HTTP stack. Wire it into the project's pom.xml: XML <plugin> <groupId>com.codenameone</groupId> <artifactId>codenameone-maven-plugin</artifactId> <executions> <execution> <id>petstore-client</id> <goals><goal>generate-openapi-client</goal></goals> <configuration> <specUrl>https://petstore3.swagger.io/api/v3/openapi.json</specUrl> <basePackage>com.example.petstore</basePackage> </configuration> </execution> </executions> mvn generate-sources picks the spec up, downloads it, and writes one file per schema and one per tag under target/generated-sources/. The Petstore reference spec exercised end-to-end produces six model classes (Pet, Order, Customer, Tag, Category, User) and three API classes (PetApi, StoreApi, UserApi), and the nine generated .class files compile cleanly against codenameone-core. Documented at the OpenAPI codegen Maven goal. In application code you call the generated Api class the same way you would call any other Java method: Java PetApi pets = new PetApi(); // Returns AsyncResource<Pet>; resolves with the deserialised object. pets.getPetById(42).onResult((pet, err) -> { if (err == null) Log.p("Got " + pet.getName()); }); // Returns AsyncResource<List<Pet>>. pets.findPetsByStatus("available").onResult((list, err) -> { if (err == null) { for (Pet p : list) Log.p(p.getName()); } }); // POST with a request body. addPet takes a Pet, returns a Pet. Pet candidate = new Pet(); candidate.setName("Mittens"); candidate.setStatus("available"); pets.addPet(candidate).onResult((created, err) -> { /* ... */ }); There is no hand-rolled ConnectionRequest setup, no manual JSON parsing, no string-typed request bodies. The generated client takes a typed Pet, serializes it with Mappers.toJson(...), fires the right HTTP verb, deserializes the response with Mappers.fromJson(...), and surfaces the result through the framework's AsyncResource so your callback fires on the EDT. For teams who already publish an OpenAPI spec as part of their backend (most modern backend frameworks do this automatically; FastAPI, Spring's springdoc-openapi, NestJS, ASP.NET Core, Go's gnostic), the practical effect is that the mobile client's bindings stay in sync with the backend without anyone hand-writing a single network call. Update the spec, re-run mvn generate-sources, and the new and changed endpoints land in your app as typed Java; the IDE picks up immediately. It is the kind of change that is most useful when you do not know you have it: pull a fresh spec, rebuild, and your IDE highlights every place in the codebase that called a renamed endpoint or passed the wrong type to a parameter. SQLite ORM @Entity marks the class; @Id and @Column shape the schema; @DbTransient opts a field out: Java @Entity public class TodoItem { @Id @Column long id; @Column String title; @Column(name = "completed_at") Date completedAt; @DbTransient Object cachedView; } Dao<TodoItem> dao = EntityManager.open("todos.db").dao(TodoItem.class); dao.createTable(); dao.insert(new TodoItem(0, "Read the post", null)); List<TodoItem> open = dao.find("completed_at IS NULL", new Object[] {}); TodoItem byId = dao.findById(42); dao.delete(byId); The generated DAO does the typed work underneath. No reflection in insert; the generated code calls setString(1, e.title) and setLong(2, e.id) directly against the SQLite PreparedStatement. Validation at build time catches missing @Id, fields that look like relationships but are not yet supported, and abstract entity classes; the build fails with a class name and a reason. For JPA/Hibernate developers, the API is intentionally familiar. @Entity, @Id, @Column, and @Transient (here renamed @DbTransient to avoid colliding with java.beans.Transient) carry the same meaning they do under javax.persistence / jakarta.persistence. The EntityManager name is the same. Dao#findById, Dao#findAll, Dao#find(where, params), Dao#insert, Dao#update, Dao#delete line up with the basic JPA repository contract. The query language is plain SQL (there is no JPQL or Criteria DSL), but the annotation surface, the lifecycle, and the runtime methods will feel like a long-lost friend to anyone with server-side Java persistence experience. JSON/XML Mapping @Mapped marks a class as a transferable POJO. @JsonProperty and @XmlElement (plus @XmlRoot, @XmlAttribute, @JsonIgnore, @XmlTransient) shape the wire format. The runtime entry points are Mappers.toJson(...), Mappers.fromJson(...), Mappers.toXml(...), Mappers.fromXml(...): Java @Mapped public class User { @JsonProperty("user_id") long id; @JsonProperty String name; @JsonProperty("created_at") Date createdAt; @JsonIgnore String passwordHash; } String json = Mappers.toJson(user); User back = Mappers.fromJson(json, User.class); The same @Mapped POJO is the type the typed Rest helpers accept: Java Rest.get("https://api.example.com/users/42") .fetchAsMapped(User.class) .onResult((user, err) -> { /* ... */ }); Rest.get("https://api.example.com/users") .fetchAsMappedList(User.class) .onResult((users, err) -> { /* ... */ }); Rest.fetchAsJsonList (top-level JSON arrays, no {"root":[...]} envelope trick), JSONWriter (the complement of JSONParser, with fluent builders and streaming variants for Writer and OutputStream), and URLImage.setDefaultBearerToken (auth headers on image fetches) all ship alongside. For JAXB developers, the XML surface (@XmlRoot, @XmlElement, @XmlAttribute, @XmlTransient) is a direct port of the long-established javax.xml.bind.annotation surface. The same model class can be both @XmlRoot-decorated and @JsonProperty-decorated, which gives you a single source of truth for both wire formats. The JSON surface adopts the Jackson convention (@JsonProperty, @JsonIgnore) that nearly every modern JVM JSON binding (Jackson, Moshi, kotlinx-serialization) inherited. Component Binding With Validation The fourth annotation processor on the same pipeline is the component binder. @Bindable marks a model class; @Bind(name = "userField") ties a field to a component on a form by the component's name. Field-level validation annotations compose with @Bind on the same field: Java @Bindable public class SignupModel { @Bind(name = "userField") @Required @Length(min = 3) private String user; @Bind(name = "emailField") @Required @Email private String email; @Bind(name = "ageField") @Numeric(min = 13, max = 120) private String age; @Bind(name = "roleField") @ExistIn({ "admin", "editor", "viewer" }) private String role; } The matching form sets a name on each component so the binder can find them: Java TextField user = new TextField(); user.setName("userField"); TextField email = new TextField(); email.setName("emailField"); TextField age = new TextField(); age.setName("ageField"); ComboBox<String> role = new ComboBox<>("admin", "editor", "viewer"); role.setName("roleField"); Button submit = new Button("Sign up"); Form form = new Form("Sign Up", BoxLayout.y()); form.add(user).add(email).add(age).add(role).add(submit); form.show(); SignupModel model = new SignupModel(); Binding binding = Binders.bind(model, form); binding.getValidator().addSubmitButtons(submit); Binding is the handle: refresh() re-reads the model into the components, commit() writes the components back, disconnect() tears the listeners down. Multiple validation annotations on a single field compose via Validator.addConstraint(Component, Constraint...) and GroupConstraint (first failure wins). @Validate(MyClass.class) is the escape hatch for hand-written Constraint implementations. The validation set: @Required, @Length, @Regex, @Email, @Url, @Numeric, @ExistIn, @Validate. The new BindAttr enum lets @Bind target a specific attribute of the component (TEXT, UIID, SELECTED, ...) when the default ("write a String field into the component's text") is not what you want. SVG at Build Time Drop an SVG into src/main/css/, alongside theme.css: Shell src/main/css/ theme.css star.svg gradient_circle.svg path_arrow.svg rounded_button.svg wave.svg pro_badge.svg After the next build, every SVG is a regular Codename One Image. An SVG handled by the transcoder is a vector image, but it is still an Image. Everywhere a raster Image works (Label.setIcon, Button.setIcon, BorderLayout.NORTH, the toolbar, a MultiButton's leading icon, a CSS background: url(...) rule), the SVG works too. The difference is that it stays crisp at any size: the same source file is sharp at a 16-point list-row icon, a 64-point hero header, and a 256-point launch screen, on every DPI bucket. A grid of the static SVGs from the hellocodenameone fixture, rendered through the new pipeline: Sizing in Millimeters The SVG transcoder's most useful feature is also the one most easily missed: size every SVG in millimeters from CSS. SVGs in the wild routinely declare odd width / height attributes (a 1024×1024 export of a 24×24 icon, no dimensions at all, design-pixel values from one specific framework). Pinning the rendered size in millimeters sidesteps all of that. CSS HomeIcon { background: url(home.svg); cn1-svg-width: 6mm; cn1-svg-height: 6mm; bg-type: image_scaled_fit; } LogoBanner { background: url(logo.svg); cn1-svg-width: 32mm; cn1-svg-height: 12mm; } A 6 mm icon is 6 mm tall on a 1× desktop, 6 mm on a high-DPI handset, and 6 mm on a 4K tablet. The transcoder routes both values through Display.convertToPixels() at install time, the same way font-size: 3mm already behaves elsewhere in Codename One CSS. No design-pixel guesswork, no DPI bucket to choose, no scaling surprise when the artist re-exports the source SVG at a different resolution. If a project does not use CSS for theming, the two-float constructor on the generated class takes millimeters directly: new com.codename1.generated.svg.Home(6f, 6f). Coverage and What We Still Want Feedback On The transcoder is a maven/svg-transcoder/ module that parses SVG with javax.xml StAX. No Batik, no Flamingo, no external dependencies. Coverage targets what real-world icon SVGs use: rect (rounded corners included), circle, ellipse, line, polyline, polygon, the full path grammar (M / L / H / V / C / S / Q / T / A / Z plus relative-coordinate and smooth-curve reflection), groups with affine transforms (translate, scale, rotate, skew, matrix), linear gradients via LinearGradientPaint, fill, stroke, stroke-width, linecap, linejoin, opacity. SMIL animations are supported in the same pipeline: <animate>, <animateTransform> (translate, scale, rotate), and <set>. Time values interpolate against wall-clock time on every paint, with from / to / values / begin / dur / repeatCount / fill="freeze" honored. Text and clip-path landed in the follow-up PR for the static SVG fixtures, and both are visible in the screenshot above (the "Codename One / build-time SVG" wordmark in the rounded button, the "PRO" badge text, and the clip-path-shaped rounded-corner badge underneath). <text> and <tspan> work with single-style fills and transforms; <clipPath> referenced via clip-path="url(#id)" works against rect, circle, and path clip shapes (nested clip refs are ignored). What is still not supported: SVG filter primitives, <mask> (treated as a clip, so alpha masking falls back to opaque), <radialGradient> (falls back to the first-stop color), and CSS-in-SVG (style rules inside the SVG document; the transcoder reads presentation attributes and the inline style="..." attribute, but a <style> element with selectors is not parsed). If you hit an SVG that does not transcode the way you expect, please open an issue at github.com/codenameone/CodenameOne/issues and attach the source file. The fastest way to extend the coverage is for us to run the failing case through the test fixtures and watch the output. Every SVG we ship test goldens for started as somebody else's "this doesn't render right" report. Caveat on iOS: The transcoded SVGs use the framework's shape API (fillShape, drawShape, LinearGradientPaint). The full surface is implemented on the Metal renderer. The deprecated GL ES 2 pipeline does not have parity on every operation, so an SVG drawn under ios.metal=false will often render with visible artifacts (missing gradients, clipped fills, distorted paths) rather than the placeholder you might expect. Now that Metal is the default for new iOS builds as of last Friday, this is a non-issue on most apps; if you have explicitly pinned ios.metal=false, expect some visual regressions on SVG content and let us know which. The coverage matrix and troubleshooting are in the SVG Transcoder in the developer guide. Lottie at Build Time The same pipeline carries Lottie. Drop a Bodymovin export into the same src/main/css/: JSON src/main/css/ theme.css pulse.json spinner.json After the next build, both are real Image instances on every platform that exposes the shape API. The same vector-everywhere story as SVG: a Lottie animation renders crisply at any size and slots into any Image slot in the framework. Java Image pulse = Resources.getGlobalResources().getImage("pulse"); Image spinner = Resources.getGlobalResources().getImage("spinner"); Animation runs against wall-clock time on every paint, with no Timer and no allocation in the hot path. A capture of the hellocodenameone Lottie fixture in motion: The Lottie transcoder lives in maven/lottie-transcoder/. It parses Bodymovin JSON with no external dependencies (the framework's built-in JSON parser carries the load) and lowers each file into the same SVGDocument model the SVG path uses. The same JavaCodeGenerator emits the same GeneratedSVGImage subclass, and the same SVGRegistry registers it under the source filename. No new Image base class, no new registry, no per-port wiring, since the SVG path's JavaSE reflective load and iOS / Android Stub weaving already cover the new format. Coverage in v1: shape layers (rc / el / sh) with solid fills and strokes; layer transforms (anchor, position, scale, rotation, opacity); animated rotation, position, and scale collapsed to a two-keyframe loop; solid-color layers as filled rects. Most icon-grade Bodymovin exports lower cleanly. Complex character animations from After Effects with image references, masks, and effects do not, and the transcoder logs which layers it dropped so the source of any blank output is obvious. Same ask as for SVG: if a Lottie / Bodymovin file does not transcode the way you expect, please open an issue at github.com/codenameone/CodenameOne/issues and attach the source .json. The transcoder grows one shape family at a time from the cases the community reports. The same iOS caveat applies: the renderer leans on the shape API, so the deprecated GL ES 2 pipeline shows artifacts on the more elaborate Lottie animations. Use the Metal default (now on by default for new iOS builds). Deep Links and Routing Two pieces of plumbing for apps that handle URLs from outside themselves (notification taps, marketing links, share targets, Universal Links from Safari and the equivalent App Links from Chrome on Android). Deep Links Codename One has had deep-link support for a long time through Display.setProperty("AppArg", url). The platform plumbing already writes the incoming URL into that property on cold launch, and an app-resume sets it again on warm launch; reading it back from start() works fine for a small number of patterns. Where the AppArg-only approach gets fragile is consistency. The cold and warm paths execute different lifecycle code, the value is a flat string with no parsing, and the trickiest case is the one where a user lands in the middle of the app via a link and then continues to interact: their next navigation needs to compose with the entry point, the back-stack needs to make sense as if they had arrived through the usual flow, and "fall off the edge of the app" on back is a common bug. With a hand-rolled AppArg reader it is easy to miss one of these and ship a half-working flow. This release introduces a typed DeepLink and a single handler that fires for both cold and warm launches: Java Display.getInstance().setDeepLinkHandler(link -> { // link is a normalised DeepLink: scheme, host, path, // segments, query map, fragment. Same shape cold or warm. if ("/users".equals(link.path()) && link.segments().size() == 2) { showUserDetailForm(link.segments().get(1)); return true; } return false; AppArg still works for projects that depend on it, but the new handler is what we recommend going forward. The handler runs on a consistent lifecycle path on both cold and warm starts, and the parsed DeepLink value carries the scheme, host, path segments, query map, and fragment, so app code does not need to roll its own URL parser. Routing For projects that handle more than a handful of URL patterns, the second piece is the declarative router in com.codename1.router. We built it on the same build-time codegen pipeline as the ORM and the mappers (the router was actually the first concrete consumer of the new preprocessor), so the two surfaces compose: a deep-link handler that delegates to the router becomes a one-liner. Each form declares its own path with a @Route annotation: Java @Route("/") public class HomeForm extends Form { /* ... */ } @Route("/users/:id") public class UserDetailForm extends Form { public UserDetailForm(RouteMatch match) { String userId = match.param("id"); // build UI for user `userId` } } @Route("/about") Router.navigate("/users/42") resolves the path, instantiates UserDetailForm, and shows it. The deep-link handler now collapses to: Java Display.getInstance().setDeepLinkHandler(link -> Router.navigate(link.toString())); Each form owns its own routing rule. Adding or moving a screen is a one-class change. The "what screens does this app have, and at what paths?" question is answered by an IDE search for @Route, not by reading every form constructor in the project. For Spring developers, the shape is familiar by design. @Route plays the same role as Spring MVC's @RequestMapping: a class-level declaration that announces "this controller handles URLs of this shape". The :id parameter syntax mirrors Spring's {id} path-variable syntax; RouteMatch.param("id") is the same kind of accessor as Spring's @PathVariable. The mental model carries over from server-side Java with almost no friction. The same recognition is available to anyone with React Router, Vue Router, or Angular Router experience; the :param convention is the cross-framework default. The build-time processor validates that each annotated class extends Form, that the path starts with /, that the constructor is accessible, and that there are no duplicate patterns. Any rule violation fails the build with a class name and a reason, not at runtime with a stack trace. The rest of the router surface covers the kind of thing that has become table stakes in modern client routing: Route guards run before navigation completes and can cancel or redirect.Per-tab navigation stacks via TabsForm, where each tab keeps its own back stack.Location listeners so anything in the app can subscribe to "the route changed".Form.setPopGuard(PopGuard) intercepts hardware back, toolbar back, or Router.pop() with a chance to ask "are you sure?".Sheet.showForResult() returns an AsyncResource<T> that auto-cancels with null if the user dismisses the sheet. The API is opt-in. Apps that prefer the existing Form.show() / Form.showBack() flow keep using that; nothing changes. For the link-publishing side, an AasaBuilder emits the iOS apple-app-site-association JSON and an AssetLinksBuilder emits the Android assetlinks.json. The full setup walk-through (entitlements, the Android intent-filter, the .well-known/ upload on your origin server) is at Routing and Deep Links in the developer guide. The JavaScript port bridges the router into window.history so navigating the in-app router pushes a real entry into the browser's session history. Back and forward in the browser drive the router; reloading the page lands at the deep-link URL; sharing the URL out of the address bar takes a colleague to the same in-app location. How It Works: The Build-Time Codegen Pipeline Everything above sits on a single Maven-plugin pass. The plugin has an AnnotationProcessor SPI and two new Mojos: cn1:generate-annotation-stubs (in generate-sources) and cn1:process-annotations (in process-classes). The orchestrator ASM-scans target/classes, dispatches to every registered processor, validates the annotated classes, and emits a typed runtime artifact next to each one plus a tiny Index class that registers everything with a public runtime registry. Adding a new processor later is a matter of dropping it into META-INF/services with no orchestrator changes. The reason this runs against bytecode rather than against source text is that the source-regex prototype was scrapped early. The bytecode pass sees the JVM's view of the project (extends Form is a thing the JVM actually knows, not a pattern we have to hope the user wrote a specific way), rule violations come back with class names and reasons, and the build fails fast before any generated .class lands on disk. The infrastructure shares the ASM passes that the BytecodeComplianceMojo's existing String rewrites already use. A small stub source is emitted under target/generated-sources/cn1-annotations/ during generate-sources so application code that references the generated registry resolves at compile time. The real .class overwrites the stub later in process-classes. Standard "compile against a stub, link against the real thing" pattern; it just works inside a single Maven build instead of needing a multi-module split. cn1-core ships a no-op stub of each generated index (RoutesIndex, MappersIndex, BindersIndex, DaosIndex), so application code compiles even when the project has no annotated classes. The build-time processor shadows each stub with the real implementation before packaging. The SVG and Lottie transcoders sit on a parallel pipeline (declarative graphics files in place of annotations), but they emit the same shape of code and obey the same constraints. The practical effect is that the kind of code that historically required reflection at runtime (with all the obfuscation hazards and surprise allocations that come with that) now happens once at build time and produces direct, dead-code-eliminable, rename-safe symbol references. Wrapping Up That closes this release's post series. We already have some pretty big features lined up for this Friday's release post; the headline pieces are the most substantial things to land in months and are worth checking back for. Back to the weekly index.

By Shai Almog

CORE

Why Infrastructure Efficiency Is Becoming the New Cloud Profitability Metric

Infrastructure efficiency is rapidly becoming one of the most important factors determining profitability for cloud providers, managed service providers, and SaaS companies. For years, infrastructure growth followed a simple formula: add more servers, more storage, and more capacity whenever demand increased. That model worked when hardware prices consistently declined, and inefficiencies could be absorbed through growth. Those conditions no longer exist. Today, providers face rising costs for memory, enterprise SSDs, GPUs, power, cooling, and colocation, while customers continue to expect lower pricing, better performance, stronger SLAs, and faster service delivery. Several industry shifts have fundamentally changed infrastructure economics. Changes in virtualization licensing models have increased costs for many organizations. AI adoption has driven demand for GPUs, high-capacity memory, and high-performance storage. Power and colocation costs continue to rise globally, while sovereign cloud initiatives are creating demand for regional infrastructure that must compete economically with hyperscale cloud providers. The challenge is clear: infrastructure costs are rising faster than revenue. What Does a Workload Really Cost? Infrastructure efficiency ultimately comes down to a simple question: what does it cost to deliver a workload? Customers do not buy servers, storage systems, or software licenses. They buy virtual machines, Kubernetes clusters, databases, AI environments, SaaS applications, and business services. The true cost of delivering those workloads includes much more than infrastructure hardware: Software licensingPower and coolingColocationNetwork connectivityStorageCapacity buffersStaffing and operationsSupport and SLA commitments The providers that achieve the lowest cost per workload while maintaining performance and service quality gain a significant competitive advantage. As infrastructure costs continue to increase, "cost per workload delivered" is becoming a useful framework for evaluating efficiency. Unlike traditional metrics focused solely on hardware utilization or licensing costs, this approach considers the complete economics of delivering customer-facing services. Beyond Infrastructure Utilization Infrastructure efficiency is not measured only by CPU, memory, or storage utilization. Operational metrics often have an equally significant impact on the cost of delivering workloads. Examples include administrator-to-server ratio, administrator-to-VM ratio, workload deployment times, incident resolution times, and the number of infrastructure platforms that must be maintained. Cost alone is also a misleading metric. A workload delivered at lower cost may also deliver lower performance, higher contention, or slower support response times. A virtual machine with two vCPUs does not necessarily provide the same amount of usable compute across platforms. CPU oversubscription ratios, noisy-neighbor effects, storage latency, network performance, and support commitments all influence the actual customer experience. The relevant metric is not simply cost per workload, but cost per workload delivered at a defined SLA. Architectural Choices and Efficiency Infrastructure architecture plays a major role in determining workload economics. Traditional infrastructure environments often combine separate virtualization, storage, networking, monitoring, backup, and orchestration platforms. While this approach offers flexibility, it can also increase operational complexity, encourage overprovisioning, and create management overhead. As a result, many organizations are moving toward more integrated infrastructure models, including hyperconverged infrastructure (HCI) and software-defined platforms that consolidate multiple functions into a unified operational framework. The goal is not merely consolidation. The real objective is to reduce operational overhead, improve resource utilization, simplify scaling, and lower long-term total cost of ownership. This becomes particularly important for sovereign cloud initiatives. Unlike hyperscalers that benefit from massive global scale, regional cloud providers often need to achieve competitive economics within a specific country or market while maintaining local data residency, compliance, and operational control. In these environments, maximizing infrastructure efficiency is often critical to long-term profitability. Infrastructure Efficiency Metrics Worth Tracking Organizations evaluating infrastructure efficiency should look beyond traditional utilization metrics and monitor indicators that directly affect workload economics, including: Cost per virtual machineCost per containerCost per Kubernetes clusterCost per AI workloadStorage efficiency ratiosPower consumption per workloadAdministrator-to-server ratioWorkload deployment timesMean time to resolution (MTTR)Resource utilization across compute and storage environments These metrics provide a more accurate view of infrastructure performance than hardware utilization alone. Why AI Changes the Equation The emergence of AI workloads has made infrastructure efficiency even more important. GPU resources are expensive, but GPUs alone do not determine the economics of AI infrastructure. Storage performance, networking efficiency, workload orchestration, and operational processes all directly impact GPU utilization and overall service profitability. In many environments, the challenge is no longer acquiring GPUs. It ensures that the surrounding infrastructure can keep them fully utilized. As GPU, storage, and power costs continue to rise, organizations are increasingly focused on maximizing the value extracted from every infrastructure resource. AI infrastructure economics are becoming less about acquiring the largest amount of hardware and more about achieving the highest utilization and operational efficiency from existing investments. Measuring Infrastructure Economics One of the challenges with infrastructure efficiency is that it often remains invisible until it is measured. Many organizations focus on software licensing when evaluating infrastructure costs, but licensing is only one part of the equation. Utilization rates, storage efficiency, operational overhead, power consumption, hardware refresh cycles, staffing requirements, and SLA commitments often have a much greater impact on long-term economics. This is why Total Cost of Ownership (TCO) modeling is becoming increasingly important. Effective infrastructure evaluations should account for: Software costsHardware acquisitionEnergy consumptionColocation expensesStorage efficiencyStaffing requirementsOperational complexitySupport and maintenance costs Organizations that perform these broader analyses often discover that the greatest opportunities for savings come not from individual licensing decisions but from improving overall workload economics. Conclusion The next phase of cloud infrastructure optimization is unlikely to be driven by capacity growth alone. As infrastructure costs continue to rise and customer expectations continue to increase, providers must focus on delivering more workloads with fewer resources while maintaining performance and service quality. In that environment, infrastructure efficiency becomes more than a technical objective. It becomes a business metric. The organizations that can achieve the lowest cost per workload delivered at a defined service level will be best positioned to protect margins, remain competitive, and build sustainable cloud and AI services for the future.

By Tetiana Fydorenchyk

Intelligent Matching and Semantic Search for Marketplace Applications Using OpenAI and .NET

Marketplace platforms are fundamentally matching systems. Whether the platform connects: Students and tutorsFreelancers and clientsBuyers and sellersConsultants and companies The overall user experience usually depends on how accurately the platform can connect relevant people together. At early stages, traditional search systems are often enough. Basic SQL filtering, category-based navigation, and keyword matching can solve many initial requirements without major issues. The situation changes once the platform grows and users begin writing longer, intent-based queries instead of simple keywords. For example, a user may search for: “online calculus tutor for engineering preparation” while a marketplace listing may contain: “advanced mathematics mentor for university students” Even though both sides are highly relevant, a traditional keyword-based search engine may completely fail to connect them because the wording is different. This is one of the areas where semantic search architectures become extremely valuable. Why Traditional Marketplace Search Starts Breaking Down Most marketplace platforms initially rely on: SQL LIKE queriesfull-text searchBM25 rankingtag filtering These approaches work reasonably well when search queries are short and predictable. However, real users rarely search using the exact same terminology as listing owners. A few common examples: User QueryMarketplace Listingmath tutorcalculus mentorIELTS coachEnglish speaking trainerReact mentorfrontend architectstartup advisorbusiness consultant The issue here is not syntax. It is the semantic meaning. Traditional search engines are effective at matching identical words, but much weaker at understanding contextual similarity between different phrases. Intent Matters More Than Exact Keywords One thing that becomes visible fairly quickly in marketplace applications is that users tend to search with intent instead of isolated keywords. For example: “senior React mentor for interview preparation” The user is probably not simply searching for “React.” The actual intent may include: MentorshipSenior-level expertiseInterview coachingFrontend architecture experience Traditional keyword search systems struggle to interpret these relationships properly. Even when partially relevant results appear, ranking quality often becomes inconsistent as queries become longer and more contextual. Semantic Search Approaches the Problem Differently Semantic search systems do not treat text as isolated keywords. Instead, text is converted into vector representations called embeddings. These embeddings represent contextual meaning rather than exact wording. As a result, the following phrases can become mathematically close to each other even when they do not contain identical words: “Math tutor”“Calculus mentor”“Engineering mathematics coach” This allows marketplace applications to perform much more flexible matching. A Typical Semantic Search Architecture Plain Text User Query ↓ Embedding Generation ↓ Vector Search ↓ Similarity Scoring ↓ Hybrid Ranking ↓ Marketplace Results The important detail here is that the following are converted into embeddings before similarity calculations happen: Marketplace listingsUser queries .NET Technologies Used in the Architecture A typical .NET-based semantic search stack may include the following components: AreaTechnologyAPI LayerASP.NET CoreBackground JobsHosted Services / Quartz.NETQueue SystemRabbitMQ / Azure Service BusDatabaseSQL Server / PostgreSQLVector Storagepgvector / PineconeCacheRedisLoggingSerilogMonitoringOpenTelemetryAI IntegrationOpenAI API One thing that becomes obvious during implementation is that the OpenAI API itself is usually only a small part of the overall system. The larger engineering effort often involves: IndexingRankingCachingAsynchronous processingOperational monitoringRetry handling Marketplace Listing Model A simplified marketplace listing model may look like this: C# public class MarketplaceListing { public long Id { get; set; } public string Title { get; set; } public string Description { get; set; } public string CategoryName { get; set; } public string Location { get; set; } public bool IsActive { get; set; } public bool IsDeleted { get; set; } public DateTime CreatedOn { get; set; } public DateTime? LastIndexedOn { get; set; } public string SearchText => $"{Title} {Description} {CategoryName} {Location}"; } The SearchText property combines multiple searchable fields into a single semantic context before embedding generation. Generating Embeddings With OpenAI A simplified embedding service implementation in .NET may look like this: C# public class OpenAIEmbeddingService : IEmbeddingService { private readonly HttpClient _httpClient; private readonly IConfiguration _configuration; public OpenAIEmbeddingService( HttpClient httpClient, IConfiguration configuration) { _httpClient = httpClient; _configuration = configuration; } public async Task<float[]> GenerateEmbeddingAsync( string input, CancellationToken cancellationToken = default) { var apiKey = _configuration["OpenAI:ApiKey"]; using var request = new HttpRequestMessage( HttpMethod.Post, "https://api.openai.com/v1/embeddings"); request.Headers.Authorization = new AuthenticationHeaderValue("Bearer", apiKey); var body = new { model = "text-embedding-3-small", input = input }; request.Content = new StringContent( JsonSerializer.Serialize(body), Encoding.UTF8, "application/json"); using var response = await _httpClient.SendAsync(request, cancellationToken); response.EnsureSuccessStatusCode(); var json = await response.Content.ReadAsStringAsync(cancellationToken); using var document = JsonDocument.Parse(json); return document .RootElement .GetProperty("data")[0] .GetProperty("embedding") .EnumerateArray() .Select(x => x.GetSingle()) .ToArray(); } } Dependency Injection C# builder.Services.AddHttpClient< IEmbeddingService, OpenAIEmbeddingService>(); builder.Services.AddScoped< ISemanticSearchService, SemanticSearchService>(); Background Indexing One issue that appears very quickly in production systems is latency. Generating embeddings synchronously during listing creation or updates may slow down the request lifecycle significantly. Because of this, many systems move embedding generation into: Background workersQueuesAsynchronous indexing pipelines A simple hosted worker example: C# public class ListingEmbeddingWorker : BackgroundService { private readonly IServiceProvider _serviceProvider; private readonly ILogger<ListingEmbeddingWorker> _logger; public ListingEmbeddingWorker( IServiceProvider serviceProvider, ILogger<ListingEmbeddingWorker> logger) { _serviceProvider = serviceProvider; _logger = logger; } protected override async Task ExecuteAsync( CancellationToken stoppingToken) { while (!stoppingToken.IsCancellationRequested) { try { using var scope = _serviceProvider.CreateScope(); var service = scope.ServiceProvider .GetRequiredService<IListingEmbeddingService>(); await service.IndexPendingListingsAsync(stoppingToken); } catch (Exception ex) { _logger.LogError( ex, "Listing embedding worker failed."); } await Task.Delay( TimeSpan.FromMinutes(5), stoppingToken); } } } Vector Similarity Once embeddings are generated, similarity calculations can be performed. The most common approach is cosine similarity: cos(θ)=A⋅B∥A∥∥B∥\cos(\theta)=\frac{A\cdot B}{\|A\|\|B\|}cos(θ)=∥A∥∥B∥A⋅B A simplified helper implementation may look like this: C# public static class VectorSimilarityHelper { public static double CosineSimilarity( float[] vectorA, float[] vectorB) { double dotProduct = 0; double magnitudeA = 0; double magnitudeB = 0; for (int i = 0; i < vectorA.Length; i++) { dotProduct += vectorA[i] * vectorB[i]; magnitudeA += vectorA[i] * vectorA[i]; magnitudeB += vectorB[i] * vectorB[i]; } return dotProduct / (Math.Sqrt(magnitudeA) * Math.Sqrt(magnitudeB)); } } Semantic Similarity Alone Is Usually Not Enough One thing that became obvious during testing was that semantic similarity alone sometimes produced weak ranking behavior. For example, the following could still receive high semantic scores: Inactive listingsOutdated profilesLow-quality marketplace entries Because of this, most production marketplace systems eventually move toward hybrid ranking models.A simplified ranking formula may look like this: FinalScore=0.45SemanticScore+0.25KeywordScore+0.15PopularityScore+0.10FreshnessScore+0.05ConversionScoreFinalScore=0.45SemanticScore+0.25KeywordScore+0.15PopularityScore+0.10FreshnessScore+0.05ConversionScoreFinalScore=0.45SemanticScore+0.25KeywordScore+0.15PopularityScore+0.10FreshnessScore+0.05ConversionScore This combines: semantic relevancekeyword relevancepopularityfreshnessconversion metrics There is usually no perfect ranking formula. In practice, ranking becomes an ongoing optimization problem. Example Semantic Search Service C# public class SemanticSearchService : ISemanticSearchService { private readonly AppDbContext _dbContext; private readonly IEmbeddingService _embeddingService; public SemanticSearchService( AppDbContext dbContext, IEmbeddingService embeddingService) { _dbContext = dbContext; _embeddingService = embeddingService; } public async Task<List<SearchResultDto>> SearchAsync( string query, CancellationToken cancellationToken) { var queryVector = await _embeddingService.GenerateEmbeddingAsync( query, cancellationToken); var listings = await _dbContext.ListingEmbeddings .Include(x => x.Listing) .ToListAsync(cancellationToken); var results = listings .Select(x => { var vector = JsonSerializer.Deserialize<float[]>(x.VectorJson); var similarity = VectorSimilarityHelper.CosineSimilarity( queryVector, vector); return new SearchResultDto { ListingId = x.ListingId, Title = x.Listing.Title, SimilarityScore = similarity }; }) .OrderByDescending(x => x.SimilarityScore) .Take(50) .ToList(); return results; } } Production Challenges One thing that often gets underestimated in semantic search discussions is operational complexity. The AI layer itself is usually easier than the surrounding production engineering. A few examples include: Embedding costsQueue managementIndexing latencyRetry handlingStale embeddingsCache invalidationMultilingual relevanceRanking quality optimization For example, the following trigger embedding generation, API costs can grow much faster than expected: Listing updateProfile editSearch query Caching and embedding reuse become important fairly early in the process. Final Thoughts Semantic search is not really about replacing traditional search entirely. In most production systems, the better approach is usually to combine these into a layered ranking architecture: Semantic relevanceKeyword matchingBehavioral scoringFreshnessBusiness metrics OpenAI embeddings and .NET provide a practical foundation for building these types of marketplace systems, especially for platforms where relevance quality directly affects user experience and conversion rates. One interesting observation after introducing semantic matching is that users generally spend less time trying to “guess the correct keywords.” The platform becomes significantly better at understanding what users actually mean instead of simply matching individual words.

By Omer Yilmaz

On-Device Debugging and JUnit 5

This is the first follow-up to Friday's release post, and it covers the two changes from this release that affect how you iterate on a Codename One app rather than what the app itself does. On-device debugging that treats Java as Java on a real iPhone or a real Android device, and standard JUnit 5 against the JavaSE simulator. The first is the one we have been wanting for a long time, and is the one that takes the most explaining, so most of the post is about it. On-Device Debugging That Treats Java as Java Codename One has always supported on-device debugging in the strict technical sense. You could attach Xcode to a .ipa, you could attach Android Studio to a running APK, you could read the native call stack, you could step through Objective-C or the C that ParparVM emits. What you could not do was set a breakpoint in MyForm.java, hit it on a real iPhone, and inspect a Java field on a Java object as a Java object. You also could not debug an iOS app without a Mac in the loop somewhere, because the only debugger that understood the binary was Xcode. The translation step between the Java you wrote and the C that ParparVM produces left no way back across the gap on the device. PR #4999 (iOS) and PR #5012 (Android) close that gap. As of this week, any JDWP-speaking debugger (IntelliJ IDEA, jdb, VS Code's Java Debugger, Eclipse, NetBeans) can attach to a Codename One app and treat the running process as a JVM. Supported targets: iOS The iOS Simulator (requires a Mac, because the iOS Simulator only runs on a Mac),A real iPhone reached over Wi-Fi from the developer machine on the same network. You do not need a local Mac to debug on a real iPhone. The Codename One build cloud runs the iOS build for you and produces a signed .ipa; install it on your iPhone the usual way (TestFlight, ad-hoc, or the standard Build Cloud install link), and the JDWP attach over Wi-Fi works from a Linux or Windows IDE just as well as from a Mac. The Mac is only required for the local Xcode build path and for running the iOS Simulator. Android The Android emulatorA real Android phone over USBA real Android phone over wireless adb The Android attach uses standard adb, so you need the Android SDK platform tools installed on the developer machine. Those are available on macOS, Linux, and Windows, so any of the three is fine for Android debugging. What It Looks Like A breakpoint inside an iOS app, hit on the iOS Simulator next to IntelliJ IDEA: The same Debug tool window you use for any other Java project. The frames panel on the left has the full Java call stack. The Variables panel shows this and the locals as Java values, with the same drill-down you would get on a regular JVM. The simulator on the right is the real iOS app, paused at the breakpoint, waiting for the next step. How the Pieces Fit Together On iOS, the IDE never talks to the device directly. The CN1 Debug Proxy is a small Java process you run on your developer machine. It binds two TCP ports: one for the iOS app to dial into using the CN1 wire protocol, and one that speaks standard JDWP for the IDE. The IDE sees a normal remote JVM. The iOS app sees a debug proxy. The proxy translates between the two and walks the ParparVM struct layout so Java fields, method calls, and values round-trip cleanly in both directions. On Android, the proxy is unnecessary. Dalvik/ART implements JDWP themselves, so IntelliJ attaches directly to the device through adb's built-in JDWP forwarder. The Maven plugin's new cn1:android-on-device-debugging goal does the adb orchestration and the port forwarding for you. A capability difference between the two platforms worth knowing up front: on Android, a native interface's Impl class is regular Java, so the JDWP attach steps through it the same way it steps through any other class in your project. On iOS the Impl is Objective-C, which JDWP does not speak, so you cannot step through it from the IDE. You can still step through the Codename One framework code and your own Java up to and through the native-interface call, and you can inspect the value the call returns; the body of the Objective-C method is the only thing that is opaque from the JDWP side. Attach Xcode in parallel if you need to step through the Objective-C as well. Tutorial: IntelliJ + iOS The Codename One archetype now generates two run configurations under an On-Device Debug folder in the IntelliJ run-config dropdown: CN1 Debug Proxy and CN1 Attach iOS. The tutorial below assumes a project generated from the Initializr recently enough to have those. If you have an older project, generate a new project with initializr and copy over the .idea directory and maven pom.xml files. 1. Enable the Build Hints Open common/codenameone_settings.properties and uncomment the four lines the archetype generated: Properties files ios.onDeviceDebug=true ios.onDeviceDebug.proxyHost=127.0.0.1 ios.onDeviceDebug.proxyPort=55333 ios.onDeviceDebug=true flips the iOS build into the instrumented variant. The other three configure the proxy connection. The fourth hint, ios.onDeviceDebug.waitForAttach=true, is the block-on-load option, and we recommend leaving it on. With it enabled, the iOS app shows a "Waiting for debugger" overlay at launch and does not progress past Display.init until the proxy issues its first resume. The recommendation is mostly about making the on-device-debug variant visible. Without the overlay it is easy to launch an on-device-debug build expecting the debugger to attach and not realize it is silently waiting for a proxy that is not running, and it is also easy to mistake an on-device-debug build for a regular build and then be surprised when it does not perform as smoothly as the release variant. The overlay rules out both of those. For a physical iPhone the proxyHost value should be the laptop's LAN IP (run ifconfig | grep "inet " to find it) rather than 127.0.0.1. The iOS Simulator can always use 127.0.0.1. 2. Build the iOS App Either path works: Local Xcode build (mvn cn1:buildIosXcodeProject) and then run from Xcode.Cloud build for a real device (mvn cn1:buildIosOnDeviceDebug) and install the resulting .ipa. Both produce an iOS binary instrumented for on-device debugging because the build hint is set. 3. Start the Proxy In IntelliJ, pick CN1 Debug Proxy from the run-config dropdown and click the green Run button (not the bug icon; Debug on this config would attach IntelliJ to the proxy itself, which is not what you want). The Run tool window shows: Plain Text On-device-debug proxy starting: symbols : .../cn1-symbols.txt device : listening on tcp://0.0.0.0:55333 jdwp : listening on tcp://0.0.0.0:8000 [device] listening on port 55333 for ParparVM app to dial in When the [jdwp] line appears, the proxy is ready. 4. Attach the Debugger Switch the run-config dropdown to CN1 Attach iOS and click the Debug button. IntelliJ connects to localhost:8000 and opens its standard Debug tool window. You can now set breakpoints anywhere in your Java code or in the framework. 5. Launch the App Launch the iOS app under the iOS Simulator (from Xcode) or on the tethered device. With waitForAttach=true it pauses at the "Waiting for debugger" overlay until the proxy issues its first resume. Hit Resume on the IntelliJ Debug toolbar; the app proceeds, your breakpoints fire as the app exercises them. The proxy's Run window is also your device console. Anything the app writes to System.out, Log.p, printf, or NSLog from native code is forwarded to the proxy and printed in the CN1 Debug Proxy Run window with a [device] prefix. This is genuinely useful and is one fewer thing you need Xcode for. The caveat is that the forwarding starts when the proxy connection is established, so output written during the very first millisecond of process launch (before Display.init) is not always captured. If you need every byte from t=0, attach Xcode's console for that specific run. Tutorial: IntelliJ + Android Android is simpler because the proxy is not needed. The archetype generates two run configurations under the same On-Device Debug folder: CN1 Android On-Device Debug (Maven, builds and installs the APK and forwards JDWP) and CN1 Attach Android (Remote JVM Debug at localhost:5005). 1. Enable the Build Hint In common/codenameone_settings.properties: Properties files android.onDeviceDebug=true This single hint flips the manifest to debuggable="true" and turns R8 / Proguard off for this build. Release builds without the hint are unaffected. 2. Run CN1 Android On-Device Debug Picks up the hint, builds the APK, installs it on the connected device or emulator, sets the debug-app for wait-for-attach, launches the Activity, forwards JDWP to localhost:5005, and streams logcat --pid=<pid> into the Run window with a [device] prefix. For wireless adb, pass -Dcn1.android.onDeviceDebug.wireless=<ip:port> and the goal will adb connect before installing. Both the Android 11+ adb pair flow and the legacy adb tcpip flow work. 3. Attach the Debugger Switch to CN1, Attach Android, and click Debug. IntelliJ connects to localhost:5005. Set breakpoints anywhere; they fire when exercised. Source resolution covers both the codenameone-core and codenameone-android sources jars, so breakpoints inside the framework or inside the Android port resolve to the right files. On Android, native interfaces are themselves Java, so a breakpoint inside the Impl class of your own native interface fires just like a breakpoint anywhere else in your code; you can step through the implementation, inspect locals, and evaluate expressions the same way. The dev guide has the full reference, including the wireless-pairing flows, the VS Code and Eclipse equivalents, and a troubleshooting section: iOS on-device debugging and Android on-device debugging. When to Use It (and When Not To) For most bugs, the JavaSE simulator is still, by a large margin, the fastest loop. Reach for on-device debugging when the bug is platform-specific: ParparVM-specific threading, an iOS-only layout glitch under the modern native theme, a real-radio Bluetooth interaction, a Touch ID gate, an Android-only manifest interaction, anything that only reproduces under iOS background memory pressure. The kind of bug that previously sent you reaching for Log.p and a rebuild loop. That bug now has a debugger pointed at it. JUnit 5 Against the Simulator The other change in this release is the new JUnit 5 integration in the JavaSE port (PR #5032). To be clear about what this is: it is standard JUnit 5. There is no fork of JUnit in com.codename1.testing.junit. That package holds a small set of annotations and a CodenameOneExtension that plugs into the regular JUnit Jupiter lifecycle. You write @Test methods using org.junit.jupiter.api.Test, you assert with org.junit.jupiter.api.Assertions, and your IDE's native test runner picks them up the way it does on any other Java project. Why a separate integration at all? The legacy com.codename1.testing.AbstractTest framework, driven by the cn1:test Maven goal, still exists and is still the only way to run tests on a real iOS or Android device (JUnit Jupiter is not available on ParparVM). The trade-off is that AbstractTest tests have to compile under the Codename One device subset, with no reflection, no java.net.http, no java.nio.file, no Mockito, no AssertJ, no assertThrows. JUnit-style tests run only on the JavaSE simulator JVM, but the JVM is a regular JVM, so reflection, Mockito, AssertJ, and parameterized tests are all available. Both styles coexist in the same project under common/src/test/java. You pick per test class. The runners discover disjoint sets (cn1:test looks for UnitTest implementers; Surefire looks for @Test methods), so a mvn install runs both passes in the same phase without overlap. A Minimal Test Tests live in common/src/test/java. The shape most apps want is one that boots the project's app class through the same init / start sequence the simulator uses, then asserts against the form the app actually opens: Java package com.example.myapp; import com.codename1.testing.junit.CodenameOneTest; import com.codename1.testing.junit.RunOnEdt; import com.codename1.ui.CN; import com.codename1.ui.Display; import com.codename1.ui.Form; import org.junit.jupiter.api.Test; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertTrue; @CodenameOneTest class GreetingFormTest { @Test @RunOnEdt void formShowsExpectedTitle() { MyAppName app = new MyAppName(); app.init(null); app.start(); assertEquals("Hi World", Display.getInstance().getCurrent().getTitle()); assertTrue(CN.isEdt(), "@RunOnEdt method runs on the Codename One EDT"); } } That is more useful than constructing a Form directly in the test because it exercises the same startup path the simulator runs. The assertions check the form your app opens, not a form the test wrote. The natural way to run it is from the IntelliJ gutter. Click the green icon next to the class declaration: The results land in the standard Run tool window: Click the green icon next to a specific @Test method to run just that method. The same flow works in VS Code's Test Explorer and in Eclipse's JUnit view. If you prefer the command line: Shell mvn -Ptest test # run the JUnit suite mvn -Ptest test -Dtest=GreetingFormTest # one class mvn -Ptest test -Dtest=GreetingFormTest#formShowsExpectedTitle @CodenameOneTest is the class-level entry point. It wires the simulator extension into the JUnit Jupiter lifecycle, boots Display.init(null) once per JVM (idempotent, so subsequent classes share the same Display), and skips the class with a TestAbortedException if the JVM is genuinely headless (so CI runners that have no display do not poison the rest of the run). @RunOnEdt dispatches the test body through CN.callSerially, which is what you want any time the body touches UI state. It rethrows the body's exceptions on the JUnit thread so the stack trace stays clickable in the IDE. Place it on the method for one test, on the class to apply to every test. A Couple More Common Cases A test that exercises a plain validator, with no UI involved at all: Java @CodenameOneTest class EmailValidatorTest { @Test void rejectsEmptyString() { assertFalse(new EmailValidator().isValid("")); } @Test void acceptsCommonAddress() { assertTrue(new EmailValidator().isValid("[email protected]")); } } This is the "pure model code" shape. No @RunOnEdt, no UI, runs on the JUnit worker thread, fast. A test of a form under a specific visual configuration: Java @CodenameOneTest class GreetingFormVisualTest { @Test @RunOnEdt @DarkMode @LargerText(scale = 1.6f) void titleStillFitsInDarkModeAtAccessibilityScale() { new GreetingForm().show(); Form current = Display.getInstance().getCurrent(); assertEquals("Hello", current.getTitle()); assertTrue(current.getPreferredW() <= Display.getInstance().getDisplayWidth()); } } The visual-config annotations (@Theme, @DarkMode, @LargerText, @Orientation, @RTL) apply on the EDT in one batch, followed by a single theme refresh, so the test body sees the simulator in the exact configuration you asked for without flicker. A test that injects a custom property for the duration of one method: Java @Test @RunOnEdt @SimulatorProperty(name = "feature.flag", value = "on") void newCodePathRunsWhenFlagIsOn() { // Display.getProperty("feature.flag", "off") returns "on" here runFeature(); assertEquals("expected", Display.getInstance().getCurrent().getTitle()); Class-level @SimulatorProperty applies to every method in the class. Method-level overrides class-level. Use the container @SimulatorProperties for more than one (the package source level rules out @Repeatable). The full reference, including the dependency-block YAML for common/pom.xml and javase/pom.xml and the @Theme / @Orientation / @RTL details, is at Testing with JUnit 5 in the developer guide. Wrapping Up That is the workflow half of this release. Tomorrow's post covers the new platform APIs that moved into the core this week: AI and OAuth/OIDC are the headline pieces, with wifi/connectivity and a few smaller items alongside them. Back to the weekly index.

By Shai Almog

CORE

Grok AI API Tutorial: Chat, Image, Video, Tool Calling, and Web Search

The xAI Grok API provides access to powerful frontier models, including the Grok 4 series, supporting chat completions (text + vision), image generation, tool calling (function calling and built-in tools like web search), and more advanced features. Quick Intro Sign up at https://x.ai/api.Generate an API key from the console.Install pip install xai-sdk.Set env var: export XAI_API_KEY="your_key_here".Models list: https://docs.x.ai/developers/models. I'll share some samples in Python. Learn how to use Grok AI - xAI Basic Chat API Call Let's first prepare our project before making the API call 1. Install the xai-sdk. Shell pip install xai-sdk 2. Set env var: export XAI_API_KEY="your_key_here" or use .env file. Now, create a new file and this basic setup: Python import os from xai_sdk import Client from xai_sdk.chat import user, system from dotenv import load_dotenv load_dotenv() XAI_API_KEY = os.environ.get("XAI_API_KEY") client = Client(api_key=XAI_API_KEY) Ensure you can print out your XAI_API_KEY correctly at this stage. Next, let's call the chat function: Python ... model = "grok-4-1-fast-non-reasoning" chat = client.chat.create(model=model) chat.append(system("You are Grok, a highly intelligent, helpful AI assistant.")) chat.append(user("How can I be a good developer?")) response = chat.sample() print(response.content) Feel free to switch the model based on your needs or preferences. Here is an example output: Grok AI API basic call Image Generation API Let's see how to generate an image with Grok API. We'll need to use the "grok-imagine-image" model for this. Python ... response = client.image.sample( model="grok-imagine-image", prompt="detective cat searching on website" ) print(f"Generated image: {response.url}") The output is a URL like this: Image generation API using xAI API Video Generation API Generating a video is as easy as generating an image with Grok API. We'll need to use the "grok-imagine-video" model for this. Python response = client.video.generate( prompt="A glowing crystal-powered rocket launching from the red dunes of Mars, ancient alien ruins lighting up in the background as it soars into a sky full of unfamiliar constellations", model="grok-imagine-video", duration=10, aspect_ratio="16:9", resolution="720p", ) print(response.url) Grok Video API example You can set the duration, aspect ratio, and resolution. Tools in Grok The xAI Grok API features powerful tool-calling capabilities, allowing Grok to go far beyond simple text generation. It can take real actions such as performing web searches, running code, retrieving information from your own data sources, or invoking any custom functions you've defined. From x.ai - available tools Tool Calling (Function Calling) Let's start by calling a custom function, as it'll help us call any internal or external API or function. Let's say we want to call a function to look for an item's price. First, we need to define the function, such as adding the name, description, and parameters. Python ... import json from xai_sdk.chat import user, tool, tool_result ... # Define tools tools = [ tool( name="get_item_price", description="Get the price of an item from the store", parameters={ "type": "object", "properties": { "item_name": {"type": "string", "description": "Name of the item to get the price for"}, }, "required": ["item_name"] }, ), ] Upon calling the client method, we now need to include the tool we declared above. Python chat = client.chat.create( model="grok-4.20-reasoning", tools=tools, ) chat.append(user("What is the price of a laptop?")) response = chat.sample() print("========= response ===========") print(response) print("==========================") Important: At this stage, Grok doesn't care if we have the actual function to check the price or not. The AI simply wants to know "what tools are available" for them to use. Try to run the code to see the output from the chat call. Function calling output sample As you can see, Grok can detect the tool we need to call. You can see it from outputs > message > tool_calls . It consists of the name of the function and the arguments that are extracted from the user's prompt, so it'll be dynamic. Function Call Simulation Next, let's create a fake function to call. In real life, it could be a call to a database or APIs. Python def get_item_price(item_name): prices = { "laptop": 999.99, "smartphone": 499.99, "headphones": 199.99, } return {"item_name": item_name, "price": prices.get(item_name, "Item not found")} Following up on the latest code, we can check if the response has a "tool_calls" object or not. If so, we'll call the actual function we just declared above. Python # Handle tool calls if response.tool_calls: chat.append(response) for tc in response.tool_calls: args = json.loads(tc.function.arguments) result = get_item_price(args["item_name"]) chat.append(tool_result(json.dumps(result))) response = chat.sample() print(response.content) We need to loop through the tool_calls objectWe need to extract the argument to pass to the functionCall the actual function alongside the argument valueAdd the information back to our chat method Now, calling the chat.sample() method, will include all the information we received from calling the "fake function" before. Sample result for function calling Let's try with a different prompt: Shell chat.append(user("I need to buy two laptops and a smartphone. Can you tell me how much that will cost?")) Here is the result: Function calling result sample Web Search API Grok can access real-time information through this feature, so you can get up-to-date content. Unlike the function calling above, we don't need to declare a custom function, as it's an internal tool. Here is a simple example: Python import os from xai_sdk import Client from xai_sdk.chat import user from xai_sdk.tools import web_search from dotenv import load_dotenv load_dotenv() XAI_API_KEY = os.environ.get("XAI_API_KEY") client = Client(api_key=XAI_API_KEY) chat = client.chat.create( model="grok-4.20-reasoning", # reasoning model tools=[web_search()], include=["verbose_streaming"], ) chat.append(user("Grok VS OpenAI API")) is_thinking = True for response, chunk in chat.stream(): for tool_call in chunk.tool_calls: print(f"\nCalling tool: {tool_call.function.name} with arguments: {tool_call.function.arguments}") if response.usage.reasoning_tokens and is_thinking: print(f"\rThinking... ({response.usage.reasoning_tokens} tokens)", end="", flush=True) if chunk.content and is_thinking: print("\n\nFinal Response:") is_thinking = False if chunk.content and not is_thinking: print(chunk.content, end="", flush=True) print("\n\nCitations:") print(response.citations) Use tools=[web_search()]To show what's happening in the process, we use include=["verbose_streaming"],is_thinking variable is to check if the process is still running (a boolean variable) Web Search API with Grok AI As you can see, it'll perform several searches on the internal database with different queries. It'll then visit a specific URL after that to get more context. Allowed Domains You can search only in specific domains using allowed_domains. Python tools=[ web_search(allowed_domains=["grokipedia.com"]), ], Exclude Domains Vice versa, you can exclude specific domains: Python chat = client.chat.create( model="grok-4.20-reasoning", tools=[ web_search(excluded_domains=["grokipedia.com"]), ], ) Better Web Search API While you can specifically choose the domain, the keyword Grok uses to find answers on the internet is random. For example, when I'm asking for "Top 3 pizza restaurants from Google Maps in Boston. Share some reviews and ratings for each place." This is what I saw from the thinking process: It needs to perform multiple queries before returning the answer. Another sample, when asking simply for three images: It runs across multiple pages, and unfortunately, the links are not valid. Grok may hallucinate at this point. Web Search API Alternative In some cases, AI-generated keywords are fine, but if you're building an app where you want efficiency and full control over the process, the native "Web Search Tool" can be replaced with a simple API call to a specific API your app needs. For example, to find answers online, SerpApi offers 100+ APIs. Need a generic Google answer? We have: Google Search APIGoogle AI OverviewGoogle AI Mode Same with Bing, DuckDuckGo, and other top search engines. Need a restaurant review? We have: Yelp Reviews APIGoogle Maps Reviews API Need an API for traveling apps? We have: Google Hotels APIGoogle Flights APITripAdvisor API and more! See how SerpApi is the Web Search API for your AI apps, LLM, and agents. Using Grok API With SerpApi To get a sense of how SerpApi works, feel free to test the results in our playground. You can play with different parameters and directly see the JSON sample we return. SerpApi Playground Sample Case Let's say we want to find images via Google Image API like this: Sample result search with SerpApi Step 1: Preparation You can register for free at serpapi.com to get your API key. Step 2: Parsing Keyword Let's say we need three images from Google. Since users can type anything, we need to parse the keyword, as SerpApi simply performs a search using a particular keyword. Python USER_QUERY = "Show me 3 cute cat images from the internet" # Step 1: Ask Grok to extract a search keyword from the user's natural language keyword_chat = client.chat.create(model="grok-3-fast") keyword_chat.append(system("Extract the most relevant search keyword or phrase from the user's message. Reply with only the keyword, nothing else.")) keyword_chat.append(user(USER_QUERY)) keyword_response = keyword_chat.sample() search_keyword = keyword_response.content.strip() print(f"Extracted keyword: {search_keyword}") Step 3: Search via SerpApi We now have the keyword. Let's run a search on SerpApi. Python # Step 2: Search via SerpAPI using simple requests (Google Images) serpapi_params = { "api_key": SERPAPI_API_KEY, "engine": "google_images", "q": search_keyword, "hl": "en", "gl": "us", } serpapi_url = "https://serpapi.com/search" serpapi_response = requests.get(serpapi_url, params=serpapi_params) results = serpapi_response.json() At this stage, you already have the answers you're looking for. Step 4: Filter Results (Optional) Sometimes, we don't need all the information. It's good to filter it programmatically first, so we don't use too many tokens. For example, I'm only interested in the top five answers: Python image_results = results.get("images_results", [])[:5] formatted_results = "\n".join( f"- {img.get('title', 'No title')}: {img.get('original', img.get('thumbnail', 'No URL'))}" for img in image_results ) print(f"\nSerpAPI results:\n{formatted_results}") We can also format the answer as a bonus. Step 5: Reply in Natural Language (Optional) Depending on your application, you may want to answer the user back in natural language. We just need to pass the answers above back to the AI: Python # Step 3: Feed results back to Grok for a final response final_chat = client.chat.create(model="grok-3-fast") final_chat.append(system("You are a helpful assistant. Use the provided search results to answer the user's question.")) final_chat.append(user(f"User question: {USER_QUERY}\n\nSearch results from SerpAPI:\n{formatted_results}\n\nPlease answer the user's question based on these results.")) final_response = final_chat.sample() print(f"\nFinal Response:\n{final_response.content}") Final result: You can try the other APIs for other use cases. Sidenote It's also possible to call the API with the OpenAI SDK. Sample: Python from openai import OpenAI client = OpenAI( api_key=os.getenv("XAI_API_KEY"), base_url="https://api.x.ai/v1", ) Check out the full SerpAPI article collection here.

By Hilman Ramadhan

Optimizing Arm-Based Build Servers With AmpereOne CPUs

What Makes a Good Build Server? In modern cloud-native application development, Continuous Integration, with automated building and testing of software on every commit, has become a standard best practice. This typically involves maintaining a farm of build nodes, which can be physical devices, virtual machines, or containers, that can be provisioned on demand and retired once build tasks are completed. This guide aims to help you configure the ultimate build server for Ampere's Arm-based architecture. We will explore various configuration options (or “knobs and switches”) to optimize a Linux build server’s performance, detailing the performance improvement with each adjustment. This tuning guide will focus on building LLVM-MinGW, a toolchain for creating Windows binaries using the LLVM project. This will be conducted on a Fedora 40 server running on an Ampere® Altra® 128-core server and compared with results from AmpereOne® 192-core servers, highlighting the advancements in that CPU. What Build Server Workloads Look Like Running software builds and testing workloads on a modern build server are inherently dynamic. They typically feature short bursts of CPU-intensive tasks, interspersed with other tasks that aren't as CPU-intensive. Figure 1 illustrates this CPU utilization behavior over time, which we will discuss later. While many build tasks, such as compiling source files, can execute in parallel across numerous CPU cores, other steps are inherently serial, including initial build configuration and linking. Therefore, optimizing complex builds means effectively managing these concurrent processes around unavoidable serial choke points To build the best possible build server, we aim to ensure that our compilation processes remain uninterrupted, that we avoid saturating the system's disk I/O, and that we minimize memory thrashing. This specifically includes minimizing memory page allocation and TLB misses. Furthermore, when builds can be parallelized, it is crucial to keep all available cores busy. Using All Your Cores Modern build servers derive most of their performance from parallel compilation. If your build system is not explicitly configured to execute tasks concurrently, large multi-core systems will be underutilized and build times will scale very poorly. For GNU Make, parallelism is explicit, where -j<N> represents the number of parallel jobs to be started, typically set to the number of available CPU cores (for example, up to -j128 on Ampere Altra or up to -j192 on AmpereOne). Shell make -j<N> Most modern build systems provide similar mechanisms: CMake: Parallelism is controlled by the underlying build tool: Shell cmake --build . --parallel <N> Or by passing -j<N> to make or ninja. Ninja: Parallel by default; concurrency can be limited with: Shell ninja -j<N> Maven: Supports parallel module and project builds: Shell mvn -T <N> or -T <N>C (cores) Gradle: Enables parallel task execution: Shell gradle build --parallel Or via org.gradle.workers.max= <N>. SCons: Similar to Make and Ninja, uses –j: Shell scons -j<N> Building LLVM-MinGW We describe building the LLVM-MinGW project to highlight the steps we used to optimize an Ampere-based build server. LLVM-MinGW is a toolchain to build Windows binaries using the LLVM project that supports the i686, x86_64, armv7, and arm64 architectures. The LLVM-mingw repository provides detailed documentation about the project. After cloning the git repo, we set up a 50 GB RAM disk to run the build on to improve performance. We use the build-all.sh script to run the build for 5 different architectures specified in the TOOLCHAIN_ARCHS variable. Shell git clone https://github.com/mstorsjo/llvm-mingw.git ode Shell # Create RAM disk mkdir llvm-mingw-tmpfs sudo mount | grep llvm-mingw-tmpfs mount -t tmpfs -o size=50G,mode=1777 tmpfs ./llvm-mingw-tmpfs Shell # run build from RAM disk LOG=build-llvm-mingw.log cd llvm-mingw-tmpfs && rm -rf * && cp -r ../llvm-mingw/* . && TOOLCHAIN_ARCHS="i686 x86_64 armv7 aarch64 arm64ec" LLVM_CMAKEFLAGS="-DLLVM_ENABLE_LIBXML2=OFF -DLLDB_ENABLE_PYTHON=OFF -DLLVM_USER_LINKER=lld -DCMAKE_C_COMPILER=/usr/bin/clang -DCMAKE_CXX_COMPILER=/usr/bin/clang++" /usr/bin/time -f '%U, %S, %e, %P' ./build-all.sh --disable-lldb-mi $(pwd)/install/llvm-mingw >& ${LOG} RAM Disk: Using DRAM to Avoid Disk I/O A RAM disk is a filesystem backed by system memory rather than persistent storage. On Linux, this is typically implemented using tmpfs, which allows a directory to behave like a regular filesystem while storing its contents in RAM. Reads and writes to a RAM disk occur at memory speed and are much faster than reading from disks or SSDs. On a build server, a RAM disk is commonly used to store temporary build artifacts such as object files, dependency caches, and intermediate outputs. The build system can then be configured to place its build directory, temporary files, or compiler cache under this path (for example, using an out-of-tree build directory). In our example, the entire build is done on the RAM disk. RAM disks can improve performance because large parallel builds can generate and consume tens of thousands of small files, especially during compilation and linking. Even on fast NVMe SSDs, metadata operations and concurrent I/O can become a bottleneck when hundreds of compiler processes run in parallel. A RAM disk eliminates these I/O constraints by keeping all intermediate data in memory, reducing latency and contention. On high-core-count servers running large parallel builds, this can significantly improve throughput by ensuring that compiler processes are not stalled waiting for disk access. The benefit is most visible when the build is otherwise CPU-bound and sufficient memory is available to avoid swapping. In this configuration, a RAM disk helps the system sustain full CPU utilization across all cores, shortening end-to-end build times. It is always recommended to measure the performance of any configuration changes to understand their impact and to verify that performance has improved. In this case, we tested building using a RAM disk vs. using the system SSD. The build took 1069.1 s running on the SSD and 1062.7 s using the RAM disk, for a ~0.6% speedup. The mpstat utility, part of the sysstat package, was used to measure CPU utilization during the build, with the results plotted in Figure 1. This data shows that the overall CPU utilization is very low for the build, except for the region from ~100 to ~450 seconds and for brief spikes of high CPU utilization. Overall, the average build total CPU utilization was just 53.4% of the Ampere ® Altra ® 128-core server. This raised the question: Why is utilization so poor? Fig 1: Server CPU utilization vs. time for initial LLVM-MinGW build, illustrating poor overall server utilization except for the region approximately 100 – 450 seconds. Our initial studies focused on running the build using the perf record performance profiler to measure performance. Linux’s perf profiler is a very powerful tool that allows you to see what’s running on the CPU at the application level and drill down to shared libraries, functions, individual source code lines and assembly instructions. However, sometimes a focus on low-level metrics prevents understanding global issues. This proved to be one such example. For this project, understanding the true bottleneck required higher-level performance data to grasp why the build was not utilizing the system more efficiently. Instrumenting Build Scripts The LLVM-MinGW build is complex, involving many sub-projects built with different configurations and architectures. To gain detailed insights, we implemented a powerful yet simple instrumentation method using bash’s PS4 variable. This approach inserted timestamps into the build log file before every executed command. By parsing the output log file, we could accurately determine the duration of each command. While analyzing and understanding the various, often nested, phases of the build took time, the instrumentation via bash’s PS4 variable was trivial to implement. Shell # Script to add timestamps to all scripts PS4='DEBUG $0 Line ${LINENO}, time since epoch, $(date "+ %s"): ' SCRIPTS=$(ls *sh) for SCRIPT in ${SCRIPTS}; do cp ${SCRIPT} ${SCRIPT}.orig sed -i '1d' ${SCRIPT} # remove #!/bin/sh line # converts existing scripts to bash scripts # with timestamps using bash’s PS4 cat ../convert_sh_to_bash_debug.sh ${SCRIPT} > bash_${SCRIPT} cp bash_${SCRIPT} ${SCRIPT} done chmod +x *.sh Figure 2 shows an example instrumented output. This output can be parsed to calculate the execution time of individual commands, such as this example of running cmake –G Ninja. Fig 2: Example output after adding timestamps to the build. After analyzing the build output with the timestamps we added, we identified significant time spent in processes that utilized only a single CPU core: running configure and cmake -G Ninja configuration phases, and pulling sources via git. Now that we understood the significant serial phases of the build, we were ready to optimize the server configuration to improve the build performance. Figure 3: Breakdown of build, based on adding instrumentation to measure its phases. This shows a large fraction of the build is due to running configuration scripts (cmake -G Ninja and configure), which are serialized and limit scaling. CPU Performance Governor The CPU frequency governor in Linux is the kernel mechanism that controls a range of performance and power management-related features for each CPU core. It can be used to tell CPU cores to run at maximum frequency, maximum power efficiency, or to dynamically scale on demand. You can see what the current setting is for all CPU cores with the command: Shell cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor The possible values for Linux are: powersave: Prioritizes energy efficiency, but modern CPUs can still ramp frequency under load. A good option for power-limited workloads and laptopsondemand/conservative: Ramps up frequency under load, and down when idle. conservative ramps up and down more slowly than ondemandperformance: Ensures the CPU cores are always running at maximum frequencyschedutil: A common default for modern Linux distributions, uses scheduler utilization metrics to adjust frequency If your CPU cores are not running at their maximum frequency, or if they need to ramp up to maximum frequency when you start a build, this can have a huge impact on build times. By explicitly tuning all CPU cores to be in performance mode, we can maximize build throughput for CPU-bound workloads like software compilation. You can manually set the governor to performance mode using either the cpupower utility or by changing the setting that we viewed earlier on the command line: Shell echo performance | sudo tee \ /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor These changes are runtime only, however. If you want this change to persist across reboots, you can use the tuned service and the tuned-adm utility on Linux to configure how aggressive the power management functions of your operating system behave when your system is idle, and to ensure that the option persists across restarts. To use tuned to configure your server for high performance mode, run: Shell sudo systemctl enable --now tuned sudo tuned-adm profile performance This setting should make a huge difference for high-core, CPU-bound build servers. Software compilation is a workload that benefits from cores running at maximum frequency. Dynamic governors (powersave, ondemand, conservative, schedutil) may let cores scale up only after work arrives, but this will introduce latency before full CPU power is available. For builds that spawn hundreds of parallel compilation jobs, this delay can add up across many cores and many files. Explicitly setting all cores to performance, the governor ensures that every core is immediately available at peak frequency, maximizing throughput for CPU-bound workloads. Performance Governor Impact: 17% Speedup After configuring the server to use the performance governor, we saw a 17% speedup in build time. The build running with a RAM disk improved from 1,063 to 909 seconds. Figure 4: Server CPU utilization vs. time for initial LLVM-MinGW after setting the CPU performance governor, we observed a 17% speedup in build time. CONFIG_HZ: The Heartbeat of the Linux Kernel Have you ever tried to work on multiple tasks at the same time — say, composing an email while preparing a presentation, and attempting to characterize and fix a software bug? This is inefficient for humans because we need time to stop working on one task, gather enough context for the new task, and regain full efficiency — a process we call context switching. CPU cores experience a similar overhead when changing the task they are executing. They must load the instructions and data for the new task into L1 and L2 caches, evicting the old context in the process, and refilling the CPU pipeline with new instructions. This whole process can cost hundreds to a few thousand CPU cycles on modern CPUs. With cores operating at 3 GHz (three billion CPU cycles per second), that means each context switch can incur an overhead of around one microsecond. Inside the kernel process scheduler, there is a separate heartbeat, as the Linux kernel determines which process will get access to each CPU core, based on its priority, how long it has been waiting for a time slice, and a few other criteria. The kernel compile-time parameter CONFIG_HZ determines the frequency that the scheduler makes these decisions, potentially triggering a context switch as it evicts one process and gives another process a time slice. By default, on most Linux distributions, the kernel is compiled with an idle-tickless configuration (the kernel build option CONFIG_NO_HZ=y) and a timer frequency of 250 Hz (the compile-time option CONFIG_HZ_250=y, which sets CONFIG_HZ=250). That means that on busy CPU cores, the process scheduler triggers 250 times a second, or once every 4 milliseconds, but on idle CPU cores, the tick is suppressed, reducing unnecessary interrupts. When the scheduler is triggered, the kernel checks whether the current process running on the core should be preempted and replaced with another process that is waiting for CPU time. On a build server, we typically do not want compilation processes to be preempted. These tasks are relatively short-lived for most source code files, often completing in under a second (and frequently under 100 ms) from C source to object file. As a result, if a compiler process is preempted by the kernel, context switches can represent a significant percentage of the execution time for compiling that source file. There are four possible values for CONFIG_HZ, with 100 Hz, 250 Hz, and 1000 Hz being most common. While higher CONFIG_HZ values are often chosen for workloads demanding very low latency and high responsiveness, for throughput-focused CPU-bound tasks like compilation, a lower CONFIG_HZ is generally beneficial. A lower frequency means the kernel's scheduler checks less often, reducing the likelihood of preemption for short-lived compiler processes and decreasing overall scheduler overhead. For our build server optimization, we will test the effect of setting CONFIG_HZ to 100, which sets the length of a jiffy (the time between clock ticks) to 10ms." Running with a 100 Hz kernel improved the build time to 889.2 vs. 909.31 seconds, for an additional 2% improvement. Building With AmpereOne Lastly, we ran the build on the AmpereOne 192-core processor and compared its performance against the Ampere Altra 128-core processor. The build completed in 603.6 s. The graphs (Figure 5) demonstrate that the multicore-intensive build phase was effectively halved, improving from over 300 seconds to approximately 150 seconds. Despite this significant improvement, we measured that approximately 75% of build time utilized one CPU core, with occasional spikes in demand. To further improve the server CPU utilization, one could run multiple builds and/or CI testing in parallel. Figure 5: Server CPU utilization over time for LLVM-MinGW on Ampere Altra Max ( top) vs. AmpereOne A192-32X Processor (bottom), showing a significant speedup Summary and Recommendations This guide explored our approach to configuring Linux application build servers for optimal performance on Ampere's Arm-based architecture. We detailed how various build systems enable parallel execution and demonstrated effective methods for measuring server utilization. Utilizing an innovative bash shell PS4 variable method, we instrumented the LLVM-MinGW build process, which involves numerous sub-projects across five different architectures. Our analysis revealed that a significant fraction of the build time was consumed by serial configuration phases. As these are inherently serial, they severely limit the overall CPU utilization of the server, even with powerful multi-core processors. Through our targeted optimization efforts — including using RAM disks, setting the CPU performance governor, and tuning kernel parameters such as CONFIG_HZ — we significantly improved build throughput for this specific project. Our findings also highlighted that for complex projects with persistent serial bottlenecks, the ultimate performance gains on high-core systems often come from parallelizing multiple independent builds or CI testing workflows. Recommendations We encourage you to apply these optimization strategies to your own Ampere-based build servers. To efficiently utilize the resources on a multi-core dedicated build server, we recommend that you implement the following recommendations: Implement parallel build flags – this enables the critical compute-intensive phase of builds to completely saturate available cores.Configure the performance CPU governor – ensure that your build server cores are not reducing performance by aggressively entering power-saving mode.Consider CONFIG_HZ adjustments to unleash the full potential of your hardware – context switches can have a significant impact, and slowing the heartbeat of a system minimizes them.To improve server throughput, explore offsetting build jobs or running testing jobs simultaneously. We have seen that configuration and linking stages do not scale as efficiently to multiple cores. If you start builds offset by time, or run jobs with different compute demands, you can execute these single-threaded parts of the build process in parallel with the intensive build stages of other jobs. How to optimally schedule build jobs to maximize throughput is left as an exercise to the reader. For further discussion, share your experiences or seek additional guidance by joining the Ampere developer community forum at https://community.amperecomputing.com. Check out the full Ampere article collection here.

By Dave Neary

The Rise of Microservices Architecture in Scalable Applications

In recent years, building modern applications has changed from what has been seen historically. Usually, in the past, systems were developed with a single, large block of code (referred to as a monolithic design) and would operate fairly well for smaller applications, but with time, as they got larger and more complex, the method of writing software became more of a hindrance to the applications as they required more users and increased speed. Now, companies need their applications to be able to grow quickly, adapt to changes quickly, and be able to support millions of users without any impact on performance, and that is where microservice architecture is so relevant. Microservice architecture has become the way to design scalable applications because applications can be broken into smaller, individual services that can work independently from each other. The trend towards microservice architecture in developing applications that can scale indicates to me that there is a shift in value towards being flexible, quick, and resilient in the highly competitive digital environment we live in today. What Is Microservices Architecture? Microservice architecture is a method of designing an application as a set of distinct parts that operate independently and perform specific tasks. Each microservice communicates with the others via APIs. With a microservice architecture, as opposed to a traditional monolithic system where all of the application’s components are dependent upon one another, developers can modify/update/deploy/scale a single microservice without impacting any of the other microservices in the application. In an e-commerce application, the components include user authentication, product catalog, payment processing, and order processing (each of these services exists as a microservice). Why Microservices Are Gaining Popularity Microservices are more than just a trend; they are the answer to increased demands for scalable, flexible, and high-performing applications. As digital-first business models grow, traditional architectures simply can't keep up, driving a preference for microservices. 1. Scalability Requirements Modern applications often deal with unpredictable user traffic, especially during peak times such as high-volume sales, new product launches, or virally driven surges in user traffic. In a monolithic architecture, scaling means replicating your entire application on expensive resources over a long period, which is inefficient. 2. Quick Development Cycles With the rapid pace of change in the marketplace, speed is key to success in competitive industries today. The use of a microservices architecture enables development teams to develop different services simultaneously without affecting one another’s progress. 3. Technology Flexibility The flexibility of technology is one of the greatest benefits of microservices architecture. Unlike Monolithic systems that typically use only one tech stack, each microservice can be built using the best programming language, framework, or database. For example, a data-intensive microservice can use a high-performance programming language as its primary language, while the UI microservice can use a more flexible front-end framework. 4. Enhanced Fault Containment Failure is a fact of life for big programs. What you do when it happens can make a difference. In a monolithic program, a single bug or failure can shut down an entire application. Microservices provide better fault containment by isolating faults to independent services. When an individual service fails, the failure won't automatically affect the rest of the program. This results in higher overall system availability and an improved user experience. 5. Agreement With DevOps Microservices architecture aligns well with DevOps practices, which focus on automation, collaboration, and continuous delivery. With microservices, teams can develop CI/CD pipelines for each of their services so that they can deploy frequently and reliably. Automated testing, monitoring, and deployment allow them to release updates efficiently with minimal risk. The Benefits of Microservice Architecture The rise in popularity of microservices aligns with current trends in the enterprise landscape; however, many organizations are beginning to realize significant value in microservice architecture for application development and performance. Through the use of microservices, an organization can break down large, complex systems into smaller parts (components). By creating applications using smaller components or microservices, organizations can develop highly scalable, resilient, and efficient systems. 1. Services Can Be Deployed Independently Deployment of one or more services can occur independently using a microservices-based architecture. In traditional applications, deploying even a small change would require deploying the entire application (which could take a long time and add significant risk). 2. Improved Scalability Since microservices inherently have scalability as a key design feature, software development companies can concentrate just on scaling those parts of their applications that require more resources rather than scaling an entire application as was done with Monolith-type applications. 3. Greater Agility Agility is extremely important in today’s digital market that changes rapidly. By allowing multiple teams that consist of members from different functional areas to independently develop their own services using microservices, microservices allow us to increase development speed and decision-making speed. 4. Easier to Manage Codebase It is common for large codebases to become challenging to manage over time. One of the advantages of using a microservices architecture is the ability to create smaller codebases that can be easily managed. 5. Increased Reliability Reliability is one of the most important aspects of any system, especially those with a large number of users. Microservices can help improve reliability by isolating faults between services. Conclusion The increase in the use of microservice architectures within scalable applications has led to a change of focus to properly design systems that are flexible, durable, and can grow with an organization's business needs. By breaking down large, complex applications into smaller independent services, organizations can take advantage of better speed of development, increased scalability, and greater reliability of their systems. Although there are some challenges associated with implementing microservices, the long-term benefits will more than justify any upfront investment required to adopt this architectural style in a modern enterprise. Businesses that have a plan, the right tools, and the right people can quickly realize the full benefits of the microservice architecture while providing high-quality digital experiences to their customers.

By Mitchell Jhonson