BYOLM with Spring AI & MCP: Secure, Swappable AI Everywhere
Spring AI and MCP empower BYOLM by enabling swappable language models with privacy, control, and extensibility. Learn more about this approach below.
Join the DZone community and get the full member experience.
Join For FreeIntroduction
Artificial intelligence has rapidly moved from research labs into everyday tools. Yet, most users remain locked into vendor‑controlled ecosystems, where the choice of language model (LM) is dictated by the provider. This creates friction for developers, educators, and organizations who want flexibility, privacy, and control. The Bring Your Own Language Model (BYOLM) paradigm challenges this status quo. By designing a configurable middleware layer, extensions for Chrome, Word, and other applications can seamlessly integrate with swappable LLMs. Combined with Spring AI and Model Context Protocol (MCP), this architecture empowers users to safeguard sensitive data, authenticate access securely, and orchestrate reproducible AI labs. This article may be referred to as a sequel to this article on DZone, and readers are encouraged to read it.
Motivation
The motivation behind BYOLM is simple yet powerful: freedom of choice. Traditional AI assistants often operate as black boxes, offering little transparency into how data is processed or stored. For developers and mentors, this lack of control is unacceptable. BYOLM allows individuals and organizations to:
- Select models based on performance, cost, or licensing needs.
- Protect sensitive data by routing requests through private IP layers.
- Experiment and innovate without being tied to a single vendor.
- Educate juniors and non‑tech users with reproducible labs that demonstrate real‑world tradeoffs.
This approach democratizes AI, making it accessible not just to enterprises but also to families, students, and small businesses.
Chrome Plugin: A Single Facade for Everything
At the heart of the system lies a Chrome extension designed as a unified facade. Instead of building separate integrations for Word, Notepad, or other tools, the extension provides a single interface. Through middleware, it abstracts away the complexity of model selection and API differences.
- Unified Access: Users interact with one extension, regardless of the underlying model.
- Swappable Logic: Middleware routes requests to the chosen LLM, whether open‑source (LLaMA, Mistral) or proprietary (GPT‑4/5).
- Ease of Maintenance: Developers avoid rewriting client logic when swapping models.
- Scalability: The facade can be extended to new applications with minimal effort.
This design philosophy mirrors the “write once, run anywhere” principle, ensuring reproducibility and consistency across platforms.
Control Over Sensitive Data
Data privacy is a cornerstone of BYOLM. Instead of sending queries to external servers, requests are routed through a private IP layer.
- Local Hosting: Sensitive information remains within the trusted environment.
- Enterprise‑Grade Security: Aligns with compliance standards while remaining lightweight enough for family use.
- Configurable Middleware: Developers can define routing rules, ensuring that only non‑sensitive queries reach external APIs.
- Transparency: Users know exactly where their data flows, eliminating hidden risks.
This architecture empowers users to balance accessibility with security, a critical requirement in mentoring labs and organizational deployments.
MCP and Spring AI
Two technologies make BYOLM practical: Model Context Protocol (MCP) and Spring AI.
- MCP (Model Context Protocol): Provides schema‑driven interoperability. It standardizes how models communicate, enabling seamless switching between LLMs. MCP ensures that extensions and middleware can remain agnostic to the underlying model.
- Spring AI: Acts as the orchestration layer. It manages tool discovery, endpoint configuration, and modular integration. By leveraging Spring AI, developers can scaffold reproducible labs, integrate third‑party APIs, and maintain clarity in complex workflows.
Together, MCP and Spring AI form the backbone of BYOLM, ensuring that the system is not only flexible but also reproducible and scalable.
Safeguarding via WhatsApp OTP
Authentication is critical when dealing with sensitive AI functions. The system enforces access control through WhatsApp OTP verification.
- Human‑Centric Security: Users receive a one‑time password via WhatsApp, ensuring that only authorized individuals can access sensitive features.
- Ease of Use: Familiar to non‑tech users, reducing friction in adoption.
- Layered Protection: Complements technical safeguards like private IP routing.
- Trust Building: Adds a human verification step, bridging usability with resilience.
This approach ensures that sensitive workflows remain protected without sacrificing accessibility.
It is worth mentioning that for simple, non-commercial use, WAHA, an open source initiative to expose unofficial WhatsApp APIs, was used.
Practical Use Cases
The BYOLM system is not just theoretical. It has practical applications across diverse domains:
- Education: Juniors can experiment with different models, learning tradeoffs in latency, accuracy, and cost.
- Family Chatbots: Sensitive family data remains private, authenticated via WhatsApp OTP.
- Small Businesses: Organizations can integrate BYOLM into productivity tools without vendor lock‑in.
- Mentoring Labs:Developers can scaffold reproducible labs, demonstrating how APIs, GUIs, and plugin SDKs interact.
- A key feature of BYOLM is the ability to compare models. Selection of models in this regard usually follows this commonn
Decision Matrix for Model Selection
Selecting the right model requires balancing multiple dimensions. Accuracy ensures its predictions are correct, while speed determines responsiveness. Scalability matters for growing datasets, and interpretability helps stakeholders trust results. Cost influences feasibility, and robustness ensures stability under varied conditions. Flexibility allows adaptation to new tasks, while data fit measures alignment with available inputs. Maintenance reflects long‑term sustainability, and risk accounts for ethical or regulatory concerns. By scoring candidate models across these criteria, teams can visualize trade‑offs and make transparent, auditable choices. This structured approach prevents bias, highlights priorities, and supports reliable, explainable deployment in real‑world contexts.
|
Criteria |
x_model |
y_model |
z_Model |
|---|---|---|---|
|
Accuracy |
High |
Medium |
High |
|
Speed |
Medium |
High |
Low |
|
Scalability |
High |
Medium |
Medium |
|
Interpretability |
Low |
High |
Medium |
|
Cost |
Medium |
Low |
High |
|
Robustness |
High |
Medium |
Medium |
|
Flexibility |
Medium |
High |
Low |
|
Data Fit |
High |
Medium |
Medium |
|
Maintenance |
Medium |
High |
Low |
|
Risk |
Low |
Medium |
High |
Future Directions
BYOLM is only the beginning. Future enhancements could include:
- Multi‑model orchestration: Routing queries dynamically based on complexity.
- Federated learning: Training models locally while sharing insights securely.
- Cross‑platform plugins: Extending the facade to mobile and desktop environments.
- Advanced authentication: Integrating biometrics with WhatsApp OTP for layered security.
Demo Chrome Extension
To support arbitrary streaming chat end point (Server Sent Event), the extension provisions configuration.

While the screenshot above shows only end point URL as the sole configurable, the next version of the extension provisions advanced configuration such as Authentication, Headers (optional for a backend that might need them for some reason or other), proxy etc.

A working Chrome plug-in is available here, https://chromewebstore.google.com/detail/my-assistant/anhpbigbobkkmibdpepfbmffagligkch . One just needs to plugin one's LLM and chat or middleware end point via configuration shown in [Figure-1]
The code of the plugin is available here.
const chat = document.getElementById('chat');
const input = document.getElementById('input');
let API = null;
chrome.storage.local.get("apiEndpoint", (result) => {
if (result.apiEndpoint) {
console.log("Using endpoint:", result.apiEndpoint);
API=result.apiEndpoint + "/api/chat/whatsapp-stream?q="
}else{
alert("Cannot find apiEndPoint config. Please set it up first before using!");
}
});
input.addEventListener('keydown', (e) => {
console.log("Keydown ---------------------------------");
if (e.key === 'Enter' && input.value.trim().length > 0) {
const userMsg = input.value.trim();
appendMessage(userMsg, 'user');
input.value = '';
// // Simulate bot response
// setTimeout(() => {
// appendMessage("You said: " + userMsg, 'bot');
// }, 500);
var source = new EventSource(API + encodeURIComponent(userMsg));
const answerDiv = appendMessage('Reply: ', 'bot');
source.onmessage = function (event) {
let chunk = event.data;
answerDiv.textContent += ' ' + chunk;
};
source.onerror = () => {
console.log("\n\n❌ Stream disconnected.");
source.close();
};
source.addEventListener('done', (event) => {
console.log('Stream ended:', event.data);
source.close();
});
}
});
chrome.runtime.onInstalled.addListener(() => {
chrome.sidePanel.setOptions({
path: "popup.html",
enabled: true
});
});
function appendMessage(text, sender) {
const msg = document.createElement('div');
msg.className = `msg ${sender}`;
msg.textContent = text;
chat.appendChild(msg);
chat.scrollTop = chat.scrollHeight;
return msg;
}
[Listing -1 : Side Panel chat UI code ]
This video illustrates a specific set of use cases being catered by this chrome plugin
Conclusion
The BYOLM system with Spring AI and MCP represents a paradigm shift in AI adoption. By combining configurable middleware, Chrome extensions, private IP routing, and WhatsApp OTP authentication, it empowers users to take control of their AI workflows. This architecture democratizes AI, making it accessible, secure, and reproducible for developers, families, and organizations alike.
Importantly, choosing the right LLM depends on one’s specific needs — whether prioritizing speed, accuracy, cost, or privacy. Our middleware is not limited to text generation; it already supports agentic actions such as ordering medicine or food, sending emails, delivering WhatsApp messages, and even controlling IoT devices. These capabilities demonstrate how BYOLM can evolve from a simple language interface into a multi‑faceted agentic platform.
As requirements grow, I continue to add more features into the middleware layer, ensuring that users can extend functionality without sacrificing security or control. In a world where vendor lock‑in and privacy concerns dominate, BYOLM offers a refreshing alternative: freedom, flexibility, and trust — augmented by agentic actions that make AI truly useful in everyday life.
Opinions expressed by DZone contributors are their own.
Comments