A Backend-First Approach to Production-Scale LLM Applications
A backend-first AI design with Laravel, queues, Redis, SSE, and MySQL powers resilience, scalability, and uninterrupted user experiences.
Join the DZone community and get the full member experience.
Join For FreeA few months ago, I launched the first version of my platform, which operated without AI functionality. It worked well for its initial purpose, but I knew it could do more. A few weeks ago, I rolled out version two, this time with large language models (LLMs) as its core component. It was designed to operate through a structured workflow in which the frontend sends requests to the backend, where the platform applies business logic before accessing OpenAI's API to generate responses. All operations performed as expected during controlled testing sessions. As more people started using the platform, new problems appeared. These were mostly caused by user actions and factors such as slow internet, accidental browser refreshes, and other interruptions that affected the user experience.
Users will always do unexpected things in production, and not all of it is their fault. I had to accept that and find a way for the platform to handle these hiccups smoothly. The solution was to add safeguards, a safety net to catch problems and keep the system running gracefully. I redesigned the platform, putting the backend at the center of all large language model operations.
This made the platform more efficient. Even if the frontend had issues, AI processes continued without interruption. The backend became the hub, and everything else fell into place, ensuring the platform would run reliably no matter what the user did.
Implementation Walkthrough
The backend is built on Laravel, while Redis and MySQL work together with queues and jobs to operate in the background. The system processes incoming requests by sending jobs, which produce content before returning results and storing everything in the database. The system maintains its operation while preventing data loss throughout the entire process.
Laravel makes queue management straightforward, and although you can use either Redis or database drivers, Redis is usually better for LLM workloads because it handles high-volume tasks much faster. This makes sure that requests move smoothly through the system without delays. The .env configuration is set up like this:
QUEUE_CONNECTION=redis
A worker process then handles queued jobs:
php artisan queue:work --queue=new_llm_requests
This ensures that the system maintains organized, scalable request handling even when multiple users send requests simultaneously.
Defining a Job for LLM Requests
Each LLM request runs inside a Laravel Job, which keeps the work isolated and makes it easy to handle retries if something goes wrong.
class GenerateContent implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function __construct(
public string $prompt,
public string $requestId
) {}
public function handle()
{
//Call OpenAI in stream mode
$client = OpenAI::client(env('OPENAI_API_KEY'));
$stream = $client->chat()->createStreamed([
'model' => 'gpt-4o-mini',
'messages' => [['role' => 'user', 'content' => $this->prompt]],
]);
foreach ($stream as $event) {
if (isset($event['choices'][0]['delta']['content'])) {
$token = $event['choices'][0]['delta']['content'];
//Push token into Redis pub/sub channel
Redis::publish("stream:{$this->requestId}", $token);
}
}
//Store final output in MySQL
Content::create([
'request_id' => $this->requestId,
'prompt' => $this->prompt,
'output' => $fullOutput ?? '',
]);
}
}
Using stream mode allows the job to transmit generated tokens directly from OpenAI as soon as they become available, instead of requiring the complete response to finish.
Dispatch Job From Controller
The controller initiates the job execution after receiving a request from the user while providing a distinct task ID to the user:
public function generate(Request $request)
{
$prompt = $request->input('prompt');
$requestId = uniqid('req_', true);
dispatch((new GenerateContent($prompt, $requestId))->onQueue('new_llm_requests'));
return response()->json(['requestId' => $requestId]);
}
This task ID acts as the handle the frontend uses to subscribe to the results.
SSE Endpoint for Streaming
Through Server-Sent Events (SSE), Laravel enables real-time Redis pub/sub message streaming to the frontend. The frontend receives immediate updates through this method, which eliminates the need for repeated requests and supports applications that require continuous live information.
public function stream($requestId)
{
return response()->stream(function () use ($requestId) {
Redis::subscribe(["stream:{$requestId}"], function ($message) {
echo "data: {$message}\n\n";
ob_flush();
flush();
});
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'Connection' => 'keep-alive',
]);
}
This keeps the connection open and continuously pushes tokens as they arrive.
Frontend Consumption and Auth
The frontend implementation of SSE remains basic, but authentication methods present challenges. The standard browser EventSource implementation does not permit users to set arbitrary headers, which creates problems when trying to authenticate protected routes using Authorization: Bearer <token>.
If your platform makes use of cookie-based authentication, this is not an issue. But for token-based APIs, like many SPAs and mobile apps, it becomes a blocker. You can solve this by using EventSourcePolyfill. It works like EventSource but lets you add custom headers, including authentication headers like Authorization: Bearer <token>. This way, the SSE connection does not have to rely on cookies or URL tokens:
import { EventSourcePolyfill } from 'event-source-polyfill';
const evtSource = new EventSourcePolyfill(`/api/stream/${requestId}`, {
headers: { Authorization: `Bearer ${token}` }
});
evtSource.onmessage = (event) => {
document.getElementById('output').innerText += event.data;
};
evtSource.onerror = () => {
evtSource.close();
};
This small change makes authenticated streaming work smoothly. It ensures users stay in a secure session while still receiving real-time token streams. Without it, you would have to expose the endpoint or rely on cookie-based sessions, neither of which is ideal for modern distributed apps.
Persisting to Database
Once generation is complete, results are saved to MySQL. This persistence layer adds resilience.
If a user refreshes their browser mid-stream, the system checks MySQL:
- If the response is already finished, it loads instantly.
- If it is still in progress, streaming resumes from Redis until completion.
Redis acts as a short-term buffer, while MySQL remains the authoritative datastore. Together, they prevent any "lost generations."
Security and Reliability
When using LLMs in production, you have to be ready for both the expected and the unexpected. Laravel makes this easier with features already available out of the box:
- Per-user quotas can be enforced at the middleware level to prevent abuse.
- Rate limiting is straightforward using Laravel’s ThrottleRequests middleware.
Prompt validation and sanitization should always be applied before sending requests to OpenAI or any LLM API, to protect both your costs and system integrity.
These guardrails ensure that no single user or client request drains resources or introduces risk.
Conclusion
The backend system operates independently of the user interface through a combination of queues, Redis streaming, authenticated SSE, and MySQL persistence for AI task execution. The design maintains flexibility during network instability and unpredictable user behavior. The backend runs in the background to manage scaling, recovery, and reliability, while users experience a stable and secure frontend. The platform transforms unstable production environments into a consistent and reliable user experience.
Opinions expressed by DZone contributors are their own.
Comments