A Backend-First Approach to Production-Scale LLM Applications

A backend-first AI design with Laravel, queues, Redis, SSE, and MySQL powers resilience, scalability, and uninterrupted user experiences.

Kolawole Yusuf

Sep. 19, 25 · Analysis

Likes (1)

Comment

Save

2.8K Views

A few months ago, I launched the first version of my platform, which operated without AI functionality. It worked well for its initial purpose, but I knew it could do more. A few weeks ago, I rolled out version two, this time with large language models (LLMs) as its core component. It was designed to operate through a structured workflow in which the frontend sends requests to the backend, where the platform applies business logic before accessing OpenAI's API to generate responses. All operations performed as expected during controlled testing sessions. As more people started using the platform, new problems appeared. These were mostly caused by user actions and factors such as slow internet, accidental browser refreshes, and other interruptions that affected the user experience.

Users will always do unexpected things in production, and not all of it is their fault. I had to accept that and find a way for the platform to handle these hiccups smoothly. The solution was to add safeguards, a safety net to catch problems and keep the system running gracefully. I redesigned the platform, putting the backend at the center of all large language model operations.

This made the platform more efficient. Even if the frontend had issues, AI processes continued without interruption. The backend became the hub, and everything else fell into place, ensuring the platform would run reliably no matter what the user did.

Implementation Walkthrough

The backend is built on Laravel, while Redis and MySQL work together with queues and jobs to operate in the background. The system processes incoming requests by sending jobs, which produce content before returning results and storing everything in the database. The system maintains its operation while preventing data loss throughout the entire process.

Laravel makes queue management straightforward, and although you can use either Redis or database drivers, Redis is usually better for LLM workloads because it handles high-volume tasks much faster. This makes sure that requests move smoothly through the system without delays. The .env configuration is set up like this:

    Shell
   
   QUEUE_CONNECTION=redis

A worker process then handles queued jobs:

    PHP
   
   php artisan queue:work --queue=new_llm_requests

This ensures that the system maintains organized, scalable request handling even when multiple users send requests simultaneously.

Defining a Job for LLM Requests

Each LLM request runs inside a Laravel Job, which keeps the work isolated and makes it easy to handle retries if something goes wrong.

    PHP
   
   class GenerateContent implements ShouldQueue

{

    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function __construct(

        public string $prompt,

        public string $requestId

    ) {}

    public function handle()

    {

        //Call OpenAI in stream mode
        $client = OpenAI::client(env('OPENAI_API_KEY'));

        $stream = $client->chat()->createStreamed([

            'model' => 'gpt-4o-mini',

            'messages' => [['role' => 'user', 'content' => $this->prompt]],

        ]);

        foreach ($stream as $event) {

            if (isset($event['choices'][0]['delta']['content'])) {

                $token = $event['choices'][0]['delta']['content'];

                //Push token into Redis pub/sub channel
                Redis::publish("stream:{$this->requestId}", $token);

            }

        }

        //Store final output in MySQL
        Content::create([

            'request_id' => $this->requestId,

            'prompt'     => $this->prompt,

            'output'     => $fullOutput ?? '',

        ]);

    }

}

Using stream mode allows the job to transmit generated tokens directly from OpenAI as soon as they become available, instead of requiring the complete response to finish.

Dispatch Job From Controller

The controller initiates the job execution after receiving a request from the user while providing a distinct task ID to the user:

    PHP
   
   public function generate(Request $request)

{

    $prompt = $request->input('prompt');

    $requestId = uniqid('req_', true);

    dispatch((new GenerateContent($prompt, $requestId))->onQueue('new_llm_requests'));

    return response()->json(['requestId' => $requestId]);

}

This task ID acts as the handle the frontend uses to subscribe to the results.

SSE Endpoint for Streaming

Through Server-Sent Events (SSE), Laravel enables real-time Redis pub/sub message streaming to the frontend. The frontend receives immediate updates through this method, which eliminates the need for repeated requests and supports applications that require continuous live information.

    PHP
   
   public function stream($requestId)

{

    return response()->stream(function () use ($requestId) {

        Redis::subscribe(["stream:{$requestId}"], function ($message) {

            echo "data: {$message}\n\n";

            ob_flush();

            flush();

        });

    }, 200, [

        'Content-Type' => 'text/event-stream',

        'Cache-Control' => 'no-cache',

        'Connection' => 'keep-alive',

    ]);

}

This keeps the connection open and continuously pushes tokens as they arrive.

Frontend Consumption and Auth

The frontend implementation of SSE remains basic, but authentication methods present challenges. The standard browser EventSource implementation does not permit users to set arbitrary headers, which creates problems when trying to authenticate protected routes using Authorization: Bearer <token>.

If your platform makes use of cookie-based authentication, this is not an issue. But for token-based APIs, like many SPAs and mobile apps, it becomes a blocker. You can solve this by using EventSourcePolyfill. It works like EventSource but lets you add custom headers, including authentication headers like Authorization: Bearer <token>. This way, the SSE connection does not have to rely on cookies or URL tokens:

    JavaScript
   
   import { EventSourcePolyfill } from 'event-source-polyfill';

const evtSource = new EventSourcePolyfill(`/api/stream/${requestId}`, {

  headers: { Authorization: `Bearer ${token}` }

});

evtSource.onmessage = (event) => {

  document.getElementById('output').innerText += event.data;

};

evtSource.onerror = () => {

  evtSource.close();
};

This small change makes authenticated streaming work smoothly. It ensures users stay in a secure session while still receiving real-time token streams. Without it, you would have to expose the endpoint or rely on cookie-based sessions, neither of which is ideal for modern distributed apps.

Persisting to Database

Once generation is complete, results are saved to MySQL. This persistence layer adds resilience.

If a user refreshes their browser mid-stream, the system checks MySQL:

If the response is already finished, it loads instantly.
If it is still in progress, streaming resumes from Redis until completion.

Redis acts as a short-term buffer, while MySQL remains the authoritative datastore. Together, they prevent any "lost generations."

Security and Reliability

When using LLMs in production, you have to be ready for both the expected and the unexpected. Laravel makes this easier with features already available out of the box:

Per-user quotas can be enforced at the middleware level to prevent abuse.
Rate limiting is straightforward using Laravel’s ThrottleRequests middleware.

Prompt validation and sanitization should always be applied before sending requests to OpenAI or any LLM API, to protect both your costs and system integrity.

These guardrails ensure that no single user or client request drains resources or introduces risk.

Conclusion

The backend system operates independently of the user interface through a combination of queues, Redis streaming, authenticated SSE, and MySQL persistence for AI task execution. The design maintains flexibility during network instability and unpredictable user behavior. The backend runs in the background to manage scaling, recovery, and reliability, while users experience a stable and secure frontend. The platform transforms unstable production environments into a consistent and reliable user experience.

Production (computer science) Redis (company) large language model

Opinions expressed by DZone contributors are their own.

Related

Trending