Building an Accessibility-First AI Assistant With IBM Granite and RAG

Learn how to build a disability-aware AI assistant using IBM Granite LLM and retrieval-augmented generation with FastAPI backend and adaptive response generation.

Lohitaksha Patary

Mar. 04, 26 · Analysis

Likes (0)

Comment

Save

2.0K Views

This is a hands-on guide to creating adaptive, disability-aware interfaces using retrieval-augmented generation.

The Problem I Wanted to Solve

Last year, I watched my grandmother struggle at a bank kiosk. The screen was cluttered, the text was small, and she could not hear the audio prompts clearly. An employee eventually helped her, but she looked embarrassed, as if she had done something wrong by needing assistance.

That moment stuck with me.

I started noticing these barriers everywhere: airport check-in machines with tiny fonts, grocery self-checkout screens that assume everyone can see color-coded buttons, hospital registration systems drowning in medical jargon. These interfaces work fine for most people, but they quietly exclude millions of other people with visual impairments, hearing difficulties, cognitive disabilities, or simply those who are not comfortable with technology.

I am a computer science student, so I asked myself what if AI could bridge this gap. Not by replacing human help but by providing an adaptive layer that meets users where they are.

That question led me to build Granite Accessible Assistant, an open-source accessibility layer powered by IBM Granite LLM and retrieval-augmented generation.

What We Are Building And Why It Matters for Developers

This is not just another chatbot. It is a context-aware AI assistant designed specifically for accessibility scenarios. The system:

Retrieves verified information from a curated knowledge base, no hallucinations
Adapts responses based on disability profiles, including blind, deaf, and cognitive
Supports 30-plus languages for inclusive global deployment
Runs locally without sending personal data to external servers

The tech stack includes Python FastAPI, FAISS, SentenceTransformers, React, and IBM Granite 4.0 Micro. Everything is open source and runs on modest hardware; no GPU cluster is required.

If you have been looking for a practical RAG implementation that solves a real problem, this is it.

The Architecture: How It Actually Works

Let me walk you through the system flow, then we will dive into the code.

Here is what happens when a user asks, "How do I check in for my flight?"

Step 1: Profile Resolution

The user selects their accessibility preferences upfront. For a blind user, the system loads this instruction:

    Python
   
 

   DISABILITY_PROFILES = {
    "blind": {
        "instruction": (
            "Respond with clear, step by step verbal instructions. "
            "Avoid visual references like click here or see above."
        )
    },
    "deaf": {
        "instruction": (
            "Respond using clear, concise text. "
            "Avoid audio dependent explanations."
        )
    },
    "cognitive": {
        "instruction": (
            "Use simple language. Break information into short steps. "
            "Avoid technical jargon."
        )
    }
}
  

This is intentionally simple. Complex disability taxonomies would slow down the interface and confuse users. Three clear options cover the majority of use cases.

Step 2: RAG Retrieval

The query gets converted into a vector embedding using SentenceTransformers, specifically, all MiniLM L6 v2, small, fast, and good enough for semantic search. FAISS then finds the most relevant documents from the knowledge base.

    Python
   
 

   class RAGPipeline:
    def __init__(self):
        self.embedder = SentenceTransformer("all-MiniLM-L6-v2")
        self.documents = []
        self.index = None

    def retrieve(self, query: str, k: int = 2):
        query_embedding = self.embedder.encode([query])
        distances, indices = self.index.search(query_embedding, k)
        return [self.documents[i] for i in indices[0]]
  

Why only retrieve two documents? Because more context does not always mean better answers. Too much context confuses the LLM and dilutes the response quality. For accessibility scenarios, precision beats volume.

Step 3: Prompt Construction

This is where the magic happens. The prompt combines:

Grounding rules forcing the model to use only the retrieved context
Disability specific instructions
Language preferences
Formatting requirements numbered steps, short sentences

    Python
   
 

   def build_rag_prompt(context_docs, user_query, disability=None, language=None):
    grounding = (
        "Grounding rules:\n"
        "- Use ONLY the context below\n"
        "- Do not add outside facts or assumptions\n"
        "- If the context does not contain the answer, reply exactly: I don't know\n"
    )

    formatting = (
        "Format your response for accessibility and clarity:\n"
        "- Start with a short title\n"
        "- Then give 3–6 numbered steps (Step 1, Step 2, …)\n"
        "- Keep sentences short; avoid dense paragraphs\n"
    )

    return f"""
You are an accessibility-focused AI assistant.
{grounding}

Additional Instructions:
{disability}
{language}

{formatting}

Context:
{context}

Question:
{user_query}

Answer:
"""
  

The grounding rules are critical. Without them, the model might hallucinate helpful-sounding but incorrect information. The explicit say I do not know instruction prevents the model from guessing.

Step 4: Granite Response Generation

IBM Granite 4.0 Micro generates the final response. I chose this model because it is lightweight, runs on CPU, open source, and handles instruction following well.

    Python
   
 

   def generate_response(prompt: str, max_tokens: int = 200):
    tokenizer, model = load_granite_model()
    inputs = tokenizer(prompt, return_tensors="pt")

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_tokens,
    )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)
  

The Frontend Accessible by Design

The React frontend supports kiosk-style deployment. Here is what makes it accessibility friendly:

High contrast colors and large touch targets: Users with low vision need clear visual boundaries. The interface uses bold colors and generous padding.
Minimal cognitive load: No nested menus or hidden options. Language and accessibility preferences appear upfront with clear labels.
30 plus language options: The system supports English, Spanish, Mandarin, Hindi, Arabic, French, German, Japanese, Korean, Bengali, Urdu, and many more. Language selection, combined with disability profiles, a Hindi-speaking user with cognitive needs gets simple Hindi responses.

    JavaScript
   
 

   const DISABILITY_OPTIONS = [
  { value: 'none', label: 'None' },
  { value: 'blind', label: 'Blind / Low-vision' },
  { value: 'deaf', label: 'Deaf / Hard-of-hearing' },
  { value: 'cognitive', label: 'Cognitive-friendly' }
]
const LANGUAGE_OPTIONS = [
  { value: 'english', label: 'English' },
  { value: 'spanish', label: 'Spanish (Español)' },
  { value: 'mandarin', label: 'Mandarin (中文)' },
  { value: 'hindi', label: 'Hindi (हिन्दी)' },
  // ... 26 more languages
]
  

Why RAG Instead of Fine-Tuning?

You might wonder why not just fine-tune a model on accessibility data.

Three reasons:

Control: With RAG, I know exactly what information the model draws from. If something is wrong, I update the knowledge base, no retraining required.
Auditability: Every response can be traced back to specific documents. This matters for public-facing systems where accountability matters.
Resource efficiency: Fine-tuning requires GPUs, datasets, and expertise. RAG works with a handful of text files and runs on a laptop.

The knowledge base is intentionally simple, plain text files with verified information:

Airport navigation assistance, general kiosk guidance
After arriving, check the nearest display for your flight number and gate.
If you have not checked in, use the airline counter or the self-check-in kiosk.
Follow signs for Security to reach the screening area.
Keep your ID and boarding pass ready before the checkpoint.

Anyone can read, edit, or add to these files. No machine learning expertise required.

If You Want to Run It Yourself

Prerequisites

Python 3.9 plus
Node.js 18 plus for the frontend
8GB RAM minimum

Backend Setup

    Shell
   
   git clone https://github.com/lohitaksha06/granite-accessible-rag.git
cd granite-accessible-rag
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The first run downloads the Granite model, about 500MB, and builds the FAISS index. Subsequent starts are faster.

Frontend Setup

    Shell
   
   cd web
pnpm install  # or npm install
pnpm dev      # starts on localhost:5173

Testing the API directly:

    Shell
   
 

   curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I check in for my flight?",
    "disability": "blind",
    "language": "english"
  }'
  

Real-World Use Cases

This system could deploy in several contexts:

Airport kiosks: Help travelers with disabilities navigate check-in, security, and boarding independently. The system provides step-by-step verbal instructions for blind users or simplified text for those with cognitive needs.
Bank branches: Assist customers with account tasks. Instead of waiting for staff users get personalized guidance adapted to their communication needs.
Hospital registration: Medical environments are stressful. An accessibility aware assistant reduces friction for patients who struggle with standard interfaces.
Grocery self-checkout: Guide users through payment and bagging with clear non-visual instructions.
Government services: Public forms and processes explained in simple multilingual formats.

What Lessons Have We Learned?

Keep disability profiles simple: Early versions had 10-plus disability options. Users got overwhelmed. Three clear categories work better than exhaustive taxonomies.
Grounding rules prevent hallucinations: Without explicit use, only this context instructs the model to add plausible-sounding but incorrect details. Be aggressive about grounding.
Test with real accessibility needs: I consulted with users who have visual and cognitive impairments. Their feedback changed everything from font sizes to response length.
Local models matter: Privacy concerns are real, especially for disability related interactions. Running Granite locally means no data leaves the device.

Extending the System

The architecture supports several extensions:

Sign language avatars: The frontend includes a customizable avatar component. With integration to sign language animation libraries, responses could render as ASL or ISL.
Voice input output: Connect to speech-to-text and text-to-speech APIs for hands-free interaction.
Braille display support: Output formatting already uses structured text that maps well to Braille displays.
Additional knowledge domains: Drop new .txt files into the data docs folder, restart the server, and the RAG pipeline indexes them automatically.

The Tech Stack at a Glance

Component	Technology	Why This Choice
Backend	FastAPI	Async-ready, automatic OpenAPI docs are easy to deploy.
LLM	IBM Granite 4.0 Micro	Open-source runs on a CPU with a good instruction set.
Embeddings	SentenceTransformers all MiniLM L6 v2	Small footprint, fast inference, decent quality.
Vector Store	FAISS	Industry-standard efficient similarity search.
Frontend	React plus Vite	Fast development, good accessibility, and tooling.
Knowledge Base	Plain text files	Human-readable, easy to audit, and update.

Total dependencies: 10 Python packages. No external API keys required for core functionality.

What Is Next?

I am working on:

Benchmark dataset: Standardized tests for accessibility aware AI responses
Multi-modal input: Image-based queries for users who cannot type easily
Deployment guides: Docker containers and Kubernetes configs for production use
Community knowledge base: Crowdsourced accessibility information for more domains

Try It Yourself

The complete code is available on GitHub.

Star the repo if you find it useful. Pull requests welcome, especially for new language profiles, knowledge documents, or accessibility improvements.

If you are building public-facing interfaces, consider who gets left out by your current design. Sometimes the most impactful code is not the most complex. It is the code that helps one more person use your system independently.

AI Assistant (by Speaktoit) RAG

Opinions expressed by DZone contributors are their own.

Related

Trending