DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • The 12 Biggest Android App Development Trends in 2023
  • What Nobody Tells You About Multimodal Data Pipelines for AI Training
  • Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
  • AI Agents in Java: Architecting Intelligent Health Data Systems

Trending

  • Stop Guessing, Start Seeing: A Five -Layer Framework for Monitoring Distributed Systems
  • The Serverless Illusion: When “Pay for What You Use” Becomes Expensive
  • The Art of Token Frugality in Generative AI Applications
  • You Secured the Code. Did You Secure the Model?
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Pragmatic Paths to On-Device AI on Android with ML Kit

Pragmatic Paths to On-Device AI on Android with ML Kit

Use ML Kit to add on-device AI (Text/Barcode/OCR, Object & Pose, Translation) with simple Kotlin APIs-fast, offline, private.

By 
Mohan Sankaran user avatar
Mohan Sankaran
·
Jan. 12, 26 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
1.8K Views

Join the DZone community and get the full member experience.

Join For Free

There isn’t a single canonical way to add on-device AI to Android apps. Your ideal path depends on latency, privacy, UX, and maintainability. Google’s ML Kit gives you interchangeable building blocks — text recognition, barcode scanning, object/pose detection, translation, and more — that you can compose to fit your constraints. This guide lays out a pragmatic architecture, drop-in code, and a performance checklist you can ship in a sprint. The theme is intentional minimalism: pick one capability, wrap it behind a tiny interface, wire it to CameraX if needed, and iterate with metrics instead of speculative complexity.

When ML Kit Is the Smart Choice

  • On-device by default: You get low latency, offline reliability, and strong privacy because images and text don’t need to leave the device for common tasks. This dramatically reduces legal/compliance risk and eliminates network tail latency that can frustrate users during capture flows.
  • Production-hardened models: The bundled models handle rotation, noise, motion blur, and imperfect lighting better than most “roll-your-own” attempts. You benefit from years of tuning without owning a training pipeline.
  • Modular adoption: Add exactly one capability at a time; you don’t need a model server, autoscaling, or a feature-flagged rollout of custom models. That simplicity keeps your blast radius small.
  • Great Android ergonomics: ML Kit works cleanly with CameraX, coroutines, and lifecycle components. That means less boilerplate and fewer foot-guns when you integrate with the camera stack, orientation changes, or backgrounding/foregrounding transitions.

Common wins:

  • Text Recognition for receipts, forms, and serials
  • Barcode Scanning for QR/retail codes, tickets, and boarding passes
  • Object Detection & Tracking for AR-lite highlights and tap-to-focus interactions
  • Pose Detection / Selfie Segmentation for fitness and background effects
  • Language ID + Translation for chat and travel scenarios

Project Setup (minimal friction)

app/build.gradle:

Groovy
 
dependencies {
    // Choose only what you need:
    implementation "com.google.mlkit:text-recognition:latest-version"
    implementation "com.google.mlkit:barcode-scanning:latest-version"
    implementation "com.google.mlkit:object-detection:latest-version"

    // CameraX
    def camerax = "1.3.4"
    implementation "androidx.camera:camera-core:$camerax"
    implementation "androidx.camera:camera-camera2:$camerax"
    implementation "androidx.camera:camera-lifecycle:$camerax"
    implementation "androidx.camera:camera-view:$camerax"

    // Coroutines interop with Google Tasks
    implementation "org.jetbrains.kotlinx:kotlinx-coroutines-play-services:1.7.3"
}


Versioning tip: Use Gradle version catalogs and bump dependencies on a release train, not ad hoc.

Pattern 1: Still-Image Text Recognition (clean, testable)

Kotlin
 
import android.graphics.Bitmap
import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.text.TextRecognition
import com.google.mlkit.vision.text.latin.TextRecognizerOptions
import kotlinx.coroutines.tasks.await

class TextReader : AutoCloseable {
    private val client = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)

    suspend fun read(bitmap: Bitmap): String {
        val image = InputImage.fromBitmap(bitmap, 0)
        val result = client.process(image).await()
        return result.text.trim()
    }

    override fun close() = client.close()
}


Why this scales: Keep ML Kit behind a tiny API you can fake in tests. Normalize rotation at the boundary and return domain objects (e.g., ReceiptFields) rather than raw strings.

Pattern 2: Real-Time CameraX -> Analyzer (live capture)

Kotlin
 
import androidx.camera.core.ImageAnalysis
import androidx.camera.core.ImageProxy
import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.text.TextRecognition
import com.google.mlkit.vision.text.latin.TextRecognizerOptions

class LiveTextAnalyzer(
    private val onText: (String) -> Unit
) : ImageAnalysis.Analyzer {

    private val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)

    override fun analyze(imageProxy: ImageProxy) {
        val mediaImage = imageProxy.image ?: return imageProxy.close()
        val rotation = imageProxy.imageInfo.rotationDegrees
        val image = InputImage.fromMediaImage(mediaImage, rotation)

        recognizer.process(image)
            .addOnSuccessListener { onText(it.text) }
            .addOnCompleteListener { imageProxy.close() } // always close
    }
}


UX polish that users feel: A framing hint (“Align the code inside the box”), a subtle haptic on success, and throttled overlay updates (e.g., 150–250 ms) to avoid flicker.

Pattern 3: Object Detection & Tracking (multi-object, optional labels)

Kotlin
 
import com.google.mlkit.vision.objects.ObjectDetection
import com.google.mlkit.vision.objects.defaults.ObjectDetectorOptions

val detector = ObjectDetection.getClient(
    ObjectDetectorOptions.Builder()
        .setDetectorMode(ObjectDetectorOptions.STREAM_MODE)
        .enableMultipleObjects()
        .enableClassification() // coarse labels like "Food", "Home good"
        .build()
)


Draw rounded rects with stable IDs so users can see continuity across frames. Maintain a simple tracker map to manage per-object UI state.

Security, Privacy, and Accessibility (professional baseline)

  • Privacy UX: Place “Processed on your device; nothing uploaded” near the capture action (not buried in settings).
  • Permission education: Explain why you need camera access before the system dialog.
  • A11y: Announce detections via TalkBack, provide a manual capture button, respect reduced-motion, and avoid focus thrash.
  • Failure design: Time out gracefully, show a retry affordance, and debounce repeated attempts.

Testing & Observability (so it doesn’t regress)

  • Interfaces > implementations: Hide ML Kit behind Repository/UseCase ports and use fakes in unit tests.
  • Golden inputs: Keep a tiny suite of canonical images (good/low light, rotated, blurred). Assert on parsed fields, not raw strings.
  • Cold-start metrics: Track detector init, time-to-first-result, and analyzer throughput (p50/p95).
  • Sampled logs: Log consecutive failures and recovery; keep SLOs honest.

Performance Checklist (drop into your PR)

  • ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST to prevent frame backlogs.
  • Start/stop analyzers with lifecycle; no invisible background work.
  • Reuse recognizers/detectors; close them when screens disappear.
  • Downscale frames when 4K isn’t necessary for the task.
  • Throttle overlay updates; no heavy work on the main thread.
  • Back off on repeated failures (exponential or capped linear).

Hybrid Approaches (when you need domain specificity)

There’s no rule that everything must be on-device. A pragmatic flow:

  1. Use ML Kit to quickly localize candidates on-device.
  2. With explicit consent, send cropped regions to a server model for high-recall verification.
  3. Cache results and translation packs so the user experience degrades gracefully offline.

Takeaway

There are multiple valid paths to ship intelligent camera and language features on Android. ML Kit’s modular APIs let you choose the composition that fits your latency, privacy, and UX goals-without the drag of model hosting. 

Start with one capability (text or barcodes), wrap it behind a clean use-case interface, wire up CameraX, and iterate with the checklist above. You’ll deliver meaningful AI in a single release cycle — safe, measurable, and maintainable.

AI Android (robot)

Opinions expressed by DZone contributors are their own.

Related

  • The 12 Biggest Android App Development Trends in 2023
  • What Nobody Tells You About Multimodal Data Pipelines for AI Training
  • Self-Hosted Inference Doesn’t Have to Be a Nightmare: How to Use GPUStack
  • AI Agents in Java: Architecting Intelligent Health Data Systems

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook