Migrating Legacy Microservices to Modern Java and TypeScript
Incremental strangler fig migration — containerize first, route traffic gradually, and validate with shadow mode testing.
Join the DZone community and get the full member experience.
Join For Free"Modernize the legacy stack" is a phrase that strikes dread into every senior engineer's heart — and for good reason. Migration projects fail at a notoriously high rate. They balloon in scope, break running systems, and produce tech debt that rivals what they replaced. I led successful migrations of critical microservices to modern runtimes, containerized deployments, and event-driven architectures — on time, without downtime, and with measurable gains in performance and reliability.
This article distills the frameworks, patterns, and hard lessons from those engagements into a practical guide for teams facing similar challenges.
Why Migrations Fail: The Common Traps
Before discussing what works, it's worth naming what doesn't:
- The Big Bang rewrite: Halting feature development to rebuild from scratch. Systems become outdated before they ship. Teams lose institutional knowledge. This almost always fails.
- The framework upgrade without architecture change: Upgrading Java 8 → Java 17 without rethinking the monolithic service structure just ships a faster monolith. The underlying problems remain.
- Ignoring the database layer: Migrating application services while leaving tightly-coupled schemas in place creates a false sense of progress. The database becomes the new bottleneck.
- The missing Strangler Fig:— Attempting to migrate everything simultaneously instead of routing traffic incrementally.
The pattern that works: incremental strangler fig migration with continuous deployment verification.
Phase 0: Characterize Before You Modernize
The first step — before writing a single line of new code — is deep characterization of the existing system.
Build a Dependency Map
# For Maven projects: visualize the dependency tree
mvn dependency:tree -Dverbose | grep -E "(INFO|WARN)" > dep-tree.txt
# For Node.js microservices: check for outdated dependencies
This analysis revealed 23 transitive dependencies that were unmaintained, 4 services using Spring Boot 1.5 (EOL), and 3 services sharing a database schema — a classic anti-pattern in microservice architectures.
Profile the Current System Under Load
You need a baseline to measure progress against. We captured:
- P50/P95/P99 response times per service endpoint
- Memory and CPU utilization under typical load
- Database query execution plans for the top 20 slowest queries
- Error rates and types by service
This data becomes your migration contract: the new system must at minimum match these metrics, and ideally exceed them.
// Capture response time metrics using a Node.js middleware
import { Request, Response, NextFunction } from 'express';
import { Histogram } from 'prom-client';
const httpDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
});
export function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
const end = httpDuration.startTimer();
res.on('finish', () => {
end({ method: req.method, route: req.route?.path ?? req.path, status_code: res.statusCode });
});
next();
}
Phase 1: Containerize Without Changing Logic
The safest first migration step is containerizing existing services without changing their code. This gives you several advantages:
- Establishes Docker/Kubernetes as the deployment standard
- Removes environment-specific configuration from the application
- Exposes hidden environment dependencies (hardcoded paths, implicit file system assumptions)
- Lets the team practice the deployment pipeline before the high-risk code changes
Multi-Stage Dockerfile for Spring Boot
# Stage 1: Build
FROM maven:3.9.4-eclipse-temurin-17 AS build
WORKDIR /app
COPY pom.xml .
# Cache dependencies separately from source code
RUN mvn dependency:go-offline -q
COPY src ./src
RUN mvn clean package -DskipTests
# Stage 2: Runtime
FROM eclipse-temurin:17-jre-jammy AS runtime
WORKDIR /app
# Run as non-root user
RUN addgroup --system appgroup && adduser --system --ingroup appgroup appuser
USER appuser
COPY --from=build /app/target/*.jar app.jar
# JVM tuning for containerized environments
ENV JAVA_OPTS="-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:+UseG1GC \
-XX:+HeapDumpOnOutOfMemoryError"
EXPOSE 8080
Critical flags explained:
-XX:+UseContainerSupport– tells the JVM to respect cgroup memory limits rather than the host's total RAM. Without this, your JVM allocates 25% of the host's 64GB RAM even though the container limit is 2GB.-XX:MaxRAMPercentage=75.0– uses 75% of the container's memory limit for heap.-XX:+HeapDumpOnOutOfMemoryError– writes a heap dump file on OOM for post-mortem analysis.
Kubernetes Deployment With Resource Limits
apiVersion: apps/v1
kind: Deployment
metadata:
name: document-service
spec:
replicas: 3
selector:
matchLabels:
app: document-service
template:
spec:
containers:
- name: document-service
image: registry.internal/document-service:1.0.0
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 20
env:
- name: SPRING_DATASOURCE_URL
valueFrom:
secretKeyRef:
name: db-credentials
Phase 2: The Strangler Fig Pattern in Practice
The Strangler Fig pattern — named after the fig tree that grows around a host tree and gradually replaces it — is the only proven approach for risk-free large-scale migration.
The Routing Proxy
We deployed an NGINX proxy in front of all legacy services. New endpoints are progressively routed to the new service; legacy endpoints remain on the old system until they are fully replaced and validated.
upstream legacy_document_service {
server legacy-docs:8080;
}
upstream new_document_service {
server new-docs:8080;
}
server {
location ~ ^/api/v1/documents/(.*)$ {
# Legacy routes still served by old service
proxy_pass http://legacy_document_service;
}
location ~ ^/api/v2/documents/(.*)$ {
# New endpoints served by migrated service
proxy_pass http://new_document_service;
}
# Feature-flagged shadow routing for validation
location ~ ^/api/v1/documents/generate$ {
# Route 5% of traffic to new service for comparison
set $upstream legacy_document_service;
if ($request_id ~* "^[0-4]") {
set $upstream new_document_service;
}
proxy_pass http://$upstream;
}
Shadow Mode Testing
Before cutting over a migrated endpoint, we ran it in shadow mode: the request was sent to both the old and new service simultaneously, but only the old service's response was returned to the client. We logged and compared both responses.
// Shadow mode middleware for validation
async function shadowTest(req: Request, res: Response, next: NextFunction) {
// Send request to legacy system and return its response
const legacyResponse = await axios(buildLegacyRequest(req));
// Asynchronously compare with new service (fire-and-forget)
shadowCompare(req, legacyResponse).catch((err) =>
logger.warn('Shadow test failed', { path: req.path, error: err.message })
);
// Return legacy response to client
res.status(legacyResponse.status).json(legacyResponse.data);
}
async function shadowCompare(req: Request, legacyResponse: AxiosResponse) {
const newResponse = await axios(buildNewServiceRequest(req));
const match =
legacyResponse.status === newResponse.status &&
isEquivalentResponse(legacyResponse.data, newResponse.data);
await metrics.record({
endpoint: req.path,
match,
legacyDuration: legacyResponse.headers['x-response-time'],
newDuration: newResponse.headers['x-response-time'],
});
This approach let us identify 14 behavioral discrepancies in the new service before any real traffic hit it — issues that would have been production incidents under a hard cutover.
Phase 3: Database Decoupling
The trickiest part of microservice migration is the database. Three services shared a single PostgreSQL schema. Decoupling them required the following sequence:
1. Introduce an Anti-Corruption Layer (ACL)
Before splitting the schema, each service accesses the shared database through a dedicated adapter module. This creates a seam for future extraction.
// Before: Direct shared DB access
const user = await db.query('SELECT * FROM shared.users WHERE id = $1', [userId]);
// After: Routed through ACL
import { UserRepository } from '@domain/users/repository';
const user = await userRepository.findById(userId);
2. Schema Versioning With Flyway
Every schema change goes through Flyway migrations, versioned and reviewed as code:
-- V2.1.3__extract_document_metadata_to_service_schema.sql
-- Create the new schema owned by document-service
CREATE SCHEMA IF NOT EXISTS document_service;
-- Copy data (non-destructive)
CREATE TABLE document_service.metadata AS
SELECT id, document_id, created_at, created_by, file_size
FROM shared.document_metadata;
-- Add constraints to new table
ALTER TABLE document_service.metadata
ADD CONSTRAINT pk_metadata PRIMARY KEY (id),
ADD CONSTRAINT fk_document FOREIGN KEY (document_id)
REFERENCES document_service.documents(id);
-- Dual-write trigger during migration window (dropped after cutover)
CREATE OR REPLACE FUNCTION sync_metadata_to_new_schema()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO document_service.metadata
VALUES (NEW.id, NEW.document_id, NEW.created_at, NEW.created_by, NEW.file_size)
ON CONFLICT (id) DO UPDATE
SET file_size = EXCLUDED.file_size;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER sync_metadata
AFTER INSERT OR UPDATE ON shared.document_metadata
The dual-write trigger ensures both schemas stay in sync during the migration window, providing an instant rollback path.
Phase 4: Event-Driven Decoupling With Kafka
Synchronous service-to-service HTTP calls were causing cascading failures. When the quote-pricing service had elevated latency, the entire quote journey degraded. The solution: replace synchronous calls with asynchronous events via Kafka.
Before: Synchronous Chain
Quote Request → QuoteService → [HTTP] → PricingService → [HTTP] → EligibilityService
A 2-second latency spike in the EligibilityService propagated the full 2 seconds to the user.
After: Event-Driven Quote Journey
// QuoteService publishes an event and returns immediately
async function initiateQuote(request: QuoteRequest): Promise<QuoteAcknowledgement> {
const quoteId = generateQuoteId();
await kafkaProducer.send({
topic: 'quote.initiated',
messages: [{
key: quoteId,
value: JSON.stringify({ quoteId, ...request, timestamp: Date.now() }),
}],
});
// Return immediately — processing is async
return { quoteId, status: 'processing', estimatedCompletion: Date.now() + 3000 };
}
// PricingService subscribes to quote.initiated and publishes quote.priced
kafkaConsumer.on('quote.initiated', async (event) => {
const price = await calculatePrice(event);
await kafkaProducer.send({
topic: 'quote.priced',
messages: [{ key: event.quoteId, value: JSON.stringify({ ...event, price }) }],
});
});
// EligibilityService subscribes to quote.priced
kafkaConsumer.on('quote.priced', async (event) => {
const eligibility = await checkEligibility(event);
await kafkaProducer.send({
topic: 'quote.ready',
messages: [{ key: event.quoteId, value: JSON.stringify({ ...event, eligibility }) }],
});
The client receives a quote ID immediately and polls for completion (or receives a WebSocket push when quote.readyfires). EligibilityService latency no longer affects the user-perceived response time.
Kafka Consumer Error Handling With Dead Letter Queue
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
try {
await processMessage(message);
} catch (error) {
const retryCount = parseInt(message.headers?.['retry-count']?.toString() ?? '0');
if (retryCount < MAX_RETRIES) {
// Publish to retry topic with exponential backoff metadata
await producer.send({
topic: `${topic}.retry`,
messages: [{
...message,
headers: { 'retry-count': String(retryCount + 1), 'retry-after': String(Date.now() + 2 ** retryCount * 1000) },
}],
});
} else {
// Exhausted retries — send to DLQ for manual investigation
await producer.send({
topic: `${topic}.dlq`,
messages: [{ ...message, headers: { 'error': error.message } }],
});
logger.error('Message sent to DLQ', { topic, error: error.message });
}
}
},
Results Across Projects
Cloud migration:
- Document generation latency: P95 reduced from 4.2s → 0.9s
- Service deployment time: reduced from 45 minutes → 6 minutes (containerized CI/CD)
- Zero production incidents during migration due to shadow testing and the strangler fig approach
Health quote journey:
- Quote journey error rate: reduced from 1.8% → 0.12%
- P99 quote initiation latency: reduced from 8.1s → 320ms (async decoupling)
- Infrastructure cost: reduced by 31% through right-sized containers vs. over-provisioned VMs
Migration Playbook: Summary
| Phase | Goal | Key Techniques |
|---|---|---|
| 0: Characterize | Establish baseline | Dependency mapping, performance profiling |
| 1: Containerize | Remove environment coupling | Multi-stage Docker, Kubernetes with resource limits |
| 2: Strangle | Risk-free incremental migration | Routing proxy, shadow mode testing |
| 3: Decouple DB | Eliminate shared schema anti-pattern | ACL, Flyway versioning, dual-write triggers |
| 4: Go async | Eliminate cascade failures | Kafka event streams, DLQ for error resilience |
Conclusion
Microservice modernization is not a technology problem — it's a sequencing problem. The technologies (containers, Kafka, modern JVM runtimes) are mature and well-documented. The challenge is doing it without breaking production systems, maintaining team velocity, and building confidence incrementally. The strangler fig pattern, shadow mode testing, and phased database decoupling are the tools that make the difference between a successful modernization and a multi-year failed rewrite.
Opinions expressed by DZone contributors are their own.
Comments