DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How are you handling the data revolution? We want your take on what's real, what's hype, and what's next in the world of data engineering.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Testing, Tools, and Frameworks

The Testing, Tools, and Frameworks Zone encapsulates one of the final stages of the SDLC as it ensures that your application and/or environment is ready for deployment. From walking you through the tools and frameworks tailored to your specific development needs to leveraging testing practices to evaluate and verify that your product or application does what it is required to do, this Zone covers everything you need to set yourself up for success.

icon
Latest Premium Content
Trend Report
Software Supply Chain Security
Software Supply Chain Security
Refcard #376
Cloud-Based Automated Testing Essentials
Cloud-Based Automated Testing Essentials
Refcard #363
JavaScript Test Automation Frameworks
JavaScript Test Automation Frameworks

DZone's Featured Testing, Tools, and Frameworks Resources

Real-Object Detection at the Edge: AWS IoT Greengrass and YOLOv5

Real-Object Detection at the Edge: AWS IoT Greengrass and YOLOv5

By Anil Jonnalagadda
Edge computing has transformed how we process and respond to data. By taking compute capability to the point of data, such as cameras, sensors, and machines, businesses can make decisions faster, reduce latency, save on bandwidth, and enhance privacy. AWS empowers this revolution with a set of edge-capable services, most notably AWS IoT Greengrass. In this article, we'll give an example of how to run a machine learning model (YOLOv5) on an edge device via AWS IoT Greengrass v2 to identify objects in real-time within a retail setting. This is a fault-tolerant and scalable solution appropriate for an intermittently cloud-connected environment. Let's look at a Retail Store Video Analytics on the Edge. Consider a chain of retail outlets wanting to: Detect shoplifters in real time Count customer traffic and interaction Run analytics offline during internet outages Rather than streaming video continuously to the cloud and bearing significant cost and latency, the solution is to perform an ML model at the edge for real-time insights. Architecture Overview To facilitate edge object detection in real time, it is essential to have an optimally structured system whose workload offload and interaction with local infrastructure versus cloud services, such as AWS, is balanced. Besides facilitating ultra-low latency inference, this system can also ensure continuity of operations within environments without consistent connectivity, such as shopping stores, factories, or transportation stations. The architecture consists of: Edge Device (such as NVIDIA Jetson Xavier, Raspberry Pi, and AWS Snowcone) IP Camera Feed (IP cameras installed within the store) ML Model (YOLOv5 for object detection) AWS IoT Greengrass V2 for Inference and Control of Lambda Functions AWS IoT Core to process events on the cloud side We are going to decompose the architecture into steps. 1. Edge Device Layer The main component of edge architecture is the edge device, a small yet versatile compute node close to data sources such as a security camera. Some examples of supported devices are: NVIDIA Jetson Nano/Xavier AGX: Ideal for Machine Learning Acceleration.Raspberry Pi 4: Ideal for light applications and prototyping.AWS Snowcone: Managed and rugged edge device for challenging environments. This layer is responsible for ingesting video frames from mounted cameras and running inference on pre-deployed machine learning models, such as in our situation, YOLOv5. It’s also responsible for managing local decision logic and distributing actionable insights.We remove the need to stream raw video to the cloud by processing video on the device, thus reducing bandwidth consumption and latency by a dramatic amount. 2. Edge Runtime Using AWS IoT Greengrass v2 AWS IoT Greengrass V2 is deployed on the edge device, a light-weight, feature-rich edge runtime serving as glue to connect cloud and local applications. Core competencies encompass: Secure Lambda execution at edge: Allows for execution of a Python (or Java/Node.js) function on an event trigger like sensor change or camera input. Component management: You have your object detection code packaged and deployed as a Greengrass component. These components can be updated, rolled back, and monitored directly from the AWS Management Console or the CLI. Offline mode: Even when the device becomes disconnected from the internet, Greengrass keeps on running inference and keeps messages pending to be sent later. This makes it extremely robust and suitable for retail spaces with poor Wi-Fi or cell phone signals. 3. Machine Learning Inference Pipeline We deploy on the edge device a pre-existing PyTorch YOLOv5 object detection model for relevant image data (e.g., person detection, customer behavior, and interaction with product). An example of such a model is: Either trained from scratch or fine-tuned with Amazon SageMaker, optimized for deployment on the edge via Amazon SageMaker Neo (ONNX or Torch Script converted). Embedded within the inference script, which is either run as a Lambda function or a system process. It processes each frame of the video and produces a list of detected objects, along with bounding boxes, class labels, and scores. Locally, predictions are parsed and filtered. For instance, detections with confidence scores of below 80% may be filtered out to eliminate noise. 4. Publishing Events to AWS IoT Core Subsequently, actionable data is made available on the cloud via MQTT, a lightweight publish/subscribe messaging protocol suitable for edge devices. A message can be: JSON { "label": "person", "confidence": 0.91, "timestamp": "2025-01-01T20:04:12Z" "location": "Store#99 / location code 0x34" } This message is sent to a topic, such as edge/camera/events, on which AWS IoT Core subscribes for downstream routing and analytics. Optional services, such as Amazon Timestream and QuickSight, for time-series analytics and dashboards are worth considering for enhanced analytics and visualization. Let's proceed: Steps Step 1: Set up and install the Edge Device Install AWS IoT Greengrass Core v2 on your edge device: Shell sudo apt update sudo apt install openjdk-11-jdk python3-pip -y wget https://d2s8p88vqu9w66.cloudfront.net/greengrass/v2/install.zip unzip install.zip -d greeng cd greeng sudo ./greengrass-cli installer \ --aws-region us-west-2 \ --thing-name EdgeCamera001 \ --thing-group-name EdgeCameras \ --component-default-user ggc_user:ggc_group This registers the device in AWS IoT Core, installs required components, and prepares it for deployment. Step 2: Write the Edge Logic Script Let's use YOLOv5 and boto3, and write our edge inference logic like this: Python # inference.py import torch, cv2, json from datetime import datetime import boto3 model = torch.hub.load('ultralytics/yolov5', 'yolov5s') # Load YOLOv5 model client = boto3.client('iot-data', region_name='us-west-2') # AWS IoT Core def detect(): cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() results = model(frame) for obj in results.xyxy[0]: label = model.names[int(obj[5])] confidence = float(obj[4]) if confidence > 0.8: payload = { "label": label, "confidence": confidence, "timestamp": datetime.utcnow().isoformat() } client.publish( topic='edge/camera/events', qos=1, payload=json.dumps(payload) ) This script: Captures live frames from a USB or RTSP camera.Runs object detection locally.Sends results to the edge/camera/events topic in AWS IoT Core. Step 3: Package and Deploy Zip your Python script: Shell zip object-detection.zip inference.py Upload to S3: Shell aws s3 cp object-detection.zip s3://your-bucket-name/greengrass/ Create the component recipe: JSON { "RecipeFormatVersion": "2020-01-25", "ComponentName": "com.example.objectdetection", "ComponentVersion": "1.0.0", "ComponentDescription": "YOLOv5 object detection at the edge", "Manifests": [ { "Platform": { "os": "linux" }, "Lifecycle": { "Run": "python3 inference.py" }, "Artifacts": [ { "URI": "s3://your-bucket-name/greengrass/object-detection.zip" } ] } ] } Deploy the component: Shell aws greengrassv2 create-deployment \ --target-arn arn:aws:iot:us-west-2:123456789012:thing/EdgeCamera001 \ --components '{"com.example.objectdetection": {"componentVersion": "1.0.0"}' \ --deployment-name "EdgeObjectDetection" You can subscribe to the edge/camera/events topic in AWS IoT Core to review the incoming detections. Example payload: JSON { "label": "person", "confidence": 0.93, "timestamp": "2025-06-08T14:21:35.123Z" } Use this data to: Create Alerts Using Amazon SNS.Store event streams within Amazon Timestream.Create dashboards in Amazon QuickSight. Conclusion In this article, we examined a sample use case and how it can be accomplished with AWS to deploy end-to-end, robust edge AI applications using well-known devices, such as IoT Greengrass and SageMaker. You can deploy a solution fit for production with just hundreds of lines of code and run real-time object recognition on-site, responding immediately to events, and have it plugged into AWS for analytics and visualization. This is not specific to retail but can also be applied to factories, smart cities, logistics, and healthcare settings where data must be processed close to where it is being created. More
How to Test Multi-Threaded and Concurrent Java

How to Test Multi-Threaded and Concurrent Java

By Thomas Krieger
Testing multi-threaded, concurrent Java code is difficult because each test run only captures one possible thread interleaving, and those interleavings are non-deterministic. To address this, I created the open-source tool VMLens. VMLens allows you to test concurrent Java code in a deterministic and reproducible way by executing all possible thread interleavings. In this guide, I will show you how to use VMLens with a simple example. We will build a concurrent Address class that holds a street and a city. The class should support parallel reading and writing from multiple threads. You can download all examples from this Git repository. The First Problem: A Data Race Here is our first implementation of the Address: Java public class Address { private String street; private String city; public Address(String street, String city) { this.street = street; this.city = city; } public void update(String street, String city) { this.street = street; this.city = city; } public String getStreetAndCity() { return street + ", " + city; } } And the test: Java @Test public void readWrite() throws InterruptedException { try(AllInterleavings allInterleavings = new AllInterleavings("howto.address.regularFieldReadWrite")) { while (allInterleavings.hasNext()) { // Given Address address = new Address("First Street", "First City"); // When Thread first = new Thread() { @Override public void run() { address.update("Second Street","Second City"); } }; first.start(); String streetAndCity = address.getStreetAndCity();; first.join(); // Then assertThat(streetAndCity,anyOf(is("First Street, First City"), is("Second Street, Second City"))); } } } The test updates the Address in a newly started thread, while simultaneously reading the Address inside the original thread. We expect to either read the original address if the read happens before the update. Or the updated address if the read happens after the update. We test this using the assertion: Java assertThat(streetAndCity,anyOf(is("First Street, First City"), is("Second Street, Second City")));` To cover all possible thread interleavings, we wrap the test in a while loop that runs through every possible execution order: Java try(AllInterleavings allInterleavings = new AllInterleavings("howto.address.regularFieldReadWrite")) { while (allInterleavings.hasNext()) { Running the test with VMLens leads to the following error: a data race. A data race occurs when two threads access the same field at the same time without proper synchronization. Synchronization actions — such as reading or writing a volatile field or using a lock — ensure visibility and ordering between threads. Without synchronization, there’s no guarantee that a thread will see the most recently written value. This is because the compiler may reorder instructions, and CPU cores can cache field values independently. Both synchronization actions and data races are formally defined in the Java Memory Model. As the trace shows, there is no synchronization action between the read and write to our street and city variables. So VMLens reports a data race. To fix the error, we add a volatile modifier to both fields. The Second Problem: A Read-Modify-Write Race Condition Here is the new Address class with the two volatile fields: Java public class Address { private volatile String street; private volatile String city; public Address(String street, String city) { this.street = street; this.city = city; } public void update(String street, String city) { this.street = street; this.city = city; } public String getStreetAndCity() { return street + ", " + city; } } When we run our test now, the assertion fails: Plain Text Expected: (is "First Street, First City" or is "Second Street, Second City") but: was "First Street, Second City" We read a partially updated Address. The VMLens report reveals the specific thread interleaving that caused this error: The main thread first reads the street variable before it has been updated. Meanwhile, another thread updates both the street and city variables. When the main thread later reads the city variable, it ends up seeing a partially updated Address. To solve this, we use a ReenatrantLock. The Solution: A Lock For our new test, we add a ReenatrantLock to the update and getStreetAndCity methods: Java public class Address { private final Lock lock = new ReentrantLock(); private String street; private String city; public Address(String street, String city) { this.street = street; this.city = city; } public void update(String street, String city) { lock.lock(); try{ this.street = street; this.city = city; } finally { lock.unlock(); } } public synchronized String getStreetAndCity() { lock.lock(); try{ return street + ", " + city; } finally { lock.unlock(); } } } Now our test succeeds. What to Test? When we write a concurrent class, we want the methods of the class to be atomic. This means that we either see the state before or after the method call. We already tested this for the parallel execution of updating and reading our Address class. What is still missing is a test for the parallel update of our class from two threads. This second test is shown below: Java @Test public void writeWrite() throws InterruptedException { try(AllInterleavings allInterleavings = new AllInterleavings("howto.address.lockWriteWrite")) { while (allInterleavings.hasNext()) { // Given Address address = new Address("First Street", "First City"); // When Thread first = new Thread() { @Override public void run() { address.update("Second Street","Second City"); } }; first.start(); address.update("Third Street","Third City"); first.join(); // Then String streetAndCity = address.getStreetAndCity(); assertThat(streetAndCity,anyOf(is("Second Street, Second City"), is("Third Street, Third City"))); } } } This test also succeeds for the class with the ReenatrantLock. Tests Are Missing The number of cores of the CPU is continuously increasing. In 2020, the processor with the highest core count was the AMD EPYC 7H12 with 64 cores and 128 hardware threads. Today, June 2025, the processor with the highest core count has 288 efficiency cores, the Intel Xeon 6 6900E. AMD increased the core count to 128 and 256 hardware threads with the AMD EPYC 9754. Java with volatile fields, synchronization blocks and the powerful concurrency utilities in java.util.concurrent allows us to use all those cores efficiently. Project Loom with its virtual threads and structured concurrency will further improve this. What is still missing is a way to test that we are using all those techniques correctly. I hope VMLens can fill this gap. Get started with testing multi-threaded, concurrent Java here. More
How to Achieve SOC 2 Compliance in AWS Cloud Environments
How to Achieve SOC 2 Compliance in AWS Cloud Environments
By Chase Bolt
Enterprise-Grade Distributed JMeter Load Testing on Kubernetes: A Scalable, CI/CD-Driven DevOps Approach
Enterprise-Grade Distributed JMeter Load Testing on Kubernetes: A Scalable, CI/CD-Driven DevOps Approach
By Prabhu Chinnasamy
How to Use Testcontainers With ScyllaDB
How to Use Testcontainers With ScyllaDB
By Eduard Knezovic
Turbocharge Load Testing: Yandex.Tank + ghz Combo for Lightning-Fast Code Checks
Turbocharge Load Testing: Yandex.Tank + ghz Combo for Lightning-Fast Code Checks

Hi there! Occasionally, there arises a need for swift load testing, whether it be in a local environment or on a testing platform. Typically, such tasks are tackled using specialized tools that demand thorough prior comprehension. However, within enterprises and startups where rapid time-to-market and prompt hypothesis validation are paramount, excessive tool familiarization becomes a luxury. This article aims to spotlight developer-centric solutions that obviate the necessity for profound engagement, allowing for rudimentary testing without delving into pages of documentation. Local Running You should install: Docker - All services and tools are required for it.Java 19+ - For Kotlin service. Also, you can try to use the Java 8 version; it should work, but you have to change the Gradle settings.Golang Python 3+ - For the Yandex.Tank. Tech Requirements Prior to embarking on our journey, it is advisable to generate a couple of services that can serve as illustrative examples for testing purposes. Stack: Kotlin + webflux.r2dbc + Postgres Our service has: get all stocks (limit 10) GET /api/v1/stocksget stock by name GET /api/v1/stock?name=applesave stock POST /api/v1/stock It should be an easy service because we have to focus on load testing. Kotlin and the HTTP Service Let's start by creating a small service with some basic logic inside. We'll prepare a model for this purpose: Kotlin @Table("stocks") data class Stock( @field:Id val id: Long?, val name: String, val price: BigDecimal, val description: String ) Simple router: Kotlin @Configuration @EnableConfigurationProperties(ServerProperties::class) class StockRouter( private val properties: ServerProperties, private val stockHandler: StockHandler ) { @Bean fun router() = coRouter { with(properties) { main.nest { contentType(APPLICATION_JSON).nest { POST(save, stockHandler::save) } GET(find, stockHandler::find) GET(findAll, stockHandler::findAll) } } } } Handler: Kotlin @Service class StockHandlerImpl( private val stockService: StockService ) : StockHandler { private val logger = KotlinLogging.logger {} private companion object { const val DEFAULT_SIZE = 10 const val NAME_PARAM = "name" } override suspend fun findAll(req: ServerRequest): ServerResponse { logger.debug { "Processing find all request: $req" } val stocks = stockService.getAll(DEFAULT_SIZE) return ServerResponse.ok() .contentType(MediaType.APPLICATION_JSON) .body(stocks, StockDto::class.java) .awaitSingle() } override suspend fun find(req: ServerRequest): ServerResponse { logger.debug { "Processing find all request: $req" } val name = req.queryParam(NAME_PARAM) return if (name.isEmpty) { ServerResponse.badRequest().buildAndAwait() } else { val stocks = stockService.find(name.get()) ServerResponse.ok() .contentType(MediaType.APPLICATION_JSON) .body(stocks, StockDto::class.java) .awaitSingle() } } override suspend fun save(req: ServerRequest): ServerResponse { logger.debug { "Processing save request: $req" } val stockDto = req.awaitBodyOrNull(StockDto::class) return stockDto?.let { dto -> stockService.save(dto) ServerResponse .ok() .contentType(MediaType.APPLICATION_JSON) .bodyValue(dto) .awaitSingle() } ?: ServerResponse.badRequest().buildAndAwait() } } Full code here: GitHub Create a docker file: Shell FROM openjdk:20-jdk-slim VOLUME /tmp COPY build/libs/*.jar app.jar ENTRYPOINT ["java", "-Dspring.profiles.active=stg", "-jar", "/app.jar"] Then, build a docker image and tune it: Shell docker build -t ere/stock-service . docker run -p 8085:8085 ere/stock-service But for now, it's better to stick with the idea of running everything through Docker containers and migrate our service into a Docker Compose setup. YAML version: '3.1' services: db: image: postgres container_name: postgres-stocks ports: - "5432:5432" environment: POSTGRES_PASSWORD: postgres adminer: image: adminer ports: - "8080:8080" stock-service: image: ere/stock-service container_name: stock-service ports: - "8085:8085" depends_on: - db Moving Forward How can we proceed with testing? Specifically, how can we initiate a modest load test for our recently developed service? It’s imperative that the testing solution is both straightforward to install and user-friendly. Given our time constraints, delving into extensive documentation and articles isn’t a viable option. Fortunately, there’s a viable alternative—enter Yandex.Tank. The tank is a powerful instrument for testing and has important integrations with JMeter, but in the article, we will use it as a simple tool. source: https://github.com/yandex/yandex-tank docs: https://yandextank.readthedocs.org/en/latest/ Let's kick off by creating a folder for our tests. Once we've placed the configs and other essential files—fortunately, just a couple of them—we'll be all set. For our service, we need to test methods “get-all” and “save.” The first config for find method. YAML phantom: address: localhost port: "8085" load_profile: load_type: rps schedule: line(100, 250, 30s) writelog: all ssl: false connection_test: true uris: - /api/v1/stocks overload: enabled: false telegraf: enabled: false autostop: autostop: - time(1s,10s) # if request average > 1s - http(5xx,100%,1s) # if 500 errors > 1s - http(4xx,25%,10s) # if 400 > 25% - net(xx,25,10) # if amount of non-zero net-codes in every second of last 10s period is more than 25 Key settings for configuration: Address and port: Same as our application.Load test profile (load_profile): We'll use the 'lined' type, ranging from 100 requests per second to 250 with a 30-second limit.URIs: A list of URLs to be tested.Autostop pattern: No need to stress-test if our service has already gone down! Copy and paste the bash script (tank sh): Shell docker run \ -v $(pwd):/var/loadtest \ --net="host" \ -it yandex/yandex-tank And run! What will we see as a result? Yandex.Tank will log everything it deems worthy during the test. We can observe metrics such as the 99th percentile and requests per second (rps). So, are we stuck with the terminal now? I want a GUI! Don't worry, Yandex.Tank has a solution for that too. We can utilize one of the overload plugins. Here's an example of how to add it: Shell overload: enabled: true package: yandextank.plugins.DataUploader job_name: "save docs" token_file: "env/token.txt" We should add our token; just go here and logic by GitHub: https://overload.yandex.net Okay, dealing with a GET request is straightforward, but what about POST? How do we structure the request? The thing is, you can't just throw the request into the tank; you need to create patterns for it! What are these patterns? It's simple—you need to write a small script, which you can again fetch from the documentation and tweak a bit to suit our needs. And we should add our own body and headers: Python #!/usr/bin/env python3 # -*- coding: utf-8 -*- import sys import json # http request with entity body template req_template_w_entity_body = ( "%s %s HTTP/1.1\r\n" "%s\r\n" "Content-Length: %d\r\n" "\r\n" "%s\r\n" ) # phantom ammo template ammo_template = ( "%d %s\n" "%s" ) method = "POST" case = "" headers = "Host: test.com\r\n" + \ "User-Agent: tank\r\n" + \ "Accept: */*\r\n" + \ "Connection: Close\r\n" def make_ammo(method, url, headers, case, body): """ makes phantom ammo """ req = req_template_w_entity_body % (method, url, headers, len(body), body) return ammo_template % (len(req), case, req) def generate_json(): body = { "name": "content", "price": 1, "description": "description" } url = "/api/v1/stock" h = headers + "Content-type: application/json" s1 = json.dumps(body) ammo = make_ammo(method, url, h, case, s1) sys.stdout.write(ammo) f2 = open("ammo/ammo-json.txt", "w") f2.write(ammo) if __name__ == "__main__": generate_json() Result: PowerShell 212 POST /api/v1/stock HTTP/1.1 Host: test.com User-Agent: tank Accept: */* Connection: Close Content-type: application/json Content-Length: 61 {"name": "content", "price": 1, "description": "description"} That’s it! Just run the script, and we will have ammo-json.txt. Just set new params to config, and delete the URLs: YAML phantom: address: localhost:9001 ammo_type: phantom ammofile: ammo-json.txt And run it one more time! It’s Time to Test the GRPC! Having acquainted ourselves with loading HTTP methods, it’s natural to consider the scenario for GRPC. Are we fortunate enough to have an equally accessible tool for GRPC, akin to the simplicity of a tank? The answer is affirmative. Allow me to introduce you to "ghz." Just take a look: https://ghz.sh/ But before we do that, we should create a small service with Go and GRPC as a good test service. Prepare a small proto file: ProtoBuf syntax = "proto3"; option go_package = "stock-grpc-service/stocks"; package stocks; service StocksService { rpc Save(SaveRequest) returns (SaveResponse) {} rpc Find(FindRequest) returns (FindResponse) {} } message SaveRequest { Stock stock = 1; } message SaveResponse { string code = 1; } message Stock { string name = 1; float price = 2; string description = 3; } message FindRequest { enum Type { INVALID = 0; BY_NAME = 1; } message ByName { string name = 1; } Type type = 1; oneof body { ByName by_name = 2; } } message FindResponse { Stock stock = 1; } And generate it! (also, we should install protoc) Shell protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative stocks.proto Our results: Coding Time! Next steps: Create services as fast as we can. Create dto (stock entity for DB layer) Go package models // Stock – base dto type Stock struct { ID *int64 `json:"Id"` Price float32 `json:"Price"` Name string `json:"Name"` Description string `json:"Description"` } Implement server: Go // Server is used to implement stocks.UnimplementedStocksServiceServer. type Server struct { pb.UnimplementedStocksServiceServer stockUC stock.UseCase } // NewStockGRPCService stock gRPC service constructor func NewStockGRPCService(emailUC stock.UseCase) *Server { return &Server{stockUC: emailUC} } func (e *Server) Save(ctx context.Context, request *stocks.SaveRequest) (*stocks.SaveResponse, error) { model := request.Stock stockDto := &models.Stock{ ID: nil, Price: model.Price, Name: model.Name, Description: model.Description, } err := e.stockUC.Create(ctx, stockDto) if err != nil { return nil, err } return &stocks.SaveResponse{Code: "ok"}, nil } func (e *Server) Find(ctx context.Context, request *stocks.FindRequest) (*stocks.FindResponse, error) { code := request.GetByName().GetName() model, err := e.stockUC.GetByID(ctx, code) if err != nil { return nil, err } response := &stocks.FindResponse{Stock: &stocks.Stock{ Name: model.Name, Price: model.Price, Description: model.Description, } return response, nil } Full code here. Test It! Install GHz with brew (as usual).Let's check a simple example here. Now, we should change it a little bit: Move to the folder with the proto files.Add method: stocks.StocksService.Save.Add simple body: {“stock”: { “name”:”APPL”, “price”: “1.3”, “description”: “apple stocks”} }10 connections will be shared among 20 goroutine workers. Each pair of 2 goroutines will share a single connection.Set service’s port. And the result: Shell cd .. && cd stock-grpc-service/proto ghz --insecure \ --proto ./stocks.proto \ --call stocks.StocksService.Save \ -d '{"stock": { "name":"APPL", "price": "1.3", "description": "apple stocks"} }' \ -n 2000 \ -c 20 \ --connections=10 \ 0.0.0.0:5007 Run it! Plain Text Summary: Count: 2000 Total: 995.93 ms Slowest: 30.27 ms Fastest: 3.11 ms Average: 9.19 ms Requests/sec: 2008.16 Response time histogram: 3.111 [1] | 5.827 [229] |∎∎∎∎∎∎∎∎∎∎∎ 8.542 [840] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 11.258 [548] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 13.973 [190] |∎∎∎∎∎∎∎∎∎ 16.689 [93] |∎∎∎∎ 19.405 [33] |∎∎ 22.120 [29] |∎ 24.836 [26] |∎ 27.551 [6] | 30.267 [5] | Latency distribution: 10 % in 5.68 ms 25 % in 6.67 ms 50 % in 8.27 ms 75 % in 10.49 ms 90 % in 13.88 ms 95 % in 16.64 ms 99 % in 24.54 ms Status code distribution: [OK] 2000 responses And what, stare at everything in the terminal again? No, with ghz, you can also generate a report, but unlike Yandex, it will be generated locally and can be opened in the browser. Just set it: Shell ghz --insecure -O html -o reports_find.html -O + html → output format-o filename Conclusion In summary, when you need a swift assessment of your service's ability to handle a load of 100+ requests per second or identify potential weaknesses, there's no need to initiate intricate processes involving teams, seeking assistance from AQA, or relying on the infrastructure team. More often than not, developers have capable laptops and computers that can execute a small load test. So, go ahead and give it a shot—save yourself some time! I trust you found this brief article beneficial. Valuable Documentation I Recommend Reading: Just in case, if you need more: Yandex.Tank docs Yandex.Tank GitHubYandex.Tank Settingghz official pageghz config:link May the Force Be With You! Thanks once again, and best of luck!

By Ilia Ivankin
Software Specs 2.0: An Elaborate Example
Software Specs 2.0: An Elaborate Example

This article is a follow-up to the article that lays the theoretical foundation for software requirement qualities. Here, I provide an example for how to craft requirements for a User Authentication Login Endpoint. A practical illustration of how essential software requirement qualities can be interwoven when designing specifications for AI-generated code. I demonstrate the crucial interplay between explicitness (to achieve completeness), unambiguity (for machine-first understandability), constraint definition (to guide implementation and ensure viability), and testability (through explicit acceptance criteria). We'll explore how these qualities can practically be achieved through structured documentation. Our goal is that our AI assistant has a clear, actionable blueprint for generating a secure and functional login service. For explanatory purposes and to make clear how things work, I will provide a detailed requirements document. A blueprint that is by no means exhaustive, but it can serve as the basis for understanding and expanding. Documentation can be lightweight in practice, but this article must focus on details to avoid confusion. The document starts by stating the requirement ID and title. A feature's description follows, along with its functional and non-functional requirements. Data definitions, implementation constraints, acceptance criteria, and error handling fundamentals are also documented. Requirement Document: User Authentication - Login Endpoint 1. Requirement ID and Title Unique IDs are crucial for traceability, allowing you to link this specific requirement to design documents, generated code blocks, and test cases. This helps in maintenance and debugging. ID: REQ-AUTH-001Title: User Login Endpoint 2. Feature Description The feature description provides a high-level overview and context. For AI, this helps establish the overall goal before diving into specifics. It answers the "what" at a broad level. This feature provides an API endpoint for registered users to authenticate themselves using their email address and password. Successful authentication will grant access by providing a session token. 3. Functional Requirements (FR) Functional requirements are broken down into atomic, specific statements. Keywords like MUST, SHOULD (though only MUST is used here for strictness) can follow RFC 2119 style, which AI-assistants can be trained to recognize. "Case-insensitive search," "structurally valid email format," and specific counter actions (increment, reset) leave little room for AI misinterpretation. This promotes unambiguity and precision. Details like checking if an account is disabled and the account lockout mechanism (FR11) cover crucial edge cases and security aspects, aiming for explicitness and completeness. FR1: The system MUST expose an HTTPS POST endpoint at /api/v1/auth/login.FR2: The endpoint MUST accept a JSON payload containing email (string) and password (string).FR3: The system MUST validate the provided email: FR3.1: It MUST be a non-empty string.FR3.2: It MUST be a structurally valid email format (e.g., [email protected]).FR4: The system MUST validate the provided password: FR4.1: It MUST comply with a strong password policy.FR5: If input validation (FR3, FR4) fails, the system MUST return an error (see Error Handling EH1).FR6: The system MUST retrieve the user record from the Users database table based on the provided email.FR7: If no user record is found for the email, or if the user account is marked as disabled, the system MUST return an authentication failure error (see Error Handling EH2).FR8: If a user record is found and the account is active, the system MUST verify the provided password against the stored hashed password for the user using the defined password hashing algorithm (see IC3: Security).FR9: If password verification fails, the system MUST increment a failed_login_attempts counter for the user and return an authentication failure error (see Error Handling EH2).FR10: If password verification is successful: FR10.1: The system MUST reset the failed_login_attempts counter for the user to 0.FR10.2: The system MUST generate a JSON Web Token (JWT) (see IC3: Security for JWT specifications).FR10.3: The system MUST return a success response containing the JWT (see Data Definitions - Output).FR11: Account lockout: If failed_login_attempts for a user reaches 5, their account MUST be temporarily locked for 15 minutes. Attempts to log in to a locked account MUST return an account locked error (see Error Handling EH3), even with correct credentials. 4. Data Definitions Clearly defining data definitions (schemas) for inputs and outputs is critical for AI to generate correct data validation, serialization, and deserialization logic. Using terms like "string, required, format: email" helps the AI map to data types and validation rules (e.g., when using Pydantic models). This contributes to Structured Input. Input Payload (JSON): email (string, required, format: email)password (string, required, minLength: 1)Success Output (JSON, HTTPS 200): access_token (string, JWT format)token_type (string, fixed value: "Bearer")expires_in(integer, seconds, representing token validity duration)Error Output (JSON, specific HTTPS status codes - see Error Handling): error_code (string, e.g., "INVALID_INPUT", "AUTH_FAILED", "ACCOUNT_LOCKED")message (string, human-readable error description) 5. Non-Functional Requirements (NFRs) NFRs reduce ambiguity, guide code generation toward aligned behaviors, and make the resulting software easier to verify against clearly defined benchmarks. They make qualities like performance and security testable and unambiguous. Specific millisecond targets and load conditions are set. Also, as an example, specific actions (no password logging, input sanitization) and references to further constraints (IC3) are provided. NFR1 (Performance): The average response time for the login endpoint MUST be less than 300ms under a load of 100 concurrent requests. P99 response time MUST be less than 800ms.NFR2 (Security): All password handling must adhere to security constraints specified in IC3. No sensitive information (passwords) should be logged. Input sanitization must be performed to prevent common injection attacks.NFR3 (Auditability): Successful and failed login attempts MUST be logged to the audit trail with timestamp, user email (for failed attempts, if identifiable), source IP address, and success/failure status. Failed attempts should include the specific failure reason (e.g., "user_not_found," "incorrect_password," "account_locked"). 6. Implementation Constraints and Guidance (IC) This section guides the AI's choices (Python/FastAPI, SQLAlchemy, Pydantic, bcrypt, JWT structure) without dictating the exact low-level code. For the purposes of this article, these specific choices are random and are not considered to be optimal in any sense. You are free to choose your own tech stack, architectural patterns, etc. Implementation constraints can guide towards Viability within the project's ecosystem and to meet specific security and architectural requirements. Also, it should be mentioned that the constraints shown are indicative and are by no means exhaustive. Currently, it depends on the specific AI assistant and the project under development, which constraints are more appropriate. Will there be AI assistants that develop code perfectly without constraints and guidance from humans? It remains to be seen. IC1 (Technology Stack): Backend Language/Framework: Python 3.11+ / FastAPI.Data Validation: Pydantic models derived from Data Definitions.Database Interaction: Use SQLAlchemy ORM with the existing project database session configuration. Target table: Users.IC2 (Architectural Pattern): Logic should be primarily contained within a dedicated AuthenticationService class. The API endpoint controller should delegate to this service.IC3 (Security - Password and Token): Password Hashing: Stored passwords MUST be hashed using bcrypt with a work factor of 12.JWT Specifications: Algorithm: HS256.Secret Key: Retrieved from environment variable JWT_SECRET_KEY.Payload Claims: MUST include sub (user_id), email, exp (expiration time), iat (issued at).Expiration: Tokens MUST expire 1 hour after issuance.IC4 (Environment): The service will be deployed as a Docker container. Configuration values (like JWT_SECRET_KEY, database connection string) MUST be configurable via environment variables.IC5 (Coding Standards): Adhere to PEP 8 style guide.All functions and methods MUST include type hints.All public functions/methods MUST have docstrings explaining purpose, arguments, and return values. 7. Acceptance Criteria (AC - Gherkin Format) Acceptance criteria make the requirements Testable. Gherkin is an example format that is human-readable and structured. A behaviour-driven development tool that can also be used for AI assistants to derive specific test cases. It can cover happy paths and key error/edge cases, providing concrete examples of expected behavior. This gives clear verification targets for the AI-generated code. Plain Text Feature: User Login API Endpoint Background: Given a user "[email protected]" exists with a bcrypt hashed password for "ValidPassword123" And the user account "[email protected]" is not disabled And the user "[email protected]" has 0 failed_login_attempts And the JWT_SECRET_KEY environment variable is set Scenario: Successful Login with Valid Credentials When a POST request is made to "/api/v1/auth/login" with JSON body: """ { "email": "[email protected]", "password": "ValidPassword123" } """ Then the response status code should be 200 And the response JSON should contain an "access_token" (string) And the response JSON should contain "token_type" with value "Bearer" And the response JSON should contain "expires_in" with value 3600 And the "access_token" should be a valid JWT signed with HS256 containing "sub", "email", "exp", "iat" claims And the failed_login_attempts for "[email protected]" should remain 0 Scenario: Login with Invalid Password When a POST request is made to "/api/v1/auth/login" with JSON body: """ { "email": "[email protected]", "password": "InvalidPassword" } """ Then the response status code should be 401 And the response JSON should contain "error_code" with value "AUTHENTICATION_FAILED" And the response JSON should contain "message" with value "Invalid email or password." And the failed_login_attempts for "[email protected]" should be 1 Scenario: Login with Non-Existent Email When a POST request is made to "/api/v1/auth/login" with JSON body: """ { "email": "[email protected]", "password": "AnyPassword" } """ Then the response status code should be 401 And the response JSON should contain "error_code" with value "AUTHENTICATION_FAILED" And the response JSON should contain "message" with value "Invalid email or password." Scenario: Account Lockout after 5 Failed Attempts Given the user "[email protected]" has 4 failed_login_attempts When a POST request is made to "/api/v1/auth/login" with JSON body: # This is the 5th failed attempt """ { "email": "[email protected]", "password": "InvalidPasswordAgain" } """ Then the response status code should be 403 And the response JSON should contain "error_code" with value "ACCOUNT_LOCKED" And the response JSON should contain "message" with value "Account is temporarily locked due to too many failed login attempts." And the failed_login_attempts for "[email protected]" should be 5 8. Error Handling (EH) This dedicated error handling section ensures Completeness by explicitly defining how different failure scenarios are communicated. To improve completeness we need to extensively cover edge cases and error handling. Specify exactly how different errors (validation errors, system errors, network errors) should be caught, logged, and communicated to the user (specific error messages, codes). EH1 (Invalid Input): Trigger: FR3 or FR4 fails.HTTPS Status: 400 Bad Request.Response Body: { "error_code": "INVALID_INPUT", "message": "Invalid input. Email must be valid and password must not be empty." } (Example message, could be more specific based on which field failed).EH2 (Authentication Failure): Trigger: FR7 or FR9 occurs.HTTPS Status: 401 Unauthorized.Response Body: { "error_code": "AUTHENTICATION_FAILED", "message": "Invalid email or password." } (Generic message to prevent user enumeration).EH3 (Account Locked): Trigger: Attempt to log in to an account that is locked per FR11.HTTP Status: 403 Forbidden.Response Body: { "error_code": "ACCOUNT_LOCKED", "message": "Account is temporarily locked due to too many failed login attempts." } Final Remarks The dual purpose. The example User Authentication Login Endpoint requirement is carefully chosen so that it can be used for two purposes. The first is to explain the basic qualities of software requirements, irrespective of who writes the code (a human or AI). The second purpose is to focus on AI-assisted code and how to use requirements to our advantage.Examples used are not exhaustive. All data and examples presented in the eight paragraphs, from requirement ID and title to error handling, are indicative. Many more functional/non-functional requirements can be crafted, as well as data definitions. Acceptance criteria and error handling cases are a minimal sample of what is usually needed in practice. Negative constraints (don't use Z, avoid pattern A), for example, are not provided here but can be very beneficial as well. And of course, you may find that there are other paragraphs, beyond the scope of this article, that are tailored to your documentation needs.Documentation is not static. For clarity and completeness in this article, the documentation for the User Authentication Login Endpoint seems to be static. All is well specified upfront and then fed to the AI-assistant that does the job for us. Although a detailed document can be a good starting point, factors like implementation constraints and guidance can be fully interactive. For AI assistants, for example, with sophisticated chat interfaces, a "dialogue" with AI can be an important part of the process. While initial implementation constraints can be vital, some constraints might be refined or even discovered through interaction with the AI. Wrapping Up I provided a requirements document for a User Authentication Login Endpoint requirement. This example document attempts to be explicit, precise, and constrained. Software requirements must necessarily be viable whilst eminently testable. It's structured to provide an AI code generator with sufficient detail to minimize guesswork and the chances of the AI producing undesirable output. While AI code assistants will probably be more capable and context-aware, the fundamental need for human-defined guidance appears to remain. Guiding an AI assistant for software development could be embedded in project templates. It could be via custom AI assistant configurations (if available), or even as part of a "system prompt" that always precedes specific task prompts. A dynamic set of principles that inform an ongoing interaction with AI can be based on the following. Initial scaffolding: We provide the critical initial direction, ensuring the AI starts on the right path aligned with project standards, architecture, and non-negotiable requirements (especially security).Basis for interaction: Our documentation becomes the foundation for interactive refinement. When the AI produces output, it can be evaluated against our documented requirements.Evolving knowledge base: As the project progresses, parts of our documentation can be updated, or new ones added, reflecting new decisions or learnings.Guardrails for AI autonomy: As AIs gain more autonomy in suggesting larger code blocks or even architectural components, such documents can act as essential guardrails, ensuring their "creativity" stays within acceptable project boundaries.

By Stelios Manioudakis, PhD DZone Core CORE
The Rise of Vibe Coding: Innovation at the Cost of Security
The Rise of Vibe Coding: Innovation at the Cost of Security

Software development teams and professionals are increasingly adopting vibe coding as their preferred approach. Vibe coding involves creating software through instinctual coding methods and minimal planning to achieve quick prototyping or making solutions work immediately. While vibe coding can spark creativity and speed up early development, it usually comes at the cost of security, maintainability, and reliability. This article analyzes the security vulnerabilities of vibe coding and provides essential guidance for developers and organizations to minimize these risks while preserving innovative processes. What Is “Vibe Coding,” Exactly? Vibe coding lacks formal status as a methodology but serves as cultural shorthand. Coding without specifications or architectural planning.Developers sometimes bypass code reviews and testing procedures to expedite product delivery by shipping their applications.Developers depend on Stack Overflow, GitHub Copilot, or ChatGPT excessively while lacking comprehension.Developers often choose to release operational code instead of ensuring it meets the standards of security and scalability or providing documentation. Vibe coding relies heavily on AI tools to generate, refine, and debug code, enabling rapid application iteration and deployment with minimal manual effort. For non-coders dreaming of creating their apps, AI offers an enticing gateway to turn ideas into reality, even profitable ones. However, without professional developer review, AI-generated code can introduce dangerous security vulnerabilities, performance bottlenecks, and critical errors that undermine your entire project. It’s fun, fast, and chaotic, but it’s also a minefield of security vulnerabilities. The Hidden Security Risks of Vibe Coding and Mitigation Strategies Here's the catch-22 with AI: it won't alert you to security vulnerabilities you don't know exist. Think about it — how can you secure systems you don't fully grasp? And if AI built it, chances are you don't truly understand its inner workings. 1. Hardcoded Secrets and Credentials For convenience, vibe coders habitually insert API keys, database passwords, or tokens as plain text within their source code. GitGuardian's alarming report reveals a critical security crisis: 24 million secrets leaked on GitHub last year alone. More troubling still, repositories using AI coding tools, which many developers now rely on, experienced a 40% higher exposure rate. This dangerous trend demands immediate attention from development teams everywhere. Risks Exposure in public repos or error logs.Easy targets for attackers scanning GitHub. Mitigation Use environment variables or a secure secrets management system (e.g., AWS Secrets Manager, Vault).Implement secret scanning tools like GitGuardian or truffleHog. 2. Lack of Input Validation and Sanitization When developers use improvised coding methods, they tend to ignore basic hygiene practices such as user input validation, which results in SQL Injection and other serious vulnerabilities like XSS and Command Injection. Risks Code or commands can be executed from user-supplied input.Data leaks, defacement, or remote access. Mitigation Always validate and sanitize inputs using frameworks (e.g., Joi for Node.js, Marshmallow for Python).Use ORM libraries to prevent SQL injection. 3. Insecure Use of Third-party Libraries Developers often quickly implement solutions by installing an NPM package or Python module from a blog without verifying its security credentials. Risks Supply chain attacks.Malware hidden in typosquatting libraries (e.g., requests vs requestr). Mitigation Use tools like OWASP Dependency-Check or npm audit.Lock versions using package-lock.json, poetry.lock, or pip-tools. 4. Improper Authentication and Authorization Developers frequently create authentication logic too quickly, leading to basic token usage and missed session expiration, as well as role verification processes. Risks Privilege escalation.Account takeover or horizontal access control issues. Mitigation Use industry-tested authentication libraries (e.g., OAuth2.0 via Auth0, Firebase Auth).Implement RBAC or ABAC strategies and avoid custom auth logic. 5. Missing or Insecure Logging The urgency to "just make it work" leads developers to either neglect logging or to log sensitive data. Risks Logs may leak PII, passwords, or tokens.Lack of traceability during incident response. Mitigation Use centralized log systems (e.g., ELK stack, Datadog).Mask sensitive data and ensure proper log rotation. 6. No Security Testing or Code Reviews Code written on vibes is rarely peer-reviewed or subjected to security testing, leaving glaring vulnerabilities undetected. Risks Vulnerabilities stay hidden until exploited.One developer’s mistake could compromise the whole application. Mitigation Automate security testing (SAST, DAST).Enforce code reviews and integrate Git hooks or CI pipelines with tools like SonarQube or GitHub Actions. How to Keep the Vibe and Still Ship Secure Code Best practicedescription Security Champions Appoint team members to advocate for secure practices even during fast-paced dev. Secure Defaults Use templates and boilerplates that include basic security setup. Automated Linting and Testing Add ESLint, Bandit, or security linters to your CI/CD. Threat Modeling Lite Do a 10-minute risk brainstorm before coding a feature. Security as Culture Teach developers how security adds value, not just overhead. Conclusion Vibe coding delivers a fast-paced and enjoyable development experience that unlocks creative freedom. Excessive vibe coding generates critical security vulnerabilities that can damage your team's reputation and lead to lost customer trust. Through the implementation of lightweight security measures and automated checks within your development process, you maintain the creative advantages of vibe coding and safeguard both your application and its users. Even when coding with good vibes, developers must maintain responsible coding practices. Tools You Should Bookmark OWASP Top 10 – https://owasp.org/www-project-top-ten/ Semgrep – https://semgrep.dev/ GitGuardian – https://www.gitguardian.com/ Snyk – https://snyk.io/ OWASP Dependency-Check – https://owasp.org/www-project-dependency-check/ Got questions or want to share your worst vibe-coding disaster? Drop a comment below.

By Sonia Mishra
The Truth About AI and Job Loss
The Truth About AI and Job Loss

I keep finding myself in conversations with family and friends asking, “Is AI coming for our jobs?” Which roles are getting Thanos-snapped first? And will there still be space for junior individual contributors in organizations? And many more. With so many conflicting opinions, I felt overwhelmed and anxious, so I decided to take action instead of staying stuck in uncertainty. So, I began collecting historical data and relevant facts to gain a clearer understanding of the direction and impact of the current AI surge. So, Here’s What We Know Microsoft reports that over 30% of the code on GitHub Copilot is now AI-generated, highlighting a shift in how software is being developed. Major tech companies — including Google, Meta, Amazon, and Microsoft — have implemented widespread layoffs over the past 18–24 months. Current generative AI models, like GPT-4 and CodeWhisperer, can reliably write functional code, particularly for standard, well-defined tasks.Productivity gains: Occupations in which many tasks can be performed by AI are experiencing nearly five times higher growth in productivity than the sectors with the least AI adoption.AI systems still require a human “prompt” or input to initiate the thinking process. They do not ideate independently or possess genuine creativity — they follow patterns and statistical reasoning based on training data.Despite rapid progress, today’s AI is still far from achieving human-level general intelligence (AGI). It lacks contextual awareness, emotional understanding, and the ability to reason abstractly across domains without guidance or structured input.Job displacement and creation: The World Economic Forum's Future of Jobs Report 2025 reveals that 40% of employers expect to reduce their workforce where AI can automate tasks.And many more. There’s a lot of conflicting information out there, making it difficult to form a clear picture. With so many differing opinions, it's important to ground the discussion in facts. So, let’s break it down from a data engineer’s point of view — by examining the available data, identifying patterns, and drawing insights that can help us make sense of it all. Navigating the Noise Let’s start with the topic that’s on everyone’s mind — layoffs. It’s the most talked-about and often the most concerning aspect of the current tech landscape. Below is a trend analysis based on layoff data collected across the tech industry. Figure 1: Layoffs (in thousands) over time in tech industries Although the first AI research boom began in the 1980s, the current AI surge started in the late 2010s and gained significant momentum in late 2022 with the public release of OpenAI's ChatGPT. The COVID-19 pandemic further complicated the technological landscape. Initially, there was a hiring surge to meet the demands of a rapidly digitizing world. However, by 2023, the tech industry experienced significant layoffs, with over 200,000 jobs eliminated in the first quarter alone. This shift was attributed to factors such as economic downturns, reduced consumer demand, and the integration of AI technologies. Since then, as shown in Figure 1, layoffs have continued intermittently, driven by various factors including performance evaluations, budget constraints, and strategic restructuring. For instance, in 2025, companies like Microsoft announced plans to lay off up to 6,800 employees, accounting for less than 3% of its global workforce, as part of an initiative to streamline operations and reduce managerial layers. Between 2024 and early 2025, the tech industry experienced significant workforce reductions. In 2024 alone, approximately 150,000 tech employees were laid off across more than 525 companies, according to data from the US Bureau of Labor Statistics. The trend has continued into 2025, with over 22,000 layoffs reported so far this year, including a striking 16,084 job cuts in February alone, highlighting the ongoing volatility in the sector. It really makes me think — have all these layoffs contributed to the rise in the US unemployment rate? And has the number of job openings dropped too? I think it’s worth taking a closer look at these trends. Figure 2: Employment and unemployment counts in the US from JOLTS DB Figure 2 illustrates employment and unemployment trends across all industries in the United States. Interestingly, the data appear relatively stable over the past few years, which raises some important questions. If layoffs are increasing, where are those workers going? And what about recent graduates who are still struggling to land their first jobs? We’ve talked about the layoffs — now let’s explore where those affected are actually going. While this may not reflect every individual experience, here’s what the available online data reveals. After the Cuts Well, I wondered if the tech job openings have decreased as well? Figure 3: Job openings over the years in the US Even with all the news about layoffs, the tech job market isn’t exactly drying up. As of May 2025, there are still around 238,000 open tech positions across startups, unicorns, and big-name public companies. Just back in December 2024, more than 165,000 new tech roles were posted, bringing the total to over 434,000 active listings that month alone. And if we look at the bigger picture, the US Bureau of Labor Statistics expects an average of about 356,700 tech job openings each year from now through 2033. A lot of that is due to growth in the industry and the need to replace people leaving the workforce. So yes — while things are shifting, there’s still a strong demand for tech talent, especially for those keeping up with evolving skills. With so many open positions still out there, what’s causing the disconnect when it comes to actually finding a job? New Wardrobe for Tech Companies If those jobs are still out there, then it’s worth digging into the specific skills companies are actually hiring for. Recent data from LinkedIn reveals that job skill requirements have shifted by approximately 25% since 2015, and this pace of change is accelerating, with that number expected to double by 2027. In other words, companies are now looking for a broader and more updated set of skills than what may have worked for us over the past decade. Figure 4: Skill bucket The graph indicates that technical skills remain a top priority, with 59% of job postings emphasizing their importance. In contrast, soft skills appear to be a lower priority, mentioned in only 46% of listings, suggesting that companies are still placing greater value on technical expertise in their hiring criteria. Figure 5: AI skill requirement in the US Focusing specifically on the comparison between all tech jobs and those requiring AI skills, a clear trend emerges. As of 2025, around 19% to 25% of tech job postings now explicitly call for AI-related expertise — a noticeable jump from just a few years ago. This sharp rise reflects how deeply AI is becoming embedded across industries. In fact, nearly one in four new tech roles now list AI skills as a core requirement, more than doubling since 2022. Figure 6: Skill distribution in open jobs Python remains the most sought-after programming language in AI job postings, maintaining its top position from previous years. Additionally, skills in computer science, data analysis, and cloud platforms like Amazon Web Services have seen significant increases in demand. For instance, mentions of Amazon Web Services in job postings have surged by over 1,778% compared to data from 2012 to 2014 While the overall percentage of AI-specific job postings is still a small fraction of the total, the upward trend underscores the growing importance of AI proficiency in the modern workforce. Final Thought I recognize that this analysis is largely centered on the tech industry, and the impact of AI can look very different across other sectors. That said, I’d like to leave you with one final thought: technology will always evolve, and the real challenge is how quickly we can evolve with it before it starts to leave us behind. We’ve seen this play out before. In the early 2000s, when data volumes were manageable, we relied on database developers. But with the rise of IoT, the scale and complexity of data exploded, and we shifted toward data warehouse developers, skilled in tools like Hadoop and Spark. Fast forward to the 2010s and beyond, we’ve entered the era of AI and data engineers — those who can manage the scale, variety, and velocity of data that modern systems demand. We’ve adapted before — and we’ve done it well. But what makes this AI wave different is the pace. This time, we need to adapt faster than we ever have in the past.

By Niruta Talwekar
When Airflow Tasks Get Stuck in Queued: A Real-World Debugging Story
When Airflow Tasks Get Stuck in Queued: A Real-World Debugging Story

Recently, my team encountered a critical production issue in which Apache Airflow tasks were getting stuck in the "queued" state indefinitely. As someone who has worked extensively with Scheduler, I've handled my share of DAG failures, retries, and scheduler quirks, but this particular incident stood out both for its technical complexity and the organizational coordination it demanded. The Symptom: Tasks Stuck in Queued It began when one of our business-critical Directed Acyclic Graphs (DAGs) failed to complete. Upon investigation, we discovered several tasks were stuck in the "queued" state — not running, failing, or retrying, just permanently queued. First Steps: Isolating the Problem A teammate and I immediately began our investigation with the fundamental checks: Examined Airflow UI logs: Nothing unusual beyond standard task submission entriesReviewed scheduler and worker logs: The scheduler was detecting the DAGs, but nothing was reaching the workersConfirmed worker health: All Celery workers showed as active and runningRestarted both scheduler and workers: Despite this intervention, tasks remained stubbornly queued Deep Dive: Uncovering a Scheduler Bottleneck We soon suspected a scheduler issue. We observed that the scheduler was queuing tasks but not dispatching them. This led us to investigate: Slot availability across workersMessage queue health (RabbitMQ in our environment)Heartbeat communication logs We initially hypothesized that the scheduler machine might be over occupied because of the dual responsibility of scheduling and DAG parsing other tasks, so we increased the min_file_process_interval to 2 mins. While this reduced CPU utilization by limiting how frequently the scheduler parsed DAG files, it didn't resolve our core issue — tasks remained stuck in the queued state. After further research, we discovered that our Airflow version (2.2.2) contained a known issue causing tasks to become trapped in the queued state under specific scheduler conditions. This bug was fixed in Airflow 2.6.0, with the solution documented in PR #30375. However, upgrading wasn't feasible in the short term. The migration from 2.2.2 to 2.6.0 would require extensive testing, custom plugin adjustments, and deployment pipeline modifications — none of which could be implemented quickly without disrupting other priorities. Interim Mitigations and Configuration Optimizations While working on the backported fix, we implemented several tactical measures to stabilize the system: Increased parsing_processes to 8 to parallelize and improve the DAG parsing timeIncreased scheduler_heartbeat_sec to 30s and increased min_file_process_interval to 120s (up from the default setting of 30s) to reduce scheduler loadImplemented continuous monitoring to ensure tasks were being processed appropriatelyWe also deployed a temporary workaround using a script referenced in this GitHub comment. This script forcibly transitions tasks from queued to running state. We scheduled it via a cron job with an additional filter targeting only task instances that had been queued for more than 10 minutes. This approach provided temporary relief while we finalized our long-term solution. However, we soon discovered limitations with the cron job. While effective for standard tasks that could eventually reach completion once moved from queued to running, it was less reliable for sensor-related tasks. After being pushed to running state, sensor tasks would often transition to up_for_reschedule and then back to queued, becoming stuck again. This required the cron job to repeatedly advance these tasks, essentially functioning as an auxiliary scheduler. We suspect this behavior stems from inconsistencies between the scheduler's in-memory state and the actual task states in the database. This unintentionally made our cron job responsible for orchestrating part of the sensor lifecycle — clearly not a sustainable solution. The Fix: Strategic Backporting After evaluating our options, we decided to backport the specific fix from Airflow 2.6.0 to our existing 2.2.2 environment. This approach allowed us to implement the necessary correction without undertaking a full upgrade cycle. We created a targeted patch by cherry-picking the fix from the upstream PR and applying it to our forked version of Airflow. The patch can be viewed here: GitHub Patch. How to Apply the Patch Important disclaimer: The patch referenced in this article is specifically designed for Airflow deployments using the Celery executor. If you're using a different executor (such as Kubernetes, Local, or Sequential), you'll need to backport the appropriate changes for your specific executor from the original PR (#30375). The file paths and specific code changes may differ based on your executor configuration. If you're facing similar issues, here's how to apply this patch to your Airflow 2.2.2 installation: Download the Patch File First, download the patch from the GitHub link provided above. You can use wget or directly download the patch file: Shell wget -O airflow-queued-fix.patch https://github.com/gurmeetsaran/airflow/pull/1.patch Navigate to Your Airflow Installation Directory This is typically where your Airflow Python package is installed. Shell cd /path/to/your/airflow/installation Apply the Patch Using git Use the git apply command to apply the patch: Shell git apply --check airflow-queued-fix.patch # Test if the patch can be applied cleanly git apply airflow-queued-fix.patch # Actually apply the patch Restart your Airflow scheduler to apply the changes.Monitor task states to verify that newly queued tasks are being properly processed by the scheduler. Note that this approach should be considered a temporary solution until you can properly upgrade to a newer Airflow version that contains the official fix. Organizational Lessons Resolving the technical challenge was only part of the equation. Equally important was our approach to cross-team communication and coordination: We engaged our platform engineering team early to validate our understanding of Airflow's architecture.We maintained transparent communication with stakeholders so they could manage downstream impacts.We meticulously documented our findings and remediation steps to facilitate future troubleshooting.We learned the value of designating a dedicated communicator — someone not involved in the core debugging but responsible for tracking progress, taking notes, and providing regular updates to leadership, preventing interruptions to the engineering team. We also recognized the importance of assembling the right team — collaborative problem-solvers focused on solutions rather than just identifying issues. Establishing a safe, solution-oriented environment significantly accelerated our progress. I was grateful to have the support of a thoughtful and effective manager who helped create the space for our team to stay focused on diagnosing and resolving the issue, minimizing external distractions. Key Takeaways This experience reinforced several valuable lessons: Airflow is powerful but sensitive to scale and configuration parametersComprehensive monitoring and detailed logging are indispensable diagnostic toolsSometimes the issue isn't a failing task but a bottleneck in the orchestration layerVersion-specific bugs can have widespread impact — staying current helps, even when upgrades require planningBackporting targeted patches can be a pragmatic intermediate solution when complete upgrades aren't immediately feasibleEffective cross-team collaboration can dramatically influence incident response outcomes This incident reminded me that while technical expertise is fundamental, the ability to coordinate and communicate effectively across teams is equally crucial. I hope this proves helpful to others who find themselves confronting a mysteriously stuck Airflow task and wondering, "Now what?"

By Gurmeet Saran
Secure IaC With a Shift-Left Approach
Secure IaC With a Shift-Left Approach

Imagine you're building a skyscraper—not just quickly, but with precision. You rely on blueprints to make sure every beam and every bolt is exactly where it should be. That’s what Infrastructure as Code (IaC) is for today’s cloud-native organizations—a blueprint for the cloud. As businesses race to innovate faster, IaC helps them automate and standardize how cloud resources are built. But here’s the catch: speed without security is like skipping the safety checks on that skyscraper. One misconfigured setting, an exposed secret, or a non-compliant resource can bring the whole thing down—or at least cause serious trouble in production. That’s why the shift-left approach to secure IaC matters more than ever. What Does “Shift-Left” Mean in IaC? Shifting left refers to moving security and compliance checks earlier in the development process. Rather than waiting until deployment or runtime to detect issues, teams validate security policies, compliance rules, and access controls as code is written—enabling faster feedback, reduced rework, and stronger cloud governance. For IaC, this means, Scanning Terraform templates and other configuration files for vulnerabilities and misconfigurations before they are deployed.Validating against cloud-specific best practices.Integrating policy-as-code and security tools into CI/CD pipelines. Why Secure IaC Matters? IaC has completely changed the game when it comes to managing cloud environments. It’s like having a fast-forward button for provisioning—making it quicker, more consistent, and easier to repeat across teams and projects. But while IaC helps solve a lot of the troubles around manual operations, it’s not without its own set of risks. The truth is, one small mistake—just a single misconfigured line in a Terraform script—can have massive consequences. It could unintentionally expose sensitive data, leave the door open for unauthorized access, or cause your setup to drift away from compliance standards. And because everything’s automated, those risks scale just as fast as your infrastructure. In cloud environments like IBM Cloud, where IaC tools like Terraform and Schematics automate the creation of virtual servers, networks, storage, and IAM policies, a security oversight can result in- Publicly exposed resources (e.g., Cloud Object Storage buckets or VPC subnets).Over-permissive IAM roles granting broader access than intended.Missing encryption for data at rest or in transit.Hard-coded secrets and keys within configuration files.Non-compliance with regulatory standards like GDPR, HIPAA, or ISO 27001. These risks can lead to data breaches, service disruptions, and audit failures—especially if they go unnoticed until after deployment. Secure IaC ensures that security and compliance are not afterthoughts but are baked into the development process. It enables: Early detection of mis-configurations and policy violations.Automated remediation before deployment.Audit-ready infrastructure, with traceable and versioned security policies.Shift-left security, empowering developers to code safely without slowing down innovation. When done right, Secure IaC acts as a first line of defense, helping teams deploy confidently while reducing the cost and impact of security fixes later in the lifecycle. Components of Secure IaC Framework The Secure IaC Framework is structured into layered components that guide organizations in embedding security throughout the IaC lifecycle. Building Blocks of IaC (Core foundation for all other layers)—These are the fundamental practices required to enable any Infrastructure as Code approach. Use declarative configuration (e.g. Terraform, YAML, JSON).Embrace version control (e.g. Git) for all infrastructure code.Define idempotent and modular code for reusable infrastructure.Enable automation pipelines (CI/CD) for repeatable deployments.Follow consistent naming conventions, tagging policies, and code linting.Build Secure Infrastructure- Focuses on embedding secure design and architectural patterns into the infrastructure baseline. Use secure-by-default modules (e.g. encryption, private subnets).Establish network segmentation, IAM boundaries, and resource isolation.Configure monitoring, logging, and default denial policies.Choose secure providers and verified module sources.Automate Controls - Empowers shift-left security by embedding controls into the development and delivery pipelines. Run static code analysis (e.g. Trivy, Checkov) pre-commit and in CI.Enforce policy-as-code using OPA or Sentinel for approvals and denials.Integrate configuration management and IaC test frameworks (e.g. Terratest).Detect & Respond - Supports runtime security through visibility, alerting, and remediation.Enable drift detection tools to track deviations from IaC definitions.Use runtime compliance monitoring.Integrate with SOAR platforms or incident playbooks.Generate security alerts for real-time remediation and Root Cause Analysis (RCA).Detect & Respond - Supports runtime security through visibility, alerting, and remediation. Enable drift detection tools to track deviations from IaC definitions.Use runtime compliance monitoring (e.g., IBM Cloud SCC).Integrate with SOAR platforms or incident playbooks.Generate security alerts for real-time remediation and RCA.Design Governance—Establishes repeatable, scalable security practices across the enterprise. Promote immutable infrastructure for consistent and tamper-proof environments.Use golden modules or signed templates with organizational guardrails.Implement change management via GitOps, PR workflows, and approval gates.Align with compliance standards (e.g., CIS, NIST, ISO 27001) and produce audit reports. Anatomy of Secure IaC Creating a secure IaC environment involves incorporating several best practices and tools to ensure that the infrastructure is resilient, compliant, and protected against potential threats. These practices are implemented and tracked at various phases of IaC environment lifecycle. Design phase of IaC involves not just identifying the IaC script design and tools decision but also includes the design of incorporating organizational policies into the IaC scripts.Development phase of IaC involves the coding best practices, implementing IaC scripts and policies involved, and also the pre-commit checks that the developer can run before committing. These checks help a clean code check-in and detect the code smells upfront.Build phase of IaC involves all the code security checks and policy verification. This is a quality gate in the pipeline that stops the deployment on any failures.Deployment phase of IaC supports deployment to various environments along with their respective configurations.Maintenance phase of IaC is also a crucial phase, as threat detection, vulnerability detection, and monitoring play a key role. Key Pillars of Secure IaC Below is a list of key pillars of Secure IaC, incorporating all the essential tools and services. These pillars align with cloud-native capabilities to enforce a secure-by-design, shift-left approach for Infrastructure as Code: Reference templates like Deployable Architectures or AWS Terraform Modules. Reusable, templatized infrastructure blueprints designed for security, compliance, and scalability.Promotes consistency across environments (dev/test/prod).Often include pre-approved Terraform templates.Managed IaC platformsllike IBM Cloud Schematics or AWS CloudFormation Enables secure execution of Terraform code in isolated workspaces.Supports: Role-Based Access Control (RBAC)Encrypted variablesApproval workflows (via GitOps or manual)Versioned infrastructure plansLifecycle resource management using IBM Cloud Projects or Azure Blueprints Logical grouping of cloud resources tied to governance and compliance requirements.Simplifies multi-environment deployments (e.g. dev, QA, prod).Integrates with IaC deployment and CI/CD for isolated, secure automation pipelines.Secrets Management Centralized secrets vault to manage: API keysCertificatesIAM credentialsProvides dynamic secrets, automatic rotation, access logging, and fine-grained access policies.Key Management Solutions (KMS/HSM) Protect sensitive data at rest or in transit Manages encryption keys with full customer control and auditability.KMS-backed encryption is critical for storage, databases, and secrets.Compliance Posture Management Provides posture management and continuous compliance monitoring.Enables: Policy-as-Code checks on IaC deploymentsCustom rules enforcementCompliance posture dashboards (CIS, NIST, GDPR)Introduce Continuous Compliance (CC) pipelines as part of the CI/CD pipelines for shift-left enforcement.CI/CD Pipelines (DevSecOps) Integrate security scans and controls into delivery pipelines using GitHub Actions, Tekton, Jenkins, or IBM Cloud Continuous DeliveryPipeline stages include: Terraform lintingStatic analysis (Checkov, tfsec)Secrets scanningCompliance policy validationChange approval gates before Schematics applyPolicy-as-Code Use tools like OPA (Open Policy Agent) policies to: Block insecure resource configurationsRequire tagging, encryption, and access policiesAutomate compliance enforcement during plan and applyIAM & Resource Access Governance Apply least privilege IAM roles for projects, and API keys.Use resource groups to scope access boundaries.Enforce fine-grained access to Secrets Manager, KMS, and Logs.Audit and Logging Integrate with Cloud Logs to: Monitor infrastructure changesAudit access to secrets, projects, and deploymentsDetect anomalies in provisioning behaviorMonitoring and Drift Detection Use monitoring tools like IBM Instana, Drift Detection, or custom Terraform state validation to: Continuously monitor deployed infrastructureCompare live state to defined IaCRemediate unauthorized changes Checklist: Secure IaC 1. Code Validation and Static Analysis Integrate static analysis tools (e.g., Checkov, TFSec) into your development workflow. Scan Terraform templates for misconfigurations and security vulnerabilities. Ensure compliance with best practices and CIS benchmarks. 2. Policy-as-Code Enforcement Define security policies using Open Policy Agent (OPA) or other equivalent tools. Enforce policies during the CI/CD pipeline to prevent non-compliant deployments. Regularly update and audit policies to adapt to evolving security requirements. 3. Secrets and Credential Management Store sensitive information in Secrets Manager. Avoid hardcoding secrets in IaC templates. Implement automated secret rotation and access controls. 4. Immutable Infrastructure and Version Control Maintain all IaC templates in a version-controlled repository (e.g., Git). Implement pull request workflows with mandatory code reviews. Tag and document releases for traceability and rollback capabilities. 5. CI/CD Integration with Security Gates Incorporate security scans and compliance checks into the CI/CD pipeline. Set up approval gates to halt deployments on policy violations. Automate testing and validation of IaC changes before deployment. 6. Secure Execution Environment Utilize IBM Cloud Schematics or AWS Cloud Formation or any equivalent tool for executing Terraform templates in isolated environments. Restrict access to execution environments using IAM roles and policies. Monitor and log all execution activities for auditing purposes. 7. Drift Detection and Continuous Monitoring Implement tools to detect configuration drift between deployed resources and IaC templates. Regularly scan deployed resources for compliance. Set up alerts for unauthorized changes or policy violations. Benefits of Shift-Left Secure IaC Here are the key benefits of adopting Shift-Left Secure IaC, tailored for cloud-native teams focused on automation, compliance, and developer enablement: Early Risk Detection and RemediationFaster, More Secure DeploymentsAutomated Compliance EnforcementReduced Human Error and Configuration DriftImproved Developer ExperienceEnhanced Auditability and TraceabilityReduced Cost of Security FixesStronger Governance with IAM and RBACContinuous Posture Assurance Conclusion Adopting a shift-left approach to secure IaC in cloud platforms isn’t just about preventing mis-configurations—it’s about building smarter from the start. When security is treated as a core part of the development process rather than an afterthought, teams can move faster with fewer surprises down the line. With cloud services like Schematics, Projects, Secrets Manager, Key Management, Cloud Formation, and Azure Blueprints, organizations have all the tools they need to catch issues early, stay compliant, and automate guardrails. However, the true benefit extends beyond security—it establishes the foundation for platform engineering. By baking secure, reusable infrastructure patterns into internal developer platforms, teams create a friction-less, self-service experience that helps developers ship faster without compromising governance.

By Josephine Eskaline Joyce DZone Core CORE
Managing Encrypted Aurora DAS Over Kinesis With AWS SDK
Managing Encrypted Aurora DAS Over Kinesis With AWS SDK

When it comes to auditing and monitoring database activity, Amazon Aurora's Database Activity Stream (DAS) provides a secure and near real-time stream of database activity. By default, DAS encrypts all data in transit using AWS Key Management Service (KMS) with a customer-managed key (CMK) and streams this encrypted data into a Serverless Streaming Data Service - Amazon Kinesis. While this is great for compliance and security, reading and interpreting the encrypted data stream requires additional effort — particularly if you're building custom analytics, alerting, or logging solutions. This article walks you through how to read the encrypted Aurora DAS records from Kinesis using the AWS Encryption SDK. Security and compliance are top priorities when working with sensitive data in the cloud — especially in regulated industries such as finance, healthcare, and government. Amazon Aurora's DAS is designed to help customers monitor database activity in real time, providing deep visibility into queries, connections, and data access patterns. However, this stream of data is encrypted in transit by default using a customer-managed AWS KMS (Key Management Service) key and routed through Amazon Kinesis Data Streams for consumption. While this encryption model enhances data security, it introduces a technical challenge: how do you access and process the encrypted DAS data? The payload cannot be directly interpreted, as it's wrapped in envelope encryption and protected by your KMS CMK. Understanding the Challenge Before discussing the solution, it's important to understand how Aurora DAS encryption works: Envelope Encryption Model: Aurora DAS uses envelope encryption, where the data is encrypted with a data key, and that data key is itself encrypted using your KMS key. Two Encrypted Components: Each record in the Kinesis stream contains: The database activity events encrypted with a data key The data key encrypted with your KMS CMK Kinesis Data Stream Format: The records follow this structure: JSON { "type": "DatabaseActivityMonitoringRecords", "version": "1.1", "databaseActivityEvents": "[encrypted audit records]", "key": "[encrypted data key]" } Solution Overview: AWS Encryption SDK Approach Aurora DAS encrypts data in multiple layers, and the AWS Encryption SDK helps you easily unwrap all that encryption so you can see what’s going on. Here's why this specific approach is required: Handles Envelope Encryption: The SDK is designed to work with the envelope encryption pattern used by Aurora DAS. Integrates with KMS: It seamlessly integrates with your KMS keys for the initial decryption of the data key. Manages Cryptographic Operations: The SDK handles the complex cryptographic operations required for secure decryption. The decryption process follows these key steps: First, decrypt the encrypted data key using your KMS CMK. Then, use that decrypted key to decrypt the database activity events.Finally, decompress the decrypted data to get the readable JSON output Implementation Step 1: Set Up Aurora With Database Activity Streams Before implementing the decryption solution, ensure you have: An Aurora PostgreSQL or MySQL cluster with sufficient permissions A customer-managed KMS key for encryption Database Activity Streams enabled on your Aurora cluster When you turn on DAS, AWS sets up a Kinesis stream called aws-rds-das-[cluster-resource-id] that receives the encrypted data. Step 2: Prepare the AWS Encryption SDK Environment For decrypting DAS events, your processing application (typically a Lambda function) needs the AWS Encryption SDK. This SDK is not included in standard AWS runtimes and must be added separately. Why this matters: The AWS Encryption SDK provides specialized cryptographic algorithms and protocols designed specifically for envelope encryption patterns used by AWS services like DAS. The most efficient approach is to create a Lambda Layer containing: aws_encryption_sdk: Required for the envelope decryption process boto3: Needed for AWS service interactions, particularly with KMS Step 3: Implement the Decryption Logic Here’s a Lambda function example that handles decrypting DAS events. Each part of the decryption process is thoroughly documented with comments in the code: Python import base64 import json import zlib import boto3 import aws_encryption_sdk from aws_encryption_sdk import CommitmentPolicy from aws_encryption_sdk.internal.crypto import WrappingKey from aws_encryption_sdk.key_providers.raw import RawMasterKeyProvider from aws_encryption_sdk.identifiers import WrappingAlgorithm, EncryptionKeyType # Configuration - update these values REGION_NAME = 'your-region' # Change to your region RESOURCE_ID = 'your cluster resource ID' # Change to your RDS resource ID # Initialize encryption client with appropriate commitment policy # This is required for proper operation with the AWS Encryption SDK enc_client = aws_encryption_sdk.EncryptionSDKClient(commitment_policy=CommitmentPolicy.FORBID_ENCRYPT_ALLOW_DECRYPT) # Custom key provider class for decryption # This class is necessary to use the raw data key from KMS with the Encryption SDK class MyRawMasterKeyProvider(RawMasterKeyProvider): provider_id = "BC" def __new__(cls, *args, **kwargs): obj = super(RawMasterKeyProvider, cls).__new__(cls) return obj def __init__(self, plain_key): RawMasterKeyProvider.__init__(self) # Configure the wrapping key with proper algorithm for DAS decryption self.wrapping_key = WrappingKey( wrapping_algorithm=WrappingAlgorithm.AES_256_GCM_IV12_TAG16_NO_PADDING, wrapping_key=plain_key, wrapping_key_type=EncryptionKeyType.SYMMETRIC ) def _get_raw_key(self, key_id): # Return the wrapping key when the Encryption SDK requests it return self.wrapping_key # First decryption step: use the data key to decrypt the payload def decrypt_payload(payload, data_key): # Create a key provider using our decrypted data key my_key_provider = MyRawMasterKeyProvider(data_key) my_key_provider.add_master_key("DataKey") # Decrypt the payload using the AWS Encryption SDK decrypted_plaintext, header = enc_client.decrypt( source=payload, materials_manager=aws_encryption_sdk.materials_managers.default.DefaultCryptoMaterialsManager( master_key_provider=my_key_provider) ) return decrypted_plaintext # Second step: decompress the decrypted data # DAS events are compressed before encryption to save bandwidth def decrypt_decompress(payload, key): decrypted = decrypt_payload(payload, key) # Use zlib with specific window bits for proper decompression return zlib.decompress(decrypted, zlib.MAX_WBITS + 16) # Main Lambda handler function that processes events from Kinesis def lambda_handler(event, context): session = boto3.session.Session() kms = session.client('kms', region_name=REGION_NAME) for record in event['Records']: # Step 1: Get the base64-encoded data from Kinesis payload = base64.b64decode(record['kinesis']['data']) record_data = json.loads(payload) # Step 2: Extract the two encrypted components payload_decoded = base64.b64decode(record_data['databaseActivityEvents']) data_key_decoded = base64.b64decode(record_data['key']) # Step 3: Decrypt the data key using KMS # This is the first level of decryption in the envelope model data_key_decrypt_result = kms.decrypt( CiphertextBlob=data_key_decoded, EncryptionContext={'aws:rds:dbc-id': RESOURCE_ID} ) decrypted_data_key = data_key_decrypt_result['Plaintext'] # Step 4: Use the decrypted data key to decrypt and decompress the events # This is the second level of decryption in the envelope model decrypted_event = decrypt_decompress(payload_decoded, decrypted_data_key) # Step 5: Process the decrypted event # At this point, decrypted_event contains the plaintext JSON of database activity print(decrypted_event) # Additional processing logic would go here # For example, you might: # - Parse the JSON and extract specific fields # - Store events in a database for analysis # - Trigger alerts based on suspicious activities return { 'statusCode': 200, 'body': json.dumps('Processing Complete') } Step 4: Error Handling and Performance Considerations As you implement this solution in production, keep these key factors in mind: Error Handling: KMS permissions: Ensure your Lambda function has the necessary KMS permissions so it can decrypt the data successfully.Encryption context: The context must match exactly (aws:rds:dbc-id) Resource ID: Make sure you're using the correct Aurora cluster resource ID—if it's off, the KMS decryption step will fail. Performance Considerations: Batch size: Configure appropriate Kinesis batch sizes for your Lambda Timeout settings: Decryption operations may require longer timeouts Memory allocation: Processing encrypted streams requires more memory Conclusion Aurora's Database Activity Streams provide powerful auditing capabilities, but the default encryption presents a technical challenge for utilizing this data. By leveraging the AWS Encryption SDK and understanding the envelope encryption model, you can successfully decrypt and process these encrypted streams. The key takeaways from this article are: Aurora DAS uses a two-layer envelope encryption model that requires specialized decryption The AWS Encryption SDK is essential for properly handling this encryption pattern The decryption process involves first decrypting the data key with KMS, then using that key to decrypt the actual events Proper implementation enables you to unlock valuable database activity data for security monitoring and compliance By following this approach, you can build robust solutions that leverage the security benefits of encrypted Database Activity Streams while still gaining access to the valuable insights they contain.

By Shubham Kaushik
Scaling Azure Microservices for Holiday Peak Traffic Using Automated CI/CD Pipelines and Cost Optimization
Scaling Azure Microservices for Holiday Peak Traffic Using Automated CI/CD Pipelines and Cost Optimization

Scaling microservices for holiday peak traffic is crucial to prevent downtime and ensure a seamless user experience. This guide explores Azure DevOps automation, CI/CD pipelines, and cost-optimization strategies to handle high-demand traffic seamlessly. Manual scaling quickly becomes a bottleneck as organizations deploy dozens, sometimes hundreds, of microservices powered by distinct backend services like Cosmos DB, Event Hubs, App Configuration, and Traffic Manager. Multiple teams juggling these components risk costly delays and errors at the worst possible moments. This is where automation comes in: a game-changing solution that transforms complex, error-prone processes into streamlined, efficient operations. In this article, you’ll explore how automated pipelines can not only safeguard your systems during peak traffic but also optimize costs and boost overall performance in this Microservice world. The Challenge in a Microservices World Imagine a project with over 100 microservices, each maintained by different engineering teams. Every service may have its backend components, for example, as shown below: Cosmos DB: Used for storing data with low-latency access and high throughput.Event Hubs: Ingests telemetry and log data from distributed services.App Configuration: Centrally manages application settings and feature flags.Traffic Manager: Routes user traffic to healthy endpoints during failures. Manual Scaling Is Inefficient Coordinating these tasks manually is cumbersome,  especially when production issues arise. With multiple teams, interacting and collaborating on each microservice’s scaling and configuration can be overwhelming. This is where CI/CD pipelines and Infrastructure-as-Code (IaC) automation become crucial. Automation not only reduces human error but also provides a unified approach for rapid, reliable scaling and updates. Figure 1: A system overview showing how the Web App (Presentation Layer) interacts with microservices (Business Logic Layer), which use Cosmos DB, Event Hubs, and App Configuration (Data Layer). The Integration & Traffic Management layer, including Traffic Manager and Azure DevOps CI/CD, handles traffic routing, deployments, and Slack notifications. Understanding Each Component AKS (Azure Kubernetes Service) AKS is a managed Kubernetes service that simplifies deploying, scaling, and managing containerized applications. In a microservices environment, each service can be deployed as a container within AKS, with independent scaling rules and resource allocation. This flexibility enables you to adjust the number of pods based on real-time demand, ensuring that each service has the computing resources it needs. Cosmos DB Azure Cosmos DB is a globally distributed, multi-model NoSQL database service that delivers low latency and high throughput. In a microservices architecture, each service may have its own Cosmos DB instance to handle specific data workloads. Automation scripts can dynamically adjust throughput to meet changing demand, ensuring your service remains responsive even during peak loads. Event Hubs Azure Event Hubs is a high-throughput data streaming service designed to ingest millions of events per second. It’s particularly useful in microservices for collecting logs, telemetry, and real-time analytics data. By automating the scaling of Event Hubs, you ensure that your data ingestion pipeline never becomes a bottleneck,  even when the number of events spikes during high-traffic periods. App Configuration Azure App Configuration is a centralized service that stores configuration settings and feature flags for your applications. In a microservices ecosystem, different services often need unique settings or dynamic feature toggles. Instead of hard-coding these values or updating configurations manually, App Configuration provides a single source of truth that can be updated on the fly. During peak traffic, a microservice can instantly disable resource-heavy features without redeployment. Traffic Manager Azure Traffic Manager is a DNS-based load-balancing solution that directs user traffic based on endpoint health and performance. For microservices, it ensures that requests are automatically rerouted from failing or overloaded endpoints to healthy ones, minimizing downtime and ensuring a seamless user experience, especially during high-stress scenarios like holiday peak traffic. The Traffic Manager ensures disaster recovery by rerouting traffic from a failed region (e.g., East US) to a healthy backup (e.g., West US) in under 30 seconds, thereby minimizing downtime. Figure 2: High-level view of user traffic flowing through Azure Traffic Manager to an AKS cluster with containerized microservices, which interact with Cosmos DB, Event Hubs, and App Configuration for data, logging, and real-time updates. Automating the Process With CI/CD Pipelines Leveraging Azure DevOps CI/CD pipelines is the backbone of this automation. Here’s how each part fits into the overall process: Continuous integration (CI): Every code commit triggers a CI pipeline that builds and tests your application. This immediate feedback loop ensures that only validated changes move forward.Continuous delivery (CD): Once the CI pipeline produces an artifact, the release pipeline deploys it to production. This deployment stage automatically scales resources (like Cosmos DB and Event Hubs), updates configurations, and manages traffic routing. Dynamic variables, secure service connections, and agent configurations are all set up to interact seamlessly with AKS, Cosmos DB, and other services.Service connections and Slack notifications: Secure service connections (using a service account or App Registration) enable your pipeline to interact with AKS and other resources. Integration with Slack provides real-time notifications on pipeline runs, scaling updates, and configuration changes, keeping your teams informed. Figure 3: Component Diagram — A high-level architectural overview showing Azure DevOps, AKS, Cosmos DB, Event Hubs, App Configuration, Traffic Manager, and Slack interconnected. Core Automation Commands and Validation Below are the essential commands or code for each component, along with validation commands that confirm each update was successful. 1. Kubernetes Pod Autoscaling (HPA) Core Commands Shell # Update HPA settings: kubectl patch hpa <deploymentName> -n <namespace> - patch '{"spec": {"minReplicas": <min>, "maxReplicas": <max>}' # Validate update: kubectl get hpa <deploymentName> -n <namespace> -o=jsonpath='{.spec.minReplicas}{"-"}{.spec.maxReplicas}{"\n"}' #Expected Output: 3–10 Bash Script for AKS Autoscaling Here’s a shell script for the CI/CD pipeline. This is an example that can be adapted for other automation tasks using technologies such as Terraform, Python, Java, and others. Shell #!/bin/bash # File: scaling-pipeline-details.sh # Input file format: namespace:deploymentname:min:max echo "Logging all application HPA pod count before update" kubectl get hpa --all-namespaces -o=jsonpath='{range .items[*]}{.metadata.namespace}{":"}{.metadata.name}{":"}{.spec.minReplicas}{":"}{.spec.maxReplicas}{"\n"}{end}' cd $(System.DefaultWorkingDIrectory)$(working_dir) INPUT=$(inputfile) OLDIFS=$IFS IFS=':' [ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; } while read namespace deploymentname min max do echo "Namespace: $namespace - Deployment: $deploymentname - min: $min - max: $max" cp $(template) "patch-template-hpa-sample-temp.json" sed -i "s/<<min>>/$min/g" "patch-template-hpa-sample-temp.json" sed -i "s/<<max>>/$max/g" "patch-template-hpa-sample-temp.json" echo "kubectl patch hpa $deploymentname --patch $(cat patch-template-hpa-sample-temp.json) -n $namespace" kubectl get hpa $deploymentname -n $namespace -o=jsonpath='{.metadata.namespace}{":"}{.metadata.name}{":"}{.spec.minReplicas}{":"}{.spec.maxReplicas}{"%0D%0A"}' >> /app/pipeline/log/hpa_before_update_$(datetime).properties #Main command to patch the scaling configuration kubectl patch hpa $deploymentname --patch "$(cat patch-template-hpa-sample-temp.json)" -n $namespace #Main command to validate the scaling configuration kubectl get hpa $deploymentname -n $namespace -o=jsonpath='{.metadata.namespace}{":"}{.metadata.name}{":"}{.spec.minReplicas}{":"}{.spec.maxReplicas}{"%0D%0A"}' >> /app/pipeline/log/hpa_after_update_$(datetime).properties rm -f "patch-template-hpa-sample-temp.json" "patch-template-hpa-sample-temp.json".bak done < $INPUT IFS=$OLDIFS tempVar=$(cat /app/pipeline/log/hpa_before_update_$(datetime).properties) curl -k --location --request GET "https://slack.com/api/chat.postMessage?token=$(slack_token)&channel=$(slack_channel)&text=------HPA+POD+Count+Before+update%3A------%0D%0ANamespace%3AHPA-Name%3AMinReplicas%3AMaxReplicas%0D%0A${tempVar}&username=<username>&icon_emoji=<emoji>" tempVar=$(cat /app/pipeline/log/hpa_after_update_$(datetime).properties) #below line is optional for slack notification. curl -k --location --request GET "https://slack.com/api/chat.postMessage?token=$(slack_token)&channel=$(slack_channel)&text=------HPA+POD+Count+After+update%3A------%0D%0ANamespace%3AHPA-Name%3AMinReplicas%3AMaxReplicas%0D%0A${tempVar}&username=<username>&icon_emoji=<emoji>" Create file: patch-template-hpa-sample.json JSON {"spec": { "maxReplicas": <<max>>,"minReplicas": <<min>>} 2. Cosmos DB Scaling Core Commands This can be enhanced further in the CI/CD pipeline with different technologies like a shell, Python, Java, etc. Shell # For SQL Database: az cosmosdb sql database throughput update -g <resourceGroup> -a <accountName> -n <databaseName> --max-throughput <newValue> # Validate update: az cosmosdb sql database throughput show -g <resourceGroup> -a <accountName> -n <databaseName> --query resource.autoscaleSettings.maxThroughput -o tsv #Expected Output: 4000 #Input file format: resourceGroup:accountName:databaseName:maxThroughput:dbType:containerName Terraform Code for Cosmos DB Scaling Shell # Terraform configuration for Cosmos DB account with autoscale settings. resource "azurerm_cosmosdb_account" "example" { name = "example-cosmosdb-account" location = azurerm_resource_group.example.location resource_group_name = azurerm_resource_group.example.name offer_type = "Standard" kind = "GlobalDocumentDB" enable_automatic_failover = true consistency_policy { consistency_level = "Session" } } resource "azurerm_cosmosdb_sql_database" "example" { name = "example-database" resource_group_name = azurerm_resource_group.example.name account_name = azurerm_cosmosdb_account.example.name } resource "azurerm_cosmosdb_sql_container" "example" { name = "example-container" resource_group_name = azurerm_resource_group.example.name account_name = azurerm_cosmosdb_account.example.name database_name = azurerm_cosmosdb_sql_database.example.name partition_key_path = "/partitionKey" autoscale_settings { max_throughput = 4000 } } 3. Event Hubs Scaling Core Commands This can be enhanced further in the CI/CD pipeline with different technologies like a shell, Python, Java, etc. Shell # Update capacity: az eventhubs namespace update -g <resourceGroup> -n <namespace> --capacity <newCapacity> --query sku.capacity -o tsv # Validate update: az eventhubs namespace show -g <resourceGroup> -n <namespace> --query sku.capacity -o tsv #Expected Output: 6 4. Dynamic App Configuration Updates Core Commands This can be enhanced further in the CI/CD pipeline with different technologies like a shell, Python, Java, etc. Shell # Export current configuration: az appconfig kv export -n <appconfig_name> --label <label> -d file --path backup.properties --format properties -y # Import new configuration: az appconfig kv import -n <appconfig_name> --label <label> -s file --path <input_file> --format properties -y # Validate update: az appconfig kv export -n <appconfig_name> --label <label> -d file --path afterupdate.properties --format properties -y #Input file format: Key-value pairs in standard properties format (e.g., key=value). 5. Traffic Management and Disaster Recovery (Traffic Switch) Core Commands This can be enhanced further in the CI/CD pipeline with different technologies like a shell, Python, Java, etc. Shell # Update endpoint status: az network traffic-manager endpoint update --endpoint-status <newStatus> --name <endpointName> --profile-name <profileName> --resource-group <resourceGroup> --type <type> --query endpointStatus -o tsv # Validate update: az network traffic-manager endpoint show --name <endpointName> --profile-name <profileName> --resource-group <resourceGroup> --type <type> --query endpointStatus -o tsv #Expected Output: Enabled #Input file format: profileName:resourceGroup:type:status:endPointName Terraform Code for Traffic Manager (Traffic Switch) JSON resource "azurerm_traffic_manager_profile" "example" { name = "example-tm-profile" resource_group_name = azurerm_resource_group.example.name location = azurerm_resource_group.example.location profile_status = "Enabled" traffic_routing_method = "Priority" dns_config { relative_name = "exampletm" ttl = 30 } monitor_config { protocol = "HTTP" port = 80 path = "/" } } resource "azurerm_traffic_manager_endpoint" "primary" { name = "primaryEndpoint" profile_name = azurerm_traffic_manager_profile.example.name resource_group_name = azurerm_resource_group.example.name type = "externalEndpoints" target = "primary.example.com" priority = 1 } resource "azurerm_traffic_manager_endpoint" "secondary" { name = "secondaryEndpoint" profile_name = azurerm_traffic_manager_profile.example.name resource_group_name = azurerm_resource_group.example.name type = "externalEndpoints" target = "secondary.example.com" priority = 2 } Explanation: These Terraform configurations enable autoscaling and efficient resource allocation for Cosmos DB and Traffic Manager. By leveraging IaC, you ensure consistency and optimize costs by provisioning resources dynamically based on demand. How to Reduce Azure Costs With Auto-Scaling Automation improves operational efficiency and plays a key role in cost optimization. In a microservices ecosystem with hundreds of services, even a small reduction in over-provisioned resources can lead to substantial savings over time. By dynamically scaling resources based on demand, you pay only for what you need. By dynamically adjusting resource usage, businesses can significantly reduce cloud costs. Here are concrete examples: Cosmos DB Autoscaling: For instance, if running 4000 RU/s costs $1,000 per month, reducing it to 1000 RU/s during off-peak hours could lower the bill to $400 monthly, leading to $7,200 in annual savings.AKS Autoscaler: Automatically removing unused nodes ensures you only pay for active compute resources, cutting infrastructure costs by 30%. Visualizing the Process: Sequence Diagram To further clarify the workflow, consider including a Sequence Diagram. This diagram outlines the step-by-step process, from code commit to scaling, configuration updates, and notifications, illustrating how automation interconnects these components. For example, the diagram shows: Developer: Commits code, triggering the CI pipeline.CI pipeline: Builds, tests, and publishes the artifact.CD pipeline: Deploys the artifact to AKS, adjusts Cosmos DB throughput, scales Event Hubs, updates App Configuration, and manages Traffic Manager endpoints.Slack: Sends real-time notifications on each step. Such a diagram visually reinforces the process and helps teams quickly understand the overall workflow. Figure 4: Sequence Diagram — A step-by-step flow illustrating the process from code commit through CI/CD pipelines to resource scaling and Slack notifications. Conclusion Automation is no longer a luxury — it’s the cornerstone of resilient and scalable cloud architectures. In this article, I demonstrated how Azure resources such as Cosmos DB, Event Hubs, App Configuration, Traffic Manager, and AKS can be orchestrated with automation using bash shell scripts, Terraform configurations, Azure CLI commands, and Azure DevOps CI/CD pipelines. These examples illustrate one powerful approach to automating microservices operations during peak traffic. While I showcased the Azure ecosystem, the underlying principles of automation are universal. Similar techniques can be applied to other cloud platforms. Whether you’re using AWS with CloudFormation and CodePipeline or Google Cloud with Deployment Manager and Cloud Build, you can design CI/CD workflows that meet your unique needs. Embrace automation to unlock your infrastructure’s full potential, ensuring your applications not only survive high-demand periods but also thrive under pressure. If you found this guide helpful, subscribe to my Medium blog for more insights on cloud automation. Comment below on your experience with scaling applications or share this with colleagues who might benefit! Your feedback is invaluable and helps shape future content, so let’s keep the conversation going. Happy scaling, and may your holiday traffic be ever in your favor! Further Reading and References Azure Kubernetes Service (AKS) Documentation: Guidance on deploying, managing, and scaling containerized applications using Kubernetes.Azure Cosmos DB Documentation: Dive deep into configuring and scaling your Cosmos DB instances.Azure Event Hubs Documentation: Explore high-throughput data streaming, event ingestion, and telemetry.Azure App Configuration Documentation: Best practices for managing application settings and feature flags in a centralized service.Azure Traffic Manager Documentation: Techniques for DNS-based load balancing and proactive endpoint monitoring.Terraform for Azure: Learn how to leverage Infrastructure as Code (IaC) with Terraform to automate resource provisioning and scaling.Azure DevOps Documentation: Understand CI/CD pipelines, automated deployments, and integrations with Azure services.

By Prabhu Chinnasamy
Distributed Consensus: Paxos vs. Raft and Modern Implementations
Distributed Consensus: Paxos vs. Raft and Modern Implementations

Distributed consensus is a fundamental concept in distributed computing that refers to the process by which multiple nodes (servers or computers) in a distributed system agree on a single data value or a sequence of actions, ensuring consistency despite the presence of failures or network partitions. In simpler terms, it's the mechanism that allows independent computers to reach agreement on critical data or operations even when some nodes fail or communication is unreliable. The importance of distributed consensus in today's technology landscape cannot be overstated. It serves as the foundation for: Reliability and Fault tolerance: By requiring agreement among nodes, a consensus algorithm allows the system to keep working correctly even if some servers crash or become unreachable. This ensures there’s no single point of failure and the system can survive node outages. Consistency: Consensus guarantees that all non-faulty nodes have the same view of data or the same sequence of events. This is vital for correctness—for example, in a distributed database, every replica should agree on committed transactions. Coordination: Many coordination tasks in a cluster (such as electing a primary leader or agreeing on a config change) are essentially consensus problems. A robust consensus protocol prevents "split-brain" scenarios by ensuring only one leader is chosen and all nodes agree on who it is. This avoids conflicting decisions and keeps the cluster synchronized. Distributed consensus has found applications across numerous domains: Leader election in fault-tolerant environmentsBlockchain technology for decentralized agreement without central authoritiesDistributed databases to maintain consistency across replicasLoad balancing to efficiently distribute workloads across multiple nodesState machine replication for building reliable distributed services Paxos vs Raft: The Battle for Consensus Dominance When it comes to implementing distributed consensus, two algorithms dominate production systems: Paxos and Raft. Let's examine these algorithms and how they compare. Paxos: The Traditional Consensus Algorithm Paxos, developed by Leslie Lamport in 1998, is foundational to distributed systems research and implementation. It enables a group of computers to reach consensus despite unreliable networks, failure-prone computers, and inaccurate clocks. Paxos has become synonymous with distributed consensus but has been criticized for its complexity and difficulty to understand. In Paxos, the consensus process involves several roles: Proposers: Suggest values to be chosenAcceptors: Vote on proposed valuesLearners: Learn about chosen values The algorithm operates in two main phases: a prepare phase and an accept phase, ensuring safety even when multiple leaders attempt to establish consensus simultaneously. Raft: The Understandable Alternative Raft, introduced by Diego Ongaro and John Ousterhout in 2014, was explicitly designed to solve the same problems as Paxos but with a focus on understandability. The creators titled their paper "In Search of an Understandable Consensus Algorithm," highlighting their primary goal. Raft simplifies the consensus process by: Dividing the problem into leader election, log replication, and safetyUsing a more straightforward approach to leader electionEmploying a strong leader model where all changes flow through the leader Key Differences Between Paxos and Raft Despite serving the same purpose, Paxos and Raft differ in several important ways: Leader Election: Raft only allows servers with up-to-date logs to become leaders, whereas Paxos allows any server to be leader provided it then updates its log to ensure it is up-to-date.Voting Behavior: Paxos followers will vote for any candidate, while Raft followers will only vote for a candidate if the candidate's log is at least as up-to-date as their own.Log Replication: If a leader has uncommitted log entries from a previous term, Paxos will replicate them in the current term, whereas Raft will replicate them in their original term.Complexity vs. Efficiency: While Raft is generally considered more understandable, Paxos can be more efficient in certain scenarios. However, Raft's leader election is surprisingly lightweight compared to Paxos since it doesn't require log entries to be exchanged during the election process. Interestingly, research suggests that much of Raft's purported understandability comes from its clear presentation rather than fundamental differences in the underlying algorithm. Recent Distributed Consensus Protocols: Kafka Raft (KRaft) One of the most significant recent developments in distributed consensus is Apache Kafka Raft (KRaft), which represents a fundamental evolution in Apache Kafka's architecture. What is Kafka Raft? KRaft is a consensus protocol introduced in KIP-500 to remove Apache Kafka's dependency on ZooKeeper for metadata management. This change significantly simplifies Kafka's architecture by consolidating responsibility for metadata within Kafka itself, rather than splitting it between two different systems (ZooKeeper and Kafka). How KRaft Works KRaft operates through a new quorum controller service that replaces the previous controller and utilizes an event-based variant of the Raft consensus protocol. Key aspects of KRaft include: Event-Sourced Storage Model: The quorum controller stores its state using an event-sourced approach, ensuring that internal state machines can always be accurately recreated.Metadata Topic: The event log used to store state (also known as the metadata topic) is periodically condensed by snapshots to prevent unlimited growth.Quick Recovery: If a node pauses due to a network partition, it can quickly catch up by accessing the log when it rejoins, significantly decreasing downtime and improving recovery time.Efficient Leadership Changes: Unlike the ZooKeeper-based controller, the quorum controller doesn't need to load state before becoming active. When leadership changes, the new controller already has all committed metadata records in memory. Benefits of KRaft over Traditional Approaches The adoption of KRaft offers several advantages: Simplified Architecture: By eliminating the need for ZooKeeper, KRaft reduces the complexity of Kafka deployments.Improved Scalability: The new architecture enhances Kafka's ability to scale by removing bottlenecks associated with ZooKeeper.Better Maintainability: With fewer components to manage, Kafka clusters become easier to maintain and operate.Enhanced Performance: The event-driven nature of the KRaft protocol improves metadata management performance compared to the previous RPC-based approach.Faster Recovery: The event-sourced model allows for quicker recovery from failures, improving overall system reliability. Conclusion: The Future of Distributed Consensus As distributed systems continue to evolve and scale, distributed consensus remains a critical foundation for building reliable, fault-tolerant applications. The journey from complex algorithms like Paxos to more understandable alternatives like Raft demonstrates the field's maturation and the industry's focus on practical implementations. The development of specialized consensus protocols like KRaft shows how consensus algorithms are being tailored to specific use cases, optimizing for particular requirements rather than applying one-size-fits-all solutions. This trend is likely to continue as more systems adopt consensus-based approaches for reliability. Looking ahead, several developments are shaping the future of distributed consensus: Simplified Implementations: Following Raft's lead, there's a growing emphasis on making consensus algorithms more accessible and easier to implement correctly.Specialized Variants: Domain-specific consensus protocols optimized for particular use cases, like KRaft for Kafka.Integration into Application Frameworks: Consensus mechanisms are increasingly being built directly into application frameworks rather than requiring separate coordination services.Scalability Improvements: Research continues on making consensus algorithms more efficient at scale, potentially reducing the trade-off between consistency and performance. As distributed systems become more prevalent in our computing infrastructure, understanding and implementing distributed consensus effectively will remain a crucial skill for system designers and developers. Whether through classic algorithms like Paxos, more approachable alternatives like Raft, or specialized implementations like KRaft, distributed consensus will continue to serve as the backbone of reliable distributed systems.

By narendra reddy sanikommu
Enforcing Architecture With ArchUnit in Java
Enforcing Architecture With ArchUnit in Java

You create a well-defined architecture, but how do you enforce this architecture in your code? Code reviews can be used, but wouldn't it be better to verify your architecture automatically? With ArchUnit you can define rules for your architecture by means of unit tests. Introduction The architecture of an application is described in the documentation. This can be a Word document, a PlantUML diagram, a DrawIO diagram, or whatever you like to use. The developers should follow this architecture when building the application. But, we do know that many do not like to read documentation, and therefore, the architecture might not be known to everyone in the team. With the help of ArchUnit, you can define rules for your architecture within a unit test. This is a very convenient way to do so, because the test will fail when an architecture rule is violated. The official documentation and examples of ArchUnit are a good starting point for using ArchUnit. Besides ArchUnit, Taikai will be discussed, which contains some predefined rules for ArchUnit. The sources used in this blog can be found on GitHub. Prerequisites Prerequisites for reading this blog are: Basic knowledge of architecture styles (layered architecture, hexagonal architecture, and so on);Basic knowledge of Maven;Basic knowledge of Java;Basic knowledge of JUnit;Basic knowledge of Spring Boot; Basic Spring Boot App A basic Spring Boot application is used to verify the architecture rules. It is the starting point for every example used and is present in the base package. The package structure is as follows and contains specific packages for the controller, the service, the repository, and the model. Plain Text ├── controller │ └── CustomersController.java ├── model │ └── Customer.java ├── repository │ └── CustomerRepository.java └── service ├── CustomerServiceImpl.java └── CustomerService.java Package Dependency Checks Before getting started with writing the test, the archunit-junit5 dependency needs to be added to the pom. XML <dependency> <groupId>com.tngtech.archunit</groupId> <artifactId>archunit-junit5</artifactId> <version>1.4.0</version> <scope>test</scope> </dependency> The architecture rule to be added will check whether classes that reside in the service package can only be accessed by classes that reside in the controller or service packages. By means of the @AnalyzeClasses annotation, you can determine which packages should be analyzed. The rule itself is annotated with @ArchTest and the rule is written in a very readable way. Java @AnalyzeClasses(packages = "com.mydeveloperplanet.myarchunitplanet.example1") public class MyArchitectureTest { @ArchTest public static final ArchRule myRule = classes() .that().resideInAPackage("..service..") .should().onlyBeAccessed().byAnyPackage("..controller..", "..service.."); } The easiest way is to run this test from within your IDE. You can also run the test by means of Maven. Shell mvn -Dtest=com.mydeveloperplanet.myarchunitplanet.example1.MyArchitectureTest test The test is successful. Add a Util class in the example1.util package, which makes use of the CustomerService class. This is a violation of the architecture rule you just defined. Java public class Util { @Autowired CustomerService customerService; public void doSomething() { // use the CustomerService customerService.deleteCustomer(1L); } } Run the test again, and now it fails with a clear description of what is wrong. Java java.lang.AssertionError: Architecture Violation [Priority: MEDIUM] - Rule 'classes that reside in a package '..service..' should only be accessed by any package ['..controller..', '..service..']' was violated (1 times): Method <com.mydeveloperplanet.myarchunitplanet.example1.util.Util.doSomething()> calls method <com.mydeveloperplanet.myarchunitplanet.example1.service.CustomerService.deleteCustomer(java.lang.Long)> in (Util.java:14) at com.tngtech.archunit.lang.ArchRule$Assertions.assertNoViolation(ArchRule.java:94) at com.tngtech.archunit.lang.ArchRule$Assertions.check(ArchRule.java:86) at com.tngtech.archunit.lang.ArchRule$Factory$SimpleArchRule.check(ArchRule.java:165) at com.tngtech.archunit.lang.syntax.ObjectsShouldInternal.check(ObjectsShouldInternal.java:81) at com.tngtech.archunit.junit.internal.ArchUnitTestDescriptor$ArchUnitRuleDescriptor.execute(ArchUnitTestDescriptor.java:168) at com.tngtech.archunit.junit.internal.ArchUnitTestDescriptor$ArchUnitRuleDescriptor.execute(ArchUnitTestDescriptor.java:151) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) Exclude Test Classes In the example2 package, a CustomerServiceImplTest is added. This test makes use of classes that reside in the services package, but the test itself is located in the example2 package. The same ArchUnit test is used as before. Run the ArchUnit test, and the test fails because CustomerServiceImplTest does not reside in the service package. Java java.lang.AssertionError: Architecture Violation [Priority: MEDIUM] - Rule 'classes that reside in a package '..service..' should only be accessed by any package ['..controller..', '..service..']' was violated (5 times): Method <com.mydeveloperplanet.myarchunitplanet.example2.CustomerServiceImplTest.testCreateCustomer()> calls method <com.mydeveloperplanet.myarchunitplanet.example2.service.CustomerServiceImpl.createCustomer(com.mydeveloperplanet.myarchunitplanet.example2.model.Customer)> in (CustomerServiceImplTest.java:64) Method <com.mydeveloperplanet.myarchunitplanet.example2.CustomerServiceImplTest.testDeleteCustomer()> calls method <com.mydeveloperplanet.myarchunitplanet.example2.service.CustomerServiceImpl.deleteCustomer(java.lang.Long)> in (CustomerServiceImplTest.java:88) Method <com.mydeveloperplanet.myarchunitplanet.example2.CustomerServiceImplTest.testGetAllCustomers()> calls method <com.mydeveloperplanet.myarchunitplanet.example2.service.CustomerServiceImpl.getAllCustomers()> in (CustomerServiceImplTest.java:42) Method <com.mydeveloperplanet.myarchunitplanet.example2.CustomerServiceImplTest.testGetCustomerById()> calls method <com.mydeveloperplanet.myarchunitplanet.example2.service.CustomerServiceImpl.getCustomerById(java.lang.Long)> in (CustomerServiceImplTest.java:53) Method <com.mydeveloperplanet.myarchunitplanet.example2.CustomerServiceImplTest.testUpdateCustomer()> calls method <com.mydeveloperplanet.myarchunitplanet.example2.service.CustomerServiceImpl.updateCustomer(java.lang.Long, com.mydeveloperplanet.myarchunitplanet.example2.model.Customer)> in (CustomerServiceImplTest.java:79) at com.tngtech.archunit.lang.ArchRule$Assertions.assertNoViolation(ArchRule.java:94) at com.tngtech.archunit.lang.ArchRule$Assertions.check(ArchRule.java:86) at com.tngtech.archunit.lang.ArchRule$Factory$SimpleArchRule.check(ArchRule.java:165) at com.tngtech.archunit.lang.syntax.ObjectsShouldInternal.check(ObjectsShouldInternal.java:81) at com.tngtech.archunit.junit.internal.ArchUnitTestDescriptor$ArchUnitRuleDescriptor.execute(ArchUnitTestDescriptor.java:168) at com.tngtech.archunit.junit.internal.ArchUnitTestDescriptor$ArchUnitRuleDescriptor.execute(ArchUnitTestDescriptor.java:151) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) You might want to exclude test classes from the architecture rules checks. This can be done by adding importOptions to the @AnalyzeClasses annotation as follows. Java @AnalyzeClasses(packages = "com.mydeveloperplanet.myarchunitplanet.example2", importOptions = ImportOption.DoNotIncludeTests.class) Run the test again, and now it is successful. Layer Checks ArchUnit provides some built-in checks for different architecture styles like a layered architecture or an onion (hexagonal) architecture. These are present in the Library API. The example3 package is based on the base package code, but in the CustomerRepository, the CustomerService is injected and used in method updateCustomer. This violates the layered architecture principles. Java @Repository public class CustomerRepository { @Autowired private DSLContext dslContext; @Autowired private CustomerServiceImpl customerService; ... public Customer updateCustomer(Long id, Customer customerDetails) { boolean exists = dslContext.fetchExists(dslContext.selectFrom(Customers.CUSTOMERS)); if (exists) { customerService.deleteCustomer(id); dslContext.update(Customers.CUSTOMERS) .set(Customers.CUSTOMERS.FIRST_NAME, customerDetails.getFirstName()) .set(Customers.CUSTOMERS.LAST_NAME, customerDetails.getLastName()) .where(Customers.CUSTOMERS.ID.eq(id)) .returning() .fetchOne(); return customerDetails; } else { throw new RuntimeException("Customer not found"); } } In order to verify any violations, the ArchUnit test makes use of the layeredArchitecture. You define the layers first, and then you add the constraints for each layer. Java @AnalyzeClasses(packages = "com.mydeveloperplanet.myarchunitplanet.example3") public class MyArchitectureTest { @ArchTest public static final ArchRule myRule = layeredArchitecture() .consideringAllDependencies() .layer("Controller").definedBy("..controller..") .layer("Service").definedBy("..service..") .layer("Persistence").definedBy("..repository..") .whereLayer("Controller").mayNotBeAccessedByAnyLayer() .whereLayer("Service").mayOnlyBeAccessedByLayers("Controller") .whereLayer("Persistence").mayOnlyBeAccessedByLayers("Service"); } The test fails because of the lack of access to the service by the Persistence layer. Java java.lang.AssertionError: Architecture Violation [Priority: MEDIUM] - Rule 'Layered architecture considering all dependencies, consisting of layer 'Controller' ('..controller..') layer 'Service' ('..service..') layer 'Persistence' ('..repository..') where layer 'Controller' may not be accessed by any layer where layer 'Service' may only be accessed by layers ['Controller'] where layer 'Persistence' may only be accessed by layers ['Service']' was violated (2 times): Field <com.mydeveloperplanet.myarchunitplanet.example3.repository.CustomerRepository.customerService> has type <com.mydeveloperplanet.myarchunitplanet.example3.service.CustomerServiceImpl> in (CustomerRepository.java:0) Method <com.mydeveloperplanet.myarchunitplanet.example3.repository.CustomerRepository.updateCustomer(java.lang.Long, com.mydeveloperplanet.myarchunitplanet.example3.model.Customer)> calls method <com.mydeveloperplanet.myarchunitplanet.example3.service.CustomerServiceImpl.deleteCustomer(java.lang.Long)> in (CustomerRepository.java:52) at com.tngtech.archunit.lang.ArchRule$Assertions.assertNoViolation(ArchRule.java:94) at com.tngtech.archunit.lang.ArchRule$Assertions.check(ArchRule.java:86) at com.tngtech.archunit.library.Architectures$LayeredArchitecture.check(Architectures.java:347) at com.tngtech.archunit.junit.internal.ArchUnitTestDescriptor$ArchUnitRuleDescriptor.execute(ArchUnitTestDescriptor.java:168) at com.tngtech.archunit.junit.internal.ArchUnitTestDescriptor$ArchUnitRuleDescriptor.execute(ArchUnitTestDescriptor.java:151) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) Taikai The Taikai library provides some predefined rules for various technologies and extends the ArchUnit library. Let's see how this works. First, add the dependency to the pom. XML <dependency> <groupId>com.enofex</groupId> <artifactId>taikai</artifactId> <version>1.8.0</version> <scope>test</scope> </dependency> In the example4 package, you add the following test. As you can see, this test is quite comprehensive. Java class MyArchitectureTest { @Test void shouldFulfillConstraints() { Taikai.builder() .namespace("com.mydeveloperplanet.myarchunitplanet.example4") .java(java -> java .noUsageOfDeprecatedAPIs() .methodsShouldNotDeclareGenericExceptions() .utilityClassesShouldBeFinalAndHavePrivateConstructor() .imports(imports -> imports .shouldHaveNoCycles() .shouldNotImport("..shaded..") .shouldNotImport("org.junit..")) .naming(naming -> naming .classesShouldNotMatch(".*Impl") .methodsShouldNotMatch("^(foo$|bar$).*") .fieldsShouldNotMatch(".*(List|Set|Map)$") .fieldsShouldMatch("com.enofex.taikai.Matcher", "matcher") .constantsShouldFollowConventions() .interfacesShouldNotHavePrefixI())) .logging(logging -> logging .loggersShouldFollowConventions(Logger.class, "logger", List.of(PRIVATE, FINAL))) .test(test -> test .junit5(junit5 -> junit5 .classesShouldNotBeAnnotatedWithDisabled() .methodsShouldNotBeAnnotatedWithDisabled())) .spring(spring -> spring .noAutowiredFields() .boot(boot -> boot .springBootApplicationShouldBeIn("com.enofex.taikai")) .configurations(configuration -> configuration .namesShouldEndWithConfiguration()) .controllers(controllers -> controllers .shouldBeAnnotatedWithRestController() .namesShouldEndWithController() .shouldNotDependOnOtherControllers() .shouldBePackagePrivate()) .services(services -> services .shouldBeAnnotatedWithService() .shouldNotDependOnControllers() .namesShouldEndWithService()) .repositories(repositories -> repositories .shouldBeAnnotatedWithRepository() .shouldNotDependOnServices() .namesShouldEndWithRepository())) .build() .check(); } } Run the test. The test fails because it is not allowed to have classes ending with Impl. The error is similar to that with ArchUnit. Java java.lang.AssertionError: Architecture Violation [Priority: MEDIUM] - Rule 'Classes should not have names matching .*Impl' was violated (1 times): Class <com.mydeveloperplanet.myarchunitplanet.example4.service.CustomerServiceImpl> has name matching '.*Impl' in (CustomerServiceImpl.java:0) at com.tngtech.archunit.lang.ArchRule$Assertions.assertNoViolation(ArchRule.java:94) at com.tngtech.archunit.lang.ArchRule$Assertions.check(ArchRule.java:86) at com.tngtech.archunit.lang.ArchRule$Factory$SimpleArchRule.check(ArchRule.java:165) at com.enofex.taikai.TaikaiRule.check(TaikaiRule.java:66) at com.enofex.taikai.Taikai.lambda$check$1(Taikai.java:70) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at com.enofex.taikai.Taikai.check(Taikai.java:70) at com.mydeveloperplanet.myarchunitplanet.example4.MyArchitectureTest.shouldFulfillConstraints(MyArchitectureTest.java:60) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) However, unlike with ArchUnit, this test fails when the first condition fails. So, you need to fix this one first, run the test again, and the next violation is shown, and so on. I created an improvement issue for this. This issue was fixed and released (v1.9.0) immediately. A new checkAll method is added that checks all rules. Java @Test void shouldFulfillConstraintsCheckAll() { Taikai.builder() .namespace("com.mydeveloperplanet.myarchunitplanet.example4") ... .build() .checkAll(); } Run this test, and all violations are reported. This way, you can fix them all at once. Java java.lang.AssertionError: Architecture Violation [Priority: MEDIUM] - Rule 'All Taikai rules' was violated (7 times): Class <com.mydeveloperplanet.myarchunitplanet.example4.controller.CustomersController> has modifier PUBLIC in (CustomersController.java:0) Class <com.mydeveloperplanet.myarchunitplanet.example4.service.CustomerService> is not annotated with org.springframework.stereotype.Service in (CustomerService.java:0) Class <com.mydeveloperplanet.myarchunitplanet.example4.service.CustomerServiceImpl> does not have name matching '.+Service' in (CustomerServiceImpl.java:0) Class <com.mydeveloperplanet.myarchunitplanet.example4.service.CustomerServiceImpl> has name matching '.*Impl' in (CustomerServiceImpl.java:0) Field <com.mydeveloperplanet.myarchunitplanet.example4.controller.CustomersController.customerService> is annotated with org.springframework.beans.factory.annotation.Autowired in (CustomersController.java:0) Field <com.mydeveloperplanet.myarchunitplanet.example4.repository.CustomerRepository.dslContext> is annotated with org.springframework.beans.factory.annotation.Autowired in (CustomerRepository.java:0) Field <com.mydeveloperplanet.myarchunitplanet.example4.service.CustomerServiceImpl.customerRepository> is annotated with org.springframework.beans.factory.annotation.Autowired in (CustomerServiceImpl.java:0) at com.enofex.taikai.Taikai.checkAll(Taikai.java:102) at com.mydeveloperplanet.myarchunitplanet.example4.MyArchitectureTest.shouldFulfillConstraintsCheckAll(MyArchitectureTest.java:108) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) Taikai: Issues Fixed In the example5 package, all issues of the Taikai test are fixed. This reveals that some checks do not seem to function correctly. Also for this an issue is registered and again a fast reply of the maintainer. It appeared to be some misunderstanding of how the rules are implemented. Reading the documentation a bit more carefully, the rule failOnEmpty checks whether rules are matched or not matched at all. In the latter case, it is possible that a rule is misconfigured. This is the case with fieldsShouldMatch and springBootApplicationShouldBeIn. A new test is added to show this functionality. Java @Test void shouldFulfillConstraintsFailOnEmpty() { Taikai.builder() .namespace("com.mydeveloperplanet.myarchunitplanet.example5") .failOnEmpty(true) ... } The springBootApplicationShouldBeIn should be configured for the package where the main Spring Boot application should be located. Java .spring(spring -> spring .noAutowiredFields() .boot(boot -> boot .springBootApplicationShouldBeIn("com.mydeveloperplanet.myarchunitplanet.example5")) Conclusion ArchUnit is an easy-to-use library that enforces some architectural rules. A developer will be notified of an architectural violation when the ArchUnit test fails. This ensures that the architecture rules are clear to everyone. The Taikai library provides easy-to-use predefined rules that can be applied immediately without too much configuration.

By Gunter Rotsaert DZone Core CORE

Top Testing, Tools, and Frameworks Experts

expert thumbnail

Stelios Manioudakis, PhD

Lead Engineer,
Technical University of Crete

Worked at Siemens and Atos as a software engineer. Worked in the RPA domain with Softomotive for the acquisition by Microsoft. Currently working in the Technical University of Crete. Holds a PhD in Electrical, Electronic and Computer Engineering, University of Newcastle Upon Tyne (UK).
expert thumbnail

Faisal Khatri

Senior Testing Specialist,
Kafaat Business Solutions شركة كفاءات حلول الأعمال

QA with 16+ years experience in Automation as well as Manual Testing. Passionate to learn new technologies. Open Source Contributor, Mentor and Trainer.

The Latest Testing, Tools, and Frameworks Topics

article thumbnail
Deploy Serverless Lambdas Confidently Using Canary
Improve AWS Lambda reliability with canary deployments, gradually release updates, minimize risk, catch bugs early, and deploy faster with confidence.
July 7, 2025
by Prajwal Nayak
· 707 Views
article thumbnail
DevOps Remediation Architecture for Azure CDN From Edgio
The article explains how organizations particularly implement the migration from the retiring Azure CDN from Edgio to Azure Front Door.
June 30, 2025
by Karthik Bojja
· 1,292 Views · 1 Like
article thumbnail
Transform Settlement Process Using AWS Data Pipeline
Modern AWS data pipelines automate ETL for settlement files using S3, Glue, Lambda, and Step Functions, transforming data from raw to curated with full orchestration.
June 30, 2025
by Prabhakar Mishra
· 1,157 Views · 2 Likes
article thumbnail
The Untold Costs of Automation: Are We Sacrificing Security for Speed?
Automation boosts efficiency but can create security risks. Breaches like MOVEit show why oversight and audits are essential to prevent costly failures.
June 27, 2025
by Rasheed Afolabi
· 1,323 Views
article thumbnail
A Beginner’s Guide to Playwright: End-to-End Testing Made Easy
Learn Playwright for reliable, cross-browser E2E testing. Modern, fast, and developer-friendly with TypeScript support, smart selectors, and parallel runs.
June 27, 2025
by Rama Mallika Kadali
· 1,655 Views · 2 Likes
article thumbnail
Mock the File System
Using the real file system in tests might seem convenient at first, but it leads to hidden state, slow execution, and an unmaintainable setup.
June 27, 2025
by Volodya Lombrozo
· 1,026 Views
article thumbnail
How to Banish Anxiety, Lower MTTR, and Stay on Budget During Incident Response
Cutting log ingestion seems thrifty — until an outage happens and suddenly you really need those signals! See how zero-cost ingestion can get rid of MTTR anxiety.
June 26, 2025
by John Vester DZone Core CORE
· 1,351 Views · 1 Like
article thumbnail
Automating E2E Tests With MFA: Streamline Your Testing Workflow
Automating tests with MFA is challenging as it complicates automation with manual code retrieval, which slows development. Use this tool to programmatically automate.
June 26, 2025
by Jonathan Bernales
· 1,826 Views · 2 Likes
article thumbnail
IBM App Connect Enterprise 13 Installation on Azure Kubernetes Service (AKS)
This article provides a step-by step guide which shows how to install IBM App Connect Enterprise 13 in an Azure Kubernetes Service Cluster.
June 25, 2025
by JEAN PAUL TABJA
· 1,468 Views · 1 Like
article thumbnail
How to Test Multi-Threaded and Concurrent Java
This tutorial teaches how to test multi-threaded, concurrent Java using VMLens. An open-source tool to test concurrent Java code in a deterministic and reproducible way.
June 24, 2025
by Thomas Krieger
· 1,783 Views · 6 Likes
article thumbnail
Real-Object Detection at the Edge: AWS IoT Greengrass and YOLOv5
Real-time object detection at the edge using YOLOv5 and AWS IoT Greengrass enables fast, offline, and scalable processing in bandwidth-limited or remote environments.
June 23, 2025
by Anil Jonnalagadda
· 1,647 Views · 13 Likes
article thumbnail
Lessons Learned in Test-Driven Development
Understand when to use TDD, traditional, or hybrid testing methods to improve software quality, streamline development, and align with your project needs.
June 20, 2025
by Arun Vishwanathan
· 1,956 Views
article thumbnail
Exploring Cloud-Based Testing With the Elastic Execution Grid
E2G gives your team cloud testing agents, allowing you to spin up the test agents, run your tests in parallel, and watch real-time logs — all without idle hardware.
June 20, 2025
by John Vester DZone Core CORE
· 1,500 Views · 1 Like
article thumbnail
Integrating Selenium With Amazon S3 for Test Artifact Management
Integrate Selenium with Amazon S3 for scalable, secure, and centralized test artifact storage — boosting efficiency, collaboration, and automation workflows.
June 19, 2025
by Sidharth Shukla
· 1,381 Views · 2 Likes
article thumbnail
How to Achieve SOC 2 Compliance in AWS Cloud Environments
Achieving SOC 2 compliance in AWS requires planning, rigorous implementation, and ongoing commitment to security best practices.
June 17, 2025
by Chase Bolt
· 1,114 Views · 1 Like
article thumbnail
How to Use Testcontainers With ScyllaDB
Learn how to use Testcontainers to create lightweight, throwaway instances of ScyllaDB for testing with hands-on example.
June 16, 2025
by Eduard Knezovic
· 1,068 Views · 1 Like
article thumbnail
Enterprise-Grade Distributed JMeter Load Testing on Kubernetes: A Scalable, CI/CD-Driven DevOps Approach
Run distributed JMeter load tests on Kubernetes environments with CI/CD integration, auto-scaling, and real-time monitoring using InfluxDB and Grafana.
June 11, 2025
by Prabhu Chinnasamy
· 2,429 Views · 43 Likes
article thumbnail
Building Generative AI Services: An Introductory and Practical Guide
Amazon Bedrock simplifies AI app development with serverless APIs, offering Q&A, summarization, and image generation using top models like Claude and Stability AI.
June 11, 2025
by Srinivas Chippagiri DZone Core CORE
· 1,990 Views · 7 Likes
article thumbnail
Software Specs 2.0: An Elaborate Example
Learn how to define precise, secure, and testable requirements for an AI-generated User Authentication Login Endpoint with structured documentation
June 11, 2025
by Stelios Manioudakis, PhD DZone Core CORE
· 6,209 Views · 1 Like
article thumbnail
Integrating Cursor and LLM for BDD Testing With Playwright MCP (Model Context Protocol)
Learn in this article how BDD, AI-powered Cursor, and Playwright MCP simplify test automation, enabling faster, smarter, and more collaborative workflows.
June 11, 2025
by Kailash Pathak DZone Core CORE
· 1,083 Views
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: