DZone Spotlight

Tuesday, April 23 View All Articles »

FastAPI Got Me an OpenAPI Spec Really... Fast

By John Vester

CORE

Readers of my publications are likely familiar with the idea of employing an API First approach to developing microservices. Countless times I have realized the benefits of describing the anticipated URIs and underlying object models before any development begins. In my 30+ years of navigating technology, however, I’ve come to expect the realities of alternate flows. In other words, I fully expect there to be situations where API First is just not possible. For this article, I wanted to walk through an example of how teams producing microservices can still be successful at providing an OpenAPI specification for others to consume without manually defining an openapi.json file. I also wanted to step outside my comfort zone and do this without using Java, .NET, or even JavaScript. Discovering FastAPI At the conclusion of most of my articles I often mention my personal mission statement: “Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.” – J. Vester My point in this mission statement is to make myself accountable for making the best use of my time when trying to reach goals and objectives set at a higher level. Basically, if our focus is to sell more widgets, my time should be spent finding ways to make that possible – steering clear of challenges that have already been solved by existing frameworks, products, or services. I picked Python as the programming language for my new microservice. To date, 99% of the Python code I’ve written for my prior articles has been the result of either Stack Overflow Driven Development (SODD) or ChatGPT-driven answers. Clearly, Python falls outside my comfort zone. Now that I’ve level-set where things stand, I wanted to create a new Python-based RESTful microservice that adheres to my personal mission statement with minimal experience in the source language. That’s when I found FastAPI. FastAPI has been around since 2018 and is a framework focused on delivering RESTful APIs using Python-type hints. The best part about FastAPI is the ability to automatically generate OpenAPI 3 specifications without any additional effort from the developer’s perspective. The Article API Use Case For this article, the idea of an Article API came to mind, providing a RESTful API that allows consumers to retrieve a list of my recently published articles. To keep things simple, let’s assume a given Article contains the following properties: id : Simple, unique identifier property (number) title : The title of the article (string) url : The full URL to the article (string) year : The year the article was published (number) The Article API will include the following URIs: GET /articles : Will retrieve a list of articles GET /articles/{article_id} : Will retrieve a single article by the id property POST /articles : Adds a new article FastAPI in Action In my terminal, I created a new Python project called fast-api-demo and then executed the following commands: Shell $ pip install --upgrade pip $ pip install fastapi $ pip install uvicorn I created a new Python file called api.py and added some imports, plus established an app variable: Python from fastapi import FastAPI, HTTPException from pydantic import BaseModel app = FastAPI() if __name__ == "__main__": import uvicorn uvicorn.run(app, host="localhost", port=8000) Next, I defined an Article object to match the Article API use case: Python class Article(BaseModel): id: int title: str url: str year: int With the model established, I needed to add the URIs…which turned out to be quite easy: Python # Route to add a new article @app.post("/articles") def create_article(article: Article): articles.append(article) return article # Route to get all articles @app.get("/articles") def get_articles(): return articles # Route to get a specific article by ID @app.get("/articles/{article_id}") def get_article(article_id: int): for article in articles: if article.id == article_id: return article raise HTTPException(status_code=404, detail="Article not found") To save myself from involving an external data store, I decided to add some of my recently published articles programmatically: Python articles = [ Article(id=1, title="Distributed Cloud Architecture for Resilient Systems: Rethink Your Approach To Resilient Cloud Services", url="https://dzone.com/articles/distributed-cloud-architecture-for-resilient-syste", year=2023), Article(id=2, title="Using Unblocked to Fix a Service That Nobody Owns", url="https://dzone.com/articles/using-unblocked-to-fix-a-service-that-nobody-owns", year=2023), Article(id=3, title="Exploring the Horizon of Microservices With KubeMQ's New Control Center", url="https://dzone.com/articles/exploring-the-horizon-of-microservices-with-kubemq", year=2024), Article(id=4, title="Build a Digital Collectibles Portal Using Flow and Cadence (Part 1)", url="https://dzone.com/articles/build-a-digital-collectibles-portal-using-flow-and-1", year=2024), Article(id=5, title="Build a Flow Collectibles Portal Using Cadence (Part 2)", url="https://dzone.com/articles/build-a-flow-collectibles-portal-using-cadence-par-1", year=2024), Article(id=6, title="Eliminate Human-Based Actions With Automated Deployments: Improving Commit-to-Deploy Ratios Along the Way", url="https://dzone.com/articles/eliminate-human-based-actions-with-automated-deplo", year=2024), Article(id=7, title="Vector Tutorial: Conducting Similarity Search in Enterprise Data", url="https://dzone.com/articles/using-pgvector-to-locate-similarities-in-enterpris", year=2024), Article(id=8, title="DevSecOps: It's Time To Pay for Your Demand, Not Ingestion", url="https://dzone.com/articles/devsecops-its-time-to-pay-for-your-demand", year=2024), ] Believe it or not, that completes the development for the Article API microservice. For a quick sanity check, I spun up my API service locally: Shell $ python api.py INFO: Started server process [320774] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit) Then, in another terminal window, I sent a curl request (and piped it to json_pp): Shell $ curl localhost:8000/articles/1 | json_pp { "id": 1, "title": "Distributed Cloud Architecture for Resilient Systems: Rethink Your Approach To Resilient Cloud Services", "url": "https://dzone.com/articles/distributed-cloud-architecture-for-resilient-syste", "year": 2023 } Preparing To Deploy Rather than just run the Article API locally, I thought I would see how easily I could deploy the microservice. Since I had never deployed a Python microservice to Heroku before, I felt like now would be a great time to try. Before diving into Heroku, I needed to create a requirements.txt file to describe the dependencies for the service. To do this, I installed and executed pipreqs: Shell $ pip install pipreqs $ pipreqs This created a requirements.txt file for me, with the following information: Plain Text fastapi==0.110.1 pydantic==2.6.4 uvicorn==0.29.0 I also needed a file called Procfile which tells Heroku how to spin up my microservice with uvicorn. Its contents looked like this: Shell web: uvicorn api:app --host=0.0.0.0 --port=${PORT} Let’s Deploy to Heroku For those of you who are new to Python (as I am), I used the Getting Started on Heroku with Python documentation as a helpful guide. Since I already had the Heroku CLI installed, I just needed to log in to the Heroku ecosystem from my terminal: Shell $ heroku login I made sure to check all of my updates in my repository on GitLab. Next, the creation of a new app in Heroku can be accomplished using the CLI via the following command: Shell $ heroku create The CLI responded with a unique app name, along with the URL for app and the git-based repository associated with the app: Shell Creating app... done, powerful-bayou-23686 https://powerful-bayou-23686-2d5be7cf118b.herokuapp.com/ | https://git.heroku.com/powerful-bayou-23686.git Please note – by the time you read this article, my app will no longer be online. Check this out. When I issue a git remote command, I can see that a remote was automatically added to the Heroku ecosystem: Shell $ git remote heroku origin To deploy the fast-api-demo app to Heroku, all I have to do is use the following command: Shell $ git push heroku main With everything set, I was able to validate that my new Python-based service is up and running in the Heroku dashboard: With the service running, it is possible to retrieve the Article with id = 1 from the Article API by issuing the following curl command: Shell $ curl --location 'https://powerful-bayou-23686-2d5be7cf118b.herokuapp.com/articles/1' The curl command returns a 200 OK response and the following JSON payload: JSON { "id": 1, "title": "Distributed Cloud Architecture for Resilient Systems: Rethink Your Approach To Resilient Cloud Services", "url": "https://dzone.com/articles/distributed-cloud-architecture-for-resilient-syste", "year": 2023 } Delivering OpenAPI 3 Specifications Automatically Leveraging FastAPI’s built-in OpenAPI functionality allows consumers to receive a fully functional v3 specification by navigating to the automatically generated /docs URI: Shell https://powerful-bayou-23686-2d5be7cf118b.herokuapp.com/docs Calling this URL returns the Article API microservice using the widely adopted Swagger UI: For those looking for an openapi.json file to generate clients to consume the Article API, the /openapi.json URI can be used: Shell https://powerful-bayou-23686-2d5be7cf118b.herokuapp.com/openapi.json For my example, the JSON-based OpenAPI v3 specification appears as shown below: JSON { "openapi": "3.1.0", "info": { "title": "FastAPI", "version": "0.1.0" }, "paths": { "/articles": { "get": { "summary": "Get Articles", "operationId": "get_articles_articles_get", "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { } } } } } }, "post": { "summary": "Create Article", "operationId": "create_article_articles_post", "requestBody": { "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Article" } } }, "required": true }, "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } }, "/articles/{article_id}": { "get": { "summary": "Get Article", "operationId": "get_article_articles__article_id__get", "parameters": [ { "name": "article_id", "in": "path", "required": true, "schema": { "type": "integer", "title": "Article Id" } } ], "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } } }, "components": { "schemas": { "Article": { "properties": { "id": { "type": "integer", "title": "Id" }, "title": { "type": "string", "title": "Title" }, "url": { "type": "string", "title": "Url" }, "year": { "type": "integer", "title": "Year" } }, "type": "object", "required": [ "id", "title", "url", "year" ], "title": "Article" }, "HTTPValidationError": { "properties": { "detail": { "items": { "$ref": "#/components/schemas/ValidationError" }, "type": "array", "title": "Detail" } }, "type": "object", "title": "HTTPValidationError" }, "ValidationError": { "properties": { "loc": { "items": { "anyOf": [ { "type": "string" }, { "type": "integer" } ] }, "type": "array", "title": "Location" }, "msg": { "type": "string", "title": "Message" }, "type": { "type": "string", "title": "Error Type" } }, "type": "object", "required": [ "loc", "msg", "type" ], "title": "ValidationError" } } } } As a result, the following specification can be used to generate clients in a number of different languages via OpenAPI Generator. Conclusion At the start of this article, I was ready to go to battle and face anyone not interested in using an API First approach. What I learned from this exercise is that a product like FastAPI can help define and produce a working RESTful microservice quickly while also including a fully consumable OpenAPI v3 specification…automatically. Turns out, FastAPI allows teams to stay focused on their goals and objectives by leveraging a framework that yields a standardized contract for others to rely on. As a result, another path has emerged to adhere to my personal mission statement. Along the way, I used Heroku for the first time to deploy a Python-based service. This turned out to require little effort on my part, other than reviewing some well-written documentation. So another mission statement bonus needs to be mentioned for the Heroku platform as well. If you are interested in the source code for this article you can find it on GitLab. Have a really great day! More

Amazon Bedrock: Leveraging Foundation Models With Quarkus and AWS

By Nicolas Duminil

CORE

Bedrock is the new Amazon service that democratizes the users' access to the most up-to-date Foundation Models (FM) made available by some of the highest-ranked AI actors. Their list is quite impressive and it includes but isn't limited to: Titan Claude Mistral AI Llama2 ... Depending on your AWS region, some of these FMs might not be available. For example, as per this post, in my region, which is eu-west-3 the only available FMs are Titan and Mistral AI, but things are changing very fast. So, what's the point of using this service which, apparently, doesn't do anything else than give you access to other FMs? Well, the added value of Amazon Bedrock is to expose via APIs all these FMs, giving you the opportunity to easily integrate generative AI in your applications, through ubiquitous techniques like Serverless or REST. This is what this post is trying to demonstrate. So, let's go! A Generative AI Gateway The project chosen in order to illustrate this post is showing a Generative AI Gateway, where the user is given access to a certain number of FMs, each one being specialized in a different type of use case like, for example, text generation, conversational interfaces, text summarization, image generation, etc. The diagram below shows the general architecture of the sample application. The sample application architecture diagram As you can see, the sample application consists of the following components: A web front-end that allows the user to select an FM, to configure its parameters, like the temperature, the max tokens, etc. and to start the dialog with it, for example asking questions. Our application being a Quarkus one, we are using here the quarkus-primefaces extension. An AWS REST Gateway that aims at exposing dedicated endpoints, depending on the chosen FM. Here we're using the quarkus-amazon-lambda-rest extension which, as you'll see soon, is able to automatically generate the SAM (Serverless Application Model) template required to deploy the REST Gateway to AWS. Several REST endpoints processing POST requests and aiming at invoking the chosen FM via a Bedrock client. The FM responses are brought back to our web application, through the REST Gateway. Let's look now in greater detail at the implementation. The REST Gateway The module bedrock-gateway-api of our Maven multi-module project, implements this component. It consists of a Quarkus RESTeasy API exposing several endpoints which are processing POST requests, having the user interaction as input parameters, and returning the FM responses. The input parameters are strings and, in the case where the user requests result in a really large amount of text, they are input files. The endpoints process these POST requests by converting the associated input into an FM-specific syntax, including the following parameters: The temperature: a real number between 0 and 1 which aims at influencing the FM's predictability. A lower value consists of a more predictable output while a higher one will generate a more random response. The top P: a real number between 0 and 1 whose value is supposed to select the most likely tokens in a distribution. A lower value results in a more limited number of choices for the response. The max-tokens: an integer value representing the maximum number of words that the FM will process for any given request. The Bedrock documentation is at your disposal in order to bring you all the required missing details concerning the parameters above. The Bedrock client used to interact with the FM service is instantiated as shown below: Java private final BedrockRuntimeAsyncClient client = BedrockRuntimeAsyncClient.builder().region(Region.EU_WEST_3).build(); This requires using the following Maven artifact: XML <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>bedrockruntime</artifactId> </dependency> There is a synchronous and an asynchronous Bedrock client and, given the relative latency generally associated with an FM invocation, we have chosen the 2nd one. The Web Front-End The Web front-end is a simple Jakarta Faces application implemented using the PrimeFaces library as well as the Facelets notation in order to define the layouts. If this architecture choice might surprise the reader more to JavaScript/TypeScript-based front-ends, then please have a look at this article. The only special thing to be noticed is the way it uses the Microprofile JAX-RS Client implementation by Quarkus to call the AWS REST Gateway. Java @RegisterRestClient @Path("/bedrock") @Produces(MediaType.TEXT_PLAIN) @Consumes(MediaType.APPLICATION_JSON) public interface BedrockAiEndpoint { @POST @Path("mistral2") Response callMistralFm (BedrockAiInputParam bedrockAiInputParam); @POST @Path("titan2") Response callTitanFm (BedrockAiInputParam bedrockAiInputParam); } This interface is all that's required, Quarkus will generate from it the associated implementation client class. Running the Sample Application The application can be run in two ways: Executing locally the AWS REST Gateway and the associated AWS Lambda endpoints; Executing in the cloud the AWS REST Gateway and the associated AWS Lambda endpoints. Running Locally The shell script named run-local.sh runs locally the AWS REST Gateway together with the associated AWS Lambda endpoints. Here is the code: Shell #!/bin/bash mvn -Durl=http://localhost:3000 clean install sed -i 's/java11/java17/g' bedrock-gateway-api/target/sam.jvm.yaml sam local start-api -t ./bedrock-gateway-api/target/sam.jvm.yaml --log-file ./bedrock-gateway-api/sam.log & mvn -DskipTests=false failsafe:integration-test docker run --name bedrock -p 8082:8082 --rm --network host nicolasduminil/bedrock-gateway-web:1.0-SNAPSHOT ./cleanup-local.sh The first thing that we need to do here is to build the application by running the Maven command. This will result, among others, in a Docker image named nicolasduminil/bedrock-gateway-web which is dedicated to run the web front-end. It also will result in the generation by Quarkus of the SAM template (target\sam.jvm.yam) that creates the AWS CloudFormation stack containing the AWS REST Gateway together with the endpoints AWS Lambda functions. For some reason, the Quarkus quarkus-amazon-lambda-rest extension used for this purpose configures the runtime as being Java 11 and, even after having contacted the support, I didn't find any way to change that. Accordingly, the sed command is used in the script to modify the runtime to be Java 17. Then, the sam cli is used to run the command start-api which will execute locally the gateway with the required endpoints. Next, we are in the position to run the integration tests, on behalf of the Maven failsafe plugin. We couldn't do it while initially running the build as the local stack wasn't deployed yet. Last but not least, the script starts a Docker container running the nicolasduminil/bedrock-gateway-web image, created previously by the quarkus-container-image-jib extension. This is our front end. Now, in order to test it, you can jump to the next section which explains how. Running in the Cloud The script named `deploy.sh`, shown below, deploys in the cloud our application: Shell #!/bin/bash mvn -pl bedrock-gateway-api -am clean install sed -i 's/java11/java17/g' bedrock-gateway-api/target/sam.jvm.yaml RANDOM=$$ BUCKET_NAME=bedrock-gateway-bucket-$RANDOM STACK_NAME=bedrock-gateway-stack echo $BUCKET_NAME > bucket-name.txt aws s3 mb s3://$BUCKET_NAME sam deploy -t bedrock-gateway-api/src/main/resources/template.yaml --s3-bucket $BUCKET_NAME --stack-name $STACK_NAME --capabilities CAPABILITY_IAM API_ENDPOINT=$(aws cloudformation describe-stacks --stack-name $STACK_NAME --query 'Stacks[0].Outputs[0].OutputValue' --output text) mvn -pl bedrock-gateway-web -Durl=$API_ENDPOINT clean install docker run --name bedrock -p 8082:8082 --rm --network host nicolasduminil/bedrock-gateway-web:1.0-SNAPSHOT This time things are a bit more complicated. The Maven build in the script's first line uses the -pl switch to select only the bedrock-gateway-api module. This is because, in this case, we don't know in advance the AWS RESY Gateway URL, which the other module, bedrock-gateway-web needs in order to it the Microprofile JAX-RS client. Next, the sed command serves the same purposes as previously but, in order to deploy our stack in the cloud, we need an S3 bucket. And since the S3 bucket names have to be unique worldwide, we need to generate them randomly and store them in a text file, such that to be able to find them later, when it comes to destroying it. Now, it's time to deploy our CloudFormation stack. Please notice the way we catch the associated URL, by using the --query and the --output option. This is the moment to build the bedrock-gateway-web module as we have now the AWS REST Gateway URL, which we're passing as an environment variable, via the -D option of Maven. At this point, we only have to start our Docker container and start testing. Testing the Application In order to test the application, be it locally or in the cloud, proceed as follows: Clone the repository: Shell $ git clone https://github.com/nicolasduminil/bedrock-gateway.git cdin the root directory: Shell $ cd bedrock-gateway Run the start script (run-local.sh or deploy.sh). The execution might take a while, especially if this is the first time you're running it. Fire your preferred browser to http://localhost:8082. You'll be presented with the screen below: Using the menu bar, select the Titan sandbox. A new screen will be presented to you, as shown below. Using the sliders, configure as you wish the parameters Temperature, Top P and Max tokens. Then type in the text area labeled Prompt your question the chosen FM. Its response will display in the rightmost text area labeled Response. Please use different combinations of parameters to notice the differences between the two FM responses. And in the case you're testing in the cloud, don't forget to run the script cleanup.sh when finished, such that to avoid being invoiced. Have fun! More

Trend Report

Enterprise AI

Artificial intelligence (AI) has continued to change the way the world views what is technologically possible. Moving from theoretical to implementable, the emergence of technologies like ChatGPT allowed users of all backgrounds to leverage the power of AI. Now, companies across the globe are taking a deeper dive into their own AI and machine learning (ML) capabilities; they’re measuring the modes of success needed to become truly AI-driven, moving beyond baseline business intelligence goals and expanding to more innovative uses in areas such as security, automation, and performance.In DZone’s Enterprise AI Trend Report, we take a pulse on the industry nearly a year after the ChatGPT phenomenon and evaluate where individuals and their organizations stand today. Through our original research that forms the “Key Research Findings” and articles written by technical experts in the DZone Community, readers will find insights on topics like ethical AI, MLOps, generative AI, large language models, and much more.

Refcard #395

Open Source Migration Practices and Patterns

By Nuwan Dias

CORE

Open Source Migration Practices and Patterns

Refcard #171

MongoDB Essentials

By Abhishek Gupta

CORE

Exploring Decision Trees: A Beginner's Guide

If you're eager to learn or understand decision trees, I invite you to explore this article. Alternatively, if decision trees aren't your current focus, you may opt to scroll through social media. About Decision Trees Figure 1: Simple Decision tree The image above shows an example of a simple decision tree. Decision trees are tree-shaped diagrams used for making decisions based on a series of logical conditions. In a decision tree, each node represents a decision statement, and the tree proceeds to make a decision based on whether the given statement is true or false. There are two main types of decision trees: Classification trees and Regression trees. A Classification tree categorizes problems by classifying the output of the decision statement into categories using if-else logical conditions. Conversely, a Regression tree classifies the output into numeric values. In Figure 2, the topmost node of a decision tree is called the Root node, while the nodes following the root node are referred to as Internal nodes or branches. These branches are characterized by arrows pointing towards and away from them. At the bottom of the tree are the Leaf nodes, which carry the final classification or decision of the tree. Leaf nodes are identifiable by arrows pointing to them, but not away from them. Figure 2: Nodes of a Decision tree Primary Objective of Decision Trees The primary objective of a decision tree is to partition the given data into subsets in a manner that maximizes the purity of the outcomes. Advantages of Decision Trees Simplicity: Decision trees are straightforward to understand, interpret, and visualize. Minimal data preparation: They require minimal effort for data preparation compared to other algorithms. Handling of data types: Decision trees can handle both numeric and categorical data efficiently. Robustness to non-linear parameters: Non-linear parameters have minimal impact on the performance of decision trees. Disadvantages of Decision Trees Overfitting: Decision trees may overfit the training data, capturing noise and leading to poor generalization on unseen data. High variance: The model may become unstable with small variations in the training data, resulting in high variance. Low bias, high complexity: Highly complex decision trees have low bias, making them prone to difficulties in generalizing new data. Important Terms in Decision Trees Below are important terms that are also used for measuring impurity in decision trees: 1. Entropy Entropy is a measure of randomness or unpredictability in a dataset. It quantifies the impurity of the dataset. A dataset with high entropy contains a mix of different classes or categories, making predictions more uncertain. Example: Consider a dataset containing data from various animals as in Figure 3. If the dataset includes a diverse range of animals with no clear patterns or distinctions, it has high entropy. Figure 3: Animal datasets 2. Information Gain Information gain is the measure of the decrease in entropy after splitting the dataset based on a particular attribute or condition. It quantifies the effectiveness of a split in reducing uncertainty. Example: When we split the data into subgroups based on specific conditions (e.g., features of the animals) like in Figure 3, we calculate information gain by subtracting the entropy of each subgroup from the entropy before the split. Higher information gain indicates a more effective split that results in greater homogeneity within subgroups. 3. Gini Impurity Gini impurity is another measure of impurity or randomness in a dataset. It calculates the probability of misclassifying a randomly chosen element if it were randomly labeled according to the distribution of labels in the dataset. In decision trees, Gini impurity is often used as an alternative to entropy for evaluating splits. Example: Suppose we have a dataset with multiple classes or categories. The Gini impurity is high when the classes are evenly distributed or when there is no clear separation between classes. A low Gini impurity indicates that the dataset is relatively pure, with most elements belonging to the same class. Classifications and Variations Implementation in Python The following is used to predict the Lung_cancer of the patients. 1. Importing necessary libraries for data analysis and visualization in Python: Python import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns # to ensure plots are displayed inline in Notebook %matplotlib inline # Set Seaborn style for plots sns.set_style("whitegrid") # Set default Matplotlib style plt.style.use("fivethirtyeight") 2. Uploading the CSV file containing the data and loading: Python import pandas as pd # Load the data from the CSV file df = pd.read_csv('survey_lung_cancer.csv') Python df.head() # Displaying first five rows of the dataframe EDA (Exploratory Data Analysis): Python sns.countplot(x='LUNG_CANCER', data=df) # Count plot using Seaborn # to visualize the distribution of values in "LUNG_CANCER" column Python # title AGE from matplotlib import pyplot as plt df['AGE'].plot(kind='hist', bins=20, title='AGE') plt.gca().spines[['top', 'right',]].set_visible(False) 3. Iterating through columns, identifying categorical columns, and appending: Python categorical_col = [] for column in df.columns: if df[column].dtype == object and len(df[column].unique()) <= 50: categorical_col.append(column) df['LUNG_CANCER'] = df.LUNG_CANCER.astype("category").cat.codes 4. Removing the column "LUNG_CANCER" for further processing: Python categorical_col.remove('LUNG_CANCER') 5. Encoding categorical variables using LabelEncoder: Python from sklearn.preprocessing import LabelEncoder # creating an instance of the LabelEncoder class # LabelEncoder will be used to transform categorical values into numerical labels label = LabelEncoder() for column in categorical_col: df[column] = label.fit_transform(df[column]) 6. Dataset splitting for Machine Learning, train_test_split: Python from sklearn.model_selection import train_test_split # X contains the features (all columns except 'LUNG_CANCER') # y contains the target variable ('LUNG_CANCER') from the DataFrame df X = df.drop('LUNG_CANCER', axis=1) y = df.LUNG_CANCER # performing the Split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) 7. Function for model evaluation and reporting: Overall, the function below serves as a convenient tool for assessing the performance of classification models and generating detailed reports, facilitating model evaluation and interpretation. Python # import functions from scikit-learn for model evaluation from sklearn.metrics import accuracy_score, confusion_matrix, classification_report # clf: The classifier model to be evaluated # X_train, y_train: The features and target variable of the training set # X_test, y_test: The features and target variable of the testing set def print_score(clf, X_train, y_train, X_test, y_test, train=True): if train: pred = clf.predict(X_train) clf_report = pd.DataFrame(classification_report(y_train, pred, output_dict=True)) print("Train Result:\n_________________________") print(f"Accuracy Score: {accuracy_score(y_train, pred) * 100:.2f}%") print("_________________________") print(f"CLASSIFICATION REPORT:\n{clf_report}") print("_________________________________________________________________________") print(f"Confusion Matrix: \n {confusion_matrix(y_train, pred)}\n") elif train==False: pred = clf.predict(X_test) clf_report = pd.DataFrame(classification_report(y_test, pred, output_dict=True)) print("\nTest Result:\n_________________________") print(f"Accuracy Score: {accuracy_score(y_test, pred) * 100:.2f}%") print("_________________________") print(f"CLASSIFICATION REPORT:\n{clf_report}") print("_________________________________________________________________________") print(f"Confusion Matrix: \n {confusion_matrix(y_test, pred)}\n") Training and evaluation of decision tree classifier: Overall, this code provides a comprehensive evaluation of the decision tree classifier's performance on both the training and testing sets, including the accuracy score, classification report, and confusion matrix for each set. During the training process, the decision tree algorithm uses entropy and information gain to recursively split nodes and build a tree that maximizes information gain at each step. Python from sklearn.tree import DecisionTreeClassifier tree_clf = DecisionTreeClassifier(random_state=42) tree_clf.fit(X_train, y_train) print_score(tree_clf, X_train, y_train, X_test, y_test, train=True) print_score(tree_clf, X_train, y_train, X_test, y_test, train=False) The results above indicate that the decision tree classifier achieved high accuracy and performance on the training set, with some level of overfitting as evident from the difference in performance between the training and testing sets. While the classifier performed well on the testing set, there is room for improvement, particularly in terms of reducing false positives and false negatives. Further tuning of hyperparameters or exploring other algorithms may help improve generalization performance. 8. Visualization of decision tree classifier: Python # Importing Dependencies # Image is used to display images in the IPython environment # StringIO is used to create a file-like object in memory # export_graphviz is used to export the decision tree in Graphviz DOT format # pydot is used to interface with the Graphviz library from IPython.display import Image from six import StringIO from sklearn.tree import export_graphviz import pydot features = list(df.columns) features.remove("LUNG_CANCER") Python dot_data = StringIO() export_graphviz(tree_clf, out_file=dot_data, feature_names=features, filled=True) graph = pydot.graph_from_dot_data(dot_data.getvalue()) Image(graph[0].create_png()) 9. Training and evaluation of Random Forest classifier: Python from sklearn.ensemble import RandomForestClassifier # Creating an instance of the Random Forest classifier with n_estimators=100 # which specifies the number of decision trees in the forest rf_clf = RandomForestClassifier(n_estimators=100) rf_clf.fit(X_train, y_train) print_score(rf_clf, X_train, y_train, X_test, y_test, train=True) print_score(rf_clf, X_train, y_train, X_test, y_test, train=False) This code below will generate heatmaps for both the training and testing sets' confusion matrices. The heatmaps use different shades to represent the counts in the confusion matrix. The diagonal elements (true positives and true negatives) will have higher values and appear lighter, while off-diagonal elements (false positives and false negatives) will have lower values and appear darker. Python import seaborn as sns import matplotlib.pyplot as plt # Create heatmap for training set plt.figure(figsize=(8, 6)) sns.heatmap(cm_train, annot=True, fmt='d', cmap='viridis', annot_kws={"size": 16}) plt.title('Confusion Matrix for Training Set') plt.xlabel('Predicted labels') plt.ylabel('True labels') plt.show() # Create heatmap for testing set plt.figure(figsize=(8, 6)) sns.heatmap(cm_test, annot=True, fmt='d', cmap='plasma', annot_kws={"size": 16}) plt.title('Confusion Matrix for Testing Set') plt.xlabel('Predicted labels') plt.ylabel('True labels') plt.show() XGBoost for Classification Python from xgboost import XGBClassifier from sklearn.metrics import accuracy_score # Instantiate XGBClassifier xgb_clf = XGBClassifier() # Train the classifier xgb_clf.fit(X_train, y_train) # Predict on the testing set y_pred = xgb_clf.predict(X_test) # Evaluate accuracy accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy) The accuracy above indicates that the model's predictions align closely with the actual class labels, demonstrating its effectiveness in distinguishing between the classes. This code below will generate a bar plot showing the relative importance of the top features in the XGBoost model. The importance is typically calculated based on metrics such as gain, cover, or frequency of feature usage across all trees in the ensemble. Python from xgboost import plot_importance import matplotlib.pyplot as plt # Plot feature importance plt.figure(figsize=(10, 6)) plot_importance(xgb_clf, max_num_features=10) # Specify the maximum number of features to show plt.show() 10. Plotting the first tree in the XGBoost model: Python from xgboost import plot_tree # Plot the first tree plt.figure(figsize=(10, 20)) plot_tree(xgb_clf, num_trees=0, rankdir='TB') # Specify the tree number to plot plt.show() Conclusion In conclusion, this article gives an idea about how decision trees and their advanced variants like Random Forest and XGBoost offer powerful tools for classification and regression machine learning tasks. Through this journey, we've explored the fundamental concepts of decision trees, including entropy, information gain, and Gini impurity, which form the basis of their decision-making process. As we continue to delve deeper into the realm of machine learning, the versatility and effectiveness of decision trees and their variants underscore their significance in solving real-world problems across diverse domains. Whether it's classifying medical conditions, predicting customer behavior, or optimizing business processes, decision trees remain a cornerstone in the arsenal of machine learning techniques, driving innovation and progress in the field.

By Prokshitha Polemoni

Exploring Reactive Programming in Kotlin Coroutines With Spring Boot: A Comparison With WebFlux

Reactive programming has become increasingly popular in modern software development, especially in building scalable and resilient applications. Kotlin, with its expressive syntax and powerful features, has gained traction among developers for building reactive systems. In this article, we’ll delve into reactive programming using Kotlin Coroutines with Spring Boot, comparing it with WebFlux, another choice for reactive programming yet more complex in the Spring ecosystem. Understanding Reactive Programming Reactive programming is a programming paradigm that deals with asynchronous data streams and the propagation of changes. It focuses on processing streams of data and reacting to changes as they occur. Reactive systems are inherently responsive, resilient, and scalable, making them well-suited for building modern applications that need to handle high concurrency and real-time data. Kotlin Coroutines Kotlin Coroutines provides a way to write asynchronous, non-blocking code in a sequential manner, making asynchronous programming easier to understand and maintain. Coroutines allow developers to write asynchronous code in a more imperative style, resembling synchronous code, which can lead to cleaner and more readable code. Kotlin Coroutines vs. WebFlux Spring Boot is a popular framework for building Java and Kotlin-based applications. It provides a powerful and flexible programming model for developing reactive applications. Spring Boot’s support for reactive programming comes in the form of Spring WebFlux, which is built on top of Project Reactor, a reactive library for the JVM. Both Kotlin Coroutines and WebFlux offer solutions for building reactive applications, but they differ in their programming models and APIs. 1. Programming Model Kotlin Coroutines: Kotlin Coroutines use suspend functions and coroutine builders like launch and async to define asynchronous code. Coroutines provide a sequential, imperative style of writing asynchronous code, making it easier to understand and reason about. WebFlux: WebFlux uses a reactive programming model based on the Reactive Streams specification. It provides a set of APIs for working with asynchronous data streams, including Flux and Mono, which represent streams of multiple and single values, respectively. 2. Error Handling Kotlin Coroutines: Error handling in Kotlin Coroutines is done using standard try-catch blocks, making it similar to handling exceptions in synchronous code. WebFlux: WebFlux provides built-in support for error handling through operators like onErrorResume and onErrorReturn, allowing developers to handle errors in a reactive manner. 3. Integration With Spring Boot Kotlin Coroutines: Kotlin Coroutines can be seamlessly integrated with Spring Boot applications using the spring-boot-starter-web dependency and the kotlinx-coroutines-spring library. WebFlux: Spring Boot provides built-in support for WebFlux, allowing developers to easily create reactive RESTful APIs and integrate with other Spring components. Show Me the Code The Power of Reactive Approach Over Imperative Approach The provided code snippets illustrate the implementation of a straightforward scenario using both imperative and reactive paradigms. This scenario involves two stages, each taking 1 second to complete. In the imperative approach, the service responds in 2 seconds as it executes both stages sequentially. Conversely, in the reactive approach, the service responds in 1 second by executing each stage in parallel. However, even in this simple scenario, the reactive solution exhibits some complexity, which could escalate significantly in real-world business scenarios. Here’s the Kotlin code for the base service: Kotlin @Service class HelloService { fun getGreetWord() : Mono<String> = Mono.fromCallable { Thread.sleep(1000) "Hello" } fun formatName(name:String) : Mono<String> = Mono.fromCallable { Thread.sleep(1000) name.replaceFirstChar { it.uppercase() } } } Imperative Solution Kotlin fun greet(name:String) :String { val greet = helloService.getGreetWord().block(); val formattedName = helloService.formatName(name).block(); return "$greet $formattedName" } Reactive Solution Kotlin fun greet(name:String) :Mono<String> { val greet = helloService.getGreetWord().subscribeOn(Schedulers.boundedElastic()) val formattedName = helloService.formatName(name).subscribeOn(Schedulers.boundedElastic()) return greet .zipWith(formattedName) .map { it -> "${it.t1} ${it.t2}" } } In the imperative solution, the greet function awaits the completion of the getGreetWord and formatName methods sequentially before returning the concatenated result. On the other hand, in the reactive solution, the greet function uses reactive programming constructs to execute the tasks concurrently, utilizing the zipWith operator to combine the results once both stages are complete. Simplifying Reactivity With Kotlin Coroutines To simplify the complexity inherent in reactive programming, Kotlin’s coroutines provide an elegant solution. Below is a Kotlin coroutine example demonstrating the same scenario discussed earlier: Kotlin @Service class CoroutineHelloService() { suspend fun getGreetWord(): String { delay(1000) return "Hello" } suspend fun formatName(name: String): String { delay(1000) return name.replaceFirstChar { it.uppercase() } } fun greet(name:String) = runBlocking { val greet = async { getGreetWord() } val formattedName = async { formatName(name) } "${greet.await()} ${formattedName.await()}" } } In the provided code snippet, we leverage Kotlin coroutines to simplify reactive programming complexities. The HelloServiceCoroutine class defines suspend functions getGreetWord and formatName, which simulates asynchronous operations using delay. The greetCoroutine function demonstrates an imperative solution using coroutines. Within a runBlocking coroutine builder, it invokes suspend functions sequentially to retrieve the greeting word and format the name, finally combining them into a single greeting string. Conclusion In this exploration, we compared reactive programming in Kotlin Coroutines with Spring Boot to WebFlux. Kotlin Coroutines offer a simpler, more sequential approach, while WebFlux, based on Reactive Streams, provides a comprehensive set of APIs with a steeper learning curve. Code examples demonstrated how reactive solutions outperform imperative ones by leveraging parallel execution. Kotlin Coroutines emerged as a concise alternative, seamlessly integrated with Spring Boot, simplifying reactive programming complexities. In summary, Kotlin Coroutines excels in simplicity and integration, making them a compelling choice for developers aiming to streamline reactive programming in Spring Boot applications.

By Dursun Koç

CORE

Cell-Based Architecture: Comprehensive Guide

Origin of Cell-Based Architecture In the rapidly evolving domain of digital services, the need for scalable and resilient architectures (the ability of the system to recover from a failure quickly) has peaked. The introduction of cell-based architecture marks a pivotal shift tailored to meet the surging demands of hyper-scaling (architecture's ability for rapid scaling in response to fluctuating demand). This methodology, essential for rapid scaling in response to fluctuating demands, has become the foundation for digital success. It's a strategy that empowers tech behemoths like Amazon and Facebook, along with service platforms such as DoorDash, to skillfully navigate the tidal waves of digital traffic during peak moments and ensure service to millions of users worldwide without a hitch. Consider the surge Amazon faces on Prime Day or the global traffic spike Facebook navigates during significant events. Similarly, DoorDash's quest to flawlessly handle a flood of orders showcases a recurring theme: the critical need for an architecture that scales vertically and horizontally — expanding capacity without sacrificing system integrity or the user experience. In the current landscape, where startups frequently encounter unprecedented growth rates, the dream of scaling quickly can become a nightmare of scalability issues. Hypergrowth — a rapid expansion that surpasses expectations — presents a formidable challenge, risking a company's collapse if it fails to scale efficiently. This challenge birthed the concept of hyperscaling, emphasizing an architecture's nimbleness in adapting and growing to meet dynamic demands. Essential to this strategy is extensive parallelization and rigorous fault isolation, ensuring companies can scale without succumbing to the pitfalls of rapid growth. Cell-based architecture emerges as a beacon for applications and services where downtime is not an option. In scenarios where every second of inactivity spells significant reputational or financial loss, this architectural paradigm proves invaluable. It is especially crucial for: Applications requiring uninterrupted operation to ensure customer satisfaction and maintain business continuity. Financial services vital for maintaining economic stability. Ultra-scale systems where failure is an unthinkable option. Multi-tenant services requiring segregated resources for specific clients. This architectural innovation was developed in direct response to the increasing need for modern, rapidly expanding digital services. It provides a scalable, resilient framework supporting continuous service delivery and operational superiority. Understanding Cell-Based Architecture What Exactly Is Cell-Based Architecture? Cell-based architecture is a modern approach to creating digital services that are both scalable and resilient, taking cues from the principles of distributed systems and microservices design patterns. This architecture breaks down an extensive system into smaller, independent units called cells. Each cell is self-sufficient, containing a specific segment of the system's functionality, data storage, compute, application logic, and dependencies. This modular setup allows each cell to be scaled, deployed, and managed independently, enhancing the system's ability to grow and recover from failures without widespread impact. Drawing an analogy to urban planning, consider cell-based architecture akin to a well-designed metropolis where each neighborhood operates autonomously, equipped with its services and amenities, yet contributes to the city's overall prosperity. In times of disruption, such as a power outage or a water main break, only the affected neighborhood experiences downtime while the rest of the city thrives. Just as a single neighborhood can experience disruption without paralyzing the entire city, a cell encountering an issue in this architectural framework does not trigger a system-wide failure. This ensures the digital service remains robust and reliable, maintaining high uptime and resilience. Cell-based architecture builds scalable and robust digital services by breaking down an extensive system into smaller, independent units called cells. Each cell is self-contained with its own data storage and computing power similar to how neighborhoods work in a city. They operate independently, so if one cell has a problem, it doesn't affect the rest of the system. This design helps improve the system's stability and ability to grow without causing widespread issues. Fig. 1: Cell-Based Architecture Key Components Cell: Akin to neighborhoods, cells are the foundational building blocks of this architecture. Each cell is an autonomous microservice cluster with resources capable of handling a subset of service responsibilities. A cell is a stand-alone version of the application with its own computing power, load balancer, and databases. This setup allows each cell to operate independently, making it possible to deploy, monitor, and maintain them separately. This independence means that if one cell runs into problems, it doesn't affect the others, which helps the system to scale effectively and stay robust. Cell Router: Cell Routers play a critical role similar to a city's traffic management system. They dynamically route requests to the most appropriate cell based on factors such as load, geographic location, or specific service requirements. By efficiently balancing the load across various cells, cell routers ensure that each request is processed by the cell best suited to handle it, optimizing system performance and the user experience, much like how traffic lights and signs direct the flow of vehicles to ensure smooth transit within a city. Inter-Cell Communication Layer: Despite the autonomy of individual cells, cooperation between them is essential for handling tasks across the system. The Inter-Cell Communication Layer facilitates secure and efficient message exchange between cells. This layer acts as the public transportation system of our city analogy, connecting different neighborhoods (cells) to ensure seamless collaboration and unified service delivery across the entire architecture. It ensures that even as cells operate independently, they can still work together effectively, mirroring how different parts of a city are connected yet function cohesively. Control Plane: The control plane is a critical component of cell-based architecture, acting as the central hub for administrative operations. It oversees tasks such as setting up new cells (provisioning), shutting down existing cells (de-provisioning), and moving customers between cells (migrating). This ensures that the infrastructure remains responsive to the system's and its users' needs, allowing for dynamic resource allocation and seamless service continuity. Why and When to Use Cell-Based Architecture? Why Use It? Cell-based architecture offers a robust framework for efficiently scaling digital services, guaranteeing their resilience and adaptability during expansion. Below is a breakdown of its advantages: Higher Scalability: By defining and managing the capacity of each cell, you can add more cells to scale out (handle growth by adding more system components, such as databases and servers, and spreading the workload evenly). This avoids hitting the resource limits that come with scaling up (accommodating growth by increasing the size of a system's component, such as a database, server, or subsystem). As demand grows, you add more cells, each a contained unit with known capacities, making the system inherently scalable. Safer Deployments: Deployments and rollbacks are smoother with cells. You can deploy changes to one cell at a time, minimizing the impact of any issues. Canary cells can be used to test new deployments under actual conditions with minimal risk, providing a safety net for broader deployment. Easy Testability: Testing large, spread-out systems can be challenging, especially as they get bigger. However, with cell-based architecture, each cell is kept to a manageable size, making it much simpler to test how they behave at their largest capacity. Testing a whole big service can be too expensive and complex. However, testing just one cell is doable because you can simulate the most significant amount of work the cell can handle, similar to the most crucial job a single customer might give your application. This makes it practical and cost-effective to ensure each cell runs smoothly. Lower Blast Radius: Cell-based architecture limits the spread of failures by isolating issues within individual cells, much like neighborhoods in a city. This division ensures that a problem in one cell doesn't affect the entire system, maintaining overall functionality. Each cell operates independently, minimizing any single incident's impact area, or "blast radius," akin to the regional isolation seen in large-scale services. This setup enhances system resilience by keeping disruptions contained and preventing widespread outages.Fig. 2: Cell-based architecture services exhibit enhanced resilience to failures and feature a reduced blast radius compared to traditional services Improved Reliability and Recovery Higher Mean Time Between Failure (MTBF): Cell-based architecture increases the system's reliability by reducing how often problems occur. This design keeps each cell small and manageable, allowing for regular checks and maintenance, smoothing operations and making them more predictable. With customers distributed across different cells, any issues affect only a limited set of requests and users. Changes are tested on just a few cells at a time, making it easy to revert without widespread impact. For example, if you have customers divided across ten cells, a problem in one cell affects only 10% of your customers. This controlled approach to managing changes and addressing issues quickly means the system experiences fewer disruptions, leading to a more stable and reliable service. Lower Mean Time to Recovery (MTTR): Recovery is quicker and more straightforward with cells since you deal with a more minor, contained issue rather than a system-wide problem. Higher Availability: Cell-based architecture can lead to fewer and shorter failures, improving the overall uptime of your service. Even though there might be more potential points of failure (each cell could theoretically fail), the impact of each failure is significantly reduced, and they're easier to fix. When to Use It? Here's a brief guide to help you understand when it's advantageous to use this architectural strategy: High-Stakes Applications: If downtime could severely impact your customers, tarnish your reputation, or result in substantial financial loss, a cell-based approach can safeguard against widespread disruptions. Critical Economic Infrastructure: Cell-based architecture ensures continuous operation for financial services industries (FSI), where workloads are pivotal to economic stability. Ultra-Scale Systems: Systems too large or critical to fail—those that must maintain operation under almost any circumstance—are prime candidates for cell-based design. Stringent Recovery Objectives: Cell-based architecture offers quick recovery capabilities for workloads requiring a Recovery Point Objective (RPO) of less than 5 seconds and a Recovery Time Objective (RTO) of less than 30 seconds. Multi-Tenant Services with Dedicated Needs: For services where tenants demand fully dedicated resources, assigning them their cell ensures isolation and dedicated performance. Although cell-based architecture brings considerable benefits to handling critical workloads, it also comes with its own hurdles, such as heightened complexity, elevated costs, the necessity for specialized tools and practices, and the need for investment in a routing layer. For a more in-depth analysis of these challenges, please see the "Weighing the Scales: Benefits and Challenges." Implementing Cell-Based Architecture This section highlights critical design factors that come into play while designing and implementing a cell-based architecture. Designing a Cell Cell design is a foundational aspect of cell-based architecture, where a system is divided into smaller, self-contained units known as cells. Each cell operates independently with its resources, making the entire system more scalable and resilient. To embark on cell design, identify distinct functionalities within your system that can be isolated into individual cells. This might involve grouping services by their operational needs or user base. Once you've defined these boundaries, equip each cell with the necessary resources, such as databases and application logic, to ensure it can function autonomously. This setup facilitates targeted scaling and recovery and minimizes the impact of failures, as issues in one cell won't spill over to others. Implementing effective communication channels between cells and establishing comprehensive monitoring are crucial steps to maintain system cohesion and oversee cell performance. By systematically organizing your architecture into cells, you create a robust framework that enhances the manageability and adaptability of your system. Here are a few ideas on cell design that can be leveraged to bolster system resilience: Distribute Cells Across Availability Zones: By positioning cells across different availability zones (AZs), you can protect your system against the failure of a single data center or geographic location. This geographical distribution ensures that even if one AZ encounters issues, other cells in different AZs can continue to operate, maintaining overall system availability and reducing the risk of complete service downtime. Implement Redundant Cell Configurations: Creating redundant copies of cells within and across AZs can further enhance resilience. This redundancy means that if one cell fails, its responsibilities can be immediately taken over by a duplicate cell, minimizing service disruption. This approach requires careful synchronization between cells to ensure data consistency but significantly improves fault tolerance. Design Cells for Autonomous Operation: Ensuring that each cell can operate independently, with its own set of resources, databases, and application logic, is crucial. This independence allows cells to be isolated from failures elsewhere in the system. Even if one cell experiences a problem, it won't spread to others, localizing the impact and making it easier to identify and rectify issues. Use Load Balancers and Cell Routers Strategically: Integrating load balancers and cell routers that are aware of cell locations and health statuses can help efficiently redirect traffic away from troubled cells or AZs. This dynamic routing capability allows for real-time adjustments to traffic flow, directing users to the healthiest available cells and balancing the load to prevent overburdening any single cell or AZ. Facilitate Easy Cell Replication and Deployment: Design cells with replication and redeployment in mind. In case of a cell or AZ failure, having mechanisms for quickly spinning up new cells in alternative locations can be invaluable. Automation tools and templates for cell deployment can expedite this process, reducing recovery times and enhancing overall system resilience. Regularly Test Failover Processes: Regular testing of cell failover processes, including simulated failures and recovery drills, can ensure that your system responds as expected during actual outages. These tests can reveal potential weaknesses in your cell design and failover strategies, allowing for continuous improvement of system resilience. By incorporating these ideas into your cell design, you can create a more resilient system capable of withstanding various failure scenarios while minimizing the impact on service availability and performance. Cell Partitioning Cell partitioning is a crucial technique in cell-based architecture. It focuses on dividing a system's workload among distinct cells to optimize performance, scalability, and resilience. It involves categorizing and directing user requests or data to specific cells based on predefined criteria. This process ensures no cell becomes overwhelmed, enhancing system reliability and efficiency. How Cell Partitioning Can Be Done: Identify Partition Criteria: Determine the basis for distributing workloads among cells. Typical criteria include geographic location, user ID, request type, or date range. This step is pivotal in defining how the system categorizes and routes requests to the appropriate cells. Implement Routing Logic: Develop a routing mechanism within the cell router or API gateway that uses the identified criteria to direct incoming requests to the correct cell. This might involve dynamic decision-making algorithms that consider current cell load and availability. Continuous Monitoring and Adjustment: Regularly monitor the performance and load distribution across cells. Use this data to adjust partitioning criteria and routing logic to maintain optimal system performance and scalability. Partitioning Algorithms: Several algorithms can be utilized for effective cell partitioning, each with its strengths and tailored to different types of workloads and system requirements: Consistent Hashing: Requests are distributed based on the hash values of the partition key (e.g., user ID), ensuring even workload distribution and minimal reorganization when cells are added or removed. Range-Based Partitioning: Divides data into ranges (e.g., alphabetical or numerical) and assigns each range to a specific cell. This is ideal for ordered data, allowing efficient query operations. Round Robin: This method distributes requests evenly across all available cells in a cyclic manner. It is straightforward and helpful in achieving a basic level of load balancing. Sharding: Similar to range-based partitioning but more complex, sharding involves splitting large databases into smaller, faster, more easily managed parts, or "shards," each handled by a separate cell. Dynamic Partitioning: Adjusts partitioning in real-time based on workload characteristics or system performance metrics. This approach requires advanced algorithms capable of analyzing system states and making immediate adjustments. By thoughtfully implementing cell partitioning and choosing the appropriate algorithm, you can significantly enhance your cell-based architecture's performance, scalability, and resilience. Regular review and adjustment of your partitioning strategy ensures it continues to meet your system's evolving needs. Implementing a Cell Router In cell-based architecture, the cell router is crucial for steering traffic to the correct cells, ensuring efficient workload management and scalability. An effective cell router hinges on two key elements: traffic routing logic and failover strategies, which maintain system reliability and optimize performance. Implementing Traffic Routing Logic: Start by defining the criteria for how requests are directed to various cells, including the users' geographic location, the type of request, and the specific services needed. The aim is to reduce latency and evenly distribute the load. Employ dynamic routing that adapts to cell availability and workload changes in real time, possibly through integration with a service discovery tool that monitors each cell's status and location. Establishing Failover Strategies: Solid failover processes are essential for the cell router to ensure the system's dependability. Should any cell become unreachable, the router must automatically reroute traffic to the next available cell, requiring minimal manual intervention. This is achieved by implementing health checks across cells to swiftly identify and respond to failures, thus keeping the user experience smooth and the service highly available, even during cell outages. Fig 3. The cell router ensures a smooth user experience by redirecting traffic to healthy cells during outages, maintaining uninterrupted service availability For the practical implementation of a cell router, you can take one of the following approaches: Load Balancers: Use cloud-based load balancers that dynamically direct traffic based on specific request attributes, such as URL paths or headers, according to set rules. API Gateways: An API gateway can serve as the primary entry for all incoming requests and route them to the appropriate cell based on configured logic. Service Mesh: A service mesh offers a network layer that facilitates efficient service-to-service communications and routing requests based on policies, service discovery, and health status. Custom Router Service: Developing a custom service allows routing decisions based on detailed request content, current cell load, or bespoke business logic, offering tailored control over traffic management. Choosing the right implementation strategy for a cell router depends on specific needs, such as the granularity of routing decisions, integration capabilities with existing systems, and management simplicity. Each method provides varying degrees of control, complexity, and adaptability to cater to distinct architectural requirements. Cell Sizing Cell sizing in a cell-based architecture refers to determining each cell's optimal size and capacity to ensure it can handle its designated workload effectively without overburdening. Proper cell sizing is crucial for several reasons: Balanced Load Distribution: Correctly sized cells help achieve a balanced distribution of workloads across the system, preventing any single cell from becoming a bottleneck. Scalability: Well-sized cells can scale more efficiently. As demand increases, the system can add more cells or adjust resources within existing cells to accommodate growth. Resilience and Recovery: Smaller, well-defined cells can isolate failures more effectively, limiting the impact of any single point of failure. This makes the system more resilient and simplifies recovery processes. Cost Efficiency: Optimizing cell size helps utilize resources more efficiently, avoiding unnecessary expenditure on underutilized capacities. How Cell Sizing Is Done? Cell sizing involves a careful analysis of several factors: Workload Analysis: Understand the nature and volume of each cell's workload. This includes peak demand times, data throughput, and processing requirements. Resource Requirements: Based on the workload analysis, estimate the resources (CPU, memory, storage) each cell needs to operate effectively under various conditions. Performance Metrics: Consider key performance indicators (KPIs) that define successful cell operation. This could include response times, error rates, and throughput. Scalability Goals: Define how the system should scale in response to increased demand. This will influence whether cells should be designed to scale up (increase resources in a cell) or scale out (add more cells). Testing and Adjustment: Validate cell size assumptions by testing under simulated workload conditions. Monitoring real-world performance and adjusting as needed is a continuous part of cell sizing. Effective cell sizing often involves a combination of theoretical analysis and empirical testing. Starting with a best-guess estimate based on workload characteristics and adjusting based on observed performance ensures that cells remain efficient, responsive, and cost-effective as the system evolves. Cell Deployment Cell deployment in a cell-based architecture is the process of distributing and managing your application's workload across multiple self-contained units called cells. This strategy ensures scalability, resilience, and efficient resource use. Here's a concise guide on how it's typically done and the technology choices available for effective implementation. How Is Cell Deployment Done? Automated Deployment Pipelines: Start by setting up automated deployment pipelines. These pipelines handle your application's packaging, testing, and deployment to various cells. Automation ensures consistency, reduces errors, and enables rapid deployment across cells. Blue/Green Deployments: Use blue/green deployment strategies to minimize downtime and reduce risk. By deploying the new version of your application to a separate environment (green) while keeping the current version (blue) running, you can switch traffic to the latest version once it's fully ready and tested. Canary Releases: Gradually roll out updates to a small subset of cells or users before making them available system-wide. This allows you to monitor the impact of changes and roll them back if necessary without affecting all users. Technology Choices for Cell Deployment: Container Orchestration Tools: Tools such as Kubernetes, AWS ECS, and Docker Swarm are crucial for orchestrating cell deployments, enabling the encapsulation of applications into containers for streamlined deployment, scaling, and management across various cells. CI/CD Tools: Continuous Integration and Continuous Deployment (CI/CD) tools such as Jenkins, GitLab CI, CircleCI, and AWS Pipeline facilitate the automation of testing and deployment processes, ensuring that new code changes can be efficiently rolled out. Infrastructure as Code (IaC): Tools like Terraform and AWS CloudFormation allow you to define your infrastructure in code, making it easier to replicate and deploy cells across different environments or cloud providers. Service Meshes: Service meshes like Istio or Linkerd provide advanced traffic management capabilities, including canary deployments and service discovery, which are crucial for managing communication and cell updates. By leveraging these deployment strategies and technologies, you can achieve a high degree of automation and control in your cell deployments, ensuring your application remains scalable, reliable, and easy to manage. Cell Observability Cell observability is crucial in a cell-based architecture to ensure you have comprehensive visibility into each cell's health, performance, and operational metrics. It allows you to monitor, troubleshoot, and optimize the system effectively, enhancing overall reliability and user experience. Implementing Cell Observability: To achieve thorough cell observability, focus on three key areas: logging, monitoring, and tracing. Logging captures detailed events and operations within each cell. Monitoring tracks key performance indicators and health metrics in real time. Tracing follows requests as they move through the cells, identifying bottlenecks or failures in the workflow. Technology Choices for Cell Observability: Logging Tools: Solutions like Elasticsearch, Logstash, Kibana (ELK Stack), or Splunk provide powerful logging capabilities, allowing you to aggregate and analyze logs from all cells centrally. Monitoring Solutions: Prometheus, coupled with Grafana for visualization, offers robust monitoring capabilities with support for custom metrics. Cloud-native services like Amazon CloudWatch or Google Operations (formerly Stackdriver) provide integrated monitoring solutions tailored for applications deployed on their respective cloud platforms. Distributed Tracing Systems: Tools like Jaeger, Zipkin, and AWS XRay enable distributed tracing, helping you to understand the flow of requests across cells and identify latency issues or failures in microservices interactions. Service Meshes: Service meshes such as Istio or Linkerd inherently offer observability features, including monitoring, logging, and tracing requests between cells without requiring changes to your application code. By leveraging these tools and focusing on comprehensive observability, you can ensure that your cell-based architecture remains performant, resilient, and capable of supporting your application's dynamic needs. Weighing the Scales: Benefits and Challenges Adopting Cell-Based Architecture transforms the structural and operational dynamics of digital services. Breaking down a service into independently scalable and resilient units (cells) offers a robust framework for managing complexity and ensuring system availability. However, this architectural paradigm also introduces new challenges and complexities. Here's a deeper dive into the technical advantages and considerations. Benefits Horizontal Scalability: Unlike traditional scale-up approaches, Cell-Based Architecture enables horizontal scaling by adding more cells. This method alleviates common bottlenecks associated with centralized databases or shared resources, allowing for linear scalability as user demand increases. Fault Isolation and Resilience: The architecture's compartmentalized design ensures that failures are contained within individual cells, significantly reducing the system's overall blast radius. This isolation enhances the system's resilience, as issues in one cell can be mitigated or repaired without impacting the entire service. Deployment Agility: Leveraging cells allows for incremental deployments and feature rollouts, akin to implementing rolling updates across microservices. This granularity in deployment strategy minimizes downtime and enables a more flexible response to market or user demands. Simplified Operational Complexity: While the initial setup is complex, the ongoing operation and management of cells can be more straightforward than monolithic architectures. Each cell's autonomy simplifies monitoring, troubleshooting, and scaling efforts, as operational tasks can be executed in parallel across cells. Challenges (Considerations) Architectural Complexity: Transitioning to or implementing Cell-Based Architecture demands a meticulous design phase, focusing on defining cell boundaries, data partitioning strategies, and inter-cell communication protocols. This complexity requires a deep understanding of distributed systems principles and may necessitate a development and operational practices shift. Resource and Infrastructure Overhead (Higher Cost): Each cell operates with its set of resources and infrastructure, potentially leading to increased overhead compared to shared-resource models. Optimizing resource utilization and cost-efficiency becomes paramount, especially as the number of cells grows. Inter-Cell Communication Management: Ensuring coherent and efficient communication between cells without introducing tight coupling or significant latency is a critical challenge. Developers must design a communication layer that supports the necessary interactions while maintaining cells' independence and avoiding performance bottlenecks. Data Consistency and Synchronization: Maintaining data consistency across cells, especially in scenarios requiring global state or real-time data synchronization, adds another layer of complexity. Implementing strategies like event sourcing, CQRS (Command Query Responsibility Segregation), or distributed sagas may be necessary to address these challenges. Specialized Tools and Practices: Operating a cell-based architecture requires specialized operational tools and practices to effectively manage multiple instances of workloads. Routing Layer Investment: A robust cell routing layer is essential for directing traffic appropriately across cells, necessitating additional investment in technology and expertise. Navigating the Trade-offs Opting for Cell-Based Architecture involves navigating these trade-offs and evaluating whether scalability, resilience, and operational agility benefits outweigh the complexities of implementation and management. It is most suitable for services requiring high availability, those undergoing rapid expansion, or systems where modular scaling and failure isolation are critical. Best Practices and Pitfalls Best Practices Adopting a cell-based architecture can significantly enhance the scalability and resilience of your applications. Here are streamlined best practices for implementing this approach effectively: Begin With a Solid Foundation Treat Your Current Setup as Cell Zero: Viewing your existing system as the initial cell, gradually introducing traffic routing and distribution across new cells. Launch with Multiple Cells: Implement more than one cell from the beginning to quickly learn and adapt to the operational dynamics of a cell-based environment. Plan for Flexibility and Growth Implement a Cell Migration Mechanism Early: Prepare for the need to move customers between cells, ensuring you can scale and adjust without disruption. Focus on Reliability Conduct a Failure Mode Analysis: Identify and assess potential failures within each cell and their impact, developing strategies to ensure robustness and minimize cross-cell effects. Ensure Independence and Security Maintain Cell Autonomy: Design cells to be self-sufficient, with dedicated resources and clear ownership, possibly by a single team. Secure Communication: Use versioned, well-defined APIs for cell interactions and enforce security policies at the API gateway level. Minimize Dependencies: Keep inter-cell dependencies low to preserve the architecture's benefits, such as fault isolation. Optimize Deployment and Operations Avoid Shared Resources: Each cell should have its data storage to eliminate global state dependencies. Deploy in Waves: Introduce updates and deployments in phases across cells for better change management and quick rollback capabilities. By following these practices, you can leverage cell-based architecture to create scalable, resilient, but also manageable, and secure systems ready to meet the challenges of modern digital demands. Common Pitfalls While cell-based architecture offers significant advantages for scalability and resilience, it also introduces specific challenges and pitfalls that organizations need to be aware of when adopting this approach: Complexity in Management and Operations Increased Operational Overhead: Managing multiple cells can introduce complexity in deployment, monitoring, and operations, requiring robust automation and orchestration tools to maintain efficiency. Consistency Management: Ensuring data consistency across cells, especially in stateful applications, can be challenging and might require sophisticated synchronization mechanisms. Initial Setup and Migration Challenges Complex Migration Process: Transitioning to a cell-based architecture from a traditional setup can be complex, requiring careful planning to avoid service disruption and data loss. Steep Learning Curve: Teams may face a learning curve in understanding cell-based concepts and best practices, necessitating training and potentially slowing initial progress. Design and Architectural Considerations Cell Isolation: Properly isolating cells to prevent failure propagation requires meticulous design, failing which the system might not fully realize the benefits of fault isolation. Optimal Cell Size: Determining the optimal size for cells can be tricky, as overly small cells may lead to inefficiencies, while huge cells might compromise scalability and resilience. Resource Utilization and Cost Implications Potential for Increased Costs: If not carefully managed, the duplication of resources across cells can lead to increased operational costs. Underutilization of Resources: Balancing resource allocation to prevent underutilization while avoiding over-provisioning requires continuous monitoring and adjustment. Networking and Communication Overhead Network Complexity: The cell-based architecture may introduce additional network complexity, including the need for sophisticated routing and load-balancing strategies. Inter-Cell Communication: Ensuring efficient and secure communication between cells, especially in geographically distributed setups, can introduce latency and requires safe, reliable networking solutions. Security and Compliance Security Configuration: Each cell's need for individual security configurations can complicate enforcing consistent security policies across the architecture. Compliance Verification: Verifying that each cell complies with regulatory requirements can be more challenging in a distributed architecture, requiring robust auditing mechanisms. Scalability vs. Cohesion Trade-Off Dependency Management: While minimizing dependencies between cells enhances fault tolerance, it can also lead to challenges in maintaining application cohesion and consistency. Data Duplication: Avoiding shared resources may result in data duplication and synchronization challenges, impacting system performance and consistency. Organizations should invest in robust planning, adopt comprehensive automation and monitoring tools, and ensure ongoing team training to mitigate these pitfalls. Understanding these challenges upfront can help design a more resilient, scalable, and efficient cell-based architecture. Cell-Based Wins in the Real World Cell-based architecture has become essential for managing scalability and ensuring system resilience, from high-growth startups to tech giants like Amazon and Facebook. This architectural model has been adopted across various industries, reflecting its effectiveness in handling large-scale, critical workloads. Here's a brief look at how DoorDash and Slack have implemented cell-based architecture to address their unique challenges. DoorDash's Transition to Cell-Based Architecture Faced with the demands of hypergrowth, DoorDash migrated from a monolithic system to a cell-based architecture, marking a pivotal shift in its operational strategy. This transition, known as Project SuperCell, was driven by the need to efficiently manage fluctuating demand and maintain consistent service reliability across diverse markets. By leveraging AWS's cloud infrastructure, DoorDash was able to isolate failures within individual cells, preventing widespread system disruptions. It significantly enhanced their ability to scale resources and maintain service reliability, even during peak times, demonstrating the transformative potential of adopting a cell-based approach. Slack's Migration to Cell-Based Architecture Slack underwent a major shift to a cell-based architecture to lessen the impact of gray failures and boost service redundancy. Prompted by a review of a network outage, this move revealed the risks of depending solely on a single availability zone. The new cellular structure aims to confine failures more effectively and minimize the extent of potential site outages. With the adoption of isolated services in each availability zone, Slack has enabled its internal services to function independently within each zone, curtailing the fallout from outages and speeding up the recovery process. This significant redesign has markedly improved Slack's system resilience, underscoring cell-based architecture's role in ensuring high service availability and quality. Roblox's Strategic Shift to Cellular Infrastructure Roblox's shift to a cell-based architecture showcases its response to rapid growth and the need to support over 70 million daily active users with reliable, low-latency experiences. Roblox created isolated clusters within their data centers by adopting a cellular infrastructure, enhancing system resilience through service replication across cells. This setup allowed for the deactivation of non-functional cells without disrupting service, effectively containing failures. The move to cellular infrastructure has significantly boosted Roblox's system reliability, enabling the platform to offer always-on, immersive experiences worldwide. This strategy highlights the effectiveness of cell-based architecture in managing large-scale, dynamic workloads and maintaining high service quality as platforms expand. These examples from DoorDash, Slack, and Roblox illustrate the strategic value of cell-based architecture in addressing the challenges of scale and reliability. By isolating workloads into independent cells, these companies have achieved greater scalability, fault tolerance, and operational efficiency, showcasing the effectiveness of this approach in supporting dynamic, high-demand services. Key Takeaways Cell-based architecture represents a transformative approach for organizations aiming to achieve hyper-scalability and resilience in the digital era. Companies like Amazon, Facebook, DoorDash, and Slack have demonstrated their efficacy in managing hypergrowth and ensuring uninterrupted service by segmenting systems into independent, self-sufficient cells. This architectural strategy facilitates dynamic scaling and robust fault isolation and demands careful consideration of increased complexity, resource allocation, and the need for specialized operational tools. As businesses continue to navigate the demands of digital growth, the adoption of cell-based architecture emerges as a strategic solution for sustaining operational integrity and delivering consistent user experiences amidst the ever-evolving digital landscape. Acknowledgments This article draws upon the collective knowledge and experiences of industry leaders and practitioners, including insights from technical blogs, case studies from companies like Amazon, Slack, and Doordash, and contributions from the wider tech community. References https://docs.aws.amazon.com/wellarchitected/latest/reducing-scope-of-impact-with-cell-based-architecture/reducing-scope-of-impact-with-cell-based-architecture.html https://github.com/wso2/reference-architecture/blob/master/reference-architecture-cell-based.md https://newsletter.systemdesign.one/p/cell-based-architecture https://highscalability.com/cell-architectures/ https://www.youtube.com/watch?v=ReRrhU-yRjg https://slack.engineering/slacks-migration-to-a-cellular-architecture/ https://blog.roblox.com/2023/12/making-robloxs-infrastructure-efficient-resilient/

By Shantanu Kumar

Anomaly Detection Using QoS Metrics and Business Intelligence

In the contemporary data landscape, characterized by vast volumes of diverse data sources, the necessity of anomaly detection intensifies. As organizations aggregate substantial datasets from disparate origins, the identification of anomalies assumes a pivotal role in reinforcing security protocols, streamlining operational workflows, and upholding stringent quality standards. Through the application of sophisticated methodologies encompassing statistical analysis, machine learning, and data visualization, anomaly detection emerges as a potent instrument for uncovering latent insights, mitigating risks, and facilitating real-time decision-making processes. This article centers on a focused application scenario: the detection of anomalies within a video/audio streaming platform to gauge real-time content delivery quality. Our objective is clear: to assess the quality of streaming video/audio content, ultimately enhancing the customer experience. Central to this discussion is the utilization of Quality of Service (QoS) metrics, complemented by GEO-IP services, to enrich data capture and facilitate proactive monitoring, detection, and intervention. What Is Quality of Service? Quality of service (QoS) refers to the measurement of the precision and reliability of the services provided to a platform, assessed through various metrics. It's a commonly employed concept in networking circles to ensure the optimal performance of a platform. This article focuses on establishing QoS metrics tailored specifically for video or audio content. We achieve this by extracting necessary metrics at the client edge (customer devices) and enhancing their attributes to provide deeper insights for business purposes. Why Quality of Service? The importance of "quality of service" lies in its ability to fulfill the specific needs of consumers. For instance, when customers are enjoying a live sports event through OTT streaming platforms like YouTube, it becomes paramount for the streaming company to assess the video quality across various regions. This necessity extends beyond video streaming to other sectors such as podcasting, audiobooks, and even award streaming services. How QoS Metrics Can Help in Anomaly Detection Integral to anomaly detection, QoS metrics furnish essential data and insights to pinpoint abnormal behavior and potential security risks across applications, systems, and networks. Continuous monitoring of metrics such as buffering ratio, bandwidth, and throughput enables the detection of anomalies through deviations from established thresholds or behavioral patterns, triggering alerts for swift intervention. Furthermore, QoS metrics facilitate root cause analysis by pinpointing underlying causes of anomalies, guiding the formulation of effective corrective actions. We need to design a solution in order to identify anomalies in three states: New York, New Jersey and Tamil Nadu for a streaming platform and ensure smooth streaming quality. We will leverage AWS components to compliment this solution. How Can We Solve This Problem Using Streaming Architecture? To comprehensively analyze the situation, we require additional attributes beyond just geographical location. For instance, in cases of streaming quality issues, organizations must ascertain whether the problem stems from the Internet Service Provider or if it is linked to recent code releases, potentially affecting specific operating systems on devices. Overall, there's a need for a Quality of Service (QoS) API service capable of collecting pertinent data from the client devices and relaying it to an API, which in turn disseminates these attributes to downstream components. With the initial details provided by the client, the downstream components can enhance the dataset. The JSON object below illustrates the basic information transmitted by the client device for a single event. Sample JSON event from client device: JSON { "video_start_time":"2023-09-10 10:30:30", "video_end_time":"2023-09-10 10:30:33", "total_play_time_mins" : "60", "uip":"10.122.9.22", "video_id":"xxxxxxxxxxxxxxx", "device_type":"ios", "device_model":"iphone11" } Architecture Option 1 The application code on the device can call the API Gateway, linked to a Kinesis proxy, which connects to a Kinesis Stream. This setup facilitates near real-time analysis of client data at this layer. Subsequently, data transformation can occur using a Lambda function, followed by storage in S3 for further analysis. This architecture addresses two primary use cases: firstly, the capability to analyze incoming QoS data in near real-time through Kinesis Stream, leveraging AWS tools like Kinesis Analytics for ad-hoc analytics with reduced latency. Secondly, the ability to write data to S3 using a simple Lambda code allows for batch analytics to be conducted. Mentioned approach effectively addresses scalability concerns in a streaming solution by leveraging various AWS components. In our specific use case, enriching incoming data with geo IP locations is essential, since we need information like country, state and ISP's. To achieve this, we can utilize a geo API, such as max mind, to incorporate geo-location, IP address, and other relevant dimensions. Alternatively, let's explore an architecture that assumes analytics are performed every minute, eliminating the need for a streaming layer and focusing solely on a delivery layer. Architecture Option 2 In this scenario, we'll illustrate the process of enriching data with geo and ISP-specific attributes to facilitate anomaly detection. Clients initiate the process by calling the API Gateway and passing along the relevant attributes. These values are then transmitted to the Kinesis Firehose via the Kinesis proxy. A transformation lambda function within the Kinesis Firehose executes a straightforward Python script to retrieve geo IP details from the MaxMind service. Subsequently, Kinesis Firehose batches the data and transfers it to S3. S3 serves as the central repository of truth for anomaly detection, housing all the necessary data for analysis. Below is a sample code snippet for calling the service to retrieve geo-IP details. As depicted, the code primarily centers on retrieving information from the MaxMind .mdb file supplied by the provider. Various methods exist for obtaining geo IP data; in this instance, I've chosen to have the .mdb file accessible via an S3 path. Alternatively, you can opt to retrieve it through API calls. The enriched data is then returned to Kinesis Firehose, where it undergoes batching, compression, and subsequent delivery to S3. Python import base64 import json import geoip2.database s3_city_url = "<maxmind_s3_url_path_for_city_details_mmdb_file>" s3_isp_url = "<maxmind_s3_url_path_for_isp_details_mmdb_file>" opener = ur.URLopener() city_file = opener.open(s3_city_url).read() isp_file = opener.open(s3_isp_url).read() def qos_handler(event, context): def enrichRecord(record): try: decodedata2 = base64.b64decode(record['data']) streaming_event_object = json.loads(decodedata2.decode("utf-8")) reader = geoip2.database.Reader(city_file, mode='RAW_FILE') response_data = reader.city(streaming_event_object['uip']) reader_isp_data = geoip2.database.Reader(isp_file, mode='RAW_FILE') response_isp_data = reader_isp.isp(streaming_event_object['uip']) streaming_event_object['cityname'] = response_data.city.name streaming_event_object['postalcode'] = response_data.postal.code streaming_event_object['metrocode'] = response_data.location.metro_code streaming_event_object['timezone'] = response_data.location.time_zone streaming_event_object['countryname'] = response_data.country.name streaming_event_object['countryisocode'] = response_data.country.iso_code streaming_event_object['origip'] = streaming_event_object['uip'] streaming_event_object['ispname']=response_data.isp jsonData = json.dumps(streaming_event_object) encoded_streaming_data = base64.b64encode(jsonData.encode("utf-8")) return { 'recordId': record['recordId'], 'result': "Ok", 'data': encoded_streaming_data.decode("utf-8") } except Exception as e: print("type of e:",type(e)) print("exception as e:",e) print("event[records]-input:",event['records']) output = list(map(enrichRecord, event['records'])) print("output:",output) return {'records': output} Analytics on Streamed Data After the data reaches S3, we can conduct ad-hoc analytics on it. Various options are available for analyzing the data once it resides in S3. It can be loaded into a data warehousing platform such as Redshift or Snowflake. Alternatively, if a data lake or data mesh serves as the source of truth, the data can be replicated there. During the analysis in S3, we primarily calculate the buffering ratio using the following formula: Plain Text The ratio is obtained by dividing the buffering time by the total playtime. In this example so we are calculating the buffering ratio as below, In our example: "video_start_time":"2023-09-10 10:30:30", "video_end_time":"2023-09-10 10:30:33", "total_play_time_mins" : "60", Buffering_ratio = diff(video_end_time,video_start_time)/total_play_time_mins Buffering_ratio = (3/3600) = 0.083 Detecting Anomalies To continue further, the following attributes will be available as rows in tabular format during the ETL operation at the Data Warehousing (DWH) stage. These values will be stored for each video/audio ID. By establishing a materialized view for the set of records stored over a certain period, we can compute an average value and percentages of the buffering ratio metric mentioned earlier. Sample JSON event with buffering ratio: JSON { "video_start_time":"2023-09-10 10:30:30", "video_end_time":"2023-09-10 10:30:33", "total_play_time_mins" : "60", "uip":"10.122.9.22", "video_id":"xxxxxxxxxxxxxxx", "device_type":"ios", "device_model":"iphone11", "Buffering_raio":"0.083", "uip":"10.122.9.22", "video_id":"xxxxxxxxxxxxxxx", "isp":"isp1", "country":"USA", "state":"NJ" } For simplicity, let's focus on one metric — buffering ratio — to gauge the streaming quality of sports matches or podcasts for customers. After capturing the real-time events and visualizing the tabular data, It is obvious NY exhibits a higher buffering ratio (out of the 3 states the organization is interested in), indicating that viewers may experience sluggish content delivery. This observation prompts further investigation into potential issues related to ISPs or networking by delving into other dimensions gathered from GEO-IP or device attributes. As the first step content providers choose to delve deeper into geographical dimensions at the city level, and they identify that Manhattan in New York has the highest buffering ratio among top 3 cities in NY having higher buffering ratios. Following this, content providers delve into the metrics associated with internet service provider (ISP) details specifically for Manhattan to identify potential causes. This examination uncovers that ISP1 exhibited a higher buffer ratio, and upon further investigation, it appears that ISP1 encountered internet speed issues only in Manhattan. These proactive analyses empower content providers to detect anomalies and evaluate their repercussions on consumers in particular regions, thereby proactively reaching out to consumers. Comparable analyses can be expanded to other factors such as device types and models. These steps demonstrate how anomaly detection can be carried out with robust data engineering, streaming solutions, and business intelligence in place. These data intrun can be us used for Machine learning algorithms as well for enhanced detections. Conclusion This article delved into leveraging QoS metrics for anomaly detection during content streaming in video or audio applications. A particular emphasis was placed on enriching data with GEO-IP details using the MAXMIND service, facilitating issue triage to specific dimensions such as country, state, county, or ISPs. Architectural options were also presented for implementing streaming solutions, accommodating both ad-hoc near real-time and batch analytics to pinpoint anomalies. I trust this article serves as a helpful starting point for exploring anomaly detection approaches within your organization. Notably, the discussed solution extends beyond OTT platforms, being applicable to diverse domains such as the financial sector, where near real-time anomaly detection is essential.

By Keerthivasan santhanakrishnan

Strategic Insights Into Azure DevOps: Balancing Advantages and Challenges

In an era where the pace of software development and deployment is accelerating, the significance of having a robust and integrated DevOps environment cannot be overstated. Azure DevOps, Microsoft's suite of cloud-based DevOps services, is designed to support teams in planning work, collaborating on code development, and building and deploying applications with greater efficiency and reduced lead times. The objective of this blog post is twofold: first, to introduce Azure DevOps, shedding light on its components and how they converge to form a powerful DevOps ecosystem, and second, to provide a balanced perspective by delving into the advantages and potential drawbacks of adopting Azure DevOps. Whether you're contemplating the integration of Azure DevOps into your workflow or seeking to optimize your current DevOps practices, this post aims to equip you with a thorough understanding of what Azure DevOps has to offer, helping you make an informed decision tailored to your organization's unique requirements. What Is Azure DevOps? Azure DevOps represents the evolution of Visual Studio Team Services, capturing over 20 years of investment and learning in providing tools to support software development teams. As a cornerstone in the realm of DevOps solutions, Azure DevOps offers a suite of tools catering to the diverse needs of software development teams. Microsoft provides this product in the Cloud with Azure DevOps Services or on-premises with Azure DevOps Server. It offers integrated features accessible through a web browser or IDE client. At its core, Azure DevOps comprises five key components, each designed to address specific aspects of the development process. These components are not only powerful in isolation but also offer enhanced benefits when used together, creating a seamless and integrated experience for users. Azure Boards It offers teams a comprehensive solution for project management, including agile planning, work item tracking, and visualization tools. It enables teams to plan sprints, track work with Kanban boards, and use dashboards to gain insights into their projects. This component fosters enhanced collaboration and transparency, allowing teams to stay aligned on goals and progress. Azure Repos It is a set of version control tools designed to manage code efficiently. It provides Git (distributed version control) or Team Foundation Version Control (centralized version control) for source code management. Developers can collaborate on code, manage branches, and track version history with complete traceability. This component ensures streamlined and accessible code management, allowing teams to focus on building rather than merely managing their codebase. Azure Pipelines Azure Pipelines automates the stages of the application's lifecycle, from continuous integration and continuous delivery to continuous testing, build, and deployment. It supports any language, platform, and cloud, offering a flexible solution for deploying code to multiple targets such as virtual machines, various environments, containers, on-premises, or PaaS services. With Azure Pipelines, teams can ensure that code changes are automatically built, tested, and deployed, facilitating faster and more reliable software releases. Azure Test Plans Azure Test Plans provide a suite of tools for test management, enabling teams to plan and execute manual, exploratory, and automated testing within their CI/CD pipelines. Furthermore, Azure Test Plans ensure end-to-end traceability by linking test cases and suites to user stories, features, or requirements. They facilitate comprehensive reporting and analysis through configurable tracking charts, test-specific widgets, and built-in reports, empowering teams with actionable insights for continuous improvement. Thus providing a framework for rigorous testing to ensure that applications meet the highest standards before release. Azure Artifacts It allows teams to manage and share software packages and dependencies across the development lifecycle, offering a streamlined approach to package management. This feature supports various package formats, including npm, NuGet, Python, Cargo, Maven, and Universal Packages, fostering efficient development processes. This service not only accelerates development cycles but also enhances reliability and reproducibility by providing a reliable source for package distribution and version control, ultimately empowering teams to deliver high-quality software products with confidence. Below is an example of architecture leveraging various Azure DevOps services: Image captured from Microsoft Benefits of Leveraging Azure DevOps Azure DevOps presents a compelling array of benefits that cater to the multifaceted demands of modern software development teams. Its comprehensive suite of tools is designed to streamline and optimize various stages of the development lifecycle, fostering efficiency, collaboration, and quality. Here are some of the key advantages: Seamless Integration One of Azure DevOps' standout features is its ability to seamlessly integrate with a plethora of tools and platforms, whether they are from Microsoft or other vendors. This interoperability is crucial for anyone who uses a diverse set of tools in their development processes. Scalability and Flexibility Azure DevOps is engineered to scale alongside your business. Whether you're working on small projects or large enterprise-level solutions, Azure DevOps can handle the load, providing the same level of performance and reliability. This scalability is a vital attribute for enterprises that foresee growth or experience fluctuating demands. Enhanced Collaboration and Visibility Collaboration is at the heart of Azure DevOps. With features like Azure Boards, teams can have a centralized view of their projects, track progress, and coordinate efforts efficiently. This visibility is essential for aligning cross-functional teams, managing dependencies, and ensuring that everyone is on the same page. Continuous Integration and Deployment (CI/CD) Azure Pipelines provides robust CI/CD capabilities, enabling teams to automate the building, testing, and deployment of their applications. This automation is crucial to accelerate their time-to-market and improve the quality of their software. By automating these processes, teams can detect and address issues early, reduce manual errors, and ensure that the software is always in a deployable state, thereby enhancing operational efficiency and software reliability. Drawbacks of Azure DevOps While Azure DevOps offers a host of benefits, it's essential to acknowledge and understand its potential drawbacks. Like any tool or platform, it may not be the perfect fit for every organization or scenario. Here are some of the disadvantages that one might encounter: Vendor Lock-In By adopting Azure DevOps services for project management, version control, continuous integration, and deployment, organizations may find themselves tightly integrated into the Microsoft ecosystem. This dependency could limit flexibility and increase reliance on Microsoft's tools and services, making it challenging to transition to alternative platforms or technologies in the future. Integration Challenges Although Azure DevOps boasts impressive integration capabilities, there can be challenges when interfacing with certain non-Microsoft or legacy systems. Some integrations may require additional customization or the use of third-party tools, potentially leading to increased complexity and maintenance overhead. For organizations heavily reliant on non-Microsoft products, this could pose integration and workflow continuity challenges. Cost Considerations Azure DevOps operates on a subscription-based pricing model, which, while flexible, can become significant at scale, especially for larger teams or enterprises with extensive requirements. The cost can escalate based on the number of users, the level of access needed, and the use of additional features and services. For smaller teams or startups, the pricing may be a considerable factor when deciding whether Azure DevOps is the right solution for their needs. Potential for Over-Complexity With its myriad of features and tools, there's a risk of over-complicating workflows and processes within Azure DevOps. Teams may find themselves navigating through a plethora of options and configurations, which, if not properly managed, can lead to inefficiency rather than improved productivity. Organizations must strike a balance between leveraging Azure DevOps' capabilities and maintaining simplicity and clarity in their processes. While these disadvantages are noteworthy, they do not necessarily diminish the overall value that Azure DevOps can provide to an organization. It's crucial for enterprises and organizations to carefully assess their specific needs, resources, and constraints when considering Azure DevOps as their solution. By acknowledging these potential drawbacks, organizations can plan effectively, ensuring that their adoption of Azure DevOps is strategic, well-informed, and aligned with their operational goals and challenges. Conclusion In the landscape of modern software development, Azure DevOps stands out as a robust and comprehensive platform, offering a suite of tools designed to enhance and streamline the DevOps process. Its integration capabilities, scalability, and extensive features make it an attractive choice for any organization or enterprise. However, like any sophisticated platform, Azure DevOps comes with its own set of challenges and considerations. The vendor lock-in, integration complexities, cost factors, and potential for over-complexity are aspects that organizations need to weigh carefully. It's crucial for enterprises to undertake a thorough analysis of their specific needs, resources, and constraints when evaluating Azure DevOps as a solution. The decision to adopt Azure DevOps should be guided by a strategic assessment of how well its advantages align with the organization's goals and how its disadvantages might impact operations. For many enterprises, the benefits of streamlined workflows, enhanced collaboration, and improved efficiency will outweigh the drawbacks, particularly when the adoption is well-planned and aligned with the organization's objectives.

By Harshavardhan Nerella

Deep Dive Into Java Executor Framework

The ExecutorService in Java provides a flexible and efficient framework for asynchronous task execution. It abstracts away the complexities of managing threads manually and allows developers to focus on the logic of their tasks. Overview The ExecutorService interface is part of the java.util.concurrent package and represents an asynchronous task execution service. It extends the Executor interface, which defines a single method execute(Runnable command) for executing tasks. Executors Executors is a utility class in Java that provides factory methods for creating and managing different types of ExecutorService instances. It simplifies the process of instantiating thread pools and allows developers to easily create and manage executor instances with various configurations. The Executors class provides several static factory methods for creating different types of executor services: FixedThreadPool: Creates an ExecutorService with a fixed number of threads. Tasks submitted to this executor are executed concurrently by the specified number of threads. If a thread is idle and no tasks are available, it remains alive but dormant until needed. Java ExecutorService executor = Executors.newFixedThreadPool(5); CachedThreadPool: Creates an ExecutorService with an unbounded thread pool that automatically adjusts its size based on the workload. Threads are created as needed and reused for subsequent tasks. If a thread remains idle for a certain period, it may be terminated to reduce resource consumption. In a cached thread pool, submitted tasks are not queued but immediately handed off to a thread for execution. If no threads are available, a new one is created. If a server is so heavily loaded that all of its CPUs are fully utilized, and more tasks arrive, more threads will be created, which will only make matters worse. Idle time of threads is default to 60s, after which if they don't have any task thread will be terminated. Therefore, in a heavily loaded production server, you are much better off using Executors.newFixedThreadPool, which gives you a pool with a fixed number of threads, or using the ThreadPoolExecutor class directly, for maximum control. Java ExecutorService executor = Executors.newCachedThreadPool(); SingleThreadExecutor: Creates an ExecutorService with a single worker thread. Tasks are executed sequentially by this thread in the order they are submitted. This executor is useful for tasks that require serialization or have dependencies on each other. Java ExecutorService executor = Executors.newSingleThreadExecutor(); ScheduledThreadPool: Creates an ExecutorService that can schedule tasks to run after a specified delay or at regular intervals. It provides methods for scheduling tasks with fixed delay or fixed rate, allowing for periodic execution of tasks. newWorkStealingPool: Creates a work-stealing thread pool with the target parallelism level. This executor is based on the ForkJoinPool and is capable of dynamically adjusting its thread pool size to utilize all available processor cores efficiently. Overall, the Executors class simplifies the creation and management of executor instances. ExecutorService Tasks can be submitted to an ExecutorService for execution. These tasks are typically instances of Runnable or Callable, representing units of work that need to be executed asynchronously. Below are the methods in ExecutorService. 1. execute(Runnable command): Executes the given task asynchronously. Java ExecutorService executor = Executors.newFixedThreadPool(5); executor.execute(() -> { System.out.println("Task executed asynchronously"); }); 2. submit(Callable<T> task): Submits a task for execution and returns a Future representing the pending result of the task. Java ExecutorService executor = Executors.newSingleThreadExecutor(); Future<Integer> future = executor.submit(() -> { // Task logic return 42; }); 3. shutdown(): Initiates an orderly shutdown of the ExecutorService, allowing previously submitted tasks to execute before terminating. 4. shutdownNow(): Attempts to stop all actively executing tasks, halts the processing of waiting tasks, and returns a list of the tasks that were awaiting execution. Java List<Runnable> pendingTasks = executor.shutdownNow(); 5. awaitTermination(long timeout, TimeUnit unit): Blocks until all tasks have completed execution after a shutdown request, or the timeout occurs, or the current thread is interrupted, whichever happens first. Java boolean terminated = executor.awaitTermination(10, TimeUnit.SECONDS); if (terminated) { System.out.println("All tasks have completed execution"); } else { System.out.println("Timeout occurred before all tasks completed"); } 6. invokeAny(Collection<? extends Callable<T>> tasks): Executes the given tasks, returning the result of one that successfully completes. This method is useful when we have multiple tasks to run but we only care about the result of whichever one completes first. All other tasks are terminated. Java ExecutorService executor = Executors.newCachedThreadPool(); Set<Callable<String>> callables = new HashSet<>(); callables.add(() -> "Task 1"); callables.add(() -> "Task 2"); String result = executor.invokeAny(callables); System.out.println("Result: " + result); 7. invokeAll(Collection<? extends Callable<T>> tasks): Executes the given tasks, returning a list of Future objects representing their pending results. Java List<Callable<Integer>> tasks = Arrays.asList(() -> 1, () -> 2, () -> 3); List<Future<Integer>> futures = executor.invokeAll(tasks); for (Future<Integer> future : futures) { System.out.println("Result: " + future.get()); } Implementations The ExecutorService interface is typically implemented by various classes provided by the Java concurrency framework, such as ThreadPoolExecutor, ScheduledThreadPoolExecutor, and ForkJoinPool. Considerations Careful configuration of thread pool size to avoid underutilization or excessive resource consumption. Consider factors such as task submission rate, task priority, resource constraints, and the desired behavior in case of queue overflow. Choose the queue type that best meets your application's requirements for scalability, performance, and resource utilization. Proper handling of exceptions and task cancellation to ensure robustness and reliability. Understanding the concurrency semantics and potential thread safety issues in concurrent code. To create an instance of ExecutorService, we can pass ThreadFactory and task queue to be used while creating the pool. A ThreadFactory is an interface used to create new threads. It provides a way to encapsulate the logic for creating threads, allowing for customization of thread creation behavior. The primary purpose of a ThreadFactory is to decouple the thread creation process from the rest of the application logic, making it easier to manage and customize thread creation. It is preferred to pass custom Thread factory, as helps in setting thread prefix and priority if required. Java static final String prefix = "app.name.task"; ExecutorService executorService = Executors.newFixedThreadPool(5, () -> { Thread t = new Thread(r); t.setName(prefix + "-" + t.getId()); // Customize thread name if needed return t; }); TaskQueues When tasks are submitted to ExecutorService, if none of the threads in pool are available to process the tasks, they get stored in a queue, below are the different queue options to choose from. Unbounded Queue: An unbounded queue, such as LinkedBlockingQueue, has no fixed capacity and can grow dynamically to accommodate an unlimited number of tasks. It is suitable for scenarios where the task submission rate is unpredictable or where tasks need to be queued indefinitely without the risk of rejection due to queue overflow. However, keep in mind that unbounded queues can potentially lead to memory exhaustion if tasks are submitted at a faster rate than they can be processed. Bounded Queue: A bounded queue, such as ArrayBlockingQueue with a specified capacity, has a fixed size limit and can only hold a finite number of tasks. It is suitable for scenarios where resource constraints or backpressure mechanisms need to be enforced to prevent excessive memory usage or system overload. Tasks may be rejected or handled according to a specified rejection policy when the queue reaches its capacity. Priority Queue: A priority queue, such as PriorityBlockingQueue, orders tasks based on their priority or a specified comparator. It is suitable for scenarios where tasks have different levels of importance or urgency, and higher-priority tasks need to be processed before lower-priority ones. Priority queues ensure that tasks are executed in the order of their priority, regardless of their submission order. Synchronous Queue: A synchronous queue, such as SynchronousQueue, is a special type of queue that enables one-to-one task handoff between producer and consumer threads. It has a capacity of zero and requires both a producer and a consumer to be available simultaneously for task exchange to occur. Synchronous queues are suitable for scenarios where strict synchronization and coordination between threads are required, such as handoff between thread pools or bounded resource access. ScheduledThreadPool The ScheduledThreadPoolExecutor inherits thread pool management capabilities from ThreadPoolExecutor and provides functionalities for scheduling tasks to run after a given delay or periodically at defined intervals. Here's a detailed explanation: Runnable and Callable Tasks: You define tasks you want to schedule using these interfaces, similar to a regular ExecutorService. ScheduledFuture: This interface represents the result of a scheduled task submission. It allows checking the task's completion status, canceling the task before execution, and (for Callable tasks) retrieving the result upon completion. Scheduling Capabilities schedule(Runnable task, long delay, TimeUnit unit): Schedules a Runnable task to be executed after a specified delay in the given time unit (e.g., seconds, milliseconds). scheduleAtFixedRate(Runnable command, long initialDelay, long period, TimeUnit unit): Schedules a fixed-rate execution of a Runnable task. The task is first executed after the initialDelay, and subsequent executions occur with a constant period between them. scheduleWithFixedDelay(Runnable command, long initialDelay, long delay, TimeUnit unit): Schedules a fixed-delay execution of a Runnable task. Similar to scheduleAtFixedRate, but the delay is measured between the completion of the previous execution and the start of the next. Key Considerations Thread Pool Management: ScheduledThreadPoolExecutor maintains a fixed-sized thread pool by default. You can configure the pool size during object creation. Delayed Execution: Scheduled tasks are not guaranteed to execute precisely at the specified time. The actual execution time might be slightly different due to factors like thread availability and workload. Missed Executions: With fixed-rate scheduling, if the task execution time exceeds the period, subsequent executions might be skipped to maintain the fixed rate. Cancellation: You can cancel a scheduled task using the cancel method of the returned ScheduledFuture object. However, cancellation success depends on the task's state (not yet started, running, etc.). Java import java.util.concurrent.ScheduledExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; public class ScheduledThreadPoolExample { public static void main(String[] args) throws InterruptedException { // Create a ScheduledThreadPoolExecutor with 2 threads ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(2); // Schedule a task with a 2-second delay Runnable task1 = () -> System.out.println("Executing task 1 after a delay"); scheduler.schedule(task1, 2, TimeUnit.SECONDS); // Schedule a task to run every 5 seconds with a fixed rate Runnable task2 = () -> System.out.println("Executing task 2 at fixed rate"); scheduler.scheduleAtFixedRate(task2, 1, 5, TimeUnit.SECONDS); // Schedule a task to run every 3 seconds with a fixed delay Runnable task3 = () -> System.out.println("Executing task 3 with fixed delay"); scheduler.scheduleWithFixedDelay(task3, 0, 3, TimeUnit.SECONDS); // Wait for some time to allow tasks to be executed Thread.sleep(15000); // Shutdown the scheduler scheduler.shutdown(); } } Shut Down ExecutorService Gracefully To efficiently shut down an ExecutorService, you can follow these steps: Call the shutdown() method to initiate the shutdown process. This method allows previously submitted tasks to execute before terminating but prevents the submission of new tasks. Call the shutdownNow() method if you want to force the ExecutorService to terminate immediately. This method attempts to stop all actively executing tasks, halts the processing of waiting tasks, and returns a list of the tasks that were awaiting execution but were never started. Await termination by calling the awaitTermination() method. This method blocks until all tasks have completed execution after a shutdown request, or the timeout occurs, or the current thread is interrupted, whichever happens first. Here's an example: Java ExecutorService executor = Executors.newFixedThreadPool(10); // Execute tasks using the executor // Shutdown the executor executor.shutdown(); try { // Wait for all tasks to complete or timeout after a certain period if (!executor.awaitTermination(60, TimeUnit.SECONDS)) { // If the timeout occurs, force shutdown executor.shutdownNow(); // Optionally, wait for the tasks to be forcefully terminated if (!executor.awaitTermination(60, TimeUnit.SECONDS)) { // Log a message indicating that some tasks failed to terminate } } } catch (InterruptedException ex) { // Log interruption exception executor.shutdownNow(); // Preserve interrupt status Thread.currentThread().interrupt(); } In summary, ExecutorService is a versatile framework that helps developers write efficient, scalable, and maintainable concurrent code.

By Prasanna J

Beyond the Resume: Practical Interview Techniques for Hiring Great DevSecOps Engineers

Hello! My name is Roman Burdiuzha. I am a Cloud Architect, Co-Founder, and CTO at Gart Solutions. I have been working in the IT industry for 15 years, a significant part of which has been in management positions. Today I will tell you how I find specialists for my DevSecOps and AppSec teams, what I pay attention to, and how I communicate with job seekers who try to embellish their own achievements during interviews. Starting Point I may surprise some of you, but first of all, I look for employees not on job boards, but in communities, in general chats for IT specialists, and through acquaintances. This way you can find a person with already existing recommendations and make a basic assessment of how suitable he is for you. Not by his resume, but by his real reputation. And you can already know him because you are spinning in the same community. Building the Ideal DevSecOps and AppSec Team: My Hiring Criteria There are general chats in my city (and not only) for IT specialists, where you can simply write: "Guys, hello, I'm doing this and I'm looking for cool specialists to work with me." Then I send the requirements that are currently relevant to me. If all this is not possible, I use the classic options with job boards. Before inviting for an interview, I first pay attention to the following points from the resume and recommendations. Programming Experience I am sure that any security professional in DevSecOps and AppSec must know the code. Ideally, all security professionals should grow out of programmers. You may disagree with me, but DevSecOps and AppSec specialists should work with code to one degree or another, be it some YAML manifests, JSON, various scripts, or just a classic application written in Java, Go, and so on. It is very wrong when a security professional does not know the language in which he is looking for vulnerabilities. You can't look at one line that the scanner highlighted and say: "Yes, indeed, this line is exploitable in this case, or it's false." You need to know the whole project and its structure. If you are not a programmer, you simply will not understand this code. Taking Initiative I want my future employees to be proactive — I mean people who work hard enough, do big tasks, have ambitions, want to achieve, and spend a lot of time on specific tasks. I support people's desire to develop in their field, to advance in the community, and to look for interesting tasks and projects for themselves, including outside of work. And if the resume indicates the corresponding points, I will definitely highlight it as a plus. Work-Life Balance I also pay a lot of attention to this point and I always talk about it during the interview. The presence of hobbies and interests in a person indicates his ability to switch from work to something else, his versatility and not being fixated on one job. It doesn't have to be about active sports, hiking, walking, etc. The main thing is that a person's life has not only work but also life itself. This means that he will not burn out in a couple of years of non-stop work. The ability to rest and be distracted acts as a guarantee of long-term employment relationships. In my experience, there have only been a couple of cases when employees had only work in their lives and nothing more. But I consider them to be unique people. They have been working in this rhythm for a long time, do not burn out, and do not fall into depression. You need to have a certain stamina and character for this. But in 99% of cases, overwork and inability to rest are a guaranteed departure and burnout of the employee in 2-3 years. At the moment, he can do a lot, but I don't need to change people like gloves every couple of years. Education I graduated from postgraduate studies myself, and I think this is more a plus than a minus. You should check the availability of certificates and diplomas of education specified in the resume. Confirmation of qualifications through certificates can indicate the veracity of the declared competencies. It is not easy to study for five years, but at the same time, when you study, you are forced to think in the right direction, analyze complex situations, and develop something that has scientific novelty at present and can be used in the future with benefit for people. And here, in principle, it is the same: you combine common ideas with colleagues and create, for example, progressive DevOps, which allows you to further help people; in particular, in the security of the banking sector. References and Recommendations I ask the applicant to provide contacts of previous employers or colleagues who can give recommendations on his work. If a person worked in the field of information security, then there are usually mutual acquaintances with whom I also communicate and who can confirm his qualifications. What I Look for in an Interview Unfortunately, not all aspects can be clarified at the stage of reading the resume. The applicant may hide some things in order to present themselves in a more favorable light, but more often it is simply impossible to take into account all the points needed by the employer when compiling a resume. Through leading questions in a conversation with the applicant and his stories from previous jobs, I find out if the potential employee has the qualities listed below. Ability To Read It sounds funny, but in fact, it is not such a common quality. A person who can read and analyze can solve almost any problem. I am absolutely convinced of this because I have gone through it myself more than once. Now I try to look for information from many sources, I actively use the same ChatGPT and other similar services just to speed up the work. That is, the more information I push through myself, the more tasks I will solve, and, accordingly, I will be more successful. Sometimes I ask the candidate to find a solution to a complex problem online and provide him with material for analysis, I look at how quickly he can read and conduct a qualitative analysis of the provided article. Analytical Mind There are two processes: decomposition and composition. Programmers usually use the second part. They conduct compositional analysis, that is, they assemble some artifact from the code that is needed for further work. An information security analyst or security specialist uses decomposition. That is, on the contrary, it disassembles the artifact into its components and looks for vulnerabilities. If a programmer creates, then a security specialist disassembles. An analytical mind is needed in the part that is related to how someone else's code works. In the 90s, for example, we talked about disassembling if the code was written in assembler. That is, you have a binary file, and you need to understand how it works. And if you do not analyze all entry and exit points, all processes, and functions that the programmer has developed in this code, then you cannot be sure that the program works as intended. There can be many pitfalls and logical things related to the correct or incorrect operation of the program. For example, there is a function that can be passed a certain amount of data. The programmer can consider this function as some input numerical data that can be passed to it, or this data can be limited by some sequence or length. For example, we enter the card number. It seems like the card number has a certain length. But, at the same time, any analyst and you should understand that instead of a number there can be letters or special characters, and the length may not be the same as the programmer came up with. This also needs to be checked, and all hypotheses need to be analyzed, to look at everything much wider than what is embedded in the business logic and thinking of the programmer who wrote it all. How do you understand that the candidate has an analytical mind? All this is easily clarified at the stage of "talking" with the candidate. You can simply ask questions like: "There is a data sample for process X, which consists of 1000 parameters. You need to determine the most important 30. The analysis task will be solved by 3 groups of analysts. How will you divide these parameters to obtain high efficiency and reliability of the analysis?" Experience Working in a Critical Situation It is desirable that the applicant has experience working in a crunch; for example, if he worked with servers with some kind of large critical load and was on duty. Usually, these are night shifts, evening shifts, on a weekend, when you have to urgently raise and restore something. Such people are very valuable. They really know how to work and have personally gone through different "pains." They are ready to put out fires with you and, most importantly, are highly likely to be more careful than others. I worked for a company that had a lot of students without experience. They very often broke a lot of things, and after that, it was necessary to raise all this. This is, of course, partly a consequence of mentoring. You have to help, develop, and turn students into specialists, but this does not negate the "pain" of correcting mistakes. And until you go through all this with them, they do not become cool. If a person participated in these processes and had the strength and ability to raise and correct, this is very cool. You need to select and take such people for yourself because they clearly know how to work. How To Avoid Being Fooled by Job Seekers Job seekers may overstate their achievements, but this is fairly easy to verify. If a person has the necessary experience, you need to ask them practical questions that are difficult to answer without real experience. For example, I ask about the implementation of a particular practice from DevSecOps, that is, what orchestrator he worked in. In a few words, the applicant should write, for example, a job in which it was all performed, and what tool he used. You can even suggest some keys from this vulnerability scanner and ask what keys and in what aspect you would use to make everything work. Only a specialist who has worked with this can answer these questions. In my opinion, this is the best way to check a person. That is, you need to give small practical tasks that can be solved quickly. It happens that not all applicants have worked and are working with the same as me, and they may have more experience and knowledge. Then it makes sense to find some common questions and points of contact with which we worked together. For example, just list 20 things from the field of information security and ask what the applicant is familiar with, find common points of interest, and then go through them in detail. When an applicant brags about having developments in interviews, it is also better to ask specific questions. If a person tells without hesitation what he has implemented, you can additionally ask him some small details about each item and direction. For example, how did you implement SAST verification, and with what tools? If he tells in detail and, possibly, with some additional nuances related to the settings of a particular scanner, and this fits into the general concept, then the person lived by this and used what he is talking about. Wrapping Up These are all the points that I pay attention to when looking for new people. I hope this information will be useful both for my Team Lead colleagues and for job seekers who will know what qualities they need to develop to successfully pass the interview.

By Roman Burdiuzha

Java vs. Scala: Comparative Analysis for Backend Development in Fintech

Choosing the right backend technology for fintech development involves a detailed look at Java and Scala. Both languages bring distinct advantages to the table, and for professionals working in the fintech industry, understanding these nuances is crucial. There is no arguing Java is a true cornerstone in software development — stable, boasting comprehensive libraries and a vast ecosystem. Many of us — me included! — relied on it for years, and today Java is the backbone of countless financial systems. Scala, in many respects a more modern language, suggests an interesting blend of object-oriented and functional programming, proud of a syntax that reduces boilerplate code and boosts developer productivity. For teams searching to introduce functional programming concepts without stepping away from the JVM ecosystem, Scala is an intriguing option. Our discussion will cover the essential aspects that matter most in fintech backend development: ecosystem and libraries, concurrency, real-time processing, maintainability, and JVM interoperability. Let's analyze, side by side, how Java and Scala perform in the fast-paced, demanding world of fintech backend development, focusing on the concrete benefits and limitations each language presents. Ecosystem and Libraries for Fintech When deciding between Java and Scala for your fintech backend, your major concern will be the richness of their ecosystems and the availability of domain-specific libraries. Java accumulated an impressive array of libraries and frameworks that have become go-to resources for fintech projects. One example is Spring Boot – a real workhorse for setting up microservices, packed with features covering everything from securing transactions to managing data. There’s also Apache Kafka, pretty much the gold standard for managing event streams effectively. But what stands out about Java's ecosystem isn't just the sheer volume of tools but also the community backing them. A vast network of experienced Java developers means you’re never far from finding a solution or best practice advice, honed through years of real-world application. This kind of support network is simply invaluable. Scala, while newer on the scene, brings forward-thinking libraries and tools that are particularly well-suited to the challenges of modern fintech development. Akka, with its toolkit for crafting highly concurrent and resilient message-driven apps, fits perfectly with the needs of high-load financial systems. Alpakka, part of the Reactive Streams ecosystem, further extends Scala's capabilities, facilitating integration with a wide range of messaging systems and data stores. The language’s functional programming capabilities, combined with its interoperability with Java, allow teams to gradually adopt new paradigms without a complete overhaul. On the other hand, one significant challenge that fintech companies might face when adopting Scala is the relative scarcity of experienced Scala developers compared to Java developers. The smaller community size can make it difficult to find developers with deep experience in Scala, especially those who are adept at leveraging its advanced features in a fintech context. This scarcity can lead to higher recruitment costs and potentially longer project timelines, one of the factors to consider when deciding between Java and Scala. While Scala presents compelling advantages to fintech companies interested in building scalable, distributed systems, Java is still a strong contender. The choice between these languages will require you to carefully assess your project's needs, weighing specific pros and cons of the two paradigms. With this in mind, let’s compare some fundamental aspects of these two remarkable languages. Concurrency and Real-Time Processing In fintech, where handling multiple transactions swiftly and safely is the daily bread, a language’s concurrency models are of particular interest. Let’s see what Java and Scala offer us in this regard. Java and Concurrency in Fintech Initially, Java offered threads and locks – a straightforward but sometimes cumbersome way to manage concurrency. However, Java 8 introduced CompletableFuture, which marked a dramatic leap to straightforward asynchronous programming. CompletableFuture provides developers with a promise-like mechanism that can be completed at a later stage, making it ideal for fintech applications that require high throughput and low latency. Let’s consider a scenario where you need to fetch exchange rates from different services concurrently and then combine them to execute a transaction: Java CompletableFuture<Double> fetchUSDExchangeRate = CompletableFuture.supplyAsync(() -> { return exchangeService.getRate("USD"); }); CompletableFuture<Double> fetchEURExchangeRate = CompletableFuture.supplyAsync(() -> { return exchangeService.getRate("EUR"); }); fetchUSDExchangeRate .thenCombine(fetchEURExchangeRate, (usd, eur) -> { return processTransaction(usd, eur); }) .thenAccept(result -> System.out.println("Transaction Result: " + result)) .exceptionally(e -> { System.out.println("Error processing transaction: " + e.getMessage()); return null; }); In this snippet, supplyAsync initiates asynchronous tasks to fetch exchange rates. thenCombine waits for both rates before executing a transaction, ensuring that operations dependent on multiple external services can proceed smoothly. The exceptionally method provides a way to handle any errors that occur during execution, a crucial feature for maintaining robustness in financial operations. Scala and Concurrency With Akka Transitioning from Java to Scala’s actor model via Akka provides a stark contrast in handling concurrency. Akka actors, elegant yet efficient, are especially well-suited for the demands of fintech applications; they were designed to be lightweight and can be instantiated in the millions. They also bring fault tolerance through supervision strategies, ensuring the system remains responsive even when parts of it fail. Consider the previous example of fetching exchange rates and processing a transaction. Here’s how you can apply the actor model in Scala: Scala import akka.actor.Actor import akka.actor.ActorSystem import akka.actor.Props import akka.pattern.ask import akka.util.Timeout import scala.concurrent.duration._ import scala.concurrent.Future case class FetchRate(currency: String) case class RateResponse(rate: Double) case class ProcessTransaction(rate1: Double, rate2: Double) class ExchangeServiceActor extends Actor { def receive = { case FetchRate(currency) => sender() ! RateResponse(exchangeService.getRate(currency)) } } class TransactionActor extends Actor { implicit val timeout: Timeout = Timeout(5 seconds) def receive = { case ProcessTransaction(rate1, rate2) => val result = processTransaction(rate1, rate2) println(s"Transaction Result: $result") } } val system = ActorSystem("FintechSystem") val exchangeServiceActor = system.actorOf(Props[ExchangeServiceActor], "exchangeService") val transactionActor = system.actorOf(Props[TransactionActor], "transactionProcessor") implicit val timeout: Timeout = Timeout(5 seconds) import system.dispatcher // for the implicit ExecutionContext val usdRateFuture = (exchangeServiceActor ? FetchRate("USD")).mapTo[RateResponse] val eurRateFuture = (exchangeServiceActor ? FetchRate("EUR")).mapTo[RateResponse] val transactionResult = for { usdRate <- usdRateFuture eurRate <- eurRateFuture } yield transactionActor ! ProcessTransaction(usdRate.rate, eurRate.rate) Here, ExchangeServiceActor fetches currency rates asynchronously, while TransactionActor processes the transaction. The use of the ask pattern (?) allows us to send messages and receive futures in response, which we can then compose or combine as needed. This pattern elegantly handles the concurrency and asynchronicity inherent in fetching rates and processing transactions, without the direct management of threads. The actor model, by design, encapsulates state and behavior, making the codebase cleaner and easier to maintain. Fintech applications, with their demand for fault tolerance and quick scalability, are one of the major beneficiaries of Scala’s Akka framework. Code Readability and Maintainability in Fintech Java's syntax is known for its verbosity, which, applied to fintech, translates to clarity. Each line of code, while longer, is self-explanatory, making it easier for new team members to understand the business logic and the flow of the application. This characteristic is beneficial in environments where maintaining and auditing code is as crucial as writing it, given the regulatory scrutiny fintech applications often face. On the other hand, while Scala's more concise syntax reduces boilerplate and can lead to a tighter, more elegant codebase, it also introduces a significant challenge. The flexibility and variety of Scala can often result in different developers solving the same problem in multiple ways, creating what can be described as a "Babylon" within the project. This variability, while showcasing Scala's expressive power, can make it more difficult to maintain consistent coding standards and ensure code quality and understandability, especially in the highly regulated environment of fintech. This steepens the learning curve, especially for developers not familiar with functional programming paradigms. Consider a simple operation in a fintech application, such as validating a transaction against a set of rules. In Java, this might involve several explicit steps, each clearly laid out: Java public boolean validateTransaction(Transaction transaction) { if (transaction.getAmount() <= 0) { return false; } if (!knownCurrencies.contains(transaction.getCurrency())) { return false; } // Additional validation rules here return true; } The challenger, Scala, boasts a more concise syntax by virtue of its functional programming capabilities. This conciseness helps dramatically reduce the boilerplate code, making the codebase tighter and easier to maintain. Despite the challenge of maintaining a uniform standard across a team I mentioned above, the brevity of Scala code can be a significant asset, though it requires a steeper learning curve, especially for developers not familiar with functional programming paradigms. The same transaction validation in Scala might look significantly shorter, leveraging pattern matching and list comprehensions: Scala def validateTransaction(transaction: Transaction): Boolean = transaction match { case Transaction(amount, currency, _) if amount > 0 && knownCurrencies.contains(currency) => true case _ => false } JVM Interoperability and Legacy Integration A critical factor in choosing a backend technology for fintech applications is how well it integrates with existing systems. Many financial institutions rely on extensive legacy systems that are critical to their operations. Java’s and Scala’s paths to interoperability and integration within the JVM ecosystem have their unique advantages here. Java's long history and widespread use in the financial industry mean that most legacy systems in fintech are built using Java or compatible with Java. This compatibility facilitates seamless integration of new developments with existing systems. Java's stability and backward compatibility are key assets when updating or extending legacy systems, minimizing disruptions, and ensuring continuous operation. For instance, integrating a new Java-based service into an existing system can be as straightforward as: Java // Java service to be integrated with a legacy system public class NewJavaService { public String processData(String input) { // Process data return "Processed: " + input; } } This simplicity in integration is a significant advantage for Java, reducing the time and effort required to enhance or expand legacy systems with new functionalities. Scala's interoperability with Java is one of its standout features, allowing Scala to use Java libraries directly and vice versa. This interoperability means that financial institutions can adopt Scala for new projects or modules without abandoning their existing Java codebase. Scala can act as a bridge to more modern, functional programming paradigms while maintaining compatibility with the JVM ecosystem. For example, calling a Scala object from Java might look like this: Scala // Scala object object ScalaService { def processData(input: String): String = { // Process data s"Processed: $input" } } Scala // Java class calling Scala object public class JavaCaller { public static void main(String[] args) { String result = ScalaService.processData("Sample input"); System.out.println(result); } } This cross-language interoperability is particularly beneficial in fintech, where leveraging existing investments while adopting new technologies is often a strategic priority. Scala offers a path to modernize applications with functional programming concepts without a complete system overhaul. Conclusion It certainly is no revelation that the two languages have their strengths and difficulties. Java stands out for its robust ecosystem and libraries, offering a tried-and-tested path for developing fintech applications. Its traditional concurrency models and frameworks provide a solid foundation for building reliable and scalable systems. Moreover, Java's verbose syntax promotes clarity and maintainability, essential in the highly regulated fintech sector. Finally, Java's widespread adoption makes integration with existing systems and legacy code seamless Scala, on the other hand, will be your weapon of choice if you want to streamline your development process with a more expressive syntax and a robust concurrency management model. It’s particularly appealing for projects aiming for high scalability and resilience, without stepping completely away from the Java universe. This makes Scala a strategic choice for evolving your tech stack, introducing functional programming benefits while keeping the door open to Java's realm. So — no, there is no and probably never will be a definitive, final answer to this question. You will always have to balance the immediate needs of your project with long-term tech strategy. Do you build on the solid, familiar ground that Java offers, or do you step into Scala's territory, with its promise of modernized approaches and efficiency gains? In fintech, where innovation must meet reliability head-on, understanding the nuances of Java and Scala will equip you to make an informed decision that aligns with both your immediate project needs and your strategic goals for the future.

By Grigoriy Alexeev

Threaded Streams

In the landscape of software development, efficiently processing large datasets has become paramount, especially with the advent of multicore processors. The Java Stream interface provided a leap forward by enabling sequential and parallel operations on collections. However, fully exploiting modern processors' capabilities while retaining the Stream API’s simplicity posed a challenge. Responding to this, I created an open-source library aimed at experimenting with a new method of parallelizing stream operations. This library diverges from traditional batching methods by processing each stream element in its own virtual thread, offering a more refined level of parallelism. In this article, I will talk about the library and its design. It is more detail than you need simply to use the library. The library is available on GitHub and also as a dependency in Maven Central. <dependency> <groupId>com.github.verhas</groupId> <artifactId>vtstream</artifactId> <version>1.0.1</version> </dependency> Check out the actual version number on the Maven Central site or on GitHub. This article is based on the version 1.0.1 of the library. Parallel Computing Parallel computing is not a new thing. It has been around for decades. The first computers were executing tasks in batches, hence in a serial way, but soon the idea of time-sharing came into the picture. The first time-sharing computer system was installed in 1961 at the Massachusetts Institute of Technology (MIT). This system, known as the Compatible Time-Sharing System (CTSS), allowed multiple users to log into a mainframe computer simultaneously, working in what appeared to be a private session. CTSS was a groundbreaking development in computer science, laying the foundation for modern operating systems and computing environments that support multitasking and multi-user operations. This was not a parallel computing system, per se. CTSS was designed to run on a single mainframe computer, the IBM 7094, at MIT. It has one CPU, thus the code was executed serially. Today we have multicore processors and multiple processors in a single computer. I edit this article on a computer that has 10 processor cores. To execute tasks concurrently, there are two plus-one approaches: Define the algorithm in a concurrent way; for example, reactive programming. Define the algorithm the good old sequential way and let some program decide on the concurrency. Mix the two. When we’re programming some reactive algorithm or defined streams as in Java 8 stream, we help the application execute the tasks concurrently. We define small parts and their interdependence so that the environment can decide which parts can be executed concurrently. The actual execution is done by the framework and when we are using Virtual threads, or Threads (perhaps processes) The difference is in the scheduler: who makes the decision about which processor should execute which task the next moment. In the case of threads or processes, the executor is the operating system. The difference between thread and process execution is that threads belonging to the same process share the same memory space. Processes have their own memory space. Similarly, virtual threads belonging to the same operating system thread share the same stack. Transitioning from processes to virtual threads, we encounter a reduction in shared resources and, consequently, overhead. This makes virtual threads significantly less costly compared to traditional threads. While a machine might support thousands of threads and processes, it can accommodate millions of virtual threads. In defining a task with streams, you are essentially outlining a series of operations to be performed on multiple elements. The decision to execute these operations concurrently rests with the framework, which may or may not choose to do so. However, Stream in Java serves as a high-level interface, offering us the flexibility to implement a version that facilitates concurrent execution of tasks. Implementing Streams in Threads The library contains two primary classes located in the main directory, namely: ThreadedStream Command ThreadedStream is the class responsible for implementing the Stream interface. public class ThreadedStream<T> implements Stream<T> { The Command class encompasses nested classes that implement functionality for stream operations. public static class Filter<T> extends Command<T, T> { public static class AnyMatch<T> extends Command<T, T> { public static class FindFirst<T> extends Command<T, T> { public static class FindAny<T> extends Command<T, T> { public static class NoOp<T> extends Command<T, T> { public static class Distinct<T> extends Command<T, T> { public static class Skip<T> extends Command<T, T> { public static class Peek<T> extends Command<T, T> { public static class Map<T, R> extends Command<T, R> { All the mentioned operators are intermediaries. The terminal operators are implemented within the ThreadedStream class, which converts the threaded stream into a regular stream before invoking the terminal operator on this stream. An example of this approach is the implementation of the collect method. @Override public <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator, BiConsumer<R, R> combiner) { return toStream().collect(supplier, accumulator, combiner); } The source of the elements is also a stream, which means that the threading functionality is layered atop the existing stream implementation. This setup allows for the utilization of streams both as data sources and as destinations for processed data. Threading occurs in the interim, facilitating the parallel execution of intermediary commands. Therefore, the core of the implementation—and its most intriguing aspect—lies in the construction of the structure and its subsequent execution. We will first examine the structure of the stream data and then explore how the class executes operations utilizing virtual threads. Stream Data Structure The ThreadedStream class maintains its data through the following member variables: private final Command<Object, T> command; private final ThreadedStream<?> downstream; private final Stream<?> source; private long limit = -1; private boolean chained = false; command represents the Command object to be executed on the data. It might be a no-operation (NoOp) command or null if there is no specific command to execute. downstream variable points to the preceding ThreadedStream in the processing chain. A ThreadedStream retrieves data either from the immediate downstream stream, if available, or directly from the source if it’s the initial in the chain. source is the initial data stream. It remains defined even when a downstream is specified, in which scenario the source for both streams remains identical. limit specifies the maximum number of elements this stream is configured to process. Implementing a limit requires a workaround, as stream element processing starts immediately rather than being "pulled" by the terminal operation. Consequently, infinite streams cannot feed into a ThreadedStream. chained is a boolean flag indicating whether the stream is part of a processing chain. When true, it signifies that there is a subsequent stream dependent on this one’s output, preventing execution in cases of processing forks. This mechanism mirrors the approach found in JVM’s standard stream implementations. Stream Build The stream data structure is constructed dynamically as intermediary operations are chained together. The initiation of this process begins with the creation of a starting element, achieved by invoking the static method threaded on the ThreadedStream class. An exemplary line from the unit tests illustrates this initiation: final var k = ThreadedStream.threaded(Stream.of(1, 2, 3)); This line demonstrates the creation of a ThreadedStream instance named k, initialized with a source stream consisting of the elements 1, 2, and 3. The threaded method serves as the entry point for transforming a regular stream into a ThreadedStream, setting the stage for further operations that can leverage virtual threads for concurrent execution. When an intermediary operation is appended, it results in the creation of a new ThreadedStream instance. This new instance designates the preceding ThreadedStream as its downstream. Moreover, the source stream for this newly formed ThreadedStream remains identical to the source stream of its predecessor. This design ensures a seamless flow of data through the chain of operations, facilitating efficient processing in a concurrent environment. For example, when we call: final var t = k.map(x -> x * 2); The map method is called, which is: public <R> ThreadedStream<R> map(Function<? super T, ? extends R> mapper) { return new ThreadedStream<>(new Command.Map<>(mapper), this); } It generates a new ThreadedStream object wherein the preceding ThreadedStream acts as the downstream. Additionally, the command field is populated with a new instance of the Command class, configured with the specified mapper function. This process effectively constructs a linked list composed of ThreadedStream objects. This linked structure comes into play during the execution phase, triggered by invoking one of the terminal operations on the stream. This method ensures that each ThreadedStream in the sequence can process data in a manner that supports concurrent execution, leveraging the capabilities of virtual threads for efficient data processing. It’s crucial to understand that the ThreadedStream class refrains from performing any operations on the data until a terminal operation is called. Once execution commences, it proceeds concurrently. To facilitate independent execution of these operations, ThreadedStream instances are designed to be immutable. They are instantiated during the setup phase and undergo a single mutation when they are linked together. During execution, these instances serve as a read-only data structure, guiding the flow of operation execution. This immutability ensures thread safety and consistency throughout concurrent processing, allowing for efficient and reliable stream handling. Stream Execution The commencement of stream execution is triggered by invoking a terminal operation. These terminal operations are executed by first transforming the threaded stream back into a conventional stream, upon which the terminal operation is then performed. The collect method serves as a prime example of this process, as previously mentioned. This method is emblematic of how terminal operations are seamlessly integrated within the ThreadedStream framework, bridging the gap between concurrent execution facilitated by virtual threads and the conventional stream processing model of Java. By converting the ThreadedStream into a standard Stream, it leverages the rich ecosystem of terminal operations already available in Java, ensuring compatibility and extending functionality with minimal overhead. @Override public <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator, BiConsumer<R, R> combiner) { return toStream().collect(supplier, accumulator, combiner); } The toStream() method represents the core functionality of the library, marking the commencement of stream execution by initiating a new virtual thread for each element in the source stream. This method differentiates between ordered and unordered execution through two distinct implementations: toUnorderedStream() toOrderedStream() The choice between these methods is determined by the isParallel() status of the source stream. It’s worth noting that executing an ordered stream in parallel can be advantageous. Although the results may be produced out of order, parallel processing accelerates the operation. Ultimately, care must be taken to collect the results in a sequential manner, despite the unordered processing potentially yielding higher efficiency by allowing elements to be passed to the resulting stream as soon as they become available, eliminating the need to wait for the preceding elements. The implementation of toStream() is designed to minimize an unnecessary collection of elements. Elements are forwarded to the resulting stream immediately upon readiness in the case of unordered streams, and in sequence upon the readiness and previous element’s forwarding in ordered streams. In subsequent sections, we delve into the specifics of these two execution methodologies. Unordered Stream Execution Unordered execution promptly forwards results as they become prepared. This approach employs a concurrent list for result storage, facilitating simultaneous result deposition by threads and retrieval by the target stream, preventing excessive list growth. The iteration over the source stream initiates the creation of a new virtual thread for each element. When a limit is imposed, it’s applied directly to the source stream, diverging from traditional stream implementations where limit acts as a genuine intermediary operation. The implementation of the unordered stream execution is as follows: private Stream<T> toUnorderedStream() { final var result = Collections.synchronizedList(new LinkedList<Command.Result<T>>()); final AtomicInteger n = new AtomicInteger(0); final Stream<?> limitedSource = limit >= 0 ? source.limit(limit) : source; limitedSource.forEach( t -> { Thread.startVirtualThread(() -> result.add(calculate(t))); n.incrementAndGet(); }); return IntStream.range(0, n.get()) .mapToObj(i -> { while (result.isEmpty()) { Thread.yield(); } return result.removeFirst(); }) .filter(f -> !f.isDeleted()) .peek(r -> { if (r.exception() != null) { throw new ThreadExecutionException(r.exception()); } }) .map(Command.Result::result); } The counter n is utilized to tally the number of threads initiated. The resulting stream is constructed using this counter by mapping the numbers 0 to n-1 to the elements of the concurrent list as they become ready. If the list lacks elements at any point, the process pauses, awaiting the availability of the next element. This waiting mechanism is implemented within a loop that incorporates a yield call to prevent unnecessary CPU consumption by halting the loop’s execution until it’s necessary to proceed. This efficient use of resources ensures that the system remains responsive and minimizes the potential for performance degradation during the execution of parallel tasks. Ordered Stream Execution Ordered stream execution introduces a more nuanced approach compared to its unordered counterpart. It incorporates a local class named Task, designed specifically to await the readiness of a particular thread. Similar to the unordered execution, a concurrent list is utilized, but with a key distinction: the elements of this list are the tasks themselves, rather than the results. This list is populated by the code responsible for thread creation, rather than by the threads themselves. The presence of a fully populated list eliminates the need for a separate counter to track thread initiation. Consequently, the process transitions to sequentially waiting on each thread as dictated by their order in the list, thereby ensuring that each thread’s output is relayed to the target stream in a sequential manner. This method meticulously maintains the ordered integrity of the stream’s elements, despite the concurrent nature of their processing, by aligning the execution flow with the sequence of the original stream. private Stream<T> toOrderedStream() { class Task { Thread workerThread; volatile Command.Result<T> result; /** * Wait for the thread calculating the result of the task to be finished. This method is blocking. * @param task the task to wait for */ static void waitForResult(Task task) { try { task.workerThread.join(); } catch (InterruptedException e) { task.result = deleted(); } } } final var tasks = Collections.synchronizedList(new LinkedList<Task>()); final Stream<?> limitedSource = limit >= 0 ? source.limit(limit) : source; limitedSource.forEach( sourceItem -> { Task task = new Task(); tasks.add(task); task.workerThread = Thread.startVirtualThread(() -> task.result = calculate(sourceItem)); } ); return tasks.stream() .peek(Task::waitForResult) .map(f -> f.result) .peek(r -> { if (r.exception() != null) { throw new ThreadExecutionException(r.exception()); } } ) .filter(r -> !r.isDeleted()).map(Command.Result::result); } Summary and Takeaway Having explored an implementation that facilitates the parallel execution of stream operations, it’s noteworthy that this library is open source, offering you the flexibility to either utilize it as is or reference its design and implementation to craft your own version. The detailed exposition provided here aims to shed light on both the conceptual underpinnings and practical aspects of the library’s construction. However, it’s important to acknowledge that the library has not undergone extensive testing. It received a review from Istvan Kovacs, a figure with considerable expertise in concurrent programming. Despite this, his review does not serve as an absolute assurance of the library’s reliability and absence of bugs. Consequently, should you decide to integrate this library into your projects, it’s advised to proceed with caution and conduct thorough testing to ensure it meets your requirements and standards. The library is provided "as is," with the understanding that users adopt it at their own risk, underpinning the importance of due diligence in its deployment.

By Peter Verhas

CORE

Creating Value With Scrum

TL; DR: Scrum Master Interview Questions on Creating Value With Scrum If you are looking to fill a position for a Scrum Master (or agile coach) in your organization, you may find the following 12th set of the Scrum Master interview questions useful to identify the right candidate. They are derived from my eighteen years of practical experience with XP as well as Scrum, serving both as Product Owner and Scrum Master as well as interviewing dozens of Scrum Master candidates on behalf of my clients. So far, this Scrum Master interview guide has been downloaded more than 27,000 times. Scrum Master Interview Questions: How We Organized Questions and Answers Scrum has proven time and again to be the most popular framework for software development. Given that software is eating the world, a seasoned Scrum Master is even nowadays, given the frosty economic climate of Spring 2024, in high demand. And that demand causes the market entry of new professionals from other project management branches, probably believing that reading one or two Scrum books will be sufficient, which makes any Scrum Master interview a challenging task. The Scrum Master Interview Questions ebook provides both questions as well as guidance on the range of suitable answers. These should allow an interviewer to dive deep into a candidate’s understanding of Scrum and her agile mindset. However, please note: The answers reflect the personal experience of the authors and may not be valid for every organization: what works for organization A may not work in organization B. There are no suitable multiple-choice questions to identify a candidate’s agile mindset, given the complexity of applying “Agile” to any organization. The authors share a holistic view of agile practices: Agility covers the whole arch from product vision (our grand idea on how to improve mankind’s fate) to product discovery (what to build) plus product delivery (how to build it). Creating Value as a Scrum Master The following questions and responses are designed to draw out a nuanced understanding of a candidate’s experience and skills in applying agile product development principles to improve customer value and economics of delivery and enhance predictability in various organizational contexts to address the current economic climate: Question 74: Resistant Industries How have you tailored Scrum practices to elevate customer value, particularly in industries resistant to Agile practices? Background: This question probes the candidate’s ability to adapt Scrum principles to sectors where Agile is not the norm, emphasizing customer-centric product development. It seeks insights into the candidate’s innovative application of Scrum to foster customer engagement and satisfaction, even in challenging environments. It is also an opportunity for the candidate to build confidence in the interview process and rapport with the interviewers. Acceptable Answer: An excellent response would detail a scenario where the candidate navigated resistance by demonstrating Agile’s benefits through small-scale pilot projects or workshops. They would probably even describe specific adjustments to Scrum events or artifacts to align with industry-specific constraints, culminating in enhanced customer feedback loops and ultimately leading to product features that directly addressed customer pain points. Question 75: Reducing Product Costs Please describe a scenario in which you significantly reduced production costs through strategic Scrum application without compromising the product’s quality. Background: This delves into the candidate’s proficiency in supporting the optimization of a team’s capacity allocation and streamlining workflows within the Scrum framework to cut costs. It’s about balancing maintaining high-quality standards and achieving cost effectiveness through Agile practices. Acceptable Answer: Look for a narrative where the candidate identifies wasteful practices or bottlenecks in the development process and implements targeted Scrum practices to address them. Examples include refining the Product Backlog to focus on high-impact features, improving cross-functional collaboration to reduce dependencies, or leveraging automated testing to speed up lead time while preserving quality standards. The answer should highlight the candidate’s analytical problem-solving approach and ability to help the team accept a cost-conscious entrepreneurial stance to solving customer problems without sacrificing quality. Question 76: Improving Predictability in a Volatile Market Please share an experience where you used Scrum to improve predictability in product delivery in a highly volatile market. Background: This question explores the candidate’s capability to use Scrum to enhance delivery predictability amidst market fluctuations. It’s about leveraging Agile’s flexibility to adapt to changing priorities while maintaining a steady pace of delivery. Acceptable Answer: The candidate should recount an instance where they utilized Scrum artifacts and events to better forecast delivery timelines in a shifting landscape. This example might involve adjusting Sprint lengths, prioritizing Product Backlog items more dynamically, or involving closer stakeholder engagement to reassess priorities during Sprint Reviews or other alignment-creating opportunities, for example, User Story Mapping sessions. The story should underscore their strategic thinking in balancing flexibility with predictability and their communication skills in setting realistic expectations with stakeholders. Question 77: Successfully Promoting Scrum Despite Skepticism How have you promoted the value of Scrum in organizations where the leadership and middle management met Agile practices with skepticism? Background: This question examines the candidate’s ability to champion Scrum in environments resistant to change. Such an environment requires a deep understanding of Agile principles and strong advocacy and education skills. Acceptable Answer: Successful candidates will describe a multifaceted strategy that includes educating leadership on Agile benefits, organizing interactive workshops to demystify Scrum practices, and securing quick wins to demonstrate value. They might also discuss establishing a community of practice to sustain Agile learning and sharing success stories to build momentum. The answer should reflect their perseverance, persuasive communication, and their role as a change agent. (Learn more about successful stakeholder communication tactics during transformations here.) Question 78: Effective Change Please describe your approach to conducting effective Sprint Retrospectives that drive continuous improvement. Background: The question probes the candidate’s techniques for facilitating Retrospectives that genuinely contribute to team growth and product enhancement. It seeks to understand how they ensure these events are productive, inclusive, and actionable. Acceptable Answer: A comprehensive response would outline a structured approach to Retrospectives, including preparation, facilitation, follow-up practices, and valuable enhancements to the framework, for example, embracing the idea of a directly responsible individual to drive change the team considers beneficial. The candidate might mention using a variety of formats to keep the sessions engaging, techniques to ensure all team members contribute, and strategies for prioritizing action items. They should emphasize their method for tracking improvements over time to ensure accountability and demonstrate the Retrospective’s impact on the team’s performance and morale. Again, this question allows the candidates to distinguish themselves in the core competence of any Scrum Master. Question 79: Balancing Demands with Principles Please explain how you’ve balanced stakeholder demands with Agile principles to help the Scrum team prioritize work effectively. Background: This question seeks insights into the candidate’s ability to support the Scrum team in general and the Product Owner in particular in navigating competing demands, aligning stakeholder expectations with Agile principles to focus the team’s efforts on the most impactful work from the customers’ perception and the organization’s perspective. Acceptable Answer: The candidate should provide an example of supporting the Product Owner by employing prioritization techniques, such as User Story Mapping, in collaboration with stakeholders to align on priorities that offer the most value, leading to the creation of valuable Product Goals and roadmaps in the process. They should highlight their negotiation skills, ability to facilitate consensus, and adeptness at transparent communication to manage expectations and maintain a sustainable pace for the team. Question 80: Boring Projects and Motivation How do you sustain team motivation and engagement in long-term projects with high levels of task repetition? Background: This question explores the candidate’s strategies for keeping the team engaged and motivated through the monotony of prolonged projects or repetitive tasks. While we all like to work on cutting-edge technology all the time, everyday operations often comprise work that we consider less glamorous yet grudgingly accept as valuable, too. The question gauges a candidate’s ability to uphold enthusiasm and maintain high performance in a potentially less motivating environment. Acceptable Answer: Expect the candidate to discuss innovative approaches like introducing gamification elements to mundane tasks, rotating roles within the team to provide fresh challenges, and setting up regular skill-enhancement workshops. They might also mention the importance of celebrating small wins, giving recognition, for example, Kudo cards, and ensuring that the team’s work aligns with individual growth goals. The response should underline their commitment to maintaining a positive and stimulating work environment, even under challenging circumstances. Question 81: Onboarding New Team Members Please describe your experience integrating a new team member into an established Scrum team, ensuring a seamless transition and maintaining team productivity. Background: This question assesses the candidate’s approach to onboarding new team members to minimize disruption and maximize integration speed. This approach is critical for maintaining an existing team’s cohesive and productive dynamics, acknowledging that Scrum teams will regularly change composition. Acceptable Answer: Look for answers detailing a structured and inclusive onboarding plan that includes, for example: Mentorship programs A buddy system Clear documentation of team norms and expectations, such as a working agreement and a Definition of Done Team activities Gradual immersion into the Scrum team’s projects through pair programming or shadowing The candidate should highlight the importance of fostering an inclusive team culture that welcomes questions and supports new members in their learning journey, ensuring they feel valued and part of the team from day one. Question 82: Conflict Resolution How do you approach conflict resolution within a Scrum team or between the team and stakeholders to ensure continued progress and collaboration? Background: Conflicts are inevitable in any team dynamic. This question probes the candidate’s skills in navigating and resolving disagreements in a way that strengthens the team and stakeholder relationships rather than undermining them. Acceptable Answer: The candidate should describe their ability to act as a neutral mediator, actively listen to understand all perspectives, and facilitate problem-solving sessions focusing on interests rather than positions. They might also discuss creating forums for open dialogue, such as conflict-themed Retrospectives, and the importance of fostering a culture of trust and psychological safety where conflicts can be aired constructively. The response should convey their adeptness at turning conflicts into opportunities for growth and deeper understanding. However, the candidate should also make clear that not all disputes among team members may be solvable and that, once all team-based options have been exhausted, the Scrum Master needs to ask for management support to bring the conflict to a conclusion. Question 83: Scaling Scrum? Please reflect on a time when scaling Scrum across multiple teams presented significant challenges. How did you address these challenges to ensure the organization’s success with its Agile transformation? Background: Scaling Agile practices is a complex endeavor that can highlight organizational impediments and resistance. This question delves into the candidate’s experience in successfully scaling Scrum, ensuring alignment and cohesion among multiple teams, and helping everyone see the value in a transformation. Acceptable Answer: This open question allows candidates to address their familiarity with frameworks like LeSS or Nexus or share their opinion on whether SAFe is useful. Moreover, at a philosophical level, it opens the discussion of whether “Agile” is scalable at all, given that most scaling frameworks apply more processes to the issue. Also, the objecting opinion points to the need to descale the organization by empowering those closest to the problems to decide within the given constraints and governance rules. The candidate should emphasize the importance of maintaining a shared vision and goals, creating communities of practice to share knowledge and best practices, and addressing cultural barriers to change. They should also reflect on the importance of executive sponsorship, the strategic engagement of key stakeholders to champion and support the scaling effort, and the necessity of a failure culture. How To Use The Scrum Master Interview Questions Scrum has always been a hands-on business, and to be successful in this, a candidate needs to have a passion for getting her hands dirty. While the basic rules are trivial, getting a group of individuals with different backgrounds, levels of engagement, and personal agendas to form and perform as a team is a complex task. (As always, you might say, when humans and communication are involved.) Moreover, the larger the organization is, the more management levels there are, the more likely failure is lurking around the corner. The questions are not necessarily suited to turning an inexperienced interviewer into an agile expert. But in the hands of a seasoned practitioner, they can help determine what candidate has worked in the agile trenches in the past.

By Stefan Wolpers

CORE