Welcome to the Data Engineering category of DZone, where you will find all the information you need for AI/ML, big data, data, databases, and IoT. As you determine the first steps for new systems or reevaluate existing ones, you're going to require tools and resources to gather, store, and analyze data. The Zones within our Data Engineering category contain resources that will help you expertly navigate through the SDLC Analysis stage.
Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Big data comprises datasets that are massive, varied, complex, and can't be handled traditionally. Big data can include both structured and unstructured data, and it is often stored in data lakes or data warehouses. As organizations grow, big data becomes increasingly more crucial for gathering business insights and analytics. The Big Data Zone contains the resources you need for understanding data storage, data modeling, ELT, ETL, and more.
Data is at the core of software development. Think of it as information stored in anything from text documents and images to entire software programs, and these bits of information need to be processed, read, analyzed, stored, and transported throughout systems. In this Zone, you'll find resources covering the tools and strategies you need to handle data properly.
A database is a collection of structured data that is stored in a computer system, and it can be hosted on-premises or in the cloud. As databases are designed to enable easy access to data, our resources are compiled here for smooth browsing of everything you need to know from database management systems to database languages.
IoT, or the Internet of Things, is a technological field that makes it possible for users to connect devices and systems and exchange data over the internet. Through DZone's IoT resources, you'll learn about smart devices, sensors, networks, edge computing, and many other technologies — including those that are now part of the average person's daily life.
Enterprise AI
In recent years, artificial intelligence has become less of a buzzword and more of an adopted process across the enterprise. With that, there is a growing need to increase operational efficiency as customer demands arise. AI platforms have become increasingly more sophisticated, and there has become the need to establish guidelines and ownership.In DZone's 2022 Enterprise AI Trend Report, we explore MLOps, explainability, and how to select the best AI platform for your business. We also share a tutorial on how to create a machine learning service using Spring Boot, and how to deploy AI with an event-driven platform. The goal of this Trend Report is to better inform the developer audience on practical tools and design paradigms, new technologies, and the overall operational impact of AI within the business.This is a technology space that's constantly shifting and evolving. As part of our December 2022 re-launch, we've added new articles pertaining to knowledge graphs, a solutions directory for popular AI tools, and more.
Instant APIs With Copilot and API Logic Server
The AIDocumentLibraryChat project has been extended to support questions for searching relational databases. The user can input a question and then the embeddings search for relevant database tables and columns to answer the question. Then the AI/LLM gets the database schemas of the relevant tables and generates based on the found tables and columns a SQL query to answer the question with a result table. Dataset and Metadata The open-source dataset that is used has 6 tables with relations to each other. It contains data about museums and works of art. To get useful queries of the questions, the dataset has to be supplied with metadata and that metadata has to be turned in embeddings. To enable the AI/LLM to find the needed tables and columns, it needs to know their names and descriptions. For all datatables like the museum table, metadata is stored in the column_metadata and table_metadata tables. Their data can be found in the files: column_metadata.csv and table_metadata.csv. They contain a unique ID, the name, the description, etc. of the table or column. That description is used to create the embeddings the question embeddings are compared with. The quality of the description makes a big difference in the results because the embedding is more precise with a better description. Providing synonyms is one option to improve the quality. The Table Metadata contains the schema of the table to add only the relevant table schemas to the AI/LLM prompt. Embeddings To store the embeddings in Postgresql, the vector extension is used. The embeddings can be created with the OpenAI endpoint or with the ONNX library that is provided by Spring AI. Three types of embeddings are created: Tabledescription embeddings Columndescription embeddings Rowcolumn embeddings The Tabledescription embeddings have a vector based on the table description and the embedding has the tablename, the datatype = table, and the metadata id in the metadata. The Columndescription embeddings have a vector based on the column description and the embedding has the tablename, the dataname with the column name, the datatype = column, and the metadata id in the metadata. The Rowcolumn embeddings have a vector based on the content row column value. That is used for the style or subject of an artwork to be able to use the values in the question. The metadata has the datatype = row, the column name as dataname, the tablename, and the metadata id. Implement the Search The search has 3 steps: Retrieve the embeddings Create the prompt Execute query and return result Retrieve the Embeddings To read the embeddings from the Postgresql database with the vector extension, Spring AI uses the VectorStore class in the DocumentVSRepositoryBean: Java @Override public List<Document> retrieve(String query, DataType dataType) { return this.vectorStore.similaritySearch( SearchRequest.query(query).withFilterExpression( new Filter.Expression(ExpressionType.EQ, new Key(MetaData.DATATYPE), new Value(dataType.toString())))); } The VectorStore provides a similarity search for the query of the user. The query is turned in an embedding and with the FilterExpression for the datatype in the header values, the results are returned. The TableService class uses the repository in the retrieveEmbeddings method: Java private EmbeddingContainer retrieveEmbeddings(SearchDto searchDto) { var tableDocuments = this.documentVsRepository.retrieve( searchDto.getSearchString(), MetaData.DataType.TABLE, searchDto.getResultAmount()); var columnDocuments = this.documentVsRepository.retrieve( searchDto.getSearchString(), MetaData.DataType.COLUMN, searchDto.getResultAmount()); List<String> rowSearchStrs = new ArrayList<>(); if(searchDto.getSearchString().split("[ -.;,]").length > 5) { var tokens = List.of(searchDto.getSearchString() .split("[ -.;,]")); for(int i = 0;i<tokens.size();i = i+3) { rowSearchStrs.add(tokens.size() <= i + 3 ? "" : tokens.subList(i, tokens.size() >= i +6 ? i+6 : tokens.size()).stream().collect(Collectors.joining(" "))); } } var rowDocuments = rowSearchStrs.stream().filter(myStr -> !myStr.isBlank()) .flatMap(myStr -> this.documentVsRepository.retrieve(myStr, MetaData.DataType.ROW, searchDto.getResultAmount()).stream()) .toList(); return new EmbeddingContainer(tableDocuments, columnDocuments, rowDocuments); } First, documentVsRepository is used to retrieve the document with the embeddings for the tables/columns based on the search string of the user. Then, the search string is split into chunks of 6 words to search for the documents with the row embeddings. The row embeddings are just one word, and to get a low distance, the query string has to be short; otherwise, the distance grows due to all the other words in the query. Then the chunks are used to retrieve the row documents with the embeddings. Create the Prompt The prompt is created in the TableService class with the createPrompt method: Java private Prompt createPrompt(SearchDto searchDto, EmbeddingContainer documentContainer) { final Float minRowDistance = documentContainer.rowDocuments().stream() .map(myDoc -> (Float) myDoc.getMetadata().getOrDefault(MetaData.DISTANCE, 1.0f)).sorted().findFirst().orElse(1.0f); LOGGER.info("MinRowDistance: {}", minRowDistance); var sortedRowDocs = documentContainer.rowDocuments().stream() .sorted(this.compareDistance()).toList(); var tableColumnNames = this.createTableColumnNames(documentContainer); List<TableNameSchema> tableRecords = this.tableMetadataRepository .findByTableNameIn(tableColumnNames.tableNames()).stream() .map(tableMetaData -> new TableNameSchema(tableMetaData.getTableName(), tableMetaData.getTableDdl())).collect(Collectors.toList()); final AtomicReference<String> joinColumn = new AtomicReference<String>(""); final AtomicReference<String> joinTable = new AtomicReference<String>(""); final AtomicReference<String> columnValue = new AtomicReference<String>(""); sortedRowDocs.stream().filter(myDoc -> minRowDistance <= MAX_ROW_DISTANCE) .filter(myRowDoc -> tableRecords.stream().filter(myRecord -> myRecord.name().equals(myRowDoc.getMetadata() .get(MetaData.TABLE_NAME))).findFirst().isEmpty()) .findFirst().ifPresent(myRowDoc -> { joinTable.set(((String) myRowDoc.getMetadata() .get(MetaData.TABLE_NAME))); joinColumn.set(((String) myRowDoc.getMetadata() .get(MetaData.DATANAME))); tableColumnNames.columnNames().add(((String) myRowDoc.getMetadata() .get(MetaData.DATANAME))); columnValue.set(myRowDoc.getContent()); this.tableMetadataRepository.findByTableNameIn( List.of(((String) myRowDoc.getMetadata().get(MetaData.TABLE_NAME)))) .stream().map(myTableMetadata -> new TableNameSchema( myTableMetadata.getTableName(), myTableMetadata.getTableDdl())).findFirst() .ifPresent(myRecord -> tableRecords.add(myRecord)); }); var messages = createMessages(searchDto, minRowDistance, tableColumnNames, tableRecords, joinColumn, joinTable, columnValue); Prompt prompt = new Prompt(messages); return prompt; } First, the min distance of the rowDocuments is filtered out. Then a list row of documents sorted by distance is created. The method createTableColumnNames(...) creates the tableColumnNames record that contains a set of column names and a list of table names. The tableColumnNames record is created by first filtering for the 3 tables with the lowest distances. Then the columns of these tables with the lowest distances are filtered out. Then the tableRecords are created by mapping the table names to the schema DDL strings with the TableMetadataRepository. Then the sorted row documents are filtered for MAX_ROW_DISTANCE and the values joinColumn, joinTable, and columnValue are set. Then the TableMetadataRepository is used to create a TableNameSchema and add it to the tableRecords. Now the placeholders in systemPrompt and the optional columnMatch can be set: Java private final String systemPrompt = """ ... Include these columns in the query: {columns} \n Only use the following tables: {schemas};\n %s \n """; private final String columnMatch = """ Join this column: {joinColumn} of this table: {joinTable} where the column has this value: {columnValue}\n """; The method createMessages(...) gets the set of columns to replace the {columns} placeholder. It gets tableRecords to replace the {schemas} placeholder with the DDLs of the tables. If the row distance was beneath the threshold, the property columnMatch is added at the string placeholder %s. Then the placeholders {joinColumn}, {joinTable}, and {columnValue} are replaced. With the information about the required columns the schemas of the tables with the columns and the information of the optional join for row matches, the AI/LLM is able to create a sensible SQL query. Execute Query and Return Result The query is executed in the createQuery(...) method: Java public SqlRowSet searchTables(SearchDto searchDto) { EmbeddingContainer documentContainer = this.retrieveEmbeddings(searchDto); Prompt prompt = createPrompt(searchDto, documentContainer); String sqlQuery = createQuery(prompt); LOGGER.info("Sql query: {}", sqlQuery); SqlRowSet rowSet = this.jdbcTemplate.queryForRowSet(sqlQuery); return rowSet; } First, the methods to prepare the data and create the SQL query are called and then queryForRowSet(...) is used to execute the query on the database. The SqlRowSet is returned. The TableMapper class uses the map(...) method to turn the result into the TableSearchDto class: Java public TableSearchDto map(SqlRowSet rowSet, String question) { List<Map<String, String>> result = new ArrayList<>(); while (rowSet.next()) { final AtomicInteger atomicIndex = new AtomicInteger(1); Map<String, String> myRow = List.of(rowSet .getMetaData().getColumnNames()).stream() .map(myCol -> Map.entry( this.createPropertyName(myCol, rowSet, atomicIndex), Optional.ofNullable(rowSet.getObject( atomicIndex.get())) .map(myOb -> myOb.toString()).orElse(""))) .peek(x -> atomicIndex.set(atomicIndex.get() + 1)) .collect(Collectors.toMap(myEntry -> myEntry.getKey(), myEntry -> myEntry.getValue())); result.add(myRow); } return new TableSearchDto(question, result, 100); } First, the result list for the result maps is created. Then, rowSet is iterated for each row to create a map of the column names as keys and the column values as values. This enables returning a flexible amount of columns with their results. createPropertyName(...) adds the index integer to the map key to support duplicate key names. Summary Backend Spring AI supports creating prompts with a flexible amount of placeholders very well. Creating the embeddings and querying the vector table is also very well supported. Getting reasonable query results needs the metadata that has to be provided for the columns and tables. Creating good metadata is an effort that scales linearly with the amount of columns and tables. Implementing the embeddings for columns that need them is an additional effort. The result is that an AI/LLM like OpenAI or Ollama with the "sqlcoder:70b-alpha-q6_K" model can answer questions like: "Show the artwork name and the name of the museum that has the style Realism and the subject of Portraits." The AI/LLM can within boundaries answer natural language questions that have some fit with the metadata. The amount of embeddings needed is too big for a free OpenAI account and the "sqlcoder:70b-alpha-q6_K" is the smallest model with reasonable results. AI/LLM offers a new way to interact with relational databases. Before starting a project to provide a natural language interface for a database, the effort and the expected results have to be considered. The AI/LLM can help with questions of small to middle complexity and the user should have some knowledge about the database. Frontend The returned result of the backend is a list of maps with keys as column names and values column values. The amount of returned map entries is unknown, because of that the table to display the result has to support a flexible amount of columns. An example JSON result looks like this: JSON {"question":"...","resultList":[{"1_name":"Portrait of Margaret in Skating Costume","2_name":"Philadelphia Museum of Art"},{"1_name":"Portrait of Mary Adeline Williams","2_name":"Philadelphia Museum of Art"},{"1_name":"Portrait of a Little Girl","2_name":"Philadelphia Museum of Art"}],"resultAmount":100} The resultList property contains a JavaScript array of objects with property keys and values. To be able to display the column names and values in an Angular Material Table component, these properties are used: TypeScript protected columnData: Map<string, string>[] = []; protected columnNames = new Set<string>(); The method getColumnNames(...) of the table-search.component.ts is used to turn the JSON result in the properties: TypeScript private getColumnNames(tableSearch: TableSearch): Set<string> { const result = new Set<string>(); this.columnData = []; const myList = !tableSearch?.resultList ? [] : tableSearch.resultList; myList.forEach((value) => { const myMap = new Map<string, string>(); Object.entries(value).forEach((entry) => { result.add(entry[0]); myMap.set(entry[0], entry[1]); }); this.columnData.push(myMap); }); return result; } First, the result set is created and the columnData property is set to an empty array. Then, myList is created and iterated with forEach(...). For each of the objects in the resultList, a new Map is created. For each property of the object, a new entry is created with the property name as the key and the property value as the value. The entry is set on the columnData map and the property name is added to the result set. The completed map is pushed on the columnData array and the result is returned and set to the columnNames property. Then a set of column names is available in the columnNames set and a map with column name to column value is available in the columnData. The template table-search.component.html contains the material table: HTML @if(searchResult && searchResult.resultList?.length) { <table mat-table [dataSource]="columnData"> <ng-container *ngFor="let disCol of columnNames" matColumnDef="{{ disCol }"> <th mat-header-cell *matHeaderCellDef>{{ disCol }</th> <td mat-cell *matCellDef="let element">{{ element.get(disCol) }</td> </ng-container> <tr mat-header-row *matHeaderRowDef="columnNames"></tr> <tr mat-row *matRowDef="let row; columns: columnNames"></tr> </table> } First, the searchResult is checked for existence and objects in the resultList. Then, the table is created with the datasource of the columnData map. The table header row is set with <tr mat-header-row *matHeaderRowDef="columnNames"></tr> to contain the columnNames. The table rows and columns are defined with <tr mat-row *matRowDef="let row; columns: columnNames"></tr>. The cells are created by iterating the columnNames like this: <ng-container *ngFor="let disCol of columnNames" matColumnDef="{{ disCol }">. The header cells are created like this: <th mat-header-cell *matHeaderCellDef>{{ disCol }</th>. The table cells are created like this: <td mat-cell *matCellDef="let element">{{ element.get(disCol) }</td>. element is the map of the columnData array element and the map value is retrieved with element.get(disCol). Summary Frontend The new Angular syntax makes the templates more readable. The Angular Material table component is more flexible than expected and supports unknown numbers of columns very well. Conclusion To question a database with the help of an AI/LLM takes some effort for the metadata and a rough idea of the users what the database contains. AI/LLMs are not a natural fit for query creation because SQL queries require correctness. A pretty large model was needed to get the required query correctness, and GPU acceleration is required for productive use. A well-designed UI where the user can drag and drop the columns of the tables in the result table might be a good alternative for the requirements. Angular Material Components support drag and drop very well. Before starting such a project the customer should make an informed decision on what alternative fits the requirements best.
Mission-critical applications require high availability. The goal of high availability is to provide users with consistent access to services or resources, minimizing the chances of interruption. Automatic failover is a specific mechanism used to achieve high availability. It involves automatically detecting the failure of a system component (like a server, network, or database) and immediately switching operations to a standby component without human intervention. This increases resiliency. MariaDB MaxScale is a database proxy that includes features for high availability. In this article, I’ll show you how you can try it out with an online store simulator application implemented in Java and Svelte. Architecture The following diagram shows the architecture of the demo application: A web application developed with JavaScript and the Svelte framework makes HTTP requests to a Java backend. The backend answers with server-sent events that the frontend uses to update the user interface on the browser. The backend is implemented with Spring Boot and connects to a MariaDB database cluster using R2DBC (reactive). The backend logic is, in short, a simulation of reads and writes to an online store database. The simulation is parameterized, and the user can adjust: Product visits per minute: How many reads to the database per minute. Orders per minute: How many writes to the database per minute. Products per order: Write amplification. Timeout in milliseconds: How many seconds until a request to the database is considered failed. The database cluster is front-ended by a database proxy called MaxScale. This proxy makes the cluster look like a single logical database to the Java backend. MaxScale also performs read/write splitting (sending writes to the primary MariaDB server and reads to replicas), as well as load-balancing of reads among replica servers using a configurable algorithm. Data is automatically replicated from the primary to the replica database servers. Building the Docker Images From Source I have prepared custom Docker images for every component in the simulator. You can either build the images from the source (optional) or use the already built and published images from Docker Hub. If you decide to build the images yourself, you can find the source code on GitHub: MariaDB deployments: Custom images for easy deployment of replicated MariaDB topologies with MaxScale. DO NOT USE THESE IN PRODUCTION! These images are suitable only for demo applications. Use the official MariaDB Docker images for production deployments. Backend application: The backend application that connects to the database cluster. Frontend application: The frontend application that makes simulation configuration requests to the backend and receives events to show the simulation result. Each repository has Dockerfiles that you can use to build your own Docker images. For example, to build the backend application image, run: Shell docker build --tag alejandrodu/online-store-simulator-java-backend . Running the Simulation All the services can be started using the following Docker Compose file (docker-compose.yml): YAML version: "3.9" services: server-1: container_name: server-1 image: alejandrodu/mariadb ports: - "3306:3306" environment: - MARIADB_CREATE_DATABASE=demo - MARIADB_CREATE_USER=user:Password123! - MARIADB_CREATE_REPLICATION_USER=replication_user:ReplicationPassword123! - MARIADB_CREATE_MAXSCALE_USER=maxscale_user:MaxScalePassword123! server-2: container_name: server-2 image: alejandrodu/mariadb ports: - "3307:3306" environment: - MARIADB_REPLICATE_FROM=replication_user:ReplicationPassword123!@server-1:3306 server-3: container_name: server-3 image: alejandrodu/mariadb ports: - "3308:3306" environment: - MARIADB_REPLICATE_FROM=replication_user:ReplicationPassword123!@server-1:3306 maxscale: container_name: maxscale image: alejandrodu/mariadb-maxscale command: --admin_host 0.0.0.0 --admin_secure_gui false ports: - "4000:4000" - "8989:8989" - "27017:27017" environment: - MAXSCALE_USER=maxscale_user:MaxScalePassword123! - MARIADB_HOST_1=server-1 3306 - MARIADB_HOST_2=server-2 3306 - MARIADB_HOST_3=server-3 3306 healthcheck: test: ["CMD", "maxctrl", "list", "servers"] interval: 5s timeout: 10s retries: 5 java-backend: container_name: java-backend image: alejandrodu/online-store-simulator-java-backend ports: - "8080:8080" environment: - spring.r2dbc.url=r2dbc:mariadb://maxscale:4000/demo - spring.r2dbc.username=user - spring.r2dbc.password=Password123! - spring.liquibase.url=jdbc:mariadb://maxscale:4000/demo - spring.liquibase.user=user - spring.liquibase.password=Password123! depends_on: maxscale: condition: service_healthy svelte-frontend: container_name: svelte-fronted image: alejandrodu/online-store-simulator-svelte-frontend ports: - "5173:80" environment: - BACKEND_URL=http://java-backend:8080 Move to the directory in which the Docker Compose file is, and start the services in detached mode as follows: Shell docker compose up -d Configuring MaxScale Before you start the simulation, configure MaxScale for transaction replay. Also, adjust timeouts to make the simulation more interesting. Navigate to http://localhost:8989/ and log into the UI using: Username:admin Password:mariadb You’ll see a dashboard with the MariaDB cluster state. There’s a primary server (server-1), and two replicas (server-2 and server-3). Replication is already configured from server-1 (primary) to server-2 and server-3 (replicas). All servers should be up and running. Click on mdb_monitor and then on the pencil icon to enable parameter editing. Set the following parameters: auto_failover (true): This enables automatic failover. When a MariaDB server is down, MaxScale selects a replica server and reconfigures it as the new primary so that writes can continue to happen. auto_rejoin (true): This enables automatic rejoin of recovered servers. When a failed server is up again, MaxScale detects it and configures it as an available replica server. failcount (1): Sets the number of monitor (a component in MaxScale that checks server status) iterations required for a server to be down in order to activate the failover process. We set a value of 1 to make sure the failover starts immediately after failure. backend_connect_timeout (1000): Connection timeout for monitor connections. We set a low value (one second) to quickly activate failover for this demo. backend_read_timeout (1000): Read timeout for monitor connections. backend_write_timeout (1000): Write timeout for monitor connections. master_failure_timeout (1000): Primary failure timeout. monitor_interval (1000): How often the servers are monitored. WARNING: These values are appropriate for this demo but very likely not the best for production environments! Once the parameters are set, click on Done Editing and Confirm. You also need to enable transaction replay which automatically re-execute failed in-flight transactions on servers that went down just after a SQL statement was routed. This is a useful feature for software developers since it prevents the need for coding failure cases and transaction retry. On the main menu, click on Dashboard and then on any of the query_router_service links in the list of servers. Edit the parameters as follows: transaction_replay (true): Activates automatic retry of failed transactions. transaction_replay_retry_on_deadlock (true): Same as the previous when a deadlock occurs. transaction_replay_retry_on_mismatch (true): Same as the previous when a checksum mismatch occurs. Once the parameters are set, click on Done Editing and Confirm. Starting the Simulation With everything configured, you can start the simulation. Navigate to http://localhost:5173/ and configure the following parameters (names are, I hope, self-explanatory): Product visits per minute:6000 Orders per minute:60 Timeout in milliseconds:8000 But before you start the simulation, you need to create the products for the online store. Click on Data | Create products…. Leave the default values and click on Create. You should see the UI updating as products are created in the database. Now you can finally click on Start and see the simulation in action. Simulating a Server Failure At this point, the primary server is handling writes (orders). What happens if you stop that server? In the command line run: Shell docker stop server-1 Depending on multiple factors you might get some “disappointed visitors” or even a few “missed opportunities” in the simulator. Or maybe you don’t get any at all! Product visits (reads) and orders (writes) continue to happen thanks to MaxScale. Without automatic failover, you have to reconfigure everything manually which ends up more offline time and in many disappointed visitors and missed opportunities! Start the failed server: Shell docker start server-1 Go to the MaxScale Dashboard (http://localhost:8989/) and check that server-1 is now a functioning replica. You can perform a manual switchover to make server-1 the primary server again. Click on mdb_monitor and then mouse hover over the MASTER section. Click on the pencil icon and select server-1. Click Swap and check again in the Dashboard that the new primary server is server-1. Conclusion Automatic failover is only one of the components in highly available systems. You can use a database proxy like MaxScale to set up automatic failover, but also other components such as load-balancing, query routing, transaction retry, topology isolation, and more. Check out the documentation here.
The Data Story At the core of every software application, from the simplest to the most complex, operating at scale to serve millions of users with low-latency requests, lies a foundational element: data. For over three decades, relational database management systems (RDBMS) have been at the forefront of this domain. These systems, from simply storing data in a table format consisting of rows for records and columns for attributes, have undergone significant advancements and innovations that have revolutionized structured data and semi-unstructured storage. Relational database models have established themselves as the foundation of structured data handling, are renowned for their reliability, and battle-tested their efficacy in supporting massive big data scales for enterprise applications. However, as we evolve deeper into the era of big data and artificial intelligence (AI), the limitations of traditional RDBMS in handling unstructured data, such as images, videos, audio, and natural language have become increasingly apparent. Enter the vector database, a cutting-edge innovation tailored for the age of AI and significantly change the recommendation systems. Unlike RDBMS, which excels in managing structured data, vector databases are designed to handle and query high-dimensional vector embeddings, a form of unstructured data representation that is central to modern machine learning algorithms. Introduction: Vector DB Vector embeddings allow complex data like text, images, and sounds to be transformed into numerical vectors, capturing the essence of the data in a way that machines can process. This transformation is crucial for tasks such as similarity search, recommendation systems, and natural language processing, where understanding the nuanced relationships between data points is key. Vector databases leverage specialized indexing and search algorithms to efficiently query these embeddings, enabling applications that were previously challenging or impossible with traditional RDBMS. Fundamental Difference of RDBMS and Vectors The application interacts with the database by executing various transactions and actions, which are stored in the form of rows and columns. When it comes to the vector database, the action might look a bit different. Below, you can see the different types of files, which will be read and processed by many types of AI models and create vector embeddings. Example in Action Consider the process of transforming a comprehensive movie database, such as IMDB, into a format where each movie is represented by vector embeddings and stored in a vector database. This transformation allows the database to leverage the power of vector embeddings to significantly enhance the user search experience. Because these vectors are organized within a three-dimensional space, search engineers can more efficiently perform queries across the movie database. This spatial organization not only streamlines the retrieval process but also enables the implementation of sophisticated search functionalities, such as finding movies with similar themes or genres, thereby creating a more intuitive and responsive search experience for users. Now, we will demonstrate in Python how to convert textual movie data, similar to the tables mentioned above, into vector representations using BERT (Bidirectional Encoder Representations from Transformers), a pre-trained deep learning model developed by Google. This process entails several crucial steps for transforming the text into a format that the model can process, followed by the extraction of meaningful embeddings. Let's break down each step. Step 1 Python #Import Libraries import sqlite3 from transformers import BertTokenizer, BertModel import torch sqlite3: This imports the SQLite3 library, which allows Python to interact with SQLite databases. It's used here to access a database containing IMDB movie information. from transformers import BertTokenizer, BertModel: These imports from the Hugging Face transformers library bring in the necessary tools to tokenize text data (BertTokenizer) and to load the pre-trained BERT model (BertModel) for generating vector embeddings. import torch: This imports PyTorch, a deep learning framework that BERT and many other models in the transformers library are built on. It's used for managing tensors, which are multi-dimensional arrays that serve as the basic building blocks of data for neural networks. Step 2 Python #Initialize Tokenizer and Model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased') tokenizer: This initializes the BERT tokenizer, configuring it to split input text into tokens that the BERT model can understand. The from_pretrained('bert-base-uncased') method loads a tokenizer trained in lowercase English text. model: This initializes the BERT model itself, also using the from_pretrained method to load a version trained in lowercase English. This model is what will generate the embeddings from the tokenized text. Step 3 Python # Connect to Database and Fetch Movie Data conn = sqlite3.connect('path/to/your/movie_database.db') cursor = conn.cursor() cursor.execute("SELECT name, genre, release_date, length FROM movies") movies = cursor.fetchall() conn = sqlite3.connect('path/to/your/movie_database.db'): Opens a connection to an SQLite database file that contains your movie data cursor = conn.cursor(): Creates a cursor object which is used to execute SQL commands through the connection cursor.execute(...): Executes an SQL command to select specific columns (name, genre, release date, length) from the movies table movies = cursor.fetchall(): Retrieves all the rows returned by the SQL query and stores them in the variable movies Step 4 Python #Convert Movie Data to Vector Embeddings movie_vectors = [] for movie in movies: movie_data = ', '.join(str(field) for field in movie) inputs = tokenizer(movie_data, return_tensors="pt", padding=True, truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) movie_vector = outputs.last_hidden_state[:, 0, :].numpy() movie_vectors.append(movie_vector) movie_vectors = []: Initializes an empty list to store the vector embeddings for each movie For loop: Iterates over each movie retrieved from the database movie_data = ', '.join(...): Concatenates the movie's details into a single string inputs = tokenizer(...): Uses the BERT tokenizer to prepare the concatenated string for the model, converting it into a tensor with torch.no_grad():: Temporarily disables gradient computation, which is unnecessary during inference (model.predict) outputs = model(**inputs): Feeds the tokenized input to the BERT model to get the embeddings movie_vector = ...: Extracts the embedding of the [CLS] token, which represents the entire input sequence movie_vectors.append(movie_vector): Adds the movie's vector embedding to the list Output movie_vectors: At the end of this script, you have a list of vector embeddings, one for each movie in your database. These vectors encapsulate the semantic information of the movies' names, genres, release dates, and durations in a form that machine learning models can work with. Conclusion In our example of vector database, movies such as "Inception" and "The Matrix" known for their action-packed, thought-provoking narratives, or "La La Land" and "Eternal Sunshine of the Spotless Mind," which explore complex romantic themes are transformed into high-dimensional vectors using BERT, a deep learning model. These vectors capture not just the overt categories like genre or release year, but also subtler thematic and emotional nuances encoded in their descriptions. Once stored in a vector database, these embeddings can be queried efficiently to perform similarity searches. When a user searches for a film with a particular vibe or thematic element, the streaming service can quickly identify and suggest films that are "near" the user's interests in the vector space, even if the user's search terms don't directly match the movie's title, genre, or other metadata. For instance, a search for "dream manipulation movies" might not only return "Inception" but also suggest "The Matrix," given their thematic similarities represented in the vector space. This method of storage and retrieval significantly enriches the user experience on streaming platforms, facilitating a discovery process that aligns content with both the user's interests and current mood. It’s designed to lead to "aha moments," where users uncover hidden gems, especially valuable when navigating the vast catalogs and offerings of streaming services. By detailing the creation and application of vector embeddings from textual movie data, we demonstrate the significant use of machine learning and vector databases in revolutionizing search capabilities and elevating the user experience in digital content ecosystems, particularly within streaming video services.
Multi-Touch Attribution (MTA) is an advanced approach in digital marketing analytics that assigns credit to each touchpoint a consumer interacts with during their journey towards a conversion. Unlike traditional models that attribute conversion success to a single touchpoint, MTA recognizes the complexity of consumer behavior by analyzing how different channels and interactions contribute to the final outcome. This method is increasingly crucial in a multi-channel marketing landscape as it provides more accurate insights into the effectiveness of various marketing strategies and campaigns. In the technical realm, MTA employs algorithms and statistical methods to distribute credit for conversion across multiple customer interactions, ranging from first exposure to the final conversion action. In this article, we are going to explore some of the traditional and advanced multi-touch attribution models and the algorithms behind them. Figure 1: Attribution model assigns weights to each channel Traditional Models Assume we have a series of n touchpoints leading to a conversion, and let C represent the total conversion value. The contribution value assigned to each touchpoint i will be denoted as Vi. Below are the various traditional attribution models Last-click attribution assigns full credit for a conversion to the final touchpoint before a purchase. While straightforward, its major flaw is the disregard for all preceding customer interactions, potentially undervaluing the importance of early engagement and awareness initiatives in the marketing funnel. First-click attribution credits the initial interaction in the customer's journey with the entire conversion. This approach overlooks the contribution of subsequent touchpoints, often resulting in a skewed understanding of mid-funnel and closing strategies' effectiveness. Linear attribution evenly distributes credit across all touchpoints. However, this model's critical limitation is its failure to acknowledge the varying influence of different interactions, potentially oversimplifying the impact of each marketing effort. Credit is evenly distributed across all touchpoints. Position-based (U-shaped) attribution emphasizes the first and last interactions (usually 40% credit each), with the rest spread across other touchpoints. This model might not accurately capture the significance of mid-funnel activities and can oversimplify complex customer journeys. Most credit goes to the first and last touchpoints, with the rest distributed evenly among the middle touchpoints. W-Shaped Attribution: An extension of the U-shaped model, it also gives additional weight to the mid-funnel touchpoint (typically a lead conversion), along with the first and last touchpoints. These traditional attribution models, while providing basic frameworks for understanding marketing impact, often fall short of accurately reflecting the intricate, multi-faceted nature of modern consumer journeys. They tend to either oversimplify the process or bias certain touchpoints, leading to potentially skewed marketing insights and decisions. As the digital landscape evolves, more sophisticated and nuanced approaches like Multi-Touch Attribution are gaining prominence to address these limitations. Advanced Models Time Decay Attribution Model The Time Decay Attribution Model is a popular method used in marketing analytics to attribute credit for conversions based on the timing of customer touchpoints. This model operates on the principle that touchpoints closer in time to the conversion are more influential than earlier ones. Concept The Time Decay model assigns more credit to marketing interactions that occur closer to the time of conversion. It's based on the rationale that these later interactions are likely more impactful in influencing the customer's final decision. It is particularly useful in long sales cycles where multiple touchpoints occur over an extended period, allowing marketers to weigh recent interactions more heavily. Approach Touchpoint identification: All touchpoints along the customer journey, from the first interaction to the conversion, are identified. Time-based weighting: Each touchpoint is assigned a weight that increases as it gets closer to the conversion event. The weighting typically follows an exponential or logarithmic function, where the increase in credit allocation accelerates as the touchpoint gets closer to the moment of conversion. Credit allocation: The model calculates the attribution by distributing the total conversion value among the touchpoints, based on their assigned weights. A common approach to represent the Time Decay model is through an exponential decay function. If t represents the time of a touchpoint and T is the time of conversion, the weight W assigned to a touchpoint can be expressed as: Where: e is the base of the natural logarithm. λ is a decay rate constant that determines how rapidly the weight of a touchpoint decreases over time. A higher λ means a faster decay. Markov Chain Attribution Models The Markov Chain Model in MTA is a sophisticated method used to evaluate the effectiveness of different marketing touchpoints in a customer's journey. In MTA, the Markov Chain Model treats the customer journey as a sequence of states, corresponding to various touchpoints. The key property of a Markov Chain is that the probability of moving to the next state depends only on the current state, not on the previous states. Each touchpoint in a customer's journey is a state in the Markov Chain. The model analyzes transitions between these states to understand how customers move through the sales funnel and how each touchpoint influences their journey toward conversion. Background Markov Chains were developed by Andrey Markov in the early 20th century. Their adoption in marketing attribution is a relatively recent innovation, leveraging their capacity to model complex, non-linear customer journeys. The use of Markov Chains in marketing attribution became prominent with the rise of multi-channel digital marketing strategies. In an environment where customers interact with multiple marketing touchpoints across different channels before converting, traditional attribution models like last-click or first-click became insufficient. Markov Chain models offered a more dynamic and holistic view. Algorithmic Approach Defining states: Each unique touchpoint, along with the start, conversion, and non-conversion, is defined as a state. Transition probability matrix: Construct a matrix that represents the probabilities of transitioning from one state (touchpoint) to another, based on historical data. Building the chain: Use the transition probabilities to model the customer journey as a Markov Chain. Calculating conversion probabilities: Compute the likelihood of reaching the conversion state from each touchpoint. Assessing touchpoint influence: Analyze the impact of removing individual touchpoints on the overall conversion probability, indicating their contribution to the journey. The table below shows the customer journey of four customers and its conversion factor. Customer Channel Conversion A Email ->House Ads No B Search Ads -> House Ads Yes C House Ads No D Search Ads -> Social Yes The customer journey of the table above can be visualized as a Directed Acyclic Graph (DAG) with probability for each transition as below. Figure 2: Customer Journey DAG Removal Effect An important aspect of Markov chain attribution is how the removal of a given touchpoint from the graph affects the likelihood of conversion. Let’s remove the Email node from the graph above to understand this behavior. Figure 3: DAG with Email node removed By removing the email, the conversion probability was reduced to 41.67% from 50%. Now the removal effect of the channel can be calculated using the formula below: Based on the above formula, the conversion probability of Email can be calculated as: Similarly, the removal effect of other channels can be calculated using the above formula and the share of each channel can be calculated as follows: Channel Conversion Probability Removal Effect Share House Ads 25% 50% 0.273 Search Ads 16.67% 66.67% 0.364 Social 25% 50% 0.273 Email 41.67% 16.67% 0.090 Markov Chain Limitations Markov Chains assume that the next state (or touchpoint) only depends on the current state and not on the history of states. This assumption might not accurately represent marketing scenarios where the effect of a touchpoint could depend on previous interactions. The model often overlooks the influence of one channel on the effectiveness of another, potentially underestimating the synergistic or suppressive effects between different marketing channels. Markov Chain models require comprehensive and granular data on customer interactions across all channels and touchpoints. Shapley Value Attribution Model The Shapley Value Model, originating from cooperative game theory, offers a unique and equitable approach to MTA in marketing. It allocates credit to each touchpoint in a way that fairly represents its contribution to the overall success of a marketing campaign. Its goal is a fair attribution method that considers all possible combinations of touchpoints, ensuring each one gets credit proportional to its impact. Concept and Background Developed by Lloyd Shapley in 1953, the Shapley Value is a solution concept in cooperative game theory. It's designed to fairly distribute the payoff among players who cooperate and contribute differently to the coalition. In the scenario of a cooperative game where multiple players join forces to create coalitions, thereby increasing the chances of a successful outcome (or payoff), the Shapley value offers a method for equitably distributing the payoff among the participants. At its core, the Shapley value calculates the average contribution of each player to the coalitions they participate in. This calculation takes into account the variability in the influence (or worth) each player brings and the order in which they join the coalitions, considering that every sequence of joining has an equal chance of occurring. Therefore, players are compensated based on their contribution across all possible permutations. When applied to marketing analytics, the players in this scenario are the various campaign channels, and the coalitions represent the different ways these channels interact and engage with accounts throughout the customer's journey. Utilizing cooperative game theory and the Shapley value, we can achieve a stable and fair measure of each channel’s influence, allocating credit for sales conversions among them proportionally to their individual contributions to the overall outcome. Algorithmic Approach Enumerate all possible coalitions: List all possible combinations (subsets) of touchpoints that might lead to a conversion. Calculate the payoff of each coalition: Determine the value (e.g., conversion rate) that each subset of touchpoints achieves. Distribute value among touchpoints: For each touchpoint, calculate its contribution across all possible coalitions it's part of, based on the difference it makes to the coalition’s value. The Shapley Value for a touchpoint is calculated using the formula: Where: ϕi(v) is the Shapley Value for touchpoint i. N is the set of all touchpoints. S is a subset of touchpoints excluding i. v(S) is the payoff (value) of the subset S. The sum is taken over all subsets S of N that don't include i. Let’s take the example of touchpoints involved in conversion: N = { Search Ads, Social, Email } Following is the ratio of each Coalition resulted in conversion: Coalition Channels Ratio S1 Email 0.04 S2 Search Ads 0.12 S3 Social 0.08 S4 Email + Search Ads 0.17 S5 Social + Search Ads 0.22 S6 Email + Social 0.11 S7 Search + Social + House Ads 0.26 The payoff or worth of each coalition is determined by the characteristic function. In this example, the worth is represented as the sum of the conversion ratio of each channel in a coalition. To find the payoff value of Coalition S5, use: So the payoff of each coalition can be calculated as shown below: Function Channels Calculation Payoff v(S1) Email S1 0.04 v(S2) Search Ads S2 0.12 v(S3) Social S3 0.08 v(S4) Email + Search Ads S1 + S2 + S4 0.33 v(S5) Social + Search Ads S2 + S3 + S5 0.42 v(S6) Email + Social S1 + S3 + S6 0.23 v(S7) Search + Social + House Ads S1+ S2 + S3 + S4 + S5 + S6 1.0 Understanding the value contributed by each coalition allows for the calculation of Shapley values. These are determined by averaging the incremental contribution (marginal contribution) of each channel across all possible sequences of coalition formation. Essentially, the Shapley value method offers a systematic approach to apportion the total value generated by the grand coalition (the collective payoff) among the three channels. This approach ensures a fair distribution based on the unique contribution each channel makes to the overall outcome. Indeed, the motivation behind the formulation of Shapley Values lies in accounting for the specific timing at which each channel or touchpoint joins a coalition. This timing is crucial because it affects the player's marginal contribution to the overall outcome. In essence, the Shapley Value method is about calculating each channel's incremental contribution, averaged across all potential sequences in which the channel or touchpoint could join the group. If the channel or touchpoint comes first, its individual payoff is considered a marginal contribution, if it comes later in the order, its subset of coalition including the prior touch points in the sequence minus the one without the current channel or touch point would be considered as its marginal contribution for the coalition. The Shapley value is the average expected marginal contribution of one channel or touchpoint after all possible combinations have been considered. In the scenario you described, this involves simulating every possible order in which the touchpoints (Email, Social, and Search Ads) could engage with the customer. For each of these sequences, you would assess the additional value (marginal payoff) brought by each touchpoint when it's added to the sequence. Then, by averaging these incremental values across all sequences, you obtain the Shapley Value for each touchpoint. This method ensures a fair and comprehensive evaluation of each touchpoint’s contribution by considering every possible way they could interact in the customer's journey, thereby reflecting their true value in the grand scheme of the marketing strategy. Let’s consider the grand coalition S7 and find the Shapley value to distribute the payoff to each channel based on the arrival order of each channel. Arrival Order Email Marginal Contribution Social Marginal Contribution Search Ads Marginal Contribution Email + Social + Search v(S1) = 0.04 v(S6) – v(S1) = 0.19 v(S7 ) – v(S6) = 0.77 Email + Search + Social v(S1) = 0.04 v(S7 ) – v(S4) = 0.67 v(S4) – v(S1) = 0.29 Social + Email + Search v(S6) – v(S3) = 0.15 v(S3) = 0.08 v(S7 ) – v(S6) = 0.77 Social + Search + Email V(S7) – v(S5) = 0.58 v(S3) = 0.08 v(S5) – v(S2) = 0.30 Search + Email + Social v(S4) – v(S2) = 0.11 v(S7 ) – v(S4) = 0.67 v(S2) = 0.12 Search + Social + Email v(S7) – v(S5) = 0.58 v(S5) – v(S2) = 0.30 v(S2) = 0.12 Shapley Value orAverage MarginalContribution 0.25 0.332 0.395 Shapley Value Limitations Calculating the Shapley value can be computationally expensive, especially with a large number of players (or marketing channels). The model requires the evaluation of every possible combination of players, which grows exponentially with the number of players. When direct information about specific coalitions is missing, you can use available data to estimate their values. This can be done through statistical modeling, machine learning techniques, or even simpler heuristic methods. Bayesian Probability Models The Bayesian Attribution Model is an advanced approach within the realm of MTA that leverages Bayesian statistics to infer the impact of various marketing touchpoints on consumer behavior and conversion rates. This model is particularly notable for its ability to handle uncertainty and integrate prior knowledge into its analytical framework. Concept and Functionality Bayesian Attribution is rooted in Bayesian probability, which updates the probability estimate for a hypothesis as more evidence or information becomes available. This approach is particularly useful in situations where data is incomplete or uncertain. In the context of MTA, the Bayesian model assesses the probability of conversion given the exposure to different marketing touchpoints. It updates these probabilities as new data becomes available, making it a dynamic and continuously evolving model. Algorithmic Approach Defining prior probabilities: Start with initial assumptions or "priors" about the effectiveness of different touchpoints. These priors can be based on historical data or expert opinion. In case of no prior data, uniform probability distribution or other statistical methods can be used. Let's assume we have prior beliefs (based on historical data or expert opinions) about the effectiveness of each channel Collecting data: Gather data on customer interactions with various touchpoints along their journey toward a conversion. Updating probabilities: As new data comes in, the model updates the probabilities using Bayes' Theorem. This theorem combines prior probabilities with new evidence to produce updated (posterior) probabilities. Continuous learning: The model keeps updating its understanding of touchpoint effectiveness as more interaction data is collected, refining its insights over time. Bayesian Attribution uses Bayes' Theorem, which in its basic form is: Where: P(A∣B) is the posterior probability (e.g., the probability of a conversion given exposure to a specific touchpoint). P(B∣A) is the likelihood (e.g., the likelihood of observing the data given the touchpoint's effectiveness). P(A) is the prior probability (initial assumption about the touchpoint's effectiveness). P(B) is the marginal probability of the data. Let's explore Bayesian Multi-Touch Attribution in the customer journey example involving Display Ads, Search Ads, Social Media, and Email Ads. Let's assume we have prior beliefs (based on historical data or expert opinions) about the effectiveness of each channel. Channel Prior Probabilities Display Ads (A) 0.35 Search Ads (B) 0.30 Social (C) 0.25 Email (D) 0.10 Calculate the likelihood based on the data points. For example, if 70% of conversions involve Search Ads in the journey, then the likelihood of Search Ads is 0.7. Channel Likelihood Display Ads (A) 0.75 Search Ads (B) 0.70 Social (C) 0.50 Email (D) 0.35 Marginal Probability = Sum(Likelihood of channel i X Prior Probability of Channel i); for every channel Marginal Probability based on above data = (0.75*0.35) + (0.7*0.3) + (0.5*0.25) + (0.35*0.1) = 0.633 Using the Bayes Theorem the posterior probability can be calculated: Channel Posterior Probability Display Ads (A) (0.75*035/0.633) = 0.42 Search Ads (B) (0.7*0.3)/0.633 = 0.33 Social (C) (0.5*0.25)/0.633 = 0.20 Email (D) (0.35*0.10)/0.633 = 0.05 Limitations The model's accuracy is partly dependent on the prior probabilities assigned to the effectiveness of different channels. These priors can be subjective and might skew the model if not accurately set. Machine Learning Attribution Models Machine learning models include regression models, decision trees, random forests, and neural networks. These models analyze complex interactions among touchpoints and can handle various types of data, including unstructured data. Capable of handling large datasets and finding non-linear relationships. They adapt as new data becomes available, providing continuously refined insights. Machine Learning (ML) algorithms have significantly advanced the field of MTA by introducing sophisticated methods to analyze complex customer journeys. These algorithms can decipher intricate patterns in large datasets, enabling marketers to understand and attribute the impact of various touchpoints more accurately. Concept and Application ML algorithms in MTA use data-driven approaches to model and predict the impact of each marketing touchpoint on the customer's path to conversion. They go beyond traditional rule-based attribution models by learning from data to identify how different touchpoints contribute to conversions. ML algorithms are employed to analyze the customer journey across multiple channels and touchpoints. They can handle vast and varied datasets, accounting for non-linear relationships and interactions among touchpoints. Types of Machine Learning Algorithms Supervised learning: Concept: Supervised learning involves training a model on a labeled dataset, where the input (features) and the desired output (labels) are known. The model learns to map inputs to outputs. Common Algorithms: Regression models, decision trees, random forests, support vector machines, and neural networks. 2. Unsupervised learning: Concept: Unsupervised learning finds patterns or structures in a dataset without pre-existing labels. The algorithms discover inherent groupings or associations in the data. Examples: Clustering algorithms like K-means, hierarchical clustering, and principal component analysis (PCA). Conclusion Multi-Touch Attribution (MTA) has emerged as a crucial tool in modern marketing analytics, offering a sophisticated way to understand and quantify the impact of various touchpoints in a customer's journey. By moving beyond the limitations of traditional single-touch attribution models, MTA provides a more nuanced and comprehensive view of the effectiveness of different marketing channels and strategies. Its ability to distribute credit for conversions more accurately across multiple interactions helps marketers optimize their campaigns, allocate budgets efficiently, and tailor customer experiences more effectively. However, the complexity and data-intensive nature of MTA models, along with the need for advanced analytical skills, mean that their implementation can be challenging. Despite these challenges, the insights gained from MTA are invaluable for businesses looking to navigate the complex, multi-channel landscape of modern digital marketing. As technology advances and data becomes more accessible, MTA is likely to become even more integral to effective marketing strategy development and evaluation. References Kakalejčík, L., Bucko, J., Resende, P.A. and Ferencova, M., 2018. Multichannel marketing attribution using Markov chains. Journal of Applied Management and Investments, 7(1), pp.49-60. Zhao, K., Mahboobi, S.H. and Bagheri, S.R., 2018. Shapley value methods for attribution modeling in online advertising. arXiv preprint arXiv:1804.05327. Sinha, R., Arbour, D. and Puli, A.M., 2022. Bayesian Modeling of Marketing Attribution. arXiv preprint arXiv:2205.15965. Berman, R., 2018. Beyond the last touch: Attribution in online advertising. Marketing Science, 37(5), pp.771-792. Romero Leguina, J., Cuevas Rumín, Á. and Cuevas Rumín, R., 2020. Digital marketing attribution: Understanding the user path. Electronics, 9(11), p.1822. One Feature Attribution Method to (Supposedly) Rule Them All: Shapley Values Data-Driven Marketing Attribution Markov Chain Attribution Modeling [Complete Guide] Markov Chain Attribution Modeling [Complete Guide]
In the contemporary data landscape, characterized by vast volumes of diverse data sources, the necessity of anomaly detection intensifies. As organizations aggregate substantial datasets from disparate origins, the identification of anomalies assumes a pivotal role in reinforcing security protocols, streamlining operational workflows, and upholding stringent quality standards. Through the application of sophisticated methodologies encompassing statistical analysis, machine learning, and data visualization, anomaly detection emerges as a potent instrument for uncovering latent insights, mitigating risks, and facilitating real-time decision-making processes. This article centers on a focused application scenario: the detection of anomalies within a video/audio streaming platform to gauge real-time content delivery quality. Our objective is clear: to assess the quality of streaming video/audio content, ultimately enhancing the customer experience. Central to this discussion is the utilization of Quality of Service (QoS) metrics, complemented by GEO-IP services, to enrich data capture and facilitate proactive monitoring, detection, and intervention. What Is Quality of Service? Quality of service (QoS) refers to the measurement of the precision and reliability of the services provided to a platform, assessed through various metrics. It's a commonly employed concept in networking circles to ensure the optimal performance of a platform. This article focuses on establishing QoS metrics tailored specifically for video or audio content. We achieve this by extracting necessary metrics at the client edge (customer devices) and enhancing their attributes to provide deeper insights for business purposes. Why Quality of Service? The importance of "quality of service" lies in its ability to fulfill the specific needs of consumers. For instance, when customers are enjoying a live sports event through OTT streaming platforms like YouTube, it becomes paramount for the streaming company to assess the video quality across various regions. This necessity extends beyond video streaming to other sectors such as podcasting, audiobooks, and even award streaming services. How QoS Metrics Can Help in Anomaly Detection Integral to anomaly detection, QoS metrics furnish essential data and insights to pinpoint abnormal behavior and potential security risks across applications, systems, and networks. Continuous monitoring of metrics such as buffering ratio, bandwidth, and throughput enables the detection of anomalies through deviations from established thresholds or behavioral patterns, triggering alerts for swift intervention. Furthermore, QoS metrics facilitate root cause analysis by pinpointing underlying causes of anomalies, guiding the formulation of effective corrective actions. We need to design a solution in order to identify anomalies in three states: New York, New Jersey and Tamil Nadu for a streaming platform and ensure smooth streaming quality. We will leverage AWS components to compliment this solution. How Can We Solve This Problem Using Streaming Architecture? To comprehensively analyze the situation, we require additional attributes beyond just geographical location. For instance, in cases of streaming quality issues, organizations must ascertain whether the problem stems from the Internet Service Provider or if it is linked to recent code releases, potentially affecting specific operating systems on devices. Overall, there's a need for a Quality of Service (QoS) API service capable of collecting pertinent data from the client devices and relaying it to an API, which in turn disseminates these attributes to downstream components. With the initial details provided by the client, the downstream components can enhance the dataset. The JSON object below illustrates the basic information transmitted by the client device for a single event. Sample JSON event from client device: JSON { "video_start_time":"2023-09-10 10:30:30", "video_end_time":"2023-09-10 10:30:33", "total_play_time_mins" : "60", "uip":"10.122.9.22", "video_id":"xxxxxxxxxxxxxxx", "device_type":"ios", "device_model":"iphone11" } Architecture Option 1 The application code on the device can call the API Gateway, linked to a Kinesis proxy, which connects to a Kinesis Stream. This setup facilitates near real-time analysis of client data at this layer. Subsequently, data transformation can occur using a Lambda function, followed by storage in S3 for further analysis. This architecture addresses two primary use cases: firstly, the capability to analyze incoming QoS data in near real-time through Kinesis Stream, leveraging AWS tools like Kinesis Analytics for ad-hoc analytics with reduced latency. Secondly, the ability to write data to S3 using a simple Lambda code allows for batch analytics to be conducted. Mentioned approach effectively addresses scalability concerns in a streaming solution by leveraging various AWS components. In our specific use case, enriching incoming data with geo IP locations is essential, since we need information like country, state and ISP's. To achieve this, we can utilize a geo API, such as max mind, to incorporate geo-location, IP address, and other relevant dimensions. Alternatively, let's explore an architecture that assumes analytics are performed every minute, eliminating the need for a streaming layer and focusing solely on a delivery layer. Architecture Option 2 In this scenario, we'll illustrate the process of enriching data with geo and ISP-specific attributes to facilitate anomaly detection. Clients initiate the process by calling the API Gateway and passing along the relevant attributes. These values are then transmitted to the Kinesis Firehose via the Kinesis proxy. A transformation lambda function within the Kinesis Firehose executes a straightforward Python script to retrieve geo IP details from the MaxMind service. Subsequently, Kinesis Firehose batches the data and transfers it to S3. S3 serves as the central repository of truth for anomaly detection, housing all the necessary data for analysis. Below is a sample code snippet for calling the service to retrieve geo-IP details. As depicted, the code primarily centers on retrieving information from the MaxMind .mdb file supplied by the provider. Various methods exist for obtaining geo IP data; in this instance, I've chosen to have the .mdb file accessible via an S3 path. Alternatively, you can opt to retrieve it through API calls. The enriched data is then returned to Kinesis Firehose, where it undergoes batching, compression, and subsequent delivery to S3. Python import base64 import json import geoip2.database s3_city_url = "<maxmind_s3_url_path_for_city_details_mmdb_file>" s3_isp_url = "<maxmind_s3_url_path_for_isp_details_mmdb_file>" opener = ur.URLopener() city_file = opener.open(s3_city_url).read() isp_file = opener.open(s3_isp_url).read() def qos_handler(event, context): def enrichRecord(record): try: decodedata2 = base64.b64decode(record['data']) streaming_event_object = json.loads(decodedata2.decode("utf-8")) reader = geoip2.database.Reader(city_file, mode='RAW_FILE') response_data = reader.city(streaming_event_object['uip']) reader_isp_data = geoip2.database.Reader(isp_file, mode='RAW_FILE') response_isp_data = reader_isp.isp(streaming_event_object['uip']) streaming_event_object['cityname'] = response_data.city.name streaming_event_object['postalcode'] = response_data.postal.code streaming_event_object['metrocode'] = response_data.location.metro_code streaming_event_object['timezone'] = response_data.location.time_zone streaming_event_object['countryname'] = response_data.country.name streaming_event_object['countryisocode'] = response_data.country.iso_code streaming_event_object['origip'] = streaming_event_object['uip'] streaming_event_object['ispname']=response_data.isp jsonData = json.dumps(streaming_event_object) encoded_streaming_data = base64.b64encode(jsonData.encode("utf-8")) return { 'recordId': record['recordId'], 'result': "Ok", 'data': encoded_streaming_data.decode("utf-8") } except Exception as e: print("type of e:",type(e)) print("exception as e:",e) print("event[records]-input:",event['records']) output = list(map(enrichRecord, event['records'])) print("output:",output) return {'records': output} Analytics on Streamed Data After the data reaches S3, we can conduct ad-hoc analytics on it. Various options are available for analyzing the data once it resides in S3. It can be loaded into a data warehousing platform such as Redshift or Snowflake. Alternatively, if a data lake or data mesh serves as the source of truth, the data can be replicated there. During the analysis in S3, we primarily calculate the buffering ratio using the following formula: Plain Text The ratio is obtained by dividing the buffering time by the total playtime. In this example so we are calculating the buffering ratio as below, In our example: "video_start_time":"2023-09-10 10:30:30", "video_end_time":"2023-09-10 10:30:33", "total_play_time_mins" : "60", Buffering_ratio = diff(video_end_time,video_start_time)/total_play_time_mins Buffering_ratio = (3/3600) = 0.083 Detecting Anomalies To continue further, the following attributes will be available as rows in tabular format during the ETL operation at the Data Warehousing (DWH) stage. These values will be stored for each video/audio ID. By establishing a materialized view for the set of records stored over a certain period, we can compute an average value and percentages of the buffering ratio metric mentioned earlier. Sample JSON event with buffering ratio: JSON { "video_start_time":"2023-09-10 10:30:30", "video_end_time":"2023-09-10 10:30:33", "total_play_time_mins" : "60", "uip":"10.122.9.22", "video_id":"xxxxxxxxxxxxxxx", "device_type":"ios", "device_model":"iphone11", "Buffering_raio":"0.083", "uip":"10.122.9.22", "video_id":"xxxxxxxxxxxxxxx", "isp":"isp1", "country":"USA", "state":"NJ" } For simplicity, let's focus on one metric — buffering ratio — to gauge the streaming quality of sports matches or podcasts for customers. After capturing the real-time events and visualizing the tabular data, It is obvious NY exhibits a higher buffering ratio (out of the 3 states the organization is interested in), indicating that viewers may experience sluggish content delivery. This observation prompts further investigation into potential issues related to ISPs or networking by delving into other dimensions gathered from GEO-IP or device attributes. As the first step content providers choose to delve deeper into geographical dimensions at the city level, and they identify that Manhattan in New York has the highest buffering ratio among top 3 cities in NY having higher buffering ratios. Following this, content providers delve into the metrics associated with internet service provider (ISP) details specifically for Manhattan to identify potential causes. This examination uncovers that ISP1 exhibited a higher buffer ratio, and upon further investigation, it appears that ISP1 encountered internet speed issues only in Manhattan. These proactive analyses empower content providers to detect anomalies and evaluate their repercussions on consumers in particular regions, thereby proactively reaching out to consumers. Comparable analyses can be expanded to other factors such as device types and models. These steps demonstrate how anomaly detection can be carried out with robust data engineering, streaming solutions, and business intelligence in place. These data intrun can be us used for Machine learning algorithms as well for enhanced detections. Conclusion This article delved into leveraging QoS metrics for anomaly detection during content streaming in video or audio applications. A particular emphasis was placed on enriching data with GEO-IP details using the MAXMIND service, facilitating issue triage to specific dimensions such as country, state, county, or ISPs. Architectural options were also presented for implementing streaming solutions, accommodating both ad-hoc near real-time and batch analytics to pinpoint anomalies. I trust this article serves as a helpful starting point for exploring anomaly detection approaches within your organization. Notably, the discussed solution extends beyond OTT platforms, being applicable to diverse domains such as the financial sector, where near real-time anomaly detection is essential.
Businesses can react quickly and effectively to user behavior patterns by using real-time analytics. This allows them to take advantage of opportunities that might otherwise pass them by and prevent problems from getting worse. Apache Kafka, a popular event streaming platform, can be used for real-time ingestion of data/events generated from various sources across multiple verticals such as IoT, financial transactions, inventory, etc. This data can then be streamed into multiple downstream applications or engines for further processing and eventual analysis to support decision-making. Apache Flink serves as a powerful engine for refining or enhancing streaming data by modifying, enriching, or restructuring it upon arrival at the Kafka topic. In essence, Flink acts as a downstream application that continuously consumes data streams from Kafka topics for processing, and then ingests the processed data into various Kafka topics. Eventually, Apache Druid can be integrated to consume the processed streaming data from Kafka topics for analysis, querying, and making instantaneous business decisions. Click here for an enlarged view In my previous write-up, I explained how to integrate Flink 1.18 with Kafka 3.7.0. In this article, I will outline the steps to transfer processed data from Flink 1.18.1 to a Kafka 2.13-3.7.0 topic. A separate article detailing the ingestion of streaming data from Kafka topics into Apache Druid for analysis and querying was published a few months ago. You can read it here. Execution Environment We configured a multi-node cluster (three nodes) where each node has a minimum of 8 GB RAM and 250 GB SSD along with Ubuntu-22.04.2 amd64 as the operating system. OpenJDK 11 is installed with JAVA_HOME environment variable configuration on each node. Python 3 or Python 2 along with Perl 5 is available on each node. A three-node Apache Kafka-3.7.0 cluster has been up and running with Apache Zookeeper -3.5.6. on two nodes. Apache Druid 29.0.0 has been installed and configured on a node in the cluster where Zookeeper has not been installed for the Kafka broker. Zookeeper has been installed and configured on the other two nodes. The Leader broker is up and running on the node where Druid is running. Developed a simulator using the Datafaker library to produce real-time fake financial transactional JSON records every 10 seconds of interval and publish them to the created Kafka topic. Here is the JSON data feed generated by the simulator. JSON {"timestamp":"2024-03-14T04:31:09Z ","upiID":"9972342663@ybl","name":"Kiran Marar","note":" ","amount":"14582.00","currency":"INR","geoLocation":"Latitude: 54.1841745 Longitude: 13.1060775","deviceOS":"IOS","targetApp":"PhonePe","merchantTransactionId":"ebd03de9176201455419cce11bbfed157a","merchantUserId":"65107454076524@ybl"} Extract the archive of the Apache Flink-1.18.1-bin-scala_2.12.tgz on the node where Druid and the leader broker of Kafka are not running Running a Streaming Job in Flink We will dig into the process of extracting data from a Kafka topic where incoming messages are being published from the simulator, performing processing tasks on it, and then reintegrating the processed data back into a different topic of the multi-node Kafka cluster. We developed a Java program (StreamingToFlinkJob.java) that was submitted as a job to Flink to perform the above-mentioned steps, considering a window of 2 minutes and calculating the average amount transacted from the same mobile number (upi id) on the simulated UPI transactional data stream. The following list of jar files has been included on the project build or classpath. Using the code below, we can get the Flink execution environment inside the developed Java class. Java Configuration conf = new Configuration(); StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf); Now we should read the messages/stream that has already been published by the simulator to the Kafka topic inside the Java program. Here is the code block. Java KafkaSource kafkaSource = KafkaSource.<UPITransaction>builder() .setBootstrapServers(IKafkaConstants.KAFKA_BROKERS)// IP Address with port 9092 where leader broker is running in cluster .setTopics(IKafkaConstants.INPUT_UPITransaction_TOPIC_NAME) .setGroupId("upigroup") .setStartingOffsets(OffsetsInitializer.latest()) .setValueOnlyDeserializer(new KafkaUPISchema()) .build(); To retrieve information from Kafka, setting up a deserialization schema within Flink is crucial for processing events in JSON format, converting raw data into a structured form. Importantly, setParallelism needs to be set to no.of Kafka topic partitions else the watermark won't work for the source, and data is not released to the sink. Java DataStream<UPITransaction> stream = env.fromSource(kafkaSource, WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofMinutes(2)), "Kafka Source").setParallelism(1); With successful event retrieval from Kafka, we can enhance the streaming job by incorporating processing steps. The subsequent code snippet reads Kafka data, organizes it by mobile number (upiID), and computes the average price per mobile number. To accomplish this, we developed a custom window function for calculating the average and implemented watermarking to handle event time semantics effectively. Here is the code snippet: Java SerializableTimestampAssigner<UPITransaction> sz = new SerializableTimestampAssigner<UPITransaction>() { @Override public long extractTimestamp(UPITransaction transaction, long l) { try { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'"); Date date = sdf.parse(transaction.eventTime); return date.getTime(); } catch (Exception e) { return 0; } } }; WatermarkStrategy<UPITransaction> watermarkStrategy = WatermarkStrategy.<UPITransaction>forBoundedOutOfOrderness(Duration.ofMillis(100)).withTimestampAssigner(sz); DataStream<UPITransaction> watermarkDataStream = stream.assignTimestampsAndWatermarks(watermarkStrategy); //Instead of event time, we can use window based on processing time. Using TumblingProcessingTimeWindows DataStream<TransactionAgg> groupedData = watermarkDataStream.keyBy("upiId").window(TumblingEventTimeWindows.of(Time.milliseconds(2500), Time.milliseconds(500))).sum("amount"); .apply(new TransactionAgg()); Eventually, the processing logic (computation of average price for the same UPI ID based on a mobile number for the window of 2 minutes on the continuous flow of transaction stream) is executed inside Flink. Here is the code block for the Window function to calculate the average amount on each UPI ID or mobile number. Java public class TransactionAgg implements WindowFunction<UPITransaction, TransactionAgg, Tuple, TimeWindow> { @Override public void apply(Tuple key, TimeWindow window, Iterable<UPITransaction> values, Collector<TransactionAgg> out) { Integer sum = 0; //Consider whole number int count = 0; String upiID = null ; for (UPITransaction value : values) { sum += value.amount; upiID = value.upiID; count++; } TransactionAgg output = new TransactionAgg(); output.upiID = upiID; output.eventTime = window.getEnd(); output.avgAmount = (sum / count); out.collect( output); } } We have processed the data. The next step is to serialize the object and send it to a different Kafka topic. Add a KafkaSink in the developed Java code (StreamingToFlinkJob.java) to send the processed data from the Flink engine to a different Kafka topic created on the multi-node Kafka cluster. Here is the code snippet to serialize the object before sending/publishing it to the Kafka topic: Java public class KafkaTrasactionSinkSchema implements KafkaRecordSerializationSchema<TransactionAgg> { @Override public ProducerRecord<byte[], byte[]> serialize( TransactionAgg aggTransaction, KafkaSinkContext context, Long timestamp) { try { return new ProducerRecord<>( topic, null, // not specified partition so setting null aggTransaction.eventTime, aggTransaction.upiID.getBytes(), objectMapper.writeValueAsBytes(aggTransaction)); } catch (Exception e) { throw new IllegalArgumentException( "Exception on serialize record: " + aggTransaction, e); } } } And, below is the code block to sink the processed data sending back to a different Kafka topic. Java KafkaSink<TransactionAgg> sink = KafkaSink.<TransactionAgg>builder() .setBootstrapServers(IKafkaConstants.KAFKA_BROKERS) .setRecordSerializer(new KafkaTrasactionSinkSchema(IKafkaConstants.OUTPUT_UPITRANSACTION_TOPIC_NAME)) .setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE) .build(); groupedData.sinkTo(sink); // DataStream that created above for TransactionAgg env.execute(); Connecting Druid With Kafka Topic In this final step, we need to integrate Druid with the Kafka topic to consume the processed data stream that is continuously published by Flink. With Apache Druid, we can directly connect Apache Kafka so that real-time data can be ingested continuously and subsequently queried to make business decisions on the spot without interventing any third-party system or application. Another beauty of Apache Druid is that we need not configure or install any third-party UI application to view the data that landed or is published to the Kafka topic. To condense this article, I omitted the steps for integrating Druid with Apache Kafka. However, a few months ago, I published an article on this topic (linked earlier in this article). You can read it and follow the same approach. Final Note The provided code snippet above is for understanding purposes only. It illustrates the sequential steps of obtaining messages/data streams from a Kafka topic, processing the consumed data, and eventually sending/pushing the modified data into a different Kafka topic. This allows Druid to pick up the modified data stream for query, analysis as a final step. Later, we will upload the entire codebase on GitHub if you are interested in executing it on your own infrastructure. I hope you enjoyed reading this. If you found this article valuable, please consider liking and sharing it.
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering. Artificial intelligence (AI) has evolved from a futuristic idea into a fundamental aspect of contemporary software development. This evolution has introduced significant milestones, reshaping both our interactions with technology and the methodologies of software creation. This article delves into AI's impact on the realm of software development, highlighting how professionals can adapt to and thrive amidst these transformative changes. Positive Impacts of AI on Developers' Jobs AI excels in automating repetitive tasks, ranging from code generation to intricate testing and deployment procedures. Tools like Jenkins and Azure DevOps streamline deployments, enhancing reliability and efficiency, while AI-driven IDEs provide real-time code analysis and bug detection, elevating coding precision and speed. In addition, the advent of AI-assisted tools marks a significant advancement, improving not only coding but also project management. Negative Impacts of AI on Developers' Jobs Despite AI's benefits, there's apprehension over job displacement, with predictions suggesting a significant portion of programming roles may become automated. Additionally, the sophistication of AI systems introduces complexity and necessitates a higher level of expertise, potentially sidelining those without specialized knowledge in AI and machine learning (ML). Some AI tools are now capable of generating complex code structures, which may reduce the need for entry-level programming jobs. According to researchers from OpenAI and the University of Pennsylvania, it is predicted that 80% of the U.S. workforce could see an effect on at least 10% of their tasks. Furthermore, as AI systems become more sophisticated, the complexity in understanding and maintaining these systems increases. For example, the development and maintenance of AI models in platforms like Google's TensorFlow or OpenAI's GPT-3 require specialized knowledge in ML, which is a skill set not all developers possess. Lastly, a heavy reliance on AI tools can lead to a scenario where developers may lack a deep understanding of the underlying code, leading to challenges in troubleshooting and customization. The Challenge of Staying Up to Date The fast-paced nature of AI advancements means that tools and techniques can quickly become outdated. For instance, ML frameworks are continuously updated, requiring developers to constantly learn new methodologies. This was evident when TensorFlow 2.0 was released with significant changes from its predecessor, requiring developers to adapt quickly. The need for continuous learning can be overwhelming, especially for developers who are already managing a full workload. The pace of change can lead to skill gaps, as seen in industries like finance and healthcare, where the adoption of AI has outpaced the workforce's ability to keep up with new technologies. Balancing AI and Human Skills in Development While AI is unparalleled in its ability to sift through and analyze extensive datasets, it's the human element — creativity, intuition, and ethical foresight — that propels truly innovative solutions. The realm of video gaming serves as a prime example of innovation through creativity, where AI assists in crafting intricate environments and behaviors. Yet it's the human touch that weaves the captivating storylines, character arcs, and the overall design, reflecting a deep understanding of narrative and emotional engagement. Finding the balance for ethical considerations and decision-making is imperative. Particularly in healthcare, AI's capacity to sift through patient data and recommend treatments is revolutionary. However, it's the human practitioner's role to weigh these suggestions within an ethical framework and make the final call on patient care, ensuring that technology serves humanity's best interests. AI: A Collaborative Companion, Not a Competitor Viewing AI as an ally in the development process is crucial for leveraging its full potential without undermining the value of human expertise. For example: In cybersecurity, AI's efficiency in identifying threats is invaluable. Nonetheless, it's the human expert's critical thinking and contextual judgment that are irreplaceable in formulating an appropriate response to these threats. The advent of collaborative robots (cobots) in manufacturing illustrates the harmonious blend of AI's precision with human dexterity and adaptability, enhancing productivity and safety. The Symbiotic Relationship Between AI and Human Intelligence A collaboration between human intelligence and AI's capabilities offers a balanced approach to solving complex challenges, leveraging the strengths of both. In financial sectors, AI excels in processing and analyzing market data to unearth trends. Yet it's the nuanced interpretation and strategic decision-making by humans, considering broader economic and geopolitical factors, that drive impactful outcomes. Leading tech firms, including Google and IBM, underscore the necessity of human oversight in AI's evolution. This ensures that AI technologies not only advance in capabilities but also align with ethical standards and human values, fostering a tech ecosystem that respects and enhances human dignity and welfare. The integration of AI in software development is not about displacing human roles but enriching them. By valuing the unique contributions of human creativity, ethical judgment, and strategic thinking alongside AI's analytical prowess, we pave the way for a future where technology amplifies human potential, driving forward innovation in a manner that is both ethical and impactful. Leveraging AI for Innovation The role of AI in software development transcends mere efficiency improvements, acting as a pivotal force for innovation. AI empowers developers to extend the realms of feasibility, facilitating the creation of software solutions that are more advanced, intuitive, and impactful. AI-Driven Creative Problem-Solving AI's unparalleled data processing and analysis capabilities unlock novel approaches for creative problem-solving within software development. Take, for example, predictive analytics for enhanced consumer insights. In the e-commerce domain, AI algorithms predict consumer behavior, allowing businesses to customize their offerings. A notable illustration is Amazon's recommendation system, which leverages AI to analyze consumer interactions and tailor shopping experiences accordingly. Additionally, AI has significantly advanced natural language processing (NLP), enabling the development of user interfaces that mimic human conversation. Siri by Apple exemplifies this, utilizing NLP to interpret and respond to user inquiries in a conversational manner. Pioneering New Software Solutions With AI AI's application spans a diverse array of industries, driving the development of innovative software solutions. AI plays a crucial role in healthcare by enabling the early detection of diseases and personalizing medical treatments. Google's DeepMind, for instance, has developed algorithms capable of identifying eye diseases from retinal scans, marking a significant leap forward in medical diagnostics. In the fintech sector, AI-driven algorithms offer automated trading systems that meticulously analyze market data to execute trades strategically, optimizing financial outcomes. Illustrative Case Studies of AI in Action The integration of AI in real-world development projects showcases its potential to redefine industry standards. Table 1. Case studies of AI in action Sector Example Automotive Tesla's Autopilot system exemplifies AI's capacity to innovate, employing ML to interpret sensor data for autonomous driving decisions. This represents a harmonious blend of AI's analytical prowess with advanced software engineering techniques. Entertainment Netflix leverages AI for content recommendation and optimization, analyzing viewer preferences to personalize content and guide original production decisions. This not only enhances the user experience but also optimizes content creation strategies. Retail operations Walmart's application of AI in managing inventory and enhancing customer service demonstrates its transformative impact. AI enables Walmart to adjust stock levels dynamically and offer personalized shopping experiences, showcasing the broad applicability and potential of AI across different market segments. Overcoming Challenges in AI Adoption The journey toward integrating AI into software development is fraught with unique challenges. Addressing these effectively demands a strategic focus on education, skill acquisition, and adherence to ethical standards. Bridging the Skills Divide Through Education and Training The swift evolution of AI technologies has precipitated a notable skills gap within the industry, necessitating a concerted effort toward continuous education and specialized training. This commitment to education may encompass engaging in specialized online courses, participating in workshops, and becoming actively involved in AI development communities to stay abreast of the latest trends and tools. Giants like IBM and Microsoft have forged alliances with academic institutions, offering AI and machine learning courses and certifications. These initiatives aim to arm developers with the expertise needed to harness AI technologies effectively. Meanwhile, Google has set a precedent with its internal AI training programs, ensuring its workforce remains at the forefront of AI advancements by familiarizing them with the latest tools and methodologies. The future will demand developers to blend AI proficiency with a broad spectrum of skills, including ethical considerations in AI, data science, and specialized industry knowledge. This holistic skill set will enable developers to leverage AI effectively across various application domains. Simplifying AI Adoption Through Accessible Tools and Resources The intricacies of AI tools and frameworks present a significant hurdle, particularly for newcomers to the field. Mastery over these technologies necessitates a considerable investment of time and resources. Efforts by companies with platforms such as Amazon SageMaker exemplify the industry's move toward simplifying AI application development. These platforms streamline the process of building, training, and deploying machine learning models, making AI more accessible. The open-source ecosystem also plays a pivotal role in democratizing AI adoption. Tools like TensorFlow and PyTorch are bolstered by extensive documentation and a supportive community, facilitating a smoother learning curve for developers. Upholding Data Privacy and Security In an era where AI systems frequently handle sensitive data, ensuring privacy and security is imperative. Adhering to stringent regulations such as GDPR and HIPAA is non-negotiable. IBM's AI ethics guidelines offer a blueprint for crafting AI solutions that honor privacy and security principles. The healthcare industry exemplifies the critical importance of data privacy, too. Firms like Epic Systems have integrated AI into their offerings while strictly complying with patient privacy regulations, setting a standard for ethical AI deployment. Overcoming the hurdles associated with AI adoption in software development is an endeavor that extends beyond mere technical implementation. It encompasses a holistic approach involving educational outreach, simplification of technological complexities, and a steadfast commitment to ethical practices. By addressing these facets, the industry can pave the way for a future where AI augments development processes in a manner that is both responsible and inclusive. The Future of AI in Development The trajectory of AI in software development is set toward groundbreaking shifts, fueled by relentless technological advancements and broader AI integration across diverse sectors. This forward-looking perspective offers insights into potential developments and the opportunities they may unveil. Emerging AI Trends and Future Directions As AI becomes increasingly entrenched in software development, we stand on the cusp of significant innovations — innovations by AI platforms illustrate the future of AI in enhancing code quality. These tools are set to extend beyond mere error detection to offer actionable recommendations for optimization, potentially setting new standards for coding efficiency and robustness. And in an era of evolving cyber threats, AI's capacity to preemptively identify and mitigate security risks will be indispensable. Future AI systems are expected to proactively counteract threats, offering a dynamic shield against cyber vulnerabilities. The future of AI in software development is not merely an extension of its current state but a revolution in how we conceive, develop, and optimize software. As we look ahead, the integration of AI promises to not only streamline development processes but also to inspire innovations that were previously unimaginable. The key to thriving in this evolving landscape lies in embracing continuous learning and interdisciplinary expertise, ensuring developers remain at the forefront of this technological renaissance. Conclusion The integration of AI in software development marks a transformative era, bringing both unparalleled opportunities and significant challenges. As innovative, AI-driven solutions reshape the development landscape, it becomes imperative for developers to commit to continuous education in order to balance AI's advanced capabilities with the irreplaceable nuances of human creativity and ethical judgment. Embracing this AI-centric future means not just leveraging its power for efficiency and innovation, but also navigating its complexities with a focus on sustainable and responsible development. Ultimately, the synergy between human intellect and artificial intelligence will define the next frontier in software development, leading to a more efficient, creative, and ethically grounded technological future. This is an excerpt from DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering. AI continues to transform businesses, but this leads to enterprises facing new challenges in terms of digital transformation and organizational changes. Based on a 2023 Forbes report, those challenges can be summarized as follows: Companies whose analytical tech stacks are built around analytical/batch workloads need to start adapting to real-time data processing (Forbes). This change affects not only the way the data is collected, but it also leads to the need for new data processing and data analytics architectural models. AI regulations need to be considered as part of AI/ML architectural models. According to Forbes, "Gartner predicts that by 2025, regulations will force companies to focus on AI ethics, transparency, and privacy." Hence, those platforms will need to comply with upcoming standards. Specialized AI teams must be built, and they should be capable of not only building and maintaining AI platforms but also collaborating with other teams to support models' lifecycles through those platforms. The answer to these new challenges seems to be MLOps, or machine learning operations. MLOps builds on top of DevOps and DataOps as an attempt to facilitate machine learning (ML) applications and a way to better manage the complexity of ML systems. The goal of this article is to provide a systematic overview of MLOps architectural challenges and demonstrate ways to manage that complexity. MLOps Application: Setting Up the Use Case For this article, our example use case is a financial institution that has been conducting macroeconomic forecasting and investment risk management for years. Currently, the forecasting process is based on partially manual loading and postprocessing of external macroeconomic data, followed by statistical modeling using various tools and scripts based on personal preferences. However, according to the institution's management, this process is not acceptable due to recently announced banking regulations and security requirements. In addition, the delivery of calculated results is too slow and financially not acceptable compared to competitors in the market. Investment in a new digital solution requires a good understanding of the complexity and the expected cost. It should start with gathering requirements and subsequently building a minimum viable product. Requirements Gathering For solution architects, the design process starts with a specification of problems that the new architecture needs to solve — for example: Manual data collection is slow, error prone, and requires a lot of effort Real-time data processing is not part of the current data loading approach There is no data versioning and, hence, reproducibility is not supported over time The model's code is triggered manually on local machines and constantly updated without versioning Data and code sharing via a common platform is completely missing The forecasting process is not represented as a business process, all the steps are distributed and unsynchronized, and most of them require manual effort Experiments with the data and models are not reproducible and not auditable Scalability is not supported in case of increased memory consumptions or CPU-heavy operations Monitoring and auditing of the whole process are currently not supported The following diagram demonstrates the four main components of the new architecture: monitoring and auditing platform, model deployment platform, model development platform, and data management platform. Figure 1. MLOps architecture diagram Platform Design Decisions The two main strategies to consider when designing a MLOps platform are: Developing from scratch vs. selecting a platform Choosing between a cloud-based, on-premises, or hybrid model Developing From Scratch vs. Choosing a Fully Packaged MLOps Platform Building an MLOps platform from scratch is the most flexible solution. It would provide the possibility to solve any future needs of the company without depending on other companies and service providers. It would be a good choice if the company already has the required specialists and trained teams to design and build an ML platform. A prepackaged solution would be a good option to model a standard ML process that does not need many customizations. One option would even be to buy a pretrained model (e.g., model as a service), if available on the market, and build only the data loading, monitoring, and tracking modules around it. The disadvantage of this type of solution is that if new features need to be added, it might be hard to achieve those additions on time. Buying a platform as a black box often requires building additional components around it. An important criterion to consider when choosing a platform is the possibility to extend or customize it. Cloud-Based, On-Premises, or Hybrid Deployment Model Cloud-based solutions are already on the market, with popular options provided by AWS, Google, and Azure. In case of no strict data privacy requirements and regulations, cloud-based solutions are a good choice due to the unlimited infrastructural resources for model training and model serving. An on-premises solution would be acceptable for very strict security requirements or if the infrastructure is already available within the company. The hybrid solution is an option for companies that already have part of the systems built but want to extend them with additional services — e.g., to buy a pretrained model and integrate with the locally stored data or incorporate into an existing business process model. MLOps Architecture in Practice The financial institution from our use case does not have enough specialists to build a professional MLOps platform from scratch, but it also does not want to invest in an end-to-end managed MLOps platform due to regulations and additional financial restrictions. The institution's architectural board has decided to adopt an open-source approach and buy tools only when needed. The architectural concept is built around the idea of developing minimalistic components and a composable system. The general idea is built around microservices covering nonfunctional requirements like scalability and availability. Striving for maximal simplicity of the system, the following decisions for the system components were made. Data Management Platform The data collection process will be fully automated. There will be a separate data loading component for each data source due to the heterogeneity of external data providers. The database choice is crucial when it comes to writing real-time data and reading a large amount of data. Due to the time-based nature of the macroeconomic data and the institution's already available relational database specialists, they chose to use the open-source database, TimescaleDB. The possibility to provide a standard SQL-based API, perform data analytics, and conduct data transformations using standard relational database GUI clients will decrease the time to deliver a first prototype of the platform. Data versions and transformations can be tracked and saved into separate data versions or tables. Model Development Platform The model development process consists of four steps: Data reading and transformation Model training Model serialization Model packaging Once the model is trained, the parametrized and trained instance is usually stored as a packaged artifact. The most common solution for code storage and versioning is a Git. Furthermore, the financial institution is already equipped with a solution like GitHub, providing functionality to define pipelines for building, packaging, and publishing the code. The architecture of Git-based systems usually relies on a set of distributed worker machines executing the pipelines. That option will be used as part of the minimalistic MLOps architectural prototype to also train the model. After training a model, the next step is to store it in a model repository as a released and versioned artifact. Storing the model in a database as a binary file, a shared file system, or even an artifacts repository are all acceptable options at that stage. Later, a model registry or a blob storage service could be incorporated into the pipeline. A model's API microservice will expose the model's functionality for macroeconomic projections. Model Deployment Platform The decision to keep the MLOps prototype as simple as possible applies to the deployment phase as well. The deployment model is based on a microservices architecture. Each model can be deployed using a Docker container as a stateless service and be scaled on demand. That principle applies for the data loading components, too. Once that first deployment step is achieved and dependencies of all the microservices are clarified, a workflow engine might be needed for orchestrating the established business processes. Model Monitoring and Auditing Platform Traditional microservices architectures are already equipped with tools for gathering, storing, and monitoring log data. Tools like Prometheus, Kibana, and ElasticSearch are flexible enough for producing specific auditing and performance reports. Open-Source MLOps Platforms A minimalistic MLOps architecture is a good start for the initial digital transformation of a company. However, keeping track of available MLOps tools in parallel is crucial for the next design phase. The following table provides a summary of some of the most popular open-source tools. Table 1. Open-source MLOps tools for initial digital transformations Tool Description Functional Areas Kubeflow Makes deployments of ML workflows on Kubernetes simple, portable, and scalable Tracking and versioning, pipeline orchestration, and model deployment MLflow Is an open-source platform for managing the end-to-end ML lifecycle Tracking and versioning BentoML Is an open standard and SDK for AI apps and inference pipelines; provides features like auto-generation of API servers, REST APIs, gRPC, and long-running inference jobs; and offers auto-generation of Docker container images Tracking and versioning, pipeline orchestration, model development, and model deployment TensorFlow Extended (TFX) Is a production-ready platform; is designed for deploying and managing ML pipelines; and includes components for data validation, transformation, model analysis, and serving Model development, pipeline orchestration, and model deployment Apache Airflow, Apache Beam Is a flexible framework for defining and scheduling complex workflows — data workflows in particular, including ML Pipeline orchestration Summary MLOps is often called DevOps for machine learning, and it is essentially a set of architectural patterns for ML applications. However, despite the similarities with many well-known architectures, the MLOps approach brings some new challenges for MLOps architects. On one side, the focus must be on the compatibility and composition of MLOps services. On the other side, AI regulations will force existing systems and services to constantly adapt to new regulatory rules and standards. I suspect that as the MLOps field continues to evolve, a new type of service providing AI ethical and regulatory analytics will soon become the focus of businesses in the ML domain. This is an excerpt from DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering. In today's digital age, data has become the cornerstone of decision-making across various domains, from business and healthcare to education and government. The ability to collect, analyze, and derive insights from data has transformed how organizations operate, offering unprecedented opportunities for innovation, efficiency, and growth. What Is a Data-Driven Approach? A data-driven approach is a methodology that relies on data analysis and interpretation to guide decision-making and strategy development. This approach encompasses a range of techniques, including data collection, storage, analysis, visualization, and interpretation, all aimed at harnessing the power of data to drive organizational success. Key principles include: Data collection – Gathering relevant data from diverse sources is foundational to ensuring its quality and relevance for subsequent analysis. Data analysis – Processing and analyzing collected data using statistical and machine learning (ML) techniques reveal valuable insights for informed decision-making. Data visualization – Representing insights visually through charts and graphs facilitates understanding and aids decision-makers in recognizing trends and patterns. Data-driven decision-making – Integrating data insights into decision-making processes across all levels of an organization enhances risk management and process optimization. Continuous improvement – Embracing a culture of ongoing data collection, analysis, and action fosters innovation and adaptation to changing environments. Data Integration Strategies Using AI Data integration combines data from various sources for a unified view. Artificial intelligence (AI) improves integration by automating tasks, boosting accuracy, and managing diverse data volumes. Here are the top four data integration strategies/patterns using AI: Automated data matching and merging – AI algorithms, such as ML and natural language processing (NLP), can match and automatically merge data from disparate sources. Real-time data integration – AI technologies, such as stream processing and event-driven architectures, can facilitate real-time data integration by continuously ingesting, processing, and integrating data as it becomes available. Schema mapping and transformation – AI-driven tools can automate the process of mapping and transforming data schemas from different formats or structures. This includes converting data between relational databases, NoSQL databases, and other data formats — plus handling schema evolution over time. Knowledge graphs and graph-based integration – AI can build and query knowledge graphs representing relationships between entities and concepts. Knowledge graphs enable flexible and semantic-driven data integration by capturing rich contextual information and supporting complex queries across heterogeneous data sources. Data integration is the backbone of modern data management strategies, which are pivotal in providing organizations with a comprehensive understanding of their data landscape. Data integration ensures a cohesive and unified view of organizational data assets by seamlessly combining data from disparate sources, such as databases, applications, and systems. One of the primary benefits of data integration is its ability to enhance data quality. By consolidating data from multiple sources, organizations can identify and rectify inconsistencies, errors, and redundancies, thus improving their data's accuracy and reliability. This, in turn, empowers decision-makers to make informed choices based on trustworthy information. Let's look closely at how we can utilize generative AI for data-related processes. Exploring the Impact of Generative AI on Data-Related Processes Generative AI has revolutionized various industries and data-related processes in recent years. Generative AI encompasses a wide array of methodologies, spanning generative adversarial networks (GANs) and variational autoencoders (VAEs) to transformer-based models such as GPT (generative pre-trained transformer). These algorithms showcase impressive abilities in producing lifelike images, text, audio, and even videos, which closely emulate human creativity through generating fresh data samples. Using Generative AI for Enhanced Data Integration Now, we've come to the practical part of the role of generative AI in enhanced data integration. Below, I've provided some real-world scenarios. This will bring more clarity to AI's role in data integration. Table 1. Real-world use cases Industry/Application Example Healthcare/image recognition Generating synthetic medical images for data augmentation in deep learning models Using GANs to create realistic medical images Supplementing limited training data Enhancing the performance of image recognition algorithms Facilitating tasks like disease diagnosis and medical imaging analysis E-commerce Automating schema mapping and transformation for product catalog integration Leveraging generative AI techniques Automatically aligning product attributes and specifications from various vendors Creating a unified schema Facilitating seamless integration of product catalogs Enhancing the shopping experience for customers on e-commerce platforms Social media Utilizing NLP models to extract metadata from user-generated content Analyzing text-based content, including social media posts or comments Extracting valuable metadata such as sentiment, topic, and user preferences Integrating extracted metadata into recommendation systems Personalizing content delivery based on user preferences Enhancing user engagement on social media platforms through personalized recommendations Cybersecurity Using generative AI to detect network traffic anomalies Training on synthetic data resembling real-world patterns Enhancing cybersecurity against threats Improving intrusion detection and response Financial services Integrating diverse market data in real time Using generative AI to aggregate data from various sources Enabling informed decisions and trade execution Continuously updating strategies for changing market conditions Improving investment outcomes and risk management Ensuring Data Accuracy and Consistency Using AI and ML Organizations struggle to maintain accurate and reliable data in today's data-driven world. AI and ML help detect anomalies, identify errors, and automate cleaning processes. Let's look into those patterns a bit closer. Validation and Data Cleansing Data validation and cleansing are often laborious tasks, requiring significant time and resources. AI-powered tools streamline and speed up these processes. ML algorithms learn from past data to automatically identify and fix common quality issues. They can standardize formats, fill in missing values, and reconcile inconsistencies. Automating these tasks reduces errors and speeds up data preparation. Uncovering Patterns and Insights AI and ML algorithms can uncover hidden patterns, trends, and correlations within datasets. By analyzing vast amounts of data, these algorithms can identify relationships that may not be apparent to human analysts. AI and ML can also understand the underlying causes of data quality issues and develop strategies to address them. For example, ML algorithms can identify common errors or patterns contributing to data inconsistencies. Organizations can then implement new processes to improve data collection, enhance data entry guidelines, or identify employee training needs. Anomalies in Data AI and ML algorithms reveal hidden patterns, trends, and correlations in datasets, analyzing vast amounts of data to uncover insights not readily apparent to humans. They also understand the root causes of data quality issues, identifying common errors or patterns causing inconsistencies. This enables organizations to implement new processes, such as refining data collection methods or enhancing employee training, to address these issues. Detecting Anomalies in Data ML models excel at detecting patterns, including deviations from norms. With ML, organizations can analyze large volumes of data, compare them against established patterns, and flag potential issues. Organizations can then identify anomalies and determine how to correct, update, or augment their data to ensure its integrity. Let's have a look at services that can validate data and detect anomalies. Detecting Anomalies Using Stream Analytics Azure Stream Analytics, AWS Kinesis, and Google Cloud Dataflow are examples of tools that provide built-in anomaly detection capabilities, both in the cloud and at the edge, enabling vendor-neutral solutions. These platforms offer various functions and operators for anomaly detection, allowing users to monitor anomalies, including temporary and persistent ones. For example, based on my experience building validation using Stream Analytics, here are several key actions to consider following: The model's accuracy improves with more data in the sliding window, treating it as expected within the timeframe. It focuses on event history in the window to spot anomalies, discarding old values as it moves. Functions establish a baseline normal by comparing past data and identifying outliers within a confidence level. Set the window size based on the minimum events needed for practical training. Response time increases with history size, so include only necessary events for better performance. Based on ML, you can monitor temporary anomalies like spikes and dips in a time series event stream using the AnomalyDetection_SpikeAndDip operator. If a second spike within the same sliding window is smaller than the first, its score might not be significant enough compared to the first spike within the specified confidence level. To address this, consider adjusting the model's confidence level. However, if you receive too many alerts, use a higher confidence interval. Leveraging Generative AI for Data Transformation and Augmentation Generative AI helps with data augmentation and transformation, which are also part of the data validation process. Generative models can generate synthetic data that resembles actual data samples. This can be particularly useful when the available dataset is small or needs more diversity. Generative models can also be trained to translate data from one domain to another, or to transform data while preserving its underlying characteristics. For example, sequence-to-sequence models like transformers can be used in NLP for tasks such as language translation or text summarization, effectively transforming the input data into a different representation. Also, the data transformation process can be used to solve problems in legacy systems based on an old codebase. Organizations can unlock numerous benefits by transitioning to modern programming languages. For instance, legacy systems are built on outdated programming languages such as Cobol, Lisp, and Fortran. To modernize and enhance their performance, we must migrate or rewrite them using the latest high-performance and sophisticated programming languages like Python, C#, or Go. Let's look at the diagram below to see how generative AI can be used to facilitate this migration process: Figure 1. Using generative AI to rewrite legacy code The architecture above is based on the following components and workflow: Azure Data Factory is the main ETL (extract, transform, load) for data orchestration and transformation. It connects to the source repo Git repositories. Alternatively, we can use AWS Glue for data integration and Google Cloud Data Fusion for ETL data operation. OpenAI is the generative AI service used to transform Cobol and C++ to Python, C#, and Golang (or any other language). The OpenAI service is connected to Data Factory. Alternatives to OpenAI are Amazon SageMaker or Google Cloud AI Platform. Azure Logic Apps and Google Cloud Functions are utility services that provide data mapping and file management capabilities. DevOps CI/CD provides pipelines to validate, compile, and interpret generated code. Data Validation and AI: Chatbot Call Center Use Case An automated call center setup is a great use case to demonstrate data validation. The following example provides an automation and database solution for call centers: Figure 2. Call center chatbot architecture The automation and database solution extracts data from the speech bot deployed in call centers or from interactions with real people. It then stores, analyzes, and validates this data using OpenAI's ChatGPT and an AI sentiment analysis service. Subsequently, the analyzed data is visualized using business intelligence (BI) dashboards for comprehensive insights. The processed information is also integrated into the customer relationship management (CRM) systems for human validation and further action. The solution ensures accurate understanding and interpretation of customer interactions by leveraging ChatGPT, an advanced NLP model. Using BI dashboards offers intuitive and interactive data visualization capabilities, allowing stakeholders to gain actionable insights at a glance. Integrating the analyzed data into CRM systems enables seamless collaboration between automated analysis and human validation. Conclusion In the ever-evolving landscape of enterprise AI, achieving data excellence is crucial. Data and generative AI services that provide data analysis, ETL, and NLP enable robust integration strategies for unlocking the full potential of data assets. By combining data-driven approaches and advanced technologies, businesses can pave the way for enhanced decision-making, productivity, and innovation through these AI and data services. This is an excerpt from DZone's 2024 Trend Report,Enterprise AI: The Emerging Landscape of Knowledge Engineering.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise AI: The Emerging Landscape of Knowledge Engineering. Generative AI, a subset of artificial intelligence (AI), stands as a transformative technology. Leveraging deep learning models, it exhibits a unique ability to interpret inputs spanning text, image, audio, video, or code and seamlessly generate novel content across various modalities. This innovation has broad applications, ranging from turning textual inputs into visual representations to transforming videos into textual narratives. Its proficiency lies in its capacity to generate high-quality and contextually relevant outputs, a testament to its potential in reshaping content creation. An example of this is found in Figure 1, which shows an application of generative AI where text prompts have been converted to an image. Figure 1. DALL·E 2 generates an image from text prompt Journey of Generative AI The fascinating journey of AI started a couple of centuries back, and Table 1 below highlights the key milestones in the evolution of generative AI, covering significant launches and advancements over the years: Table 1. Key milestones in the evolution of generative AI Major Launches 1805: First neural network (NN)/linear regression 1997: Introduction of LSTM 1925: First recurrent neural network (RNN) architecture 2014: Variational autoencoder, GAN, GRU 1958: Multi-layer perceptron — no deep learning 2017: Transformers 1965: First deep learning 2018: GPT, BERT 1972: Published artificial RNNs 2021: DALL·E 1980: Release of autoencoders 2022: Latent diffusion, DALL·E 2, Midjourney, Stable Diffusion, ChatGPT, AudioLM 1986: Invention of backpropagation 2023: GPT-4, Falcon, Bard, MusicGen, AutoGPT, LongNet, Voicebox, LLaMA 1990: Introduction of GAN/Curiosity 2024: Sora, Stable Cascade 1995: Release of LeNet-5 Generative AI Across Modalities Generative AI spans various modalities, as enlisted in Table 2 below, showcasing its versatile capabilities: Table 2. Generative AI modalities and major open-source tools Modality Tools Text OpenAI GPT, Transformer models (TensorFlow, PyTorch), BERT (Google) Code CodeT5, PolyCoder Image StyleGAN (NVlabs), DALL·E (OpenAI), CycleGAN (junyanz), BigGAN (Google), Stable Diffusion, StableStudio, Waifu Diffusion Audio WaveNet (DeepMind), Tacotron 2 (Google), MelGAN (descriptinc) 3D object 3D-GANs, PyTorch3D Video Video Generation with GANs, Temporal Generative Adversarial Nets (TGANs) How Does Generative AI Work? Generative AI leverages the pathbreaking models like transformer models, generative adversarial networks, and variational autoencoders to leverage its full potential. The Transformer Model The transformer architecture relies on a self-attention mechanism, discarding sequential processing constraints found in recurrent neural networks. The model's attention mechanism allows it to weigh input tokens differently, enabling the capture of long-range dependencies and improving parallelization during training. Transformers consist of an encoder-decoder structure, with multiple layers of self-attention and feedforward sub-layers. Models like OpenAI's GPT series utilize transformer architectures for autoregressive language modeling, where each token is generated based on the preceding context. The bidirectional nature of self-attention, coupled with the ability to handle context dependencies effectively, results in the creation of coherent and contextually relevant sequences, making transformers a cornerstone in the development of large language models (LLMs) for diverse generative applications like machine translation, text summarization, question answering, and text generation. Figure 2. Transformer architecture Generative Adversarial Networks Comprising two neural networks, namely the discriminator and the generator, generative adversarial networks (GANs) operate through adversarial training to achieve unparalleled results in unsupervised learning. The generator, driven by random noise, endeavors to deceive the discriminator, which, in turn, aims to accurately distinguish between genuine and artificially produced data. This competitive interaction propels both networks toward continuous improvement, generating realistic and high-quality samples. GANs find versatility in a myriad of applications, notably in image synthesis, style transfer, and text-to-image synthesis. Variational Autoencoders Variational autoencoders (VAEs) are designed to capture and learn the underlying probability distribution of input data, enabling them to generate new samples that share similar characteristics. The architecture of a VAE consists of an encoder network, responsible for mapping input data to a latent space, and a decoder network, which reconstructs the input data from the latent space representation. A key feature of VAEs lies in their ability to model the uncertainty inherent in the data by learning a probabilistic distribution in the latent space. This is achieved through the introduction of a variational inference framework, which incorporates a probabilistic sampling process during training. Their applications span various domains, including image and text generation, and data representation learning in complex high-dimensional spaces. Figure 3. Q/A generation from image The State of the Art Generative AI, with its disruptive innovation, leaves a profound impact across the industry. Generative Use Cases and Applications Generative AI exhibits a broad range of applications across various industries, revolutionizing processes and fostering innovation. Table 3 showcases how it is reshaping various industries: Table 3. Applications of generative AI across industries Sector Applications Healthcare Medical image generation and analysis, drug discovery, personalized treatment plans Finance Personalized risk assessment and financial advice, compliance monitoring Marketing Content creation, ad copy generation, personalized marketing campaigns Manufacturing 3D model generation for product design Retail Personalized product recommendations, virtual try-on experiences Education Adaptive learning materials, content generation for e-learning platforms Legal Document summarization, contract drafting, legal research assistance Entertainment Scriptwriting assistance, video game content generation, music composition Human resources Employee training content generation The Business Benefits Generative AI offers a myriad of business benefits, including the amplification of creative capabilities, empowering enterprises to autonomously produce expansive and innovative content. It creates significant time and cost efficiencies by automating tasks that previously required human intervention. Hyper-personalized experiences are achieved through customer data, generating recommendations and offers tailored to individual preferences. Furthermore, generative AI enhances operational efficiency by automating intricate processes, optimizing workflows, and facilitating realistic simulations for training and entertainment. The technology's adaptive learning capabilities allow continuous improvement based on feedback and new data, culminating in refined performance over time. Lastly, generative AI elevates customer interaction with dynamic AI agents capable of providing responses that mimic human conversation, contributing to an enhanced customer experience. Managing the Risks of Generative AI Effectively managing the risks associated with the widespread adoption of generative AI is crucial as this technology transforms various business aspects. Ethical guidelines focused on accuracy, safety, honesty, empowerment, and sustainability provide a framework for responsible AI development. Integrating generative AI requires using reliable data, ensuring transparency, and maintaining a human-in-the-loop approach. Ongoing testing, oversight, and feedback mechanisms are essential to prevent unintended consequences. Generative AI for Enterprises This section delves into the key methodologies for enterprises to make a transformative leap in innovation and productivity. Build Foundation Models Foundation models (FMs) like BERT and GPT are trained on extensive, generalized, and unlabeled datasets, enabling them to excel in diverse tasks, including language understanding, text and image generation, and natural language conversation. These FMs serve as base models for specialized downstream applications, evolving over a decade to handle increasingly complex tasks. The ability to continually learn from data inputs during inference enhances their effectiveness, supporting tasks like language processing, visual comprehension, code generation, human-centered engagement, and speech-to-text applications. Figure 4. Foundation model Bring your own model (BYOM) is a commitment to amplifying the platform's versatility, fostering a collaborative environment, and propelling a new era of AI innovation. BYOM's promise lies in the freedom to innovate, offering a personalized approach to AI solutions that align with individual visions. Improving an existing model involves a multifaceted approach, encompassing fine-tuning, dataset augmentation, and architectural enhancements. Fine-Tuning While pre-trained language models offer the advantage of being trained on massive datasets and generating text akin to human language, they may not always deliver optimal performance in specific applications or domains. Fine-tuning involves updating pre-trained models with new information or data, allowing them to adapt to tasks or domains. Fine-tuning pre-trained models is crucial for achieving high accuracy and relevance in generating outputs, especially when dealing with specific and nuanced tasks within various domains. Reinforcement Learning From Human Feedback The primary objective of reinforcement learning from human feedback (RLHF) is to leverage human feedback to enhance the efficiency and accuracy of ML models, specifically those employing reinforcement learning methodologies to maximize rewards. The RLHF process involves stages such as data collection, supervised fine-tuning of a language model, building a separate reward model, and optimizing the language model with the reward-based model. Retrieval Augmented Generation LLMs are instrumental in tasks like question-answering and language translation. However, inherent challenges, such as potential inaccuracies and the static nature of training data, can impact reliability and user trust. Retrieval-augmented generation (RAG) addresses these issues by seamlessly integrating domain-specific or organizational knowledge into LLMs, enhancing their relevance, accuracy, and utility without necessitating retraining. Figure 5. Retrieval-augmented generation The Tech Stack The LLMOps tech stack encompasses five key areas. The table below exhibits the key components of the five tech stack areas: Table 4. LLMOps tech stack components Stack Area Key Components Data management Data storage and retrieval Data processing Quality control Data distribution Model management Hosting the model Model testing Version control and model tracking Model training and fine tuning Model deployment Frameworks Event-driven architecture Prompt engineering and optimization Prompt development and testing Prompt analysis Prompt versioning Prompt chaining and orchestration Monitoring and logging Performance monitoring Logging Performance Evaluation Quantitative methods offer objective metrics, utilizing scores like inception score, Fréchet inception distance, or precision and recall for distributions to quantitatively measure the alignment between generated and real data distributions. Qualitative methods delve into visual and auditory inspection, employing techniques like visual inspection, pairwise comparison, or preference ranking to gauge the realism, coherence, and appeal of generated data. Hybrid methods integrate both quantitative and qualitative approaches like human-in-the-loop evaluation, adversarial evaluation, or Turing tests. What's Next? The Future of Generative AI Looking at the future of generative AI, three transformative avenues stand prominently on the horizon. The Genesis of Artificial General Intelligence The advent of artificial general intelligence (AGI) heralds a transformative era. AGI aims to surpass current AI limitations, allowing systems to excel in tasks beyond predefined domains. It distinguishes itself through autonomous self-control, self-understanding, and the ability to acquire new skills akin to human problem-solving capacities. This juncture marks a critical moment in the pursuit of AGI, envisioning a future where AI systems possess generalized human cognitive abilities and transcend current technological limitations. Integrating Perceptual Systems Through Human Senses Sensory AI stands at the forefront of generative AI evolution. Beyond computer vision, sensory AI encompasses touch, smell, and taste, aiming for a nuanced, human-like understanding of the world. The emphasis on diverse sensory inputs, including tactile sensing, olfactory, and gustatory AI, signifies a move toward human-like interaction and recognition capabilities. Computational Consciousness Modeling Focused on attributes like fairness, empathy, and transparency, computational consciousness modeling (CoCoMo) employs consciousness modeling, reinforcement learning, and prompt template formulation to instill knowledge and compassion in AI agents. CoCoMo guides generative AI toward a future where ethical and emotional dimensions seamlessly coexist with computational capabilities, fostering responsible and empathetic AI agents. Parting Thoughts This article discussed the foundational concepts to diverse applications across modalities and delved into the mechanisms, highlighting the power of the transformer model and the creativity of GANs and VAEs. The journey encompassed business benefits, risk management, and a forward-looking perspective on unprecedented advancements and the potential emergence of AGI, sensory AI, and artificial consciousness. Finally, it is encouraged to contemplate the future implications and ethical dimensions of generative AI, acknowledging the transformative journey that presents both opportunities and responsibilities in integrating generative AI into our daily lives. Repositories: A curated list of modern Generative Artificial Intelligence projects and services Home of CodeT5: Open Code LLMs for Code Understanding and Generation StableStudio GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis This is an excerpt from DZone's 2024 Trend Report,Enterprise AI: The Emerging Landscape of Knowledge Engineering.Read the Free Report