Cloud + data orchestration: Demolish your data silos. Enable complex analytics. Eliminate I/O bottlenecks. Learn the essentials (and more)!
2024 DZone Community Survey: SMEs wanted! Help shape the future of DZone. Share your insights and enter to win swag!
DZone Annual Community Survey: What's in Your 2024 Tech Stack?
Spring Boot GoT: Game of Trace!
Enterprise Security
Security is everywhere: Behind every highly performant application, or even detected threat, there is a powerful security system and set of processes implemented. And in the off chance there are NOT such systems in place, that fact will quickly make itself known. We are living in an entirely new world, where bad actors are growing more and more sophisticated the moment we make ourselves "comfortable." So how do you remain hypervigilant in this ever so treacherous environment?DZone's annual Enterprise Security Trend Report has you covered. The research and expert articles explore the fastest emerging techniques and nuances in the security space, diving into key topics like CSPM, full-stack security practices and challenges, SBOMs and DevSecOps for secure software supply chains, threat hunting, secrets management, zero-trust security, and more. It's time to expand your organization's tactics and put any future attackers in their place as you hear from industry leaders and experts on how they are facing these challenges in everyday scenarios — because if there is one thing we know about the cyberspace, any vulnerabilities left to chance will always be exposed.
Open-Source Data Management Practices and Patterns
Data Orchestration on Cloud Essentials
Developers working on Explainable AI (XAI) must address several key aspects, such as the problem, boundary scope, and potential solutions of XAI, as well as some specific use cases and benefits that can enhance an organization's credibility when implementing or leveraging this technology. The more AI is incorporated into different sectors, developers play a critical role in making those systems interpretable and transparent. XAI is crucial in making AI models highly interpretable and debuggable; it also guarantees responsible use of highly complex AI technologies, which should be fair, transparent, and accountable to society's users and stakeholders. Importance of Explainability in AI The first issue with the development of AI is to make this technology transparent and explainable. According to a McKinsey Global Institute report, AI could add $2.6 trillion-$4.4 trillion annually in global corporate profits, and the World Economic Forum estimates an economic contribution for the same amounting to up to $15.7 trillion by 2030. This is a good reminder of AI's ever-growing impact on our society and why it's absolutely mandatory to build systems that are powerful but also explainable and trustworthy. Developer's View on Explainable AI Complexity vs. Interpretability The biggest challenge developers need help with regarding XAI is the tug-and-pull relationship between complexity (accurate models) and interpretability goals. Deep learning, ensemble methods, and Support Vector Machines (SVMs) are a few of the most accurate AI/ML models. On the downside, these models are often considered "black boxes," and decisions emanating from them become difficult to understand. This makes it all the more challenging for developers to accommodate this complexity and provide meaningful explanations about their functioning without hampering the performance of a model. Nascent Techniques XAI tools are immature, as the practice of explainability is new. These need to achieve more transparency to instill trust mechanisms in AI systems. However, some post-hoc explainability methods like SHAP and LIME offer insight into decision-making processes by a model. Tailoring Explanations to Context Another challenge for developers is putting the explanation in context. Machine learning models are typically deployed in various environments aimed at different user groups, with levels from the most erudite technical ones to users who require the utmost simplicity. The XAI system will vary based on the type of user requesting an explanation. Scope Interpretable Models Developers use some inherently interpretable models such as decision trees, linear models, or rule-based systems. Even if these models are less complex and feature-reduced, they provide explicit decision pathways. Post-Hoc Explainability It keeps the structure of black-box models, but developers can explain predictions using techniques such as SHAP, LIME, or feature importance visualizations. For example, when building an autonomous vehicle, the decisions made by the model installed in each car must be sensible (because it has to do with safety!). Deep learning models can be used on perception tasks but with post-hoc explanations. Algorithmic Transparency This is especially important in XAI, where the sector will fall under huge legal or ethical liabilities due to a decision made by an opaque AI. Wherever decisions are accountable, the algorithm also has to be! AI governance requires developers to ensure their AI models meet regulatory and ethical standards by providing clear, understandable explanations for how decisions are rendered. Benefits At this core, XAI builds trust with users and stakeholders in development efforts. Also, concerning high-stakes decisions, XAI independently facilitates trust-building between AI systems. The job of XAI in this age is to inform its users how their AI has managed contrast predictions. This is even more pronounced in a field like finance, where necessity could tie your model's output to investment health and allow you to make better decisions or risk forecasts so that you do not lose. Ultimately, XAI builds public trust by demonstrating that organizations have very tight and explainable AI systems. For instance, in the industries where AI can decide on finance, healthcare, or legal services, to name just a few — this fact drives other conceptual moments within XAI. However, at least those have explainable and localized results. XAI is for building consumer confidence around AI, which, in turn, needs to be overcome before it can expect any of the population to go out and start using technology with AI in general. Ethics and responsibilities are the main subjects of any state-of-the-art AI deployment that requires. Explainable AI is one way to ensure an ethical and responsible deployment of the AV model, ensuring that our algorithm does not behave like a black box where biases are stuck. Conclusion Thus, developers are key to furthering XAI by tackling the problem of designing AI systems with power and interpretability. This improves the usability and adoption of AI technologies and helps organizations generate trust in the marketplace. XAI is a technical necessity and a notable advantage that will enable deploying more AI systems in trust, compliance with regulations, and ethics, leaving room for further growth and broader influence across different industries.
Previous Articles on Snowflake Tour of Snowflake ingestion using CockroachDB and Redpanda Connect Integrating Snowflake with Trino Previous Articles on CockroachDB CDC Emitting Protocol Buffers with CockroachDB CDC Queries Using CockroachDB CDC with Apache Pulsar Using CockroachDB CDC with Azure Event Hubs SaaS Galore: Integrating CockroachDB with Confluent Kafka, FiveTran, and Snowflake Using CockroachDB CDC with Confluent Cloud Kafka and Schema Registry CockroachDB CDC using Minio as cloud storage sink CockroachDB CDC using Hadoop Ozone S3 Gateway as cloud storage sink Motivation This article builds upon the previous discussion in "Tour of Snowflake ingestion using CockroachDB and Redpanda Connect," where we investigated the process of streaming changefeeds from CockroachDB to Snowflake using Redpanda Connect and Snowpipe in batch mode. Here, we will shift our focus to Kafka Connect and demonstrate how both batch and streaming modes can be utilized for data ingestion into Snowflake. Overview Deploy a CockroachDB cluster with enterprise changefeeds Deploy Snowflake Deploy Kafka Connect Verify Conclusion Detailed Instructions Deploy a CockroachDB Cluster With Enterprise Changefeeds Start by either launching a CockroachDB instance or utilizing a managed service. To enable CDC, execute the following commands: SET CLUSTER SETTING cluster.organization = '<organization name>'; SET CLUSTER SETTING enterprise.license = '<secret>'; SET CLUSTER SETTING kv.rangefeed.enabled = true; Verify that changefeeds are enabled: SHOW CLUSTER SETTING kv.rangefeed.enabled; If the value is false, update it to true. Create a source table: CREATE TABLE cockroachdb ( id INT PRIMARY KEY, value STRING DEFAULT md5(random()::text), created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT NULL); Insert random data: INSERT INTO cockroachdb SELECT (generate_series(1, 10000)); Update a row: UPDATE cockroachdb SET value = 'UPDATED', updated_at = now() WHERE id = 1; Create a changefeed job pointing to a local instance of Redpanda: CREATE CHANGEFEED FOR TABLE cockroachdb INTO 'kafka://redpanda:29092'; Inspect the data: SELECT * FROM cockroachdb LIMIT 5; id | value | created_at | updated_at -----+----------------------------------+-------------------------------+-------------------------------- 1 | UPDATED | 2024-09-09 13:17:57.837984+00 | 2024-09-09 13:17:57.917108+00 2 | 27a41183599c44251506e2971ba78426 | 2024-09-09 13:17:57.837984+00 | NULL 3 | 3bf8bc26a750a15691ec4d7ddbb7f5e5 | 2024-09-09 13:17:57.837984+00 | NULL 4 | b8c5786e8651ddfb3a68eabeadb52f2e | 2024-09-09 13:17:57.837984+00 | NULL 5 | 3a24df165773639ce89d0d877e7103b7 | 2024-09-09 13:17:57.837984+00 | NULL (5 rows) The next step is to set up the Snowflake Kafka connector. Deploy Snowflake Create a database and schema for outputting changefeed data: USE ROLE SYSADMIN; CREATE OR REPLACE DATABASE KAFKADB; CREATE OR REPLACE SCHEMA kafka_schema; Follow the Snowflake documentation to configure the Kafka connector. Create the necessary tables: create or replace table kafkatb_batch( RECORD_METADATA VARIANT, RECORD_CONTENT VARIANT ); create or replace table kafkatb_streaming( RECORD_METADATA VARIANT, RECORD_CONTENT VARIANT ); Set up roles and permissions: -- Use a role that can create and manage roles and privileges. USE ROLE securityadmin; -- Create a Snowflake role with the privileges to work with the connector. CREATE OR REPLACE ROLE kafka_connector_role_1; -- Grant privileges on the database. GRANT USAGE ON DATABASE kafkadb TO ROLE kafka_connector_role_1; -- Grant privileges on the schema. GRANT USAGE ON SCHEMA KAFKADB.kafka_schema TO ROLE kafka_connector_role_1; GRANT CREATE TABLE ON SCHEMA KAFKADB.kafka_schema TO ROLE kafka_connector_role_1; GRANT CREATE STAGE ON SCHEMA KAFKADB.kafka_schema TO ROLE kafka_connector_role_1; GRANT CREATE PIPE ON SCHEMA KAFKADB.kafka_schema TO ROLE kafka_connector_role_1; -- Only required if the Kafka connector will load data into an existing table. GRANT OWNERSHIP ON TABLE KAFKADB.KAFKA_SCHEMA.kafkatb_batch TO ROLE kafka_connector_role_1; GRANT OWNERSHIP ON TABLE KAFKADB.KAFKA_SCHEMA.kafkatb_streaming TO ROLE kafka_connector_role_1; -- Grant the custom role to an existing user. GRANT ROLE kafka_connector_role_1 TO USER username; -- Set the custom role as the default role for the user. -- If you encounter an 'Insufficient privileges' error, verify the role that has the OWNERSHIP privilege on the user. ALTER USER username SET DEFAULT_ROLE = kafka_connector_role_1; Ensure you follow the documentation for setting up key pair authentication for the Snowflake Kafka connector. Deploy Kafka Connect Run Redpanda using Docker Compose. docker compose -f compose-redpandadata.yaml up -d Once up, navigate to the Redpanda Console. Click into the cockroachdb topic: Install the Snowflake Kafka connector: confluent-hub install --no-prompt snowflakeinc/snowflake-kafka-connector:latest Use the following configuration for Kafka Connect in distributed mode, saved as connect-distributed.properties: bootstrap.servers=172.18.0.3:29092 group.id=connect-cluster key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=true value.converter.schemas.enable=true offset.storage.topic=connect-offsets offset.storage.replication.factor=1 config.storage.topic=connect-configs config.storage.replication.factor=1 status.storage.topic=connect-status status.storage.replication.factor=1 offset.flush.interval.ms=10000 plugin.path=/usr/share/confluent-hub-components,plugin.path=/usr/local/share/kafka/plugins,/usr/share/filestream-connectors Deploy Kafka Connect in distributed mode: ./kafka-connect/bin/connect-distributed.sh connect-distributed.properties Register the Snowflake connector with the following configuration, saved as snowflake-sink-batch.json: { "name":"snowflake-sink-batch", "config":{ "connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector", "tasks.max":"8", "topics":"cockroachdb", "snowflake.topic2table.map": "cockroachdb:kafkatb_batch", "buffer.count.records":"10000", "buffer.flush.time":"60", "buffer.size.bytes":"5000000", "snowflake.url.name":"account-name:443", "snowflake.user.name":"username", "snowflake.private.key":"private-key", "snowflake.private.key.passphrase":"", "snowflake.database.name":"kafkadb", "snowflake.schema.name":"kafka_schema", "snowflake.role.name":"kafka_connector_role_1", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter" } } Publish the connector configuration: curl -d @"snowflake-sink-batch.json" -H "Content-Type: application/json" -X POST http://kafka-connect:8083/connectors Verify the connector in the Kafka Connect UI and in the Kafka Connect section of the Redpanda Console. If you click on the snowflake-sink-batch sink, you can see additional information. The comprehensive steps needed to set this up are thoroughly outlined in the tutorial. Data will now flow into Snowflake in batch mode, with updates occurring every 60 seconds as determined by the buffer.flush.time parameter. You can now query the data in Snowflake: select * from kafkatb_batch limit 5; If everything is configured correctly, the data from CockroachDB should be available in Snowflake in real-time or in batches, depending on your configuration. record_metadata: { "CreateTime": 1725887877966, "key": "[3]", "offset": 30007, "partition": 0, "topic": "cockroachdb" } record_content: { "after": { "created_at": "2024-09-09T13:17:57.837984Z", "id": 1, "updated_at": "2024-09-09T13:17:57.917108Z", "value": "UPDATED" } } The next step is to configure the connector in streaming mode. First, stop the current connector with the following command: curl -X DELETE http://localhost:8083/connectors/snowflake-sink-batch The updated connector configuration will appear as follows: { "name":"snowflake-sink-streaming", "config":{ "connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector", "tasks.max":"8", "topics":"cockroachdb", "snowflake.topic2table.map": "cockroachdb:kafkatb_streaming", "buffer.count.records":"10000", "buffer.flush.time":"10", "buffer.size.bytes":"5000000", "snowflake.url.name":"<snowflake-account>:443", "snowflake.user.name":"username", "snowflake.private.key":"private-key", "snowflake.private.key.passphrase":"", "snowflake.database.name":"kafkadb", "snowflake.schema.name":"kafka_schema", "snowflake.role.name":"kafka_connector_role_1", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":"false", "snowflake.ingestion.method": "SNOWPIPE_STREAMING", "errors.log.enable":"true", "schemas.enable":"false" } } Take note of the snowflake.ingestion.method parameter. This feature removes the need to wait 60 seconds to push data to Snowflake, allowing us to reduce the buffer.flush.time to 10 seconds. To deploy the connector, use the following command: curl -d @"snowflake-sink-streaming.json" -H "Content-Type: application/json" -X POST http://kafka-connect:8083/connectors Shortly after deployment, the data will be available in the Snowflake table. The previous examples demonstrated how data was ingested into predefined Snowflake tables. The following method will automatically infer the schema from the Kafka messages: { "name":"snowflake-sink-streaming-schematized", "config":{ "connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector", "tasks.max":"8", "topics":"cockroachdb", "snowflake.topic2table.map": "cockroachdb:kafkatb_streaming_schematized", "buffer.count.records":"10000", "buffer.flush.time":"10", "buffer.size.bytes":"5000000", "snowflake.url.name":"<snowflake-account>:443", "snowflake.user.name":"username", "snowflake.private.key":"private-key", "snowflake.private.key.passphrase":"", "snowflake.database.name":"kafkadb", "snowflake.schema.name":"kafka_schema", "snowflake.role.name":"kafka_connector_role_1", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":"false", "snowflake.ingestion.method": "SNOWPIPE_STREAMING", "errors.log.enable":"true", "schemas.enable":"false", "snowflake.enable.schematization": "TRUE" } } Save this as snowflake-sink-streaming-schematized.json and deploy it using: curl -d @"snowflake-sink-streaming-schematized.json" -H "Content-Type: application/json" -X POST http://kafka-connect:8083/connectors Upon deployment, a new table will be created in Snowflake with the following schema: create or replace TABLE KAFKADB.KAFKA_SCHEMA.KAFKATB_STREAMING_SCHEMATIZED ( RECORD_METADATA VARIANT COMMENT 'created by automatic table creation from Snowflake Kafka Connector', AFTER VARIANT COMMENT 'column created by schema evolution from Snowflake Kafka Connector' ); To inspect the table, use the following query: SELECT after AS record FROM kafkatb_streaming_schematized LIMIT 5; Sample result: { "created_at": "2024-09-09T16:39:34.993226Z", "id": 18712, "updated_at": null, "value": "0d6bd8a4a790aab95c97a084d17bd820" } Verify We can flatten the data for easier manipulation using the following query: USE ROLE securityadmin; GRANT CREATE VIEW ON SCHEMA KAFKADB.kafka_schema TO ROLE kafka_connector_role_1; USE ROLE kafka_connector_role_1; USE DATABASE KAFKADB; USE SCHEMA KAFKA_SCHEMA; CREATE VIEW v_kafkatb_batch_flattened AS SELECT PARSE_JSON(record_content:after):id AS ID, PARSE_JSON(record_content:after):value AS VALUE, PARSE_JSON(record_content:after):created_at AS CREATED_AT, PARSE_JSON(record_content:after):updated_at AS UPDATED_AT FROM kafkatb_batch; SELECT * FROM v_kafkatb_batch_flattened limit 1; ID VALUE CREATED_AT UPDATED_AT 1 "UPDATED" "2024-09-09T13:17:57.837984Z" "2024-09-09T13:17:57.917108Z" Alternatively, for the schematized table, the view creation statement would be: CREATE VIEW v_kafkatb_streaming_schematized_flattened AS SELECT PARSE_JSON(after):id AS ID, PARSE_JSON(after):value AS VALUE, PARSE_JSON(after):created_at AS CREATED_AT, PARSE_JSON(after):updated_at AS UPDATED_AT FROM kafkatb_streaming_schematized; To verify the data flow, make an update in CockroachDB and check for the changes in Snowflake: UPDATE cockroachdb SET value = 'UPDATED', updated_at = now() WHERE id = 20000; In Snowflake, execute the following query to confirm the update: SELECT * FROM v_kafkatb_streaming_schematized_flattened where VALUE = 'UPDATED'; Sample result: ID VALUE CREATED_AT UPDATED_AT 20000 "UPDATED" "2024-09-09T18:15:13.460078Z" "2024-09-09T18:16:56.550778Z" 19999 "UPDATED" "2024-09-09T18:15:13.460078Z" "2024-09-09T18:15:27.365272Z" The architectural diagram is below: Conclusion In this process, we explored Kafka Connect as a solution to stream changefeeds into Snowflake. This approach provides greater control over how messages are delivered to Snowflake, leveraging the Snowflake Kafka Connector with Snowpipe Streaming for real-time, reliable data ingestion.
Roy Fielding created REST as his doctorate dissertation. After reading it, I would boil it down to three basic elements: A document that describes object state A transport mechanism to transmit the object state back and forth between systems A set of operations to perform on the state While Roy was focused solely on HTTP, I don't see why another transport could not be used. Here are some examples: Mount a WebDAV share (WebDAV is an HTTP extension, so is still using HTTP). Copy a spreadsheet (.xls, .xlsx, .csv, .ods) into the mounted folder, where each row is the new/updated state. The act of copying into the share indicates the operation of upserting, the name of the file indicates the type of data, and the columns are the fields. The server responds with (document name)-status.(document suffix), which provides a key for each row, a status, and possibly an error message. In this case, it does not really make sense to request data. Use gRPC. The object transmitted is the document, HTTP is the transport, and the name of the remote method is the operation. Data can be both provided and requested. Use FTP. Similar to WebDAV, it is file-based. The PUT command is upserting, and the GET command is requesting. GET only provides a filename, so it generally provides all data of the specified type. It is possible to allow for special filenames that indicate a hard-coded filter to GET a subset of data. Whenever I see REST implementations in the wild, they often do not follow basic HTTP semantics, and I have never seen any explanation given for this, just a bunch of varying opinions. None of those I found referenced the RFC. Most seem to figure that: POST = Create PUT = Update the whole document PATCH = Update a portion of a document GET = Retrieve the whole document This is counter to what HTTP states regarding POST and PUT: PUT is "create" or "update". GET generally returns whatever was last PUT. If PUT creates, it MUST return 201 Created. If PUT updates, it MUST return 200 OK or 204 No Content. The RFC suggests the content for 200 OK of a PUT should be the status of the action. I think it would ok in the case of SQL to return the new row from a select statement. This has the advantage that any generated columns are returned to the caller without having to perform a separate GET. POST processes a resource according to its own semantics. Older RFCs said POST is for subordinates of a resource. All versions give the example of posting an article to a mailing list; all versions say if a resource is created that 201 Created SHOULD be returned. I would argue that effectively what POST really means is: Any data manipulation except create, full/partial update, or delete Any operation that is not data manipulation, such as: Perform a full-text search for rows that match a phrase. Generate a GIS object to display on a map. The word MUST means your implementation is only HTTP compliant if you do what is stated. Using PUT only for updates obviously won't break anything, just because it isn't RFC compliant. If you provide clients that handle all the details of sending/receiving data, then what verbs get used won't matter much to the user of the client. I'm the kind of guy who wants a reason for not following the RFC. I have never understood the importance of separating create from update in REST APIs, any more than in web apps. Think about cell phone apps like calendar appointments, notes, contacts, etc: "Create" is hitting the plus icon, which displays a new form with empty or default values. "Update" is selecting an object and hitting the pencil icon, which displays an entry form with current values. Once the entry form appears, it works exactly the same in terms of field validations. So why should REST APIs and web front ends be any different than cell phone apps? If it is helpful for phone users to get the same data entry form for "create" and "update," wouldn't it be just as helpful to API and web users? If you decide to use PUT as "create" or "update", and you're using SQL as a store, most vendors have an upsert query of some sort. Unfortunately, that does not help to decide when to return 200 OK or 201 Created. You'd have to look at the information your driver provides when a DML query executes to find a way to distinguish insert from update for an upsert or use another query strategy. A simple example would be to perform an update set ... where pk column = pk value. If one row was affected, then the row exists and was updated; otherwise, the row does not exist and an insert is needed. On Postgres, you can take advantage of the RETURNING clause, which can actually return anything, not just row data, as follows: SQL INSERT INTO <table> VALUES (...) ON CONFLICT(<pk column>) DO UPDATE SET (...) RETURNING (SELECT COUNT(<pk column>) FROM <table> WHERE <pk column> = <pk value>) exists The genius of this is that: The subselect in the RETURNING clause is executed first, so it determines if the row exists before the INSERT ON CONFLICT UPDATE query executes. The result of the query is one column named "exists", which is 1 if the row existed before the query executed, 0 if it did not. The RETURNING clause can also return the columns of the row, including anything generated that was not provided. You only have to figure out once how to deal with if an insert or update is needed and make a simple abstraction that all your PUTs can call that handles 200 OK or 201 Created. One nice benefit of using PUT as intended is that as soon as you see a POST you know for certain it is not retrieval or persistence, and conversely, you know to search for POST to find the code for any operation that is not retrieval or persistence. I think the benefits of using PUT and POST as described in the RFC outweigh whatever reasons people have for using them in a way that is not RFC-compliant.
Most organizations face challenges while adapting to data platform modernization. The critical challenge that data platforms have faced is improving the scalability and performance of data processing due to the increased volume, variety, and velocity of data used for analytics. This article aims to summarize answers to the challenging questions of data platform modernization, and here are a few questions: How can we onboard new data sources with no code or less code? What steps are required to improve data integrity among various data source systems? How can continuous integration/continuous development workflows across environments be simplified? How can we improve the testing process? How do we identify data quality issues early in the pipeline? Evolution of Data Platforms The evolution of data platforms and corresponding tools achieved considerable advancements driven by data's vast volume and complexity. Various data platforms have been used for a long time to consolidate data by extracting it from a wide array of heterogeneous source systems and integrating them by cleaning, enriching, and nurturing the data to make it easily accessible to different business users and cross-teams in an organization. The on-premises Extract, Transform, Load (ETL) tools are designed to process data for large-scale data analysis and integration into a central repository optimized for read-heavy operations. These tools manage structured data. All the organizations started dealing with vast amounts of data as Big Data rose. It is a distributed computing framework for processing large data sets. Tools like HDFS (Hadoop) and MapReduce enabled the cost-effective handling of vast data. These ETL tools encountered data complexity, scalability, and cost challenges, leading to No-SQL Databases such as MongoDB, Cassandra, and Redis, and these platforms excelled at handling unstructured or semi-structured data and provided scalability for high-velocity applications. The need for faster insights led to the evolution of data integration tools to support real-time and near-real-time ingestion and processing capabilities, such as Apache Kafka for real-time data streaming, Apache Storm for real-time data analytics, real-time machine learning, and Apache Pulsar for distributed messaging and streaming. Many more data stream applications are available. Cloud-based solutions like cloud computing and data warehouses like Amazon RDS, Google Big Query, and Snowflake offer scalable and flexible database services with on-demand resources. Data lake and lake warehouse formation on cloud platforms such as AWS S3 and Azure Data Lake allowed for storing raw, unstructured data in its native format. This approach provided a more flexible and scalable alternative to traditional data warehouses, enabling more advanced analytics and data processing. They provide a clear separation between computing and storage with managed services for transforming data within the database. With the integration of AI/ML into data platforms through tools such as Azure Machine Learning and AWS Machine Learning, Google AI data analysis is astonishing. Automated insights, predictive analytics, and natural language querying are becoming more prevalent, enhancing the value extracted from data. Challenges While Adapting a Data Platform Modernization Data platform modernization is essential for staying competitive and controlling the full potential of data. The critical challenge data platforms have faced is improving the scalability and performance of data processing due to the increased volume, variety, and velocity of data used for analytics. Most of the organizations are facing challenges while adapting to data platform modernization. The key challenges are: Legacy systems integration: Matching Apple to Apple is complex because outdated legacy source systems are challenging to integrate with modern data platforms. Data migration and quality: Data cleansing and quality issues are challenging to fix during data migration. Cost management: Due to the expensive nature of data modernization, budgeting and managing the cost of a project are significant challenges. Skills shortage: Retaining and finding highly niche skilled resources takes much work. Data security and privacy: Implementing robust security and privacy policies can be complex, as new technologies come with new risks on new platforms. Scalability and flexibility: The data platforms should be scalable and adapt to changing business needs as the organization grows. Performance optimization: It is essential to ensure that new platforms will perform efficiently under various data loads and scales, and increasing data volumes and queries is challenging. Data governance and compliance: It is challenging to implement data governance policies and comply with regulatory requirements in a new environment if there is no existing data strategy defined for strategic solutions across the organization. Vendor lock-in: Organizations should look for interoperability and portability while modernizing instead of having a single vendor locked in. User adoption: To get end users' buy-in, we must provide practical training and communication strategies. ETL Framework and Performance The ETL Framework impacts performance in several aspects within any data integration. The framework's performance is evaluated against the following metrics. Process utilization Memory usage Time Network bandwidth utilization Let us review how cloud-based ETL tools, as a framework, support fundamental data operations principles. This article covers how to simplify Data Operations with advanced ETL tools. For example, we will cover the Coalesce cloud-based ETL tool. Collaboration: The advanced cloud-based ETL tools allow data transformations written using platform native code and provide documentation within the models to generate clear documentation, making it easier for the data teams to understand and collaborate on data transformations. Automation: These tools allow data transformations and test cases to be written as code with explicit dependencies, automatically enabling the correct order of running scheduled data pipelines and CI/CD jobs. Version control: These tools seamlessly integrate with GitHub, Bitbucket, Azure DevOps, and GitLab, enabling the tracking of model changes and allowing teams to work on different versions of models, facilitating parallel development and testing. Continuous Integration and Continuous Delivery (CI/CD): ETL frameworks allow businesses to automate deployment processes by identifying changes and running impacted models and their dependencies along with the test cases, ensuring the quality and integrity of data transformations. Monitoring and observability: The modern data integration tools allow to run data freshness and quality checks to identify potential issues and trigger alerts, Modularity and reusability: It also encourages breaking down transformations into smaller, reusable models and allows sharing models as packages, facilitating code reuse across projects. Coalesce Is One of the Choices Coalesce is a cloud-based ELT (Extract Load and Transform) and ETL (Extract Transform and Load) tool that adopts data operation principles and uses tools that natively support them. It is one tool backed by the Snowflake framework for modern data platforms. Figure 1 shows an automated process for data transformation on the Snowflake platform. Coalesce generates the Snowflake native SQL code. Coalesce is a no/low-code data transformation platform. Figure 1: Automating the data transformation process using Coalesce The Coalesce application comprises a GUI front end and a backend cloud data warehouse. Coalesce has both GUI and Codebase environments. Figure 2 shows a high-level Coalesce application architecture diagram. Figure 2: Coalesce Application Architecture (Image Credit: Coalesce) Coalesce is a data transformation tool that uses graph-like data pipelines to develop and define transformation rules for various data models on modern platforms while generating Structured Query Language (SQL) statements. Figure 3 shows the combination of templates and nodes, like data lineage graphs with SQL, which makes it more potent for defining the transformation rules. Coalesce code-first GUI-driven approach has made building, testing, and deploying data pipelines easier. This coalesce framework improves the data pipeline development workflow compared to creating directed acyclic graphs (or DAGs) purely with code. Coalesce has column-aware inbuild column integrated functionality in the repository, which allows you to see data lineage for any column in the graphs.) Figure 3: Directed Acyclic Graph with various types of nodes (Image Credit: Coalesce) Set up projects and repositories. The Continuous Integration (CI)/Continuous Development (CD) workflow without the need to define the execution order of the objects. Coalesce tool supports various DevOps providers such as GitHub, Bitbucket, GitLab, and Azure DevOps. Each Coalesce project should be tied to a single git repository, allowing easy version control and collaboration. Figure 4: Browser Git Integration Data Flow (Image Credit: Coalesce) Figure 4 demonstrates the steps for browser Git Integration with Coalesce. This article will detail the steps to configure Git with Coalesce. The reference link guide will provide detailed steps on this configuration. When a user submits a Git request from the browser, an API call sends an authenticated request to the Coalesce backend (1). Upon successful authentication (2), the backend retrieves the Git personal access token (PAT) for the user from the industry standard credential manager (3) in preparation for the Git provider request. The backend then communicates directly over HTTPS/TLS with the Git provider (4) (GitHub, Bitbucket, Azure DevOps, GitLab), proxying requests (for CORS purposes) over HTTPS/TLS back to the browser (5). The communication in part 5 uses the native git HTTP protocol over HTTPS/TLS (this is the same protocol used when performing a git clone with an HTTP git repository URL). Set up the workspace. Within a project, we can create one or multiple Development Workspaces, each with its own set of code and configurations. Each project has its own set of deployable Environments, which can used to test and deploy code changes to production. In the tool itself, we configure Storage Locations and Mappings. A good rule is to create target schemas in Snowflake for DEV, QA, and Production. Then, map them in Coalesce. The build interface is where we will spend most of our time creating nodes, building graphs, and transforming data. Coalesce comes with default node types that are not editable. However, they can be duplicated and edited, or new ones can made from scratch. The standard nodes are the source node, stage node, persistent stage node, fact node, dimension node with SCD Type 1 and Type 2 support, and view node. With very ease of use, we can create various nodes and configure properties in a few clicks. A graph represents an SQL pipeline. Each node is a logical representation and can materialize as a table or a view in the database. User-defined nodes: Coalesce has User-Defined Nodes (UDN) for any particular object types or standards an organization may want to enforce. Coalesce packages have built-in nodes and templates for building Data Vault objects like Hubs, Links, PIT, Bridge, and Satellites. For example, package id for Data Vault 2.0 can be installed in the project's workspace. Investigate the data issues without inspecting the entire pipeline by narrowing the analysis using a lineage graph and sub-graphs. Adding new data objects without worrying about the orchestration and defining the execution order is easy. Execute tests through dependent objects and catch errors early in the pipeline. Node tests can run before or after the node's transformations, and this is user-configurable. Deployment interface: Deploy data pipelines to the data warehouse using Deployment Wizard. We can select the branch to deploy, override default parameters if required, and review the plan and deployment status. This GUI interface can deploy the code across all environments. Data refresh: We can only refresh it if we have successfully deployed the pipeline. Refresh runs the data transformations defined in data warehouse metadata. Use refresh to update the pipeline with any new changes from the data warehouse. To only refresh a subset of data, use Jobs. Jobs are a subset of nodes created by the selector query run during a refresh. In coalescing in the build interface, create a job, commit it to git, and deploy it to an environment before it can used. Orchestration: Coalesce orchestrates the execution of a transformation pipeline and allows users the freedom and flexibility to choose a scheduling mechanism for deployments and job refreshes that fit their organization's current workflows. Many tools, such as Azure Data Factory, Apache Airflow, GitLab, Azure DevOps, and others, can automate execution according to time or via specific triggers (e.g., upon code deployment). Snowflake also comes in handy by creating tasks and scheduling on Snowflake. Apache Airflow is a standard orchestrator used with Coalesce. Rollback: To roll back a deployment in Coalesce and restore the environment to its prior state regarding data structures, redeploy the commit deployed just before the deployment to roll back. Documentation: Coalesce automatically produces and updates documentation as developers work, freeing them to work on higher-value deliverables. Security: Coalesce never stores data at rest and data in motion is always encrypted, data is secured in the Snowflake account. Upsides of Coalesce Feature Benefits Template-driven development Speed development; Change once, update all Auto generates code Enforce standards w/o reviews Scheduled execution Automates pipelines with 3rd party orchestration tools such as Airflow, Git, or Snowflake tasks to schedule the jobs Flexible coding Facilitates self-service and easy to code Data lineage Perform impact analysis Auto generates documentation Quick to onboard new staff Downsides of Coalesce Being Coalesce is a comprehensive data transformation platform with robust data integration capabilities it has some potential cons of using it as an ELT/ETL tool: Coalesce is built exclusively to support Snowflake. Reverse engineering schema from Snowflake into coalesce is not straightforward. Certain YAML files and configuration specification updates are required to get into graphs. The YAML file should be built with specifications to meet reverse engineering into graphs. The lack of logs after deployment and lack of logs during the data refresh phase can result in vague errors that are difficult to resolve issues. Infrastructure changes can be difficult to test and maintain, leading to frequent job failures. The CI/CD should be performed in a strictly controlled form. No built-in scheduler is available in the Coalesce application to orchestrate jobs like other ETL tools such as DataStage, Talend, Fivetran, Airbyte, and Informatica. Conclusions Here are the key take away from this article: As data platforms become more complex, managing them becomes difficult, and embracing the Data Operations principle is the way to address data operation challenges. We looked at the capabilities of ETL Frameworks and their performance. We examined Coalesce as a solution that supports data operation principles and allows us to build automated, scalable, agile, well-documented data transformation pipelines on a cloud-based data platform. We discussed the ups and downsides of Coalesce.
Continuous Delivery is a practice and methodology that helps you build and deploy your software faster so that it can be released to production systems at any time. It facilitates shortening the lifecycle times of various development and operations processes. Effectively applying the concepts of Continuous Integration (CI) and Continuous Deployment (CD) helps achieve the benefits of the continuous delivery principles, also enabling faster software releases. We explore the challenges encountered by software teams implementing CI/CD and demonstrate how feature flags can help mitigate these risks. Introduction To CI/CD CI/CD ensures that the development teams frequently integrate their code changes into the main product. CI involves frequent integration of code into a shared repository with automated testing to catch issues early, while CD extends this by automating deployments for reliable and frequent releases. However, teams face challenges such as complex tool integration, maintaining extensive automated tests, ensuring environment consistency, and overcoming cultural resistance. Mitigating Continuous Delivery Challenges with Feature Flags Technical Challenges Complex Merging and Integration Issues Challenge: Frequent changes to the code can cause merge conflicts, which makes it challenging to smoothly integrate different branches of the project. Solution with feature flags: Feature flags allow new features to be integrated into the main branch while still being hidden from users. This approach helps reduce the need for long-lived branches and minimizes merge conflicts because the code can be merged more frequently. Testing Bottlenecks Challenge: As the codebase expands, it can become increasingly challenging to ensure thorough test coverage and keep automated test suites up to date. Solution with feature flags: Feature flags let you test new features in a live production environment without exposing them to all users. This allows for more thorough real-world testing and gradual rollouts, reducing the pressure on automated test suites. Environment Consistency Challenge: Maintaining consistency across different deployment environments can be challenging, often resulting in configuration drift and potential issues during deployment. Solution with feature flags: Feature flags can be used to manage environment-specific configurations, to make sure that features behave consistently across environments by toggling them as needed. Deployment Failures Challenge: Managing failed deployments gracefully and implementing a rollback strategy is essential to keep the system stable. Solution with feature flags: Feature flags provide a quick way to disable troublesome features without needing to roll back the entire deployment. This helps reduce downtime and enables quick recovery from deployment issues. Tooling and Infrastructure Challenge: Choosing and setting up the right CI/CD tools, as well as maintaining the CI/CD infrastructure, can be complicated and require a lot of resources. Solution with feature flags: Feature flags can reduce the dependency on complex infrastructure by enabling gradual rollouts and testing in production, which can reduce the reliance on CI/CD tools and infrastructure. Organizational Challenges Cultural Resistance Challenge: Overcoming resistance to change and fostering a culture of continuous improvement can be difficult. Solution with feature flags: Feature flags promote a culture of experimentation and continuous delivery by allowing teams to release features incrementally and gather feedback early, demonstrating the benefits of agile practices. Skill Gaps Challenge: Training team members on CI/CD best practices and keeping up with the latest technologies Solution with feature flags: Feature flags offer gradual rollout and rollback options, acting as a safety net that lets teams slowly and safely adopt new practices and technologies. Process-Related Challenges Defining Effective Pipelines Challenge: Designing and continuously optimizing efficient CI/CD pipelines Solution with feature flags: Feature flags simplifies pipeline design by decoupling deployment from release, leading to simpler, faster pipelines with fewer dependencies and less complexity. Maintaining High Velocity Challenge: Balancing the speed of delivery with quality and stability Solution with feature flags: Feature flags help deliver features quickly by allowing features to be deployed in a controlled, ensuring both high quality and stability while keeping up the pace. Continuous Monitoring and Feedback Monitoring and Observability Challenge: Implementing effective monitoring and observability practices to quickly detect and resolve issues Solution with feature flags: Feature flags can be monitored and toggled based on their performance metrics and user feedback, allowing for quick response to issues and keeping the system reliable. Feedback Loops Challenge: Establishing rapid feedback loops from production to continuously improve Solution with feature flags: Feature flags allow for A/B testing and controlled rollouts, giving valuable feedback on new features and enabling continuous improvement based on real user data. Best Practices for Using Feature Flags With CI/CD Pipelines Integrating a centralized feature flag management system into CI/CD pipelines can significantly enhance deployment processes. Here are the few best practices: Choose a feature flag management system that integrates well with your CI/CD tools and workflows. It would be beneficial if the feature flag system supports workflows as the deployment process involves workflow management for change requests. Use consistent and descriptive names for feature flags to avoid confusion. Establish clear processes for creating, updating, and retiring feature flags. Use CI/CD pipeline scripts or APIs provided by the feature flag management system to automate the creation, modification, and deletion of feature flags. Introduce feature flags at the beginning of the development lifecycle. Use feature flags to perform canary releases and gradual rollouts, starting with a small subset of users and expanding gradually. Track feature flag usage, performance, and impact on system metrics. Integrate feature flag data with monitoring and analytics tools to gain insights and make informed decisions. Implement role-based access control (RBAC) to restrict who can create, modify, or delete feature flags. Include feature flags in your automated testing processes. Configure feature flags differently for development, testing, staging, and production environments. Utilize secret-type support within the feature flag storage to securely store all sensitive configuration data used in the pipelines. Feature Toggle Management Tools There are several feature flag management systems that can be integrated with CI/CD pipelines to enhance the deployment process. Here are some options: IBM Cloud App Configuration: IBM Cloud App Configuration is a centralized feature management and configuration service available on IBM Cloud for use with web and mobile applications, microservices, and distributed environments. This has a native integration with IBM Cloud Continuous delivery toolchains. LaunchDarkly: A feature flag management tool that allows you to control the release of new features and changes using feature flags; integrates with popular CI/CD tools like Jenkins, CircleCI, and GitLab Unleash: An open-source feature flag management system that provides flexibility for custom integrations; this works well with CI/CD tools such as Jenkins, GitHub Actions, and GitLab CI. Optimizely: A feature flagging and experimentation platform that focuses on A/B testing and performance optimization; supports integrations with CI/CD tools such as Jenkins, CircleCI, and GitHub Actions FeatureHub: An open-source feature management service that can be integrated with CI/CD tools such as Jenkins and GitHub Actions Conclusion Feature flags have become a powerful tool for continuous delivery processes. By weaving feature flags into CI/CD pipelines, development teams can enjoy greater control, flexibility, and safety in their deployments. When you embrace feature flags not just in development but also throughout the deployment process, you pave the way for smoother releases, happier users, and a more dynamic approach to software development and delivery.
The Meta of Design With several decades of experience, I love building enterprise applications for companies. Each solution requires a set of models: an SQL database, an API (Application Programming Interface), declarative rules, declarative security (role-based access control), test-driven scenarios, workflows, and user interfaces. The "meta" approach to design requires thinking of how each of these components interacts with the other. We also need to understand how changes in the scope of the project impact each of these meta-components. While I have worked in many different languages (APL, Revelation/PICK, BASIC, Smalltalk, Object/1, Java, JavaScript, Node.js, Python) these models are always the foundation that influences the final integrated solution. Models are meta abstractions that describe how the shape, content, and ability of the object will behave in the running environment regardless of language, platform, or operating system (OS). Model First Approach Starting with an existing SQL Schema and a good ORM allows the abstraction of the database and the generation of an API. I have been working with ApiLogicServer (a GenAI-powered Python open-source platform) which has a command line interface to connect the major SQL databases and create an SQLAlchemy ORM (Object-Relational Model). From this model, an Open API (aka Swagger) for JSON API is created, and a YAML file (model) drives a react-admin runtime. The YAML file is also used to build an Ontimize (Angular) user interface. Note that the GenAI part of ApiLogicServer lets me use a prompt-driven approach to get this entire running stack using just a few keywords. Command Line Tools The CLI (Command Line Interface) is used to create a new ApiLogicServer (ALS) Python project, connect to an SQL database, use KeyCloak for single sign-on authentication, rebuild the SQLAlchemy ORM if the database changes, generate an Angular application from the API, and much more. Most of the work of building an API is done by the CLI, mapping tables and columns, dealing with datatypes, defaults, column aliases, quoted identifiers, and relationships between parent/child tables. The real power of this tool is the things you cannot see. Command Line to build the Northwind Demo: Markdown als create --project-name=demo --db-url=nw+ Developer Perspective As a developer/consultant, I need more than one framework and set of tools to build and deliver a complete microservice solution. ApiLogicServer is a framework that works with the developer to enhance and extend these various models with low code and DSL (Domain Specific Language) services. VSCode with a debugger is an absolute requirement. Copilot for code completion and code generation Python (3.12) open-source framework and libraries Kafka integration (producer and consumer) KeyCloak framework for single sign-on LogicBank declarative rules engine integrated with the ORM model and all CRUD operations GitHub integration for source code management (VSCode extension) SQLAlchemy ORM/Flask and JSON API open-source libraries Declarative security for role-based access control Support both react-admin and Angular UI using a YAML model Docker tools to build and deploy containers Behave Test Driven tools Optimistic Locking (optional) on all API endpoints Open Source (no license issues) components Access to Python libraries for extensibility API Model Lifecycles Database First Every application will undergo change as stakeholders and end-users interact with the system. The earlier the feedback, the easier it will be to modify and test the results. The first source model is the SQL schema(s): missing attributes, foreign key lookups, datatype changes, default values, and constraints require a rebuild of the ORM. ApiLogicServer uses a command-line feature “rebuild-from-database” that rebuilds the SQLAlchemy ORM model and the YAML files used by the various UI tools. This approach requires knowledge of SQL to define tables, columns, keys, constraints, and insert data. The GenAI feature will allow an iterative and incremental approach to building the database, but in the end, an actual database developer is needed to complete the effort. Model First (GenAI) An interesting feature of SQLAlchemy is the ability to modify the ORM and rebuild the SQL database. This can be useful if it is a new application without existing data. This is how the GenAI works out of the box: it will ask ChatGPT to build an SQLALchemy ORM model and then build a database from the model. This seems to work very well for prototypes and quick solutions. GenAI can create the model and populate a small SQLite database. If the system has existing data, adding columns or new tables for aggregations requires a bit more effort and SQL knowledge. Virtual Columns and Relationships There are many use cases that prevent the developer from "touching" the database. This requires that the framework have the ability to declare virtual columns (like check_sum for optimistic locking) and virtual relationships to define one-to-many and many-to-one relationships between entities. SQLAlchemy and ALS support both of these features. Custom API Definitions There are many use cases that require API endpoints that do not map directly to the SQLAlchemy model. ApiLogicServer provides an extensible framework to define and implement new API endpoints. Further, there are use cases that require a JSON response to be formatted in a manner suitable for the consumer (e.g., nested documents) or transforms on the results that simple JSON API cannot support. This is probably one of the best features of ALS: the extensible nature of custom user endpoints. LogicBank: Declarative Logic Rules are written in an easy-to-understand DSL to support derivations (formula, sums, counts, parent copy), constraints (reject when), and events. Rules can be extended with Python functions (e.g., commit-event calling a Kafka producer). Rules can be added or changed without knowledge of the order of operations (like a spreadsheet); rules operate on state change of dependent entities and fields. These LogicBank rules can be partially generated using Copilot for formulas, sums, counts, and constraints. Sometimes, the introduction of sums and counts requires the addition of parent tables and relationships to store the column aggregates. Python Rule.formula(derive=LineItem.Total, as_expression=lambda row: row.UnitPrice * row.Quantity) Rule.copy(derive=LineItm.UnitPrice, from_parent=Product.UnitPrice) Events This is the point where developers can integrate business and API transactions with external systems. Events are applied to an entity (early, row, commit, or flush) and the existing integration with a Kafka broker demonstrates how a triggering event can be used to produce a message. This can also be used to interface with a workflow system. For example, if the commit event is used on an Order, when all the rules and constraints are completed (and successful), the commit event is called and a Python function is used to send mail, produce a Kafka message, or call another microservice API to ship order. Python def send_order_to_shipping(row: models.Order, old_row: models.Order, logic_row: LogicRow): """ #als: Send Kafka message formatted by OrderShipping RowDictMapper Format row per shipping requirements, and send (e.g., a message) NB: the after_flush event makes Order.Id available. Args: row (models.Order): inserted Order old_row (models.Order): n/a logic_row (LogicRow): bundles curr/old row, with ins/upd/dlt logic """ if (logic_row.is_inserted() and row.Ready == True) or \ (logic_row.is_updated() and row.Ready == True and old_row.Ready == False): kafka_producer.send_kafka_message(logic_row=logic_row, row_dict_mapper=OrderShipping, kafka_topic="order_shipping", kafka_key=str(row.Id), msg="Sending Order to Shipping") Rule.after_flush_row_event(on_class=models.Order, calling=send_order_to_shipping) Declarative Security Model Using a single sign-on like KeyCloak will return authentication, but authorization can be declared based on a user-defined role. Each role can have read, insert, update, or delete permissions and roles can grant specific permission for a role to a specific Entity (API) and even apply row-level filter permissions. This fine-grained approach can be added and tested anytime in the development lifecycle. Python DefaultRolePermission(to_role = Roles.public, can_read=True, ... can_delete=False) DefaultRolePermission(to_role = Roles.Customer, can_read=True, ... can_delete=True) # customers can only see their own account Grant( on_entity = models.Customer, to_role = Roles.customer, filter = lambda : models.Customer.Id == Security.current_user().id) Summary ApiLogicServer (ALS) and GenAI-powered development change the deployment of microservice applications. ALS has the features and functionality for most developers and is based on open-source components. LogicBank requires a different way of thinking about data but the investment is an improvement in time spent writing code. ALS is well-suited for database transaction systems that need an API and the ability to build a custom front-end user interface. Model-driven development is the way to implement GenAI-powered applications and ALS is a platform for developers/consultants to deliver these solutions.
In today’s rapidly evolving enterprise landscape, managing and synchronizing data across complex environments is a significant challenge. As businesses increasingly adopt multi-cloud strategies to enhance resilience and avoid vendor lock-in, they are also turning to edge computing to process data closer to the source. This combination of multi-cloud and edge computing offers significant advantages, but it also presents unique challenges, particularly in ensuring seamless and reliable data synchronization across diverse environments. In this post, we’ll explore how the open-source KubeMQ’s Java SDK provides an ideal solution for these challenges. We’ll focus on a real-life use case involving a global retail chain that uses KubeMQ to manage inventory data across its multi-cloud and edge infrastructure. Through this example, we’ll demonstrate how the solution enables enterprises to achieve reliable, high-performance data synchronization, transforming their operations. The Complexity of Multi-Cloud and Edge Environments Enterprises today are increasingly turning to multi-cloud architectures to optimize costs, enhance system resilience, and avoid being locked into a single cloud provider. However, managing data across multiple cloud providers is far from straightforward. The challenge is compounded when edge computing enters the equation. Edge computing involves processing data closer to where it’s generated, such as in IoT devices or remote locations, reducing latency and improving real-time decision-making. When multi-cloud and edge computing are combined, the result is a highly complex environment where data needs to be synchronized not just across different clouds but also between central systems and edge devices. Achieving this requires a robust messaging infrastructure capable of managing these complexities while ensuring data consistency, reliability, and performance. KubeMQ’s Open-Source Java SDK: A Unified Solution for Messaging Across Complex Environments KubeMQ is a messaging and queue management solution designed to handle modern enterprise infrastructure. The KubeMQ Java SDK is particularly appropriate for developers working within Java environments, offering a versatile toolset for managing messaging across multi-cloud and edge environments. Key features of the KubeMQ Java SDK include: All messaging patterns in one SDK: KubeMQ’s Java SDK supports all major messaging patterns, providing developers with a unified experience that simplifies integration and development. Utilizes GRPC streaming for high performance: The SDK leverages GRPC streaming to deliver high performance, making it suitable for handling large-scale, real-time data synchronization tasks. Simplicity and ease of use: With numerous code examples and encapsulated logic, the SDK simplifies the development process by managing complexities typically handled on the client side. Real-Life Use Case: Retail Inventory Management Across Multi-Cloud and Edge To illustrate how to use KubeMQ’s Java SDK, let’s consider a real-life scenario involving a global retail chain. This retailer operates thousands of stores worldwide, each equipped with IoT devices that monitor inventory levels in real-time. The company has adopted a multi-cloud strategy to enhance resilience and avoid vendor lock-in while leveraging edge computing to process data locally at each store. The Challenge The retailer needs to synchronize inventory data from thousands of edge devices across different cloud providers. Ensuring that every store has accurate, up-to-date stock information is critical for optimizing the supply chain and preventing stockouts or overstock situations. This requires a robust, high-performance messaging system that can handle the complexities of multi-cloud and edge environments. The Solution Using the KubeMQ Java SDK, the retailer implements a messaging system that synchronizes inventory data across its multi-cloud and edge infrastructure. Here’s how the solution is built: Store Side Code Step 1: Install KubeMQ SDK Add the following dependency to your Maven pom.xml file: XML <dependency> <groupId>io.kubemq.sdk</groupId> <artifactId>kubemq-sdk-Java</artifactId> <version>2.0.0</version> </dependency> Step 2: Synchronizing Inventory Data Across Multi-Clouds Java import io.kubemq.sdk.queues.QueueMessage; import io.kubemq.sdk.queues.QueueSendResult; import io.kubemq.sdk.queues.QueuesClient; import java.util.UUID; public class StoreInventoryManager { private final QueuesClient client1; private final QueuesClient client2; private final String queueName = "store-1"; public StoreInventoryManager() { this.client1 = QueuesClient.builder() .address("cloudinventory1:50000") .clientId("store-1") .build(); this.client2 = QueuesClient.builder() .address("cloudinventory2:50000") .clientId("store-1") .build(); } public void sendInventoryData(String inventoryData) { QueueMessage message = QueueMessage.builder() .channel(queueName) .body(inventoryData.getBytes()) .metadata("Inventory Update") .id(UUID.randomUUID().toString()) .build(); try { // Send to cloudinventory1 QueueSendResult result1 = client1.sendQueuesMessage(message); System.out.println("Sent to cloudinventory1: " + result1.isError()); // Send to cloudinventory2 QueueSendResult result2 = client2.sendQueuesMessage(message); System.out.println("Sent to cloudinventory2: " + result2.isError()); } catch (RuntimeException e) { System.err.println("Failed to send inventory data: " + e.getMessage()); } } public static void main(String[] args) { StoreInventoryManager manager = new StoreInventoryManager(); manager.sendInventoryData("{'item': 'Laptop', 'quantity': 50}"); } } Cloud Side Code Step 1: Install KubeMQ SDK Add the following dependency to your Maven pom.xml file: XML <dependency> <groupId>io.kubemq.sdk</groupId> <artifactId>kubemq-sdk-Java</artifactId> <version>2.0.0</version> </dependency> Step 2: Managing Data on Cloud Side Java import io.kubemq.sdk.queues.QueueMessage; import io.kubemq.sdk.queues.QueuesPollRequest; import io.kubemq.sdk.queues.QueuesPollResponse; import io.kubemq.sdk.queues.QueuesClient; public class CloudInventoryManager { private final QueuesClient client; private final String queueName = "store-1"; public CloudInventoryManager() { this.client = QueuesClient.builder() .address("cloudinventory1:50000") .clientId("cloudinventory1") .build(); } public void receiveInventoryData() { QueuesPollRequest pollRequest = QueuesPollRequest.builder() .channel(queueName) .pollMaxMessages(1) .pollWaitTimeoutInSeconds(10) .build(); try { while (true) { QueuesPollResponse response = client.receiveQueuesMessages(pollRequest); if (!response.isError()) { for (QueueMessage msg : response.getMessages()) { String inventoryData = new String(msg.getBody()); System.out.println("Received inventory data: " + inventoryData); // Process the data here // Acknowledge the message msg.ack(); } } else { System.out.println("Error receiving messages: " + response.getError()); } // Wait for a bit before polling again Thread.sleep(1000); } } catch (RuntimeException | InterruptedException e) { System.err.println("Failed to receive inventory data: " + e.getMessage()); } } public static void main(String[] args) { CloudInventoryManager manager = new CloudInventoryManager(); manager.receiveInventoryData(); } } The Benefits of Using KubeMQ for Retail Inventory Management Implementing KubeMQ’s Java SDK in this retail scenario offers several benefits: Improved inventory accuracy: The retailer can ensure that all stores have accurate, up-to-date stock information, reducing the risk of stockouts and overstock. Optimized supply chain: Accurate data flow from the edge to the cloud streamlines the supply chain, reducing waste and improving response times. Enhanced resilience: The multi-cloud and edge approach provides a resilient infrastructure that can adapt to regional disruptions or cloud provider issues. Conclusion KubeMQ’s open-source Java SDK provides a powerful solution for enterprises looking to manage data across complex multi-cloud and edge environments. In the retail use case discussed, the SDK enables seamless data synchronization, transforming how the retailer manages its inventory across thousands of stores worldwide. For more information and support, check out their quick start, documentation, tutorials, and community forums. Have a really great day!
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise Security: Reinforcing Enterprise Application Defense. With organizations increasingly relying on cloud-based services and remote work, the security landscape is becoming more dynamic and challenging than ever before. Cyberattacks and data breaches are on the rise, with high-profile organizations making headlines regularly. These incidents not only cause significant financial loss but also result in irreparable reputational damage and loss of customer trust. Organizations need a more robust security framework that can adapt to the ever-evolving security landscape to combat these threats. Traditional perimeter-based security models, which assume that everything inside the network is trustworthy, are proving to be insufficient in handling sophisticated cyber threats. Enter zero trust, a security framework built on the principle of "never trust, always verify." Instead of assuming trust within the network perimeter, zero trust mandates verification at every access attempt, using stringent authentication measures and continuous monitoring. Unlike traditional security models that often hinder agility and innovation, this framework empowers developers to build and deploy applications with security as a core component. By understanding the core principles of zero trust, developers can play a crucial role in fortifying an organization's overall security posture, while maintaining development velocity. Core Principles of Zero Trust Understanding the core principles of zero trust is crucial to successfully implementing this security framework. These principles serve as a foundation for building a secure environment where every access is continuously verified. Let's dive into the key concepts that drive zero trust. Identity Verification The core principle of zero trust is identity verification. This means that every user, device, and application accessing organizational resources is authenticated using multiple factors before granting access. This often includes multi-factor authentication (MFA) and strong password policies. Treating every access attempt as a potential threat can significantly reduce an organization's attack surface. Least-Privilege Access The principle of least privilege revolves around limiting user access to only the minimum necessary resources to perform their specific tasks. By limiting permissions, organizations can mitigate the potential damage caused by a security breach. This can be done by designing and implementing applications with role-based access controls (RBAC) to reduce the risk of insider threats and lateral movement within the network. Micro-Segmentation Zero trust advocates for micro-segmentation to further isolate and protect critical assets. This involves dividing the network into smaller, manageable segments to prevent lateral movement of threats. With this, organizations isolate potential breaches and minimize the impact of a successful attack. Developers can support this strategy by designing and implementing systems with modular architectures that can be easily segmented. Continuous Monitoring The zero-trust model is not static. Organizations should have robust monitoring and threat detection systems to identify and respond to suspicious activities. A proactive monitoring approach helps identify anomalies and threats before they can cause any harm. This involves collecting and analyzing data from multiple sources like network traffic, user behavior, and application logs. Having all this in place is crucial for the agility of the zero-trust framework. Assume Breach Always operate under the assumption that a breach will occur. Rather than hoping to prevent all attacks, organizations should focus on quick detection and response to minimize the impact and recovery time when the breach occurs. This can be done by implementing well-defined incident response procedures, regular penetration tests and vulnerability assessments, regular data backups, and spreading awareness in the organization. The Cultural Shift: Moving Toward a Zero-Trust Mindset Adopting a zero-trust model requires a cultural shift within the organization rather than only a technological implementation. It demands collaboration across teams, a commitment to security best practices, and a willingness to change deeply integrated traditional mindsets and practices that have governed IT security for some time. To understand the magnitude of this transformation, let's compare the traditional security model with the zero-trust approach. Traditional Security Models vs. Zero Trust With traditional security models, organizations rely on a strong perimeter and focus on protecting it with firewalls and intrusion detection systems. The assumption is that if you could secure the perimeter, everything inside it is trustworthy. This worked well in environments where data and applications were bounded within corporate networks. However with the rise of cloud-based systems, remote work, and BYOD (bring your own device) policies, the boundaries of these networks have become blurred, thus making traditional security models no longer effective. Figure 1. Traditional security vs. zero trust The zero-trust model, on the other hand, assumes that threats can come from anywhere, even from within the organization. It treats each access attempt as potentially malicious until proven otherwise. This is why the model requires ongoing authentication and authorization and is able to anticipate threats and take preventive actions. This paradigm shift requires a move away from implicit trust to a model where continuous verification is the norm. Changing Mindsets: From Implicit Trust to Continuous Verification Making this shift isn't just about implementing new technologies but also about shifting the entire organizational mindset around security. Zero trust fosters a culture of vigilance, where every access attempt is scrutinized, and trust must be earned every time. That's why it requires buy-in from all levels of the organization, from top management to frontline employees. It requires strong leadership support, employee education and training, and a transformation with a security-first mindset throughout the organization. Benefits and Challenges of Zero Trust Adoption Although the zero-trust model provides a resilient and adaptable framework for modern threats, the journey to implementation is not without its challenges. Understanding these obstacles, as well as the advantages, is crucial to be able to navigate the transition to this new paradigm and successfully adopt the model to leverage its full potential. Some of the most substantial benefits of zero trust are: Enhanced security posture. By eliminating implicit trust and continuously verifying user identities and device compliance, organizations can significantly reduce their attack surface against sophisticated threats. Improved visibility and control over network activities. By having real-time monitoring and detailed analytics, organizations gain a comprehensive view of network traffic and user behavior. Improved incident response. Having visibility also helps for quick anomaly and potential threat detection, which enables fast and effective incident response. Adaptability to modern work environments. Zero trust is designed for today's dynamic workspaces that include cloud-based applications and remote work environments. It enables seamless collaboration and secure access regardless of location. While the benefits of zero trust are significant, the implementation journey is also covered in challenges, the most common being: Resistance to change. To shift to a zero-trust mindset, it is necessary to overcome entrenched beliefs and behaviors in the organization that everything inside the network can be trusted and gain buy-in from all levels of the organization. Additionally, employees need to be educated and made aware of this mindset. Balancing security with usability and user experience. Implementing strict access control policies can impact user productivity and satisfaction. Potential costs and complexities. The continuous verification process can increase administrative overload as well as require a significant investment in resources and technology. Overcoming technical challenges. The zero-trust model involves changes to existing infrastructure, processes, and workflows. The architecture can be complex and requires the right technology and expertise to effectively navigate the complexity. Also, many organizations still rely on legacy systems and infrastructure that may not be compatible with zero-trust principles. By carefully considering the benefits and challenges of an investment in zero-trust security, organizations can develop a strategic roadmap for implementation. Implementing Zero Trust: Best Practices Adopting the zero-trust approach can be a complex task, but with the right strategies and best practices, organizations can overcome common challenges and build a robust security posture. Table 1. Zero trust best practices Practice Description Define a clear zero-trust strategy Establish a comprehensive roadmap outlining your organization's goals, objectives, and implementation timeline. Conduct a thorough risk assessment Identify existing vulnerabilities, critical assets, and potential threats to inform your zero-trust strategy and allocate resources. Implement identity and access control Adopt MFA and single sign-on to enhance security. Implement IAM to enforce authentication and authorization policies. Create micro-segmentation of networks Divide your network into smaller segments to isolate sensitive systems and data and reduce the impact of potential breaches. Leverage advanced threat protection Employ artificial intelligence and machine learning tools to detect anomalies and predict potential threats. Continuously monitor Maintain constant vigilance over your system with continuous real-time monitoring and analysis of security data. Conclusion The zero-trust security model is an essential component of today's cybersecurity landscape due to threats growing rapidly and becoming more sophisticated. Traditional security measures are not sufficient anymore, so transitioning from implicit trust to a state where trust is constantly checked adds a layer of strength to an organization's security framework. However, implementing this model will require a change in organizational culture. Leadership must adopt a security-first mindset that involves every department and employee contributing to safety and security. Cultural transformation is crucial for a new environment where security is a natural component of everyone's activities. Implementing zero trust is not a one-time effort but requires ongoing commitment and adaptation to new technologies and processes. Due to the changing nature of threats and cyber attacks, organizations need to keep assessing and adjusting their security measures to stay ahead of potential risks. For all organizations looking to enhance their security, now is the best time to begin the zero-trust journey. Despite appearing as a complex change, it has long-term benefits that outweigh the challenges. Although zero trust can be explained as a security model that helps prevent exposure to today's threats, it also represents a general strategy to help withstand threats in the future. Here are some additional resources to get you started: Getting Started With DevSecOps by Caroline Wong, DZone Refcard Cloud-Native Application Security by Samir Behara, DZone Refcard Advanced Cloud Security by Samir Behara, DZone Refcard "Building an Effective Zero Trust Security Strategy for End-To-End Cyber Risk Management" by Susmitha Tammineedi This is an excerpt from DZone's 2024 Trend Report, Enterprise Security: Reinforcing Enterprise Application Defense.Read the Free Report
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise Security: Reinforcing Enterprise Application Defense. Access and secrets management involves securing and managing sensitive information such as passwords, API keys, and certificates. In today's cybersecurity landscape, this practice is essential for protecting against breaches, ensuring compliance, and enhancing DevOps and cloud security. By implementing effective secrets management, organizations can reduce risk, improve operational efficiency, and respond to incidents more quickly. For developers, it provides a secure, convenient, and collaborative way to handle sensitive information, allowing them to focus on coding without worrying about the complexities of secure secrets handling. This article explores the importance of access and secrets management, why organizations should care, and how it benefits developers. Access and Secrets Management: Recent Industry Shifts As we continue to embrace cloud-native patterns in our development environments, new terms surface. "Decentralized" is a newer term (at least to me) as there's growing traction for fast development cycles using a decentralized approach with the cloud. Decentralization improves scalability and security by isolating sensitive data and reducing the risk of large-scale breaches. Cloud security and identity management ensure that authentication and authorization processes are more robust and flexible, protecting user identities across a distributed environment. An open-source tool example is Hyperledger Aries, part of the Hyperledger Foundation under the Linux Foundation. It provides the infrastructure for building, deploying, and using interoperable decentralized identity solutions. Aries provides the tools and libraries necessary for creating and managing decentralized identities based on verifiable credentials. Aries focuses on interoperability, ensuring that credentials issued by one party can be verified by another, regardless of the underlying system. Aries includes support for secure messaging and protocols to ensure that identity-related data is transmitted securely. An excellent example of leveraging blockchain technology in AWS is the Managed Block Chain Identities. This service facilitates secure and efficient identity management, where identities are verified and managed through a decentralized network, ensuring robust security and transparency. Let's look into another concept: zero-trust architecture. Zero-Trust Architecture Unlike traditional security models that rely on a well-defined perimeter, zero-trust architecture (ZTA) is a cybersecurity framework that assumes no user or device, inside or outside the network, can be trusted by default. This model requires strict verification for every user and device accessing resources on a private network. The core principle of zero trust is "never trust, always verify," ensuring continuous authentication and authorization. Figure 1. Zero-trust architecture One of the key components of ZTA is micro-segmentation, which divides the network into smaller, more isolated segments to minimize the impact of potential breaches. This approach limits lateral movement within the network, therefore containing threats and reducing the attack surface. By implementing micro-segmentation, organizations can achieve finer-grained control over their network traffic, further supporting the principle of least privilege. ZTA employs robust identity and access management (IAM) systems to enforce least-privilege access, ensuring users and devices only have the permissions necessary for their roles. By continuously verifying every access request and applying the least-privilege principle, ZTA can effectively identify and mitigate threats in real time. This proactive approach to security, grounded in micro-segmentation and least-privilege access, aligns with regulatory compliance requirements and enhances overall resilience against cyberattacks. Another additional security feature is multi-factor authentication (MFA). Let's have a look at it. MFA and Ways to Breach It Advancements in MFA involve enhancing security by requiring multiple forms of verification before granting access to systems and data. These advancements make it harder for attackers to gain unauthorized access since they need multiple pieces of identification to authenticate. However, MFA can be compromised with "MFA prompt bombing," a key concern for security. Imagine an attacker who has stolen a password and tries to log in, causing the user's device to receive multiple MFA prompts. They hope the user will either accept the prompt because they think it's legitimate or accept it out of frustration to stop the constant notifications. Threat intelligence from ZDnet reveals how the hacking group, 0ktapus, uses this method. After phishing login credentials, they bombard users with endless MFA prompts until one is accepted. They might also use social engineering, like posing as Uber security on Slack, to trick users into accepting a push notification. Additionally, 0ktapus employs phone calls, SMS, and Telegram to impersonate IT staff and either harvests credentials directly or exploits MFA fatigue. Behavioral Analytics With AI for Access Management As cybersecurity threats grow more sophisticated, integrating AI and machine learning (ML) into access management systems is becoming crucial. AI technologies are continuously enhancing IAM by improving security, streamlining processes, and refining user experiences. Key implementations include: User behavior analytics (UBA) – AI-driven UBA solutions analyze user behavior patterns to detect anomalous activities and potential security threats. For example, accessing sensitive data at unusual times or from unfamiliar locations might trigger alerts. Adaptive authentication – AI-powered systems use ML algorithms to assess real-time risks, adjusting authentication requirements based on user location, device type, and historical behavior. For example, suppose a user typically logs in from their home computer and suddenly tries to access their account from a new or unfamiliar device. In that case, the system might trigger additional verification steps. Identity governance and administration – AI technologies automate identity lifecycle management and improve access governance. They accurately classify user roles and permissions, enforce least privilege, and streamline access certification by identifying high-risk rights and recommending policy changes. Core Use Cases of Access and Secrets Management Effective access and secrets management are crucial for safeguarding sensitive data and ensuring secure access to resources. It encompasses various aspects, from IAM to authentication methods and secrets management. Typical use cases are listed below: IAM – Manage identity and access controls within an organization to ensure that users have appropriate access to resources. This includes automating user onboarding processes to assign roles and permissions based on user roles and departments and performing regular access reviews to adjust permissions and maintain compliance with security policies. Authentication and authorization – Implement and manage methods to confirm user identities and control their access to resources. This includes single sign-on (SSO) to allow users to access multiple applications with one set of login credentials and role-based access control to restrict access based on the user's role and responsibilities within the organization. Secrets management – Securely manage sensitive data such as API keys, passwords, and other credentials. This involves storing and rotating these secrets regularly to protect them from unauthorized access. Additionally, manage digital certificates to ensure secure communication channels and maintain data integrity across systems. Secrets Management: Cloud Providers and On-Premises Secrets management is a critical aspect of cybersecurity, focusing on the secure handling of sensitive information required to access systems, services, and applications. What constitutes a secret can vary but typically includes API keys, passwords, and digital certificates. These secrets are essential for authenticating and authorizing access to resources, making their protection paramount to prevent unauthorized access and data breaches. Table 1 environment overview features benefits Azure Key Vault A cloud service for securely storing and accessing secrets Secure storage for API keys, passwords, and certificates; key management capabilities Centralized secrets management, integration with Azure services, robust security features AWS Secrets Manager Manages secrets and credentials in the cloud Rotation, management, and retrieval of database credentials, API keys, and other secrets Automated rotation, integration with AWS services, secure access control On-premises secrets management Managing and storing secrets within an organization's own infrastructure Secure vaults and hardware security modules for storing sensitive information; integration with existing IT infrastructure Complete control over secrets, compliance with specific regulatory requirements, enhanced data privacy Encrypted storage Uses encryption to protect secrets stored on-premises or in the cloud Secrets are stored in an unreadable format, accessible only with decryption keys Enhances security by preventing unauthorized access, versatile across storage solutions HashiCorp Vault Open-source tool for securely accessing secrets and managing sensitive data Dynamic secrets, leasing and renewal, encryption as a service, and access control policies Strong community support, flexibility, and integration with various systems and platforms Keycloak Open-source IAM solution Supports SSO, social login, and identity brokering Free to use, customizable, provides enterprise-level features without the cost Let's look at an example scenario of access and secrets management. Use Case: Secured Banking Solution This use case outlines a highly secured banking solution that leverages the Azure AI Document Intelligence service for document recognition, deployed on an Azure Kubernetes Service (AKS) cluster. The solution incorporates Azure Key Vault, HashiCorp Vault, and Keycloak for robust secrets management and IAM, all deployed within the AKS cluster. However, this use case is not limited to the listed tools. Figure 2. Banking solutions architecture The architecture consists of the following components: The application, accessible via web and mobile app, relies on Keycloak for user authentication and authorization. Keycloak handles secure authentication and SSO using methods like biometrics and MFA, which manage user sessions and roles effectively. For secrets management, Azure Key Vault plays a crucial role. It stores API keys, passwords, and certificates, which the banking app retrieves securely to interact with the Azure AI Document Intelligence service. This setup ensures that all secrets are encrypted and access controlled. Within the AKS cluster, HashiCorp Vault is deployed to manage dynamic secrets and encryption keys. It provides temporary credentials on demand and offers encryption as a service to ensure data privacy. The application utilizes the Azure AI Document Intelligence service for document recognition tasks. Access to this service is secured through Azure Key Vault, and documents are encrypted using keys managed by HashiCorp Vault. Conclusion Access and secrets management is crucial for safeguarding sensitive information like passwords and API keys in today's cybersecurity landscape. Effective management practices are vital for preventing breaches, ensuring compliance, and enhancing DevOps and cloud security. By adopting robust secrets management strategies, organizations can mitigate risks, streamline operations, and respond to security incidents swiftly. Looking ahead, access and secrets management will become more advanced as cyber threats evolve. Expect increased use of AI for automated threat detection, broader adoption of decentralized identity systems, and development of solutions for managing secrets in complex multi-cloud environments. Organizations must stay proactive to protect sensitive information and ensure robust security. This is an excerpt from DZone's 2024 Trend Report, Enterprise Security: Reinforcing Enterprise Application Defense.Read the Free Report
Agile transformations can be tough. They’re messy, time-consuming, and more often than not, they fail to deliver the promises that got everyone excited in the first place. That’s why it’s so important to approach an Agile transformation as a full-scale organizational change rather than just a shift in how our development teams work. In my years as a change management consultant, I have studied and applied various change management models, from John Kotter’s 8-Step Change Model to ADKAR and Lean Change Management by Jason Little. I have learned through these experiences and countless transformations that there isn’t a one-size-fits-all approach. That’s why I have developed the VICTORY framework. It’s a straightforward approach, blending the best practices from multiple models with practical insights from leading Agile transformations at scale. The idea is to make it easy to remember and apply, no matter the size or complexity of the organization. The VICTORY Framework for Transformation The VICTORY framework is designed to guide organizations through the often chaotic and challenging process of organizational transformation — not just Agile Transformation. Following this framework ensures the change is not just strategic but sustainable. Here’s how it works: V: Validate the Need for Change Every transformation has to start with a strong reason. Before diving into the change, it’s crucial to validate why the transformation is necessary. What are the core issues driving this change? What happens if we don’t make these changes? We need to establish a sense of urgency to get everyone aligned and committed. Without a compelling “Why,” it’s tough to get the buy-in needed for a successful transformation. Steps To Take Analyze the current challenges and pain points. Engage with key stakeholders to understand their perspectives. Clearly communicate the risks of staying the course without change. I: Initiate Leadership Support Strong leadership is the backbone of any successful transformation. We start by securing solid support from executive leaders and finding champions within the organization who can help drive the change. These leaders will be our advocates, offering feedback and refining the transformation goals as we go along. Steps To Take Get top executives on board and invest in the change. Identify and empower champions across different levels of the organization. Set up channels for continuous communication and feedback. C: Craft a Clear Vision A transformation without a clear vision is like setting off on a journey without a map. We need a vision that is motivating, realistic, and capable of bringing everyone together. This vision should clearly explain why the change is necessary and what the organization will look like once the transformation is complete. It’s also important to test this vision with small groups to make sure it resonates with people at all levels. Steps To Take Develop a vision statement that aligns with the organization’s overall goals. Communicate this vision consistently across the organization. Gather feedback to ensure the vision is clear and inspiring. T: Target Goals and Outcomes With our vision in place, it’s time to get specific about what we want to achieve. We define clear, measurable goals and outcomes. Establishing metrics is crucial — these will keep us on track and provide a way to measure success. This is also the stage where we’ll need to create or adapt tools that will help us track progress effectively. Steps To Take Set specific, achievable goals aligned with the vision. Define key objectives and results to monitor progress. Review and adjust goals regularly as the transformation unfolds. O: Onboard With Pilot Teams Instead of launching the transformation organization-wide from the get-go, we start with pilot teams. These teams will help us test new structures, roles, tools, and processes. It’s essential to provide them with the necessary training and support to set them up for success. The insights we gain from these pilots will be invaluable in identifying potential challenges and making adjustments before scaling up. Steps To Take Choose pilot teams that represent a cross-section of the organization. Provide tailored training and ongoing support. Monitor the pilot phase closely to gather insights. R: Review and Adapt Continuous improvement is at the heart of Agile, and that applies to our transformation process, too. We regularly review how the pilot teams are progressing, gather feedback, and measure outcomes. This approach allows us to learn from early experiences and make necessary adjustments before the transformation goes organization-wide. Steps To Take Hold regular retrospectives with pilot teams to gather insights. Adjust the transformation strategy based on what’s working (and what’s not). Share learnings across the organization to keep everyone informed and engaged. Y: Yield To Continuous Scaling Once our pilots are running smoothly, it’s time to scale the transformation across the organization — but do it gradually. Expanding in phases allows us to manage the change more effectively. During this phase, we ensure that governance structures, roles, and performance metrics evolve alongside the new ways of working. Keeping leadership engaged is critical to removing obstacles and celebrating wins as we go. Steps To Take Plan a phased rollout of the transformation. Align governance structures with the new processes. Maintain executive engagement and celebrate every milestone. Don’t Forget the Individual Impact As our organization undergoes this transformation, it’s crucial not to overlook the individuals who will be affected by these changes. This means understanding how the transformation will impact roles, responsibilities, and workflows at a personal level. Each person should feel that they have something positive to look forward to, whether it’s new opportunities for growth, skill development, or simply a more satisfying job. Steps To Take Assess how each role will be impacted by the transformation. Align individual roles with the new ways of working, making sure everyone understands the benefits. Offer opportunities for growth and development that align with the transformation’s goals. Wrapping It Up The VICTORY framework provides a structured yet flexible approach to transformation. By validating the need for change, securing leadership support, crafting a clear vision, targeting specific goals, onboarding with pilot teams, continuously reviewing and adapting, and scaling the transformation gradually, we can navigate the complexities of any kind of transformation effectively. Moreover, focusing on the individual impact of the transformation ensures that the change is not just successful at the organizational level but also embraced by the people who make up the organization. This framework offers a practical roadmap for organizations looking to become more Agile and adaptive in today’s rapidly changing business environment. By following the VICTORY framework, we can increase our chances of a successful, sustainable transformation that benefits both the organization and the individuals within it.
10 Things To Avoid in Domain-Driven Design (DDD)
September 12, 2024 by CORE
The Pre-Mortem: Preventing Product Failure Before It Strikes
September 9, 2024 by CORE
Understanding the Purposes of Key Terraform Files and How to Structure Their Folders
September 12, 2024 by
How To Build an OpenAI Custom GPT With a Third-Party API
September 12, 2024 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
Understanding the Purposes of Key Terraform Files and How to Structure Their Folders
September 12, 2024 by
Designing a Secure Architecture for Distributed Systems
September 12, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Understanding the Purposes of Key Terraform Files and How to Structure Their Folders
September 12, 2024 by
10 Things To Avoid in Domain-Driven Design (DDD)
September 12, 2024 by CORE
Understanding the Purposes of Key Terraform Files and How to Structure Their Folders
September 12, 2024 by
Uncovering a Prototype Pollution Regression in the Core Node.js Project
September 11, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
How To Build an OpenAI Custom GPT With a Third-Party API
September 12, 2024 by
Understanding Floating-Point Precision Issues in Java
September 12, 2024 by CORE
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by