DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report

Open Source

Open source refers to non-proprietary software that allows anyone to modify, enhance, or view the source code behind it. Our resources enable programmers to work or collaborate on projects created by different teams, companies, and organizations.

icon
Latest Refcards and Trend Reports
Trend Report
Enterprise Application Security
Enterprise Application Security
Refcard #357
NoSQL Migration Essentials
NoSQL Migration Essentials
Refcard #297
GitOps for Kubernetes
GitOps for Kubernetes

DZone's Featured Open Source Resources

Dedicated Device Deployment Essentials
Refcard #360

Dedicated Device Deployment Essentials

Getting Started With Apache Iceberg
Refcard #382

Getting Started With Apache Iceberg

Top Three Docker Alternatives To Consider
Top Three Docker Alternatives To Consider
By Ruchita Varma
10 Most Popular Frameworks for Building RESTful APIs
10 Most Popular Frameworks for Building RESTful APIs
By Derric Gilling CORE
Key Considerations When Implementing Virtual Kubernetes Clusters
Key Considerations When Implementing Virtual Kubernetes Clusters
By Hanumantha (Hemanth) Kavuluru
Apache Kafka Introduction, Installation, and Implementation Using .NET Core 6
Apache Kafka Introduction, Installation, and Implementation Using .NET Core 6

We will go over Apache Kafka basics, installation, and operation, as well as a step-by-step implementation using a .NET Core 6 web application. Prerequisites Visual Studio 2022 .NET Core 6 SDK SQL Server Java JDK 11 Apache Kafka Agenda Overview of Event Streaming Introduction to Apache Kafka. Main concepts and foundation of Kafka. Different Kafka APIs. Use cases of Apache Kafka. Installation of Kafka on Windows 10. Step-by-step implementation Overview of Event Streaming Events are the things that happen within our application when we navigate something. For example, we sign up on any website and order something, so, these are the events. The event streaming platform records different types of data like transaction, historical, and real-time data. This platform is also used to process events and allow different consumers to process results immediately and in a timely manner. An event-driven platform allows us to monitor our business and real-time data from different types of devices like IoT and many more. After analyzing, it provides a good customer experience based on different types of events and needs. Introduction to Apache Kafka Below, are a few bullet points that describe Apache Kafka: Kafka is a distributed event store and stream-processing platform. Kafka is open source and is written in Java and Scala. The primary purpose to designed Kafka by Apache foundation is to handle real-time data feeds and provide high throughput and low latency platforms. Kafka is an event streaming platform that has many capabilities to publish (write) and subscribe to (read) streams of events from a different system. Also, to store and process events durably as long as we want, by default, Kafka stores events from seven days of the time period, but we can increase that as per need and requirement. Kafka has distributed system, which has servers and clients that can communicate via TCP protocol. It can be deployed on different virtual machines and containers in on-premise and cloud environments as per requirements. In the Kafka world, a producer sends messages to the Kafka broker. The messages will get stored inside the topics and the consumer subscribes to that topic to consume messages sent by the producer. ZooKeeper is used to manage the metadata of Kafka-related things, it tracks which brokers are part of the Kafka cluster and partitions of different topics. Lastly, it manages the status of Kafka nodes and maintains a list of Kafka topics and messages. Main Concepts and Foundation of Kafka 1. Event An event or record is the message that we read and write to the Kafka server; we do this in the form of events in our business world, and it contains a key, a value, a timestamp, and other metadata headers. The key, value, and time stamp, in this case, are as follows: Key: “Jaydeep” Value: “Booked BMW” Event Timestamp: “Dec. 11, 2022, at 12:00 p.m.” 2. Producer The producer is a client application that sends messages to the Kafka node or broker. 3. Consumer The consumer is an application that receives data from Kafka. 4. Kafka Cluster The Kafka cluster is the set of computers that share the workload with each other with varying purposes. 5. Broker The broker is a Kafka server that acts as an agent between the producer and consumer, who communicate via the broker. 6. Topic The events are stored inside the “topic,” it’s similar to our folder in which we store multiple files. Each topic has one or more producers and consumers, which write and reads data from the topic. Events in “topic” can be read as often as needed because it persists events and it’s not like another messaging system that removes messages after consuming. 7. Partitions Topics are partitions, meaning the topic is spread over multiple partitions that we created inside the topic. When the producer sends some event to the topic, it will store it inside the particular partitions, and then, the consumer can read the event from the corresponding topic partition in sequence. 8. Offset Kafka assigns one unique ID to the message stored inside the topic partition when the message arrives from the producer. 9. Consumer Groups In the Kafka world, the consumer group acts as a single logical unit. 10. Replica In Kafka, to make data fault-tolerant and highly available, we can replicate topics in different regions and brokers. So, in case something wrong happens with data in one topic, we can easily get that from another to replicate the same. Different Kafka APIs Kafka has five core APIs that serve different purposes: Admin API: This API manages different topics, brokers, and Kafka objects. Producer API: This API is used to write/publish events to different Kafka topics. Consumer API: This API is used to receive the different messages corresponding to the topics that are subscribed by the consumer. Kafka Stream API: This API is used to perform different types of operations like windowing, joins, aggregation, and many others. Basically, its use is to transform objects. Kafka Connect API: This API works as a connector to Kafka, which helps different systems connect with Kafka easily. It has different types of ready-to-use connectors related to Kafka. Use Cases of Apache Kafka Messaging User activity tracking Log aggregation Stream processing Realtime data analytics Installation of Kafka on Windows 10 Step 1 Download and install the Java SDK of version 8 or more. Note: I have Java 11, that’s why I put the same path in all commands that I used here. Step 2 Open and install EXE. Step 3 Set the environment variable for Java using the command prompt as admin. Command: setx -m JAVA_HOME “C:\Program Files\Java\jdk-11.0.16.1” setx -m PATH “%JAVA_HOME%\bin;%PATH%” Step 4 After that, download and install Apache Kafka. Step 5 Extract the downloaded Kafka file and rename it “Kafka.” Step 6 Open D:\Kafka\config\ and create a “zookeeper-data” and “kafka-logs” folder inside that. Step 7 Next, open D:\Kafka\config\zookeeper.properties file and add the folder path inside that: D:\Kafka\config\zookeeper.properties dataDir=D:/Kafka/zookeeper-data Step 8 After that, open D:\Kafka\config\server.properties file and change the log path over there: D:\Kafka\config\server.properties log.dirs=D:/Kafka/kafka-logs Step 9 Saves and close both files. Step 10 Run ZooKeeper: D:\Kafka> .\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties Step 11 Start Kafka: D:\Kafka> .\bin\windows\kafka-server-start.bat .\config\server.properties Step 12 Create Kafka topic: D:\Kafka\bin\windows>kafka-topics.bat — create — bootstrap-server localhost:9092 — replication-factor 1 — partitions 1 — topic testdata Step 13 Create a producer and send some messages after you’ve started a producer and consumer: D:\Kafka\bin\windows>kafka-console-producer.bat — broker-list localhost:9092 — topic testdata Step 14 Next, create a consumer. After, you will see the message the producer sent: D:\Kafka\bin\windows>kafka-console-consumer.bat — bootstrap-server localhost:9092 — topic testdata Step-by-Step Implementation Let’s start with practical implementation. Step 1 Create a new .NET Core Producer Web API: Step 2 Configure your application: Step 3 Provide additional details: Step 4 Install the following two NuGet packages: Step 5 Add configuration details inside the appsettings.json file: JSON { "Logging": { "LogLevel": { "Default": "Information", "Microsoft.AspNetCore": "Warning" } }, "AllowedHosts": "*", "producerconfiguration": { "bootstrapservers": "localhost:9092" }, "TopicName": "testdata" } Step 6 Register a few services inside the “Program” class: C# using Confluent.Kafka; var builder = WebApplication.CreateBuilder(args); // Add services to the container. var producerConfiguration = new ProducerConfig(); builder.Configuration.Bind("producerconfiguration", producerConfiguration); builder.Services.AddSingleton<ProducerConfig>(producerConfiguration); builder.Services.AddControllers(); // Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle builder.Services.AddEndpointsApiExplorer(); builder.Services.AddSwaggerGen(); var app = builder.Build(); // Configure the HTTP request pipeline. if (app.Environment.IsDevelopment()) { app.UseSwagger(); app.UseSwaggerUI(); } app.UseHttpsRedirection(); app.UseAuthorization(); app.MapControllers(); app.Run(); Step 7 Next, create the CarDetails model class: C# using Microsoft.AspNetCore.Authentication; namespace ProducerApplication.Models { public class CarDetails { public int CarId { get; set; } public string CarName { get; set; } public string BookingStatus { get; set; } } } Step 8 Now, create the CarsController class: C# using Confluent.Kafka; using Microsoft.AspNetCore.Mvc; using Microsoft.Extensions.Configuration; using Newtonsoft.Json; using ProducerApplication.Models; namespace ProducerApplication.Controllers { [Route("api/[controller]")] [ApiController] public class CarsController : ControllerBase { private ProducerConfig _configuration; private readonly IConfiguration _config; public CarsController(ProducerConfig configuration, IConfiguration config) { _configuration = configuration; _config = config; } [HttpPost("sendBookingDetails")] public async Task<ActionResult> Get([FromBody] CarDetails employee) { string serializedData = JsonConvert.SerializeObject(employee); var topic = _config.GetSection("TopicName").Value; using (var producer = new ProducerBuilder<Null, string>(_configuration).Build()) { await producer.ProduceAsync(topic, new Message<Null, string> { Value = serializedData }); producer.Flush(TimeSpan.FromSeconds(10)); return Ok(true); } } } } Step 9 Finally, run the application and send a message: Step 10 Now, create a “consumer” application: For that, create a new .NET Core console application: Step 11 Configure your application: Step 12 Provide additional information: Step 13 Install the NuGet below: Step 14 Add the following code, which consumes messages sent by the consumer: C# using Confluent.Kafka; var config = new ConsumerConfig { GroupId = "gid-consumers", BootstrapServers = "localhost:9092" }; using (var consumer = new ConsumerBuilder<Null, string>(config).Build()) { consumer.Subscribe("testdata"); while (true) { var bookingDetails = consumer.Consume(); Console.WriteLine(bookingDetails.Message.Value); } } Step 15 Finally, run the producer and consumer, send a message using the producer app, and you will see the message immediately inside the consumer console sent by the producer: Here is the GitHub URL I used in this article. Conclusion Here, we discussed Apache Kafka introduction, working, benefits, and step-by-step implementation using .NET Core 6. Happy Coding!

By Jaydeep Patil
Express Hibernate Queries as Type-Safe Java Streams
Express Hibernate Queries as Type-Safe Java Streams

As much as the JPA Criteria builder is expressive, JPA queries are often equally verbose, and the API itself can be unintuitive to use, especially for newcomers. In the Quarkus ecosystem, Panache is a partial remedy for these problems when using Hibernate. Still, I find myself juggling the Panache’s helper methods, preconfigured enums, and raw strings when composing anything but the simplest of queries. You could claim I am just inexperienced and impatient or, instead, acknowledge that the perfect API is frictionless to use for everyone. Thus, the user experience of writing JPA queries can be further improved in that direction. Introduction One of the remaining shortcomings is that raw strings are inherently not type-safe, meaning my IDE rejects me the helping hand of code completion and wishes me good luck at best. On the upside, Quarkus facilitates application relaunches in a split second to issue quick verdicts on my code. And nothing beats the heart-felt joy and genuine surprise when I have composed a working query on the fifth, rather than the tenth, attempt... With this in mind, we built the open-source library JPAstreamer to make the process of writing Hibernate queries more intuitive and less time-consuming while leaving your existing codebase intact. It achieves this goal by allowing queries to be expressed as standard Java Streams. Upon execution, JPAstreamer translates the stream pipeline to an HQL query for efficient execution and avoids materializing anything but the relevant results. Let me take an example—in some random database exists a table called Person represented in a Hibernate application by the following standard Entity: Java @Entity @Table(name = "person") public class Person { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) @Column(name = "person_id", nullable = false, updatable = false) private Integer actorId; @Column(name = "first_name", nullable = false, columnDefinition = "varchar(45)") private String firstName; @Column(name = "last_name", nullable = false, columnDefinition = "varchar(45)") private String lastName; @Column(name = "created_at", nullable = false, updatable = false) private LocalDateTime createdAt; // Getters for all fields will follow from here } To fetch the Person with an Id of 1 using JPAstreamer, all you need is the following: Java @ApplicationScoped public class PersonRepository { @PersistenceContext EntityManagerFactory entityManagerFactory; private final JPAStreamer jpaStreamer; public PersonRepository EntityManagerFactory entityManagerFactory) { jpaStreamer = JPAStreamer.of(entityManagerFactory); <1> } @Override public Optional<Person> getPersonById(int id) { return this.jpaStreamer.from(Person.class) <2> .filter(Person$.personId.equal(id)) <3> .findAny(); } } <1> Initialize JPAstreamer in one line, the underlying JPA provider handles the DB configuration. <2> The stream source is set to be the Person table. <3> The filter operation is treated as an SQL WHERE clause and the condition is expressed type-safely with JPAstreamer predicates (more on this to follow). Despite it looking as if JPAstreamer operates on all Person objects, the pipeline is optimized to a single query, in this case: Plain Text select person0_.person_id as person_id1_0_, person0_.first_name as first_na2_0_, person0_.last_name as last_nam3_0_, person0_.created_at as created_4_0_, from person person0_ where person0_.person_id=1 Thus, only the Person matching the search criteria is ever materialized. Next, we can look at a more complex example in which I am searching for Person’s with a first name ending with an “A” and a last name that starts with “B.” The matches are sorted primarily by first name and secondly by last name. I further decide to apply an offset of 5, excluding the first five results, and to limit the total results to 10. Here is the stream pipeline to achieve this task: Java List<Person> list = jpaStreamer.stream(Person.class) .filter(Person$.firstName.endsWith("A").and(Person$.lastName.startsWith("B"))) <1> .sorted(Person$.firstName.comparator().thenComparing(Person$.lastName.comparator())) <2> .skip(5) <3> .limit(10) <4> .collect(Collectors.toList()) <1> Filters can be combined with the and/or operators. <2> Easily filter on one or more properties. <3> Skip the first 5 Persons. <4> Return at most 10 Persons. In the context of queries, the stream operators filter, sort, limit, and skip, all have a natural mapping that makes the resulting query expressive and intuitive to read while remaining compact. This query is translated by JPAstreamer to the following HQL statement: Plain Text select person0_.person_id as person_id1_0_, person0_.first_name as first_na2_0_, person0_.last_name as last_nam3_0_, person0_.created_at as created_4_0_, from person person0_ where person0_.person_id=1 where (person0_.first_name like ?) and (person0_.last_name like ?) order by person0_.first_name asc, person0_.last_name asc limit ?, ? How JPAstreamer Works Okay, it looks simple. But how does it work? JPAstreamer uses an annotation processor to form a meta-model at compile time. It inspects any classes marked with the standard JPA annotation @Entity, and for every entity Foo.class, a corresponding Foo$.class is created. The generated classes represent entity attributes as Fields used to form predicates on the form User$.firstName.startsWith("A") that can be interpreted by JPAstreamer’s query optimizer. It is worth repeating that JPAstreamer does not alter or disturb the existing codebase but merely extends the API to handle Java stream queries. Installing the JPAstreamer Extension JPAstreamer is installed as any other Quarkus extension, using a Maven dependency: XML <dependency> <groupId>io.quarkiverse.jpastreamer</groupId> <artifactId>quarkus-jpastreamer</artifactId> <version>1.0.0</version> </dependency> After the dependency is added, rebuild your Quarkus application to trigger JPAstreamer’s annotation processor. The installation is complete once the generated fields reside in /target/generated-sources; you’ll recognize them by the trailing $ in the class names, e.g., Person$.class. Note: JPAstreamer requires an underlying JPA provider, such as Hibernate. For this reason, JPAstreamer needs no additional configuration as the database integration is taken care of by the JPA provider. JPAstreamer and Panache Any Panache fan will note that JPAstreamer shares some of its objectives with Panache, in simplifying many common queries. Still, JPAstreamer distinguishes itself by instilling more confidence in the queries with its type-safe stream interface. However, no one is forced to take a pick as Panache and JPAstreamer work seamlessly alongside each other. Note: Here is an example Quarkus application that uses both JPAstreamer and Panache. At the time of writing, JPAstreamer does not have support for Panache’s Active Record Pattern, as it relies on standard JPA Entities to generate its meta-model. This will likely change in the near future. Summary JPA in general, and Hibernate have greatly simplified application database access, but its API sometimes forces unnecessary complexity. With JPAstreamer, you can utilize JPA while keeping your codebase clean and maintainable.

By Julia Gustafsson
Get Started With Trino and Alluxio in Five Minutes
Get Started With Trino and Alluxio in Five Minutes

Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino was designed to handle data warehousing, ETL, and interactive analytics by large amounts of data and producing reports. Alluxio is an open-source data orchestration platform for large-scale analytics and AI. Alluxio sits between compute frameworks such as Trino and Apache Spark and various storage systems like Amazon S3, Google Cloud Storage, HDFS, and MinIO. This is a tutorial for deploying Alluxio as the caching layer for Trino using the Iceberg connector. Why Do We Need Caching for Trino? A small fraction of the petabytes of data you store is generating business value at any given time. Repeatedly scanning the same data and transferring it over the network consumes time, compute cycles, and resources. This issue is compounded when pulling data from disparate Trino clusters across regions or clouds. In these circumstances, caching solutions can significantly reduce the latency and cost of your queries. Trino has a built-in caching engine, Rubix, in its Hive connector. While this system is convenient as it comes with Trino, it is limited to the Hive connector and has not been maintained since 2020. It also lacks security features and support for additional compute engines. Trino on Alluxio Alluxio connects Trino to various storage systems, providing APIs and a unified namespace for data-driven applications. Alluxio allows Trino to access data regardless of the data source and transparently cache frequently accessed data (e.g., tables commonly used) into Alluxio distributed storage. Using Alluxio Caching via the Iceberg Connector Over MinIO File Storage We’ve created a demo that demonstrates how to configure Alluxio to use write-through caching with MinIO. This is achieved by using the Iceberg connector and making a single change to the location property on the table from the Trino perspective. In this demo, Alluxio is run on separate servers; however, it’s recommended to run it on the same nodes as Trino. This means that all the configurations for Alluxio will be located on the servers where Alluxio runs, while Trino’s configuration remains unaffected. The advantage of running Alluxio externally is that it won’t compete for resources with Trino, but the disadvantage is that data will need to be transferred over the network when reading from Alluxio. It is crucial for performance that Trino and Alluxio are on the same network. To follow this demo, copy the code located here. Trino Configuration Trino is configured identically to a standard Iceberg configuration. Since Alluxio is running external to Trino, the only configuration needed is at query time and not at startup. Alluxio Configuration The configuration for Alluxio can all be set using the alluxio-site.properties file. To keep all configurations colocated on the docker-compose.yml, we are setting them using Java properties via the ALLUXIO_JAVA_OPTS environment variable. This tutorial also refers to the master node as the leader and the workers as followers. Master Configurations alluxio.master.mount.table.root.ufs=s3://alluxio/ The leader exposes ports 19998 and 19999, the latter being the port for the web UI. Worker Configurations alluxio.worker.ramdisk.size=1G alluxio.worker.hostname=alluxio-follower The follower exposes ports 29999 and 30000, and sets up a shared memory used by Alluxio to store data. This is set to 1G via the shm_size property and is referenced from the alluxio.worker.ramdisk.size property. Shared Configurations Between Leader and Follower alluxio.master.hostname=alluxio-leader # Minio configs alluxio.underfs.s3.endpoint=http://minio:9000 alluxio.underfs.s3.disable.dns.buckets=true alluxio.underfs.s3.inherit.acl=false aws.accessKeyId=minio aws.secretKey=minio123 # Demo-only configs alluxio.security.authorization.permission.enabled=false The alluxio.master.hostname needs to be on all nodes, leaders and followers. The majority of shared configs points Alluxio to the underfs, which is MinIO in this case. alluxio.security.authorization.permission.enabled is set to “false” to keep the Docker setup simple. Note: This is not recommended to do in a production or CI/CD environment. Running Services First, you want to start the services. Make sure you are in the trino-getting-started/iceberg/trino-alluxio-iceberg-minio directory. Now, run the following command: docker-compose up -d You should expect to see the following output. Docker may also have to download the Docker images before you see the “Created/Started” messages, so there could be extra output: [+] Running 10/10 ⠿ Network trino-alluxio-iceberg-minio_trino-network Created 0.0s ⠿ Volume "trino-alluxio-iceberg-minio_minio-data" Created 0.0s ⠿ Container trino-alluxio-iceberg-minio-mariadb-1 Started 0.6s ⠿ Container trino-alluxio-iceberg-minio-trino-coordinator-1 Started 0.7s ⠿ Container trino-alluxio-iceberg-minio-alluxio-leader-1 Started 0.9s ⠿ Container minio Started 0.8s ⠿ Container trino-alluxio-iceberg-minio-alluxio-follower-1 Started 1.5s ⠿ Container mc Started 1.4s ⠿ Container trino-alluxio-iceberg-minio-hive-metastore-1 Started Open Trino CLI Once this is complete, you can log into the Trino coordinator node. We will do this by using the exec command and run the trino CLI executable as the command we run on that container. Notice the container id is trino-alluxio-iceberg-minio-trino-coordinator-1, so the command you will run is: <<<<<<< HEAD docker container exec -it trino-alluxio-iceberg-minio-trino-coordinator-1 trino ======= docker container exec -it trino-minio_trino-coordinator_1 trino >>>>>>> alluxio When you start this step, you should see the trino cursor once the startup is complete. It should look like this when it is done: trino> To best understand how this configuration works, let’s create an Iceberg table using a CTAS (CREATE TABLE AS) query that pushes data from one of the TPC connectors into Iceberg that points to MinIO. The TPC connectors generate data on the fly so we can run simple tests like this. First, run a command to show the catalogs to see the tpch and iceberg catalogs since these are what we will use in the CTAS query: SHOW CATALOGS; You should see that the Iceberg catalog is registered. MinIO Buckets and Trino Schemas Upon startup, the following command is executed on an intiailization container that includes the mc CLI for MinIO. This creates a bucket in MinIO called /alluxio, which gives us a location to write our data to and we can tell Trino where to find it: /bin/sh -c " until (/usr/bin/mc config host add minio http://minio:9000 minio minio123) do echo '...waiting...' && sleep 1; done; /usr/bin/mc rm -r --force minio/alluxio; /usr/bin/mc mb minio/alluxio; /usr/bin/mc policy set public minio/alluxio; exit 0; " Note: This bucket will act as the mount point for Alluxio, so the schema directory alluxio://lakehouse/ in Alluxio will map to s3://alluxio/lakehouse/. Querying Trino Let’s move to creating our SCHEMA that points us to the bucket in MinIO and then run our CTAS query. Back in the terminal, create the iceberg.lakehouse SCHEMA. This will be the first call to the metastore to save the location of the schema location in the Alluxio namespace. Notice, we will need to specify the hostname alluxio-leader and port 19998 since we did not set Alluxio as the default file system. Take this into consideration if you want Alluxio caching to be the default usage and transparent to users managing DDL statements: CREATE SCHEMA iceberg.lakehouse WITH (location = 'alluxio://alluxio-leader:19998/lakehouse/'); Now that we have a SCHEMA that references the bucket where we store our tables in Alluxio, which syncs to MinIO, we can create our first table. Optional: To view your queries run, log into the Trino UI and log in using any username (it doesn’t matter since no security is set up). Move the customer data from the tiny generated TPCH data into MinIO using a CTAS query. Run the following query, and if you like, watch it running on the Trino UI: CREATE TABLE iceberg.lakehouse.customer WITH ( format = 'ORC', location = 'alluxio://alluxio-leader:19998/lakehouse/customer/' ) AS SELECT * FROM tpch.tiny.customer; Go to the Alluxio UI and the MinIO UI, and browse the Alluxio and MinIO files. You will now see a lakehouse directory that contains a customer directory that contains the data written by Trino to Alluxio and Alluxio writing it to MinIO. Now, there is a table under Alluxio and MinIO, you can query this data by checking the following: SELECT * FROM iceberg.lakehouse.customer LIMIT 10; How are we sure that Trino is actually reading from Alluxio and not MinIO? Let’s delete the data in MinIO and run the query again just to be sure. Once you delete this data, you should still see data return. Stopping Services Once you complete this tutorial, the resources used for this excercise can be released by runnning the following command: docker-compose down Conclusion At this point, you should have a better understanding of Trino and Alluxio, how to get started with deploying Trino and Alluxio, and how to use Alluxio caching with an Iceberg connector and MinIO file storage. I hope you enjoyed this article. Be sure to like this article and comment if you have any questions!

By Brian Olsen
OpenTelemetry Auto-Instrumentation With Jaeger
OpenTelemetry Auto-Instrumentation With Jaeger

In earlier days, it was easy to deduct and debug a problem in monolithic applications because there was only one service running in the back end and front end. Now, we are moving toward microservices architecture, where applications are divided into multiple independently deployable services. These services have their own goal and logic to serve. In this kind of application architecture, it becomes difficult to observe how one service depends on or affects other services. To make the system observable, some logs, metrics, or traces must be emitted from the code, and this data must be sent to an observability back end. This is where OpenTelemetry and Jaeger come into the picture. In this article, we will see how to monitor application trace data (Traces and Spans) with the help of OpenTelemetry and Jaeger. A trace is used to observe the requests as they propagate through the services in a distributed system. Spans are a basic unit of the trace; they represent a single event within the trace, and a trace can have one or multiple spans. A span consists of log messages, time-related data, and other attributes to provide information about the operation it tracks. We will use the distributed tracing method to observe requests moving across microservices, generating data about the request and making it available for analysis. The produced data will have a record of the flow of requests in our microservices, and it will help us understand our application's performance. OpenTelemetry Telemetry is the collection and transmission of data using agents and protocols from the source in observability. The telemetry data includes logs, metrics, and traces, which help us understand what is happening in our application. OpenTelemetry (also known as OTel) is an open source framework comprising a collection of tools, APIs, and SDKs. OpenTelemetry makes generating, instrumenting, collecting, and exporting telemetry data easy. The data collected from OpenTelemetry is vendor-agnostic and can be exported in many formats. OpenTelemetry is formed after merging two projects OpenCensus and OpenTracing. Instrumenting The process of adding observability code to your application is known as instrumentation. Instrumentation helps make our application observable, meaning the code must produce some metrics, traces, and logs. OpenTelemetry provides two ways to instrument our code: Manual instrumentation Auto instrumentation 1. Manual Instrumentation The user needs to add an OpenTelemetry code to the application. The manual instrumentation provides more options for customization in spans and traces. Languages supported for manual instrumentations are C++, .NET, Go, Java, Python, and so on. 2. Automatic Instrumentation This is the easiest way of instrumentation as it requires no code changes and no need to recompile the application. It uses an intelligent agent that gets attached to an application, reads its activity, and extracts the traces. Automatic instrumentation supports Java, NodeJS, Python, and so on. Difference Between Manual and Automatic Instrumentation Both manual and automatic instrumentation have advantages and disadvantages that you might consider while writing your code. A few of them are listed below: Manual Instrumentation Automatic Instrumentation Code changes are required. Code changes are not required. It supports maximum programming languages. Currently, .Net, Java, NodeJS, and Python are supported. It consumes a lot of time as code changes are required. Easy to implement as we do not need to touch the code. Provide more options for the customization of spans and traces. As you have more control over the telemetry data generated by your application. Fewer options for customization. Possibilities of error are high as manual changes are required. No error possibilities. As we don't have to touch our application code. To make the instrumentation process hassle-free, use automatic instrumentation, as it does not require any modification in the code and reduces the possibility of errors. Automatic instrumentation is done by an agent which reads your application's telemetry data, so no manual changes are required. For the scope of this post, we will see how you can use automatic instrumentation in a Kubernetes-based microservices environment. Jaeger Jaeger is a distributed tracing tool initially built by Uber and released as open-source in 2015. Jaeger is also a Cloud Native Computing Foundation graduate project and was influenced by Dapper and OpenZipkin. It is used for monitoring and troubleshooting microservices-based distributed systems. The Jaeger components which we have used for this blog are: Jaeger Collector Jaeger Query Jaeger UI / Console Storage Backend Jaeger Collector: The Jaeger distributed tracing system includes the Jaeger collector. It is in charge of gathering and keeping the information. After receiving spans, the collector adds them to a processing queue. Collectors need a persistent storage backend, hence Jaeger also provides a pluggable span storage mechanism. Jaeger Query: This is a service used to get traces out of storage. The web-based user interface for the Jaeger distributed tracing system is called Jaeger Query. It provides various features and tools to help you understand the performance and behavior of your distributed application and enables you to search, filter, and visualise the data gathered by Jaeger. Jaeger UI/Console: Jaeger UI lets you view and analyze traces generated by your application. Storage Back End: This is used to store the traces generated by an application for the long term. In this article, we are going to use Elasticsearch to store the traces. What Is the Need for Integrating OpenTelemetry With Jaeger? OpenTelemetry and Jaeger are the tools that help us in setting the observability in microservices-based distributed systems, but they are intended to address different issues. OpenTelemetry provides an instrumentation layer for the application, which helps us generate, collect and export the telemetry data for analysis. In contrast, Jaeger is used to store and visualize telemetry data. OpenTelemetry can only generate and collect the data. It does not have a UI for the visualization. So we need to integrate Jaeger with OpenTelemetry as it has a storage backend and a web UI for the visualization of the telemetry data. With the help of Jaeger UI, we can quickly troubleshoot microservices-based distributed systems. Note: OpenTelemetry can generate logs, metrics, and traces. Jaeger does not support logs and metrics. Now you have an idea about OpenTelemetry and Jaeger. Let's see how we can integrate them with each other to visualize the traces and spans generated by our application. Implementing OpenTelemetry Auto-Instrumentation We will integrate OpenTelemetry with Jaeger, where OpenTelemetry will act as an instrumentation layer for our application, and Jaeger will act as the back-end analysis tool to visualize the trace data. Jaeger will get the telemetry data from the OpenTelemetry agent. It will store the data in the storage backend, from where we will query the stored data and visualize it in the Jaeger UI. Prerequisites for this article are: The target Kubernetes cluster is up and running. You have access to run the kubectl command against the Kubernetes cluster to deploy resources. Cert manager is installed and running. You can install it from the website cert-manager.io if it is not installed. We assume that you have all the prerequisites and now you are ready for the installation. The files we have used for this post are available in this GitHub repo. Installation The installation part contains three steps: Elasticsearch installation Jaeger installation OpenTelemetry installation Elasticsearch By default, Jaeger uses in-memory storage to store spans, which is not a recommended approach for the production environment. There are various tools available to use as a storage back end in Jaeger; you can read about them in the official documentation of Jaeger span storage back end. In this article, we will use Elasticsearch as a storage back end. You can deploy Elasticsearch in your Kubernetes cluster using the Elasticsearch Helm chart. While deploying Elasticsearch, ensure you have enabled the password-based authentication and deploy that Elasticsearch in observability namespaces. Elasticsearch is deployed in our Kubernetes cluster, and you can see the output by running the following command. Shell $ kubectl get all -n observability NAME READY STATUS RESTARTS AGE pod/elasticsearch-0 1/1 Running 0 17m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 17m NAME READY AGE statefulset.apps/elasticsearch 1/1 17m Jaeger Installation We are going to use Jaeger to visualize the trace data. Let's deploy the Jaeger Operator on our cluster. Before proceeding with the installation, we will deploy a ConfigMap in the observability namespace. In this ConfigMap, we will pass the username and password of the Elasticsearch which we have deployed in the previous step. Replace the credentials based on your setup. YAML kubectl -n observability apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: jaeger-configuration labels: app: jaeger app.kubernetes.io/name: jaeger data: span-storage-type: elasticsearch collector: | es: server-urls: http://elasticsearch:9200 username: elastic password: changeme collector: zipkin: http-port: 9411 query: | es: server-urls: http://elasticsearch:9200 username: elastic password: changeme agent: | collector: host-port: "jaeger-collector:14267" EOF If you are going to deploy Jaeger in another namespace and you have changed the Jaeger collector service name, then you need to change the values of the host-port value under the agent collector. Jaeger Operator The Jaeger Operator is a Kubernetes operator for deploying and managing Jaeger, an open source, distributed tracing system. It works by automating the deployment, scaling, and management of Jaeger components on a Kubernetes cluster. The Jaeger Operator uses custom resources and custom controllers to extend the Kubernetes API with Jaeger-specific functionality. It manages the creation, update, and deletion of Jaeger components, such as the Jaeger collector, query, and agent components. When a Jaeger instance is created, the Jaeger Operator deploys the necessary components and sets up the required services and configurations. We are going to deploy the Jaeger Operator in the observability namespace. Use the below-mentioned command to deploy the operator. Shell $ kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.38.0/jaeger-operator.yaml -n observability We are using the latest version of Jaeger, which is 1.38.0 at the time of writing this article. By default, the Jaeger script is provided for cluster-wide mode. Suppose you want to watch only a particular namespace. In that case, you need to change the ClusterRole to Role and ClusterBindingRole to RoleBinding in the operator manifest and set the WATCH_NAMESPACE env variable on the Jaeger Operator deployment. To verify whether Jaeger is deployed successfully or not, run the following command: Shell $ kubectl get all -n observability NAME READY STATUS RESTARTS AGE pod/elasticsearch-0 1/1 Running 0 17m pod/jaeger-operator-5597f99c79-hd9pw 2/2 Running 0 11m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 17m service/jaeger-operator-metrics ClusterIP 172.20.220.212 <none> 8443/TCP 11m service/jaeger-operator-webhook-service ClusterIP 172.20.224.23 <none> 443/TCP 11m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/jaeger-operator 1/1 1 1 11m NAME DESIRED CURRENT READY AGE replicaset.apps/jaeger-operator-5597f99c79 1 1 1 11m NAME READY AGE statefulset.apps/elasticsearch 1/1 17m As we can see in the above output, our Jaeger Operator is deployed successfully, and all of its pods are up and running; this means Jaeger Operator is ready to install the Jaeger instances (CRs). The Jaeger instance will contain Jaeger components (Query, Collector, Agent); later, we will use these components to query OpenTelemetry metrics. Jaeger Instance A Jaeger Instance is a deployment of the Jaeger distributed tracing system. It is used to collect and store trace data from microservices or distributed applications, and provide a UI to visualize and analyze the trace data. To deploy the Jaeger instance, use the following command. Shell $ kubectl apply -f https://raw.githubusercontent.com/infracloudio/Opentelemertrywithjaeger/master/jaeger-production-template.yaml To verify the status of the Jaeger instance, run the following command: Shell $ kubectl get all -n observability NAME READY STATUS RESTARTS AGE pod/elasticsearch-0 1/1 Running 0 17m pod/jaeger-agent-27fcp 1/1 Running 0 14s pod/jaeger-agent-6lvp2 1/1 Running 0 15s pod/jaeger-collector-69d7cd5df9-t6nz9 1/1 Running 0 19s pod/jaeger-operator-5597f99c79-hd9pw 2/2 Running 0 11m pod/jaeger-query-6c975459b6-8xlwc 1/1 Running 0 16s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 17m service/jaeger-collector ClusterIP 172.20.24.132 <none> 14267/TCP,14268/TCP,9411/TCP,14250/TCP 19s service/jaeger-operator-metrics ClusterIP 172.20.220.212 <none> 8443/TCP 11m service/jaeger-operator-webhook-service ClusterIP 172.20.224.23 <none> 443/TCP 11m service/jaeger-query LoadBalancer 172.20.74.114 a567a8de8fd5149409c7edeb54bd39ef-365075103.us-west-2.elb.amazonaws.com 80:32406/TCP 16s service/zipkin ClusterIP 172.20.61.72 <none> 9411/TCP 18s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/jaeger-agent 2 2 2 2 2 <none> 16s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/jaeger-collector 1/1 1 1 21s deployment.apps/jaeger-operator 1/1 1 1 11m deployment.apps/jaeger-query 1/1 1 1 18s NAME DESIRED CURRENT READY AGE replicaset.apps/jaeger-collector-69d7cd5df9 1 1 1 21s replicaset.apps/jaeger-operator-5597f99c79 1 1 1 11m replicaset.apps/jaeger-query-6c975459b6 1 1 1 18s NAME READY AGE statefulset.apps/elasticsearch 1/1 17m As we can see in the above screenshot, our Jaeger instance is up and running. OpenTelemetry To install the OpenTelemetry, we need to install the OpenTelemetry Operator. The OpenTelemetry Operator uses custom resources and custom controllers to extend the Kubernetes API with OpenTelemetry-specific functionality, making it easier to deploy and manage the OpenTelemetry observability stack in a Kubernetes environment. The operator manages two things: Collectors: It offers a vendor-agnostic implementation of how to receive, process, and export telemetry data. Auto-instrumentation of the workload using OpenTelemetry instrumentation libraries. It does not require the end-user to modify the application source code. OpenTelemetry Operator To implement the auto-instrumentation, we need to deploy the OpenTelemetry operator on our Kubernetes cluster. To deploy the k8s operator for OpenTelemetry, follow the K8s operator documentation. You can verify the deployment of the OpenTelemetry operator by running the below-mentioned command: Shell $ kubectl get all -n opentelemetry-operator-system NAME READY STATUS RESTARTS AGE pod/opentelemetry-operator-controller-manager-7f479c786d-zzfd8 2/2 Running 0 30s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/opentelemetry-operator-controller-manager-metrics-service ClusterIP 172.20.70.244 <none> 8443/TCP 32s service/opentelemetry-operator-webhook-service ClusterIP 172.20.150.120 <none> 443/TCP 31s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/opentelemetry-operator-controller-manager 1/1 1 1 31s NAME DESIRED CURRENT READY AGE replicaset.apps/opentelemetry-operator-controller-manager-7f479c786d 1 1 1 31s As we can see in the above output, the opentelemetry-operator-controller-manager deployment is running in the opentelemetry-operator-system namespace. OpenTelemetry Collector The OpenTelemetry facilitates the collection of telemetry data via the OpenTelemetry Collector. Collector offers a vendor-agnostic implementation on how to receive, process, and export the telemetry data. The collector is made up of the following components: Receivers: It manages how to get data into the collector. Processors: It manages the processing of data. Exporters: Responsible for sending the received data. We also need to export the telemetry data to the Jaeger instance. Use the following manifest to deploy the collector. YAML kubectl apply -f - <<EOF apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel spec: config: | receivers: otlp: protocols: grpc: http: processors: exporters: logging: jaeger: endpoint: "jaeger-collector.observability.svc.cluster.local:14250" tls: insecure: true service: pipelines: traces: receivers: [otlp] processors: [] exporters: [logging, jaeger] EOF In the above code, the Jaeger endpoint is the address of the Jaeger service which is running inside the observability namespace. We need to deploy this manifest in the same namespace where our application is deployed, so that it can fetch the traces from the application and export them to Jaeger. To verify the deployment of the collector, run the following command. Shell $ kubectl get deploy otel-collector NAME READY UP-TO-DATE AVAILABLE AGE otel-collector 1/1 1 1 41s OpenTelemetry Auto-Instrumentation Injection The above-deployed operator can inject and configure the auto-instrumentation libraries of OpenTelemetry into an application's codebase as it runs. To enable the auto-instrumentation on our cluster, we need to configure an instrumentation resource with the configuration for the SDK and instrumentation. Use the below-given manifest to create the auto-instrumentation. YAML kubectl apply -f - <<EOF apiVersion: opentelemetry.io/v1alpha1 kind: Instrumentation metadata: name: my-instrumentation spec: exporter: endpoint: http://otel-collector:4317 propagators: - tracecontext - baggage - b3 sampler: type: parentbased_traceidratio argument: "0.25" EOF In the above manifest, we have used three things: exporter, propagator, and sampler. Exporter: Used to send data to OpenTelemetry collector at the specified endpoint. In our scenario, it is "http://otel-collector:4317". Propagators: Carry traces, context, and baggage data between distributed tracing systems. They have three propagation mechanisms: tracecontext: This refers to the W3C Trace Context specification, which defines a standard way to propagate trace context information between services. baggage: This refers to the OpenTelemetry baggage mechanism, which allows for the propagation of arbitrary key-value pairs along with the trace context information. b3: This refers to the B3 header format, which is a popular trace context propagation format used by the Zipkin tracing system. Sampler: Uses a "parent-based trace ID ratio" strategy with a sample rate of 0.25 (25%). This means that when tracing a request, if any of its parent requests has already been sampled (with a probability of 0.25), then this request will also be sampled, otherwise it will not be traced. To verify that our custom resource is created or not, we can use the below-mentioned command. Shell $ kubectl get otelinst NAME AGE ENDPOINT SAMPLER SAMPLER ARG my-instrumentation 6s http://otel-collector:4317 parentbased_traceidratio 0.25 This means our custom resource is created successfully. We are using the OpenTelemetry auto-instrumented method, so we don’t need to write instrumentation code in our application. All we need to do is, add an annotation in the pod of our application for auto-instrumentation. As we are going to demo a Java application, the annotation that we will have to use here is: Shell instrumentation.opentelemetry.io/inject-java: "true" Note: The annotation can be added to a namespace as well so that all pods within that namespace will get instrumentation, or by adding the annotation to individual PodSpec objects, available as part of Deployment, Statefulset, and other resources. Below is an example of how your manifest will look after adding the annotations. In the below example, we are using annotation for a Java application. YAML apiVersion: apps/v1 kind: Deployment metadata: name: demo-sagar spec: replicas: 1 selector: matchLabels: app: demo-sagar template: metadata: labels: app: demo-sagar annotations: instrumentation.opentelemetry.io/inject-java: "true" instrumentation.opentelemetry.io/container-names: "spring" spec: containers: - name: spring image: sagar27/petclinic-demo ports: - containerPort: 8080 We have added instrumentation “inject-java” and “container-name” under annotations. If you have multiple container pods, you can add them in the same “container-names” annotation, separated by a comma. For example, “container-name1,container-name-2,container-name-3” etc. After adding the annotations, deploy your application and access it on the browser. Here in our scenario, we are using port-forward to access the application. Shell $ kubectl port-forward service/demo-sagar 8080:8080 To generate traces, either you can navigate through all the pages of this website or you can use the following Bash script: Shell while true; do curl http://localhost:8080/ curl http://localhost:8080/owners/find curl http://localhost:8080/owners?lastName= curl http://localhost:8080/vets.html curl http://localhost:8080/oups curl http://localhost:8080/oups sleep 0.01 done The above-given script will make a curl request to all the pages of the website, and we will see the traces of the request on the Jaeger UI. We are making curl requests to https://localhost:8080 because we use the port-forwarding technique to access the application. You can make changes in the Bash script according to your scenario. Now let’s access the Jaeger UI, as our service jaeger-query uses service type LoadBalancer, we can access the Jaeger UI on the browser by using the load balancer domain/IP. Paste the load balancer domain/IP on the browser and you will see the Jaeger UI there. We have to select our app from the service list and it will show us the traces it generates. In the above screenshot, we have selected our app name “demo-sagar” under the services option and its traces are visible on Jaeger UI. We can further click on the traces to get more details about it. Summary In this article, we have gone through how you can easily instrument your application using the OpenTelemetry auto-instrumentation method. We also learned how this telemetric data could be exported to the Elasticsearch backend and visualized it using Jaeger. Integrating OpenTelemetry with Jaeger will help you in monitoring and troubleshooting. It also helps perform root cause analysis of any bug/issues in your microservice-based distributed systems, performance/latency optimization, service dependency analysis, and so on. We hope you found this post informative and engaging. References OpenTelemetry Jaeger Tracing

By Sagar Parmar
GitOps: Flux vs Argo CD
GitOps: Flux vs Argo CD

GitOps is a software development and operations methodology that uses Git as the source of truth for deployment configurations. It involves keeping the desired state of an application or infrastructure in a Git repository and using Git-based workflows to manage and deploy changes. Two popular open-source tools that help organizations implement GitOps for managing their Kubernetes applications are Flux and Argo CD. In this article, we’ll take a closer look at these tools, their pros and cons, and how to set them up. Common Use Cases for Flux and Argo CD Flux Continuous delivery: Flux can be used to automate the deployment pipeline and ensure that changes are automatically deployed as soon as they are pushed to the Git repository. Configuration management: Flux allows you to store and manage your application’s configuration as code, making it easier to version control and track changes. Immutable infrastructure: Flux helps enforce an immutable infrastructure approach—where changes are made only through the Git repository and not through manual intervention on the cluster. Blue-green deployments: Flux supports blue-green deployments—where a new version of an application is deployed alongside the existing version, and traffic is gradually shifted to the new version. Argo CD Continuous deployment: Argo CD can be used to automate the deployment process, ensuring that applications are always up-to-date with the latest changes from the Git repository. Application promotion: Argo CD supports application promotion—where applications can be promoted from one environment to another. For example, from development to production. Multi-cluster management: Argo CD can be used to manage applications across multiple clusters, ensuring the desired state of the applications is consistent across all clusters. Rollback management: Argo CD provides rollback capabilities, making it easier to revert changes in case of failures. The choice between the two tools depends on the specific requirements of the organization and application, but both tools provide a GitOps approach to simplify the deployment process and reduce the risk of manual errors. They both have their own pros and cons, and in this article, we’ll take a look at what they are and how to set them up. What Is Flux? Flux is a GitOps tool that automates the deployment of applications on Kubernetes. It works by continuously monitoring the state of a Git repository and applying any changes to a cluster. Flux integrates with various Git providers such as GitHub, GitLab, and Bitbucket. When changes are made to the repository, Flux automatically detects them and updates the cluster accordingly. Pros of Flux Automated deployments: Flux automates the deployment process, reducing manual errors and freeing up developers to focus on other tasks. Git-based workflow: Flux leverages Git as a source of truth, which makes it easier to track and revert changes. Declarative configuration: Flux uses Kubernetes manifests to define the desired state of a cluster, making it easier to manage and track changes. Cons of Flux Limited customization: Flux only supports a limited set of customizations, which may not be suitable for all use cases. Steep learning curve: Flux has a steep learning curve for new users and requires a deep understanding of Kubernetes and Git. How To Set Up Flux Prerequisites A running Kubernetes cluster. Helm installed on your local machine. A Git repository for your application's source code and Kubernetes manifests. The repository URL and a SSH key for the Git repository. Step 1: Add the Flux Helm Repository The first step is to add the Flux Helm repository to your local machine. Run the following command to add the repository: Shell helm repo add fluxcd https://charts.fluxcd.io Step 2: Install Flux Now that the Flux Helm repository is added, you can install Flux on the cluster. Run the following command to install Flux: Shell helm upgrade -i flux fluxcd/flux \ --set git.url=git@github.com:<your-org>/<your-repo>.git \ --set git.path=<path-to-manifests> \ --set git.pollInterval=1m \ --set git.ssh.secretName=flux-git-ssh In the above command, replace the placeholder values with your own Git repository information. The git.url parameter is the URL of the Git repository, the git.path parameter is the path to the directory containing the Kubernetes manifests, and the git.ssh.secretName parameter is the name of the SSH secret containing the SSH key for the repository. Step 3: Verify the Installation After running the above command, you can verify the installation by checking the status of the Flux pods. Run the following command to view the pods: Shell kubectl get pods -n <flux-namespace> If the pods are running, Flux has been installed successfully. Step 4: Connect Flux to Your Git Repository The final step is to connect Flux to your Git repository. Run the following command to generate a SSH key and create a secret: Shell ssh-keygen -t rsa -b 4096 -f id_rsa kubectl create secret generic flux-git-ssh \ --from-file=id_rsa=./id_rsa --namespace=<flux-namespace> In the above command, replace the <flux-namespace> placeholder with the namespace where Flux is installed. Now, add the generated public key as a deployment key in your Git repository. You have successfully set up Flux using Helm. Whenever changes are made to the Git repository, Flux will detect them and update the cluster accordingly. In conclusion, setting up Flux using Helm is a quite simple process. By using Git as a source of truth and continuously monitoring the state of the cluster, Flux helps simplify the deployment process and reduce the risk of manual errors. What Is Argo CD? Argo CD is an open-source GitOps tool that automates the deployment of applications on Kubernetes. It allows developers to declaratively manage their applications and keeps the desired state of the applications in sync with the live state. Argo CD integrates with Git repositories and continuously monitors them for changes. Whenever changes are detected, Argo CD applies them to the cluster, ensuring the application is always up-to-date. With Argo CD, organizations can automate their deployment process, reduce the risk of manual errors, and benefit from Git’s version control capabilities. Argo CD provides a graphical user interface and a command-line interface, making it easy to use and manage applications at scale. Pros of Argo CD Advanced deployment features: Argo CD provides advanced deployment features, such as rolling updates and canary deployments, making it easier to manage complex deployments. User-friendly interface: Argo CD provides a user-friendly interface that makes it easier to manage deployments, especially for non-technical users. Customizable: Argo CD allows for greater customization, making it easier to fit the tool to specific use cases. Cons of Argo CD Steep learning curve: Argo CD has a steep learning curve for new users and requires a deep understanding of Kubernetes and Git. Complexity: Argo CD has a more complex architecture than Flux, which can make it more difficult to manage and troubleshoot. How To Set Up Argo CD Argo CD can be installed on a Kubernetes cluster using Helm, a package manager for Kubernetes. In this section, we’ll go through the steps to set up Argo CD using Helm. Prerequisites A running Kubernetes cluster. Helm installed on your local machine. A Git repository for your application’s source code and Kubernetes manifests. Step 1: Add the Argo CD Helm Repository The first step is to add the Argo CD Helm repository to your local machine. Run the following command to add the repository: Shell helm repo add argo https://argoproj.github.io/argo-cd Step 2: Install Argo CD Now that the Argo CD Helm repository is added, you can install Argo CD on the cluster. Run the following command to install Argo CD: Shell helm upgrade -i argocd argo/argo-cd --set server.route.enabled=true Step 3: Verify the Installation After running the above command, you can verify the installation by checking the status of the Argo CD pods. Run the following command to view the pods: Shell kubectl get pods -n argocd If the pods are running, Argo CD has been installed successfully. Step 4: Connect Argo CD to Your Git Repository The final step is to connect Argo CD to your Git repository. Argo CD provides a graphical user interface that you can use to create applications and connect to your Git repository. To access the Argo CD interface, run the following command to get the URL: Shell kubectl get routes -n argocd Use the URL in a web browser to access the Argo CD interface. Once you’re in the interface, you can create a new application by providing the Git repository URL and the path to the Kubernetes manifests. Argo CD will continuously monitor the repository for changes and apply them to the cluster. You have now successfully set up Argo CD using Helm. Conclusion GitOps is a valuable approach for automating the deployment and management of applications on Kubernetes. Flux and Argo CD are two popular GitOps tools that provide a simple and efficient way to automate the deployment process, enforce an immutable infrastructure, and manage applications in a consistent and predictable way. Flux focuses on automating the deployment pipeline and providing configuration management as code, while Argo CD provides a more complete GitOps solution, including features such as multi-cluster management, application promotion, and rollback management. Both tools have their own strengths and weaknesses, and the choice between the two will depend on the specific requirements of the organization and the application. Regardless of the tool chosen, GitOps provides a valuable approach for simplifying the deployment process and reducing the risk of manual errors. By keeping the desired state of the applications in sync with the Git repository, GitOps ensures that changes are made in a consistent and predictable way, resulting in a more reliable and efficient deployment process.

By Kushagra Shandilya
Utilize OpenAI API to Extract Information From PDF Files
Utilize OpenAI API to Extract Information From PDF Files

Why It's Hard to Extract Information From PDF Files PDF, or Portable Document Format, is a popular file format that is widely used for documents such as invoices, purchase orders, and other business documents. However, extracting information from PDFs can be a challenging task for developers. One reason why it is difficult to extract information from PDFs is that the format is not structured. Unlike HTML, which has a specific format for tables and headers that developers can easily identify, PDFs do not have a consistent layout for information. This makes it harder for developers to know where to find the specific information they need. Another reason why it is difficult to extract information from PDFs is that there is no standard layout for information. Each system generates invoices and purchase orders differently, so developers must often write custom code to extract information from each individual document. This can be a time-consuming and error-prone process. Additionally, PDFs can contain both text and images, making it difficult for developers to programmatically extract information from the document. OCR (optical character recognition) can be used to extract text from images, but this adds complexity to the process and may result in errors if the OCR software is not accurate. Existing Solutions Existing solutions for extracting information from PDFs include: Using regex: to match patterns in text after converting the PDF to plain text. Examples include invoice2data and traprange-invoice. However, this method requires knowledge of the format of the data fields. AI-based cloud services: utilize machine learning to extract structured data from PDFs. Examples include pdftables and docparser, but these are not open-source friendly. Yet Another Solution for PDF Data Extraction: Using OpenAI One solution to extract information from PDF files is to use OpenAI's natural language processing capabilities to understand the content of the document. However, OpenAI is not able to work with PDF or image formats directly, so the first step is to convert the PDF to text while retaining the relative positions of the text items. One way to achieve this is to use the PDFLayoutTextStripper library, which uses PDFBox to read through all text items in the PDF file and organize them in lines, keeping the relative positions the same as in the original PDF file. This is important because, for example, in an invoice's items table, if the amount is in the same column as the quantity, it will result in incorrect values when querying for the total amount and total quantity. Here is an example of the output from the stripper: *PO-003847945* Page.........................: 1 of 1 Address...........: Aeeee Consumer Good Co.(QSC) Purchase Order P.O.Box 1234 Dooo, PO-003847945 ABC TL-00074 Telephone........: USR\S.Morato 5/10/2020 3:40 PM Fax...................: 100225 Aaaaaa Eeeeee Date...................................: 5/10/2020 Expected DeliveryDate...: 5/10/2020 Phone........: Attention Information Fax.............: Vendor : TL-00074 AAAA BBBB CCCCCAAI W.L.L. Payment Terms Current month plus 60 days Discount Barcode Item number Description Quantity Unit Unit price Amount Discount 5449000165336 304100 CRET ZERO 350ML PET 5.00 PACK24 54.00 270.00 0.00 0.00 350 5449000105394 300742 CEEOCE EOE SOFT DRINKS 1.25LTR 5.00 PACK6 27.00 135.00 0.00 0.00 1.25 (truncated...) Once the PDF has been converted to text, the next step is to call the OpenAI API and pass the text along with queries such as "Extract fields: 'PO Number', 'Total Amount'". The response will be in JSON format, and GSON can be used to parse it and extract the final results. This two-step process of converting the PDF to text and then using OpenAI's natural language processing capabilities can be an effective solution for extracting information from PDF files. The query is as simple as follows, with %s replaced by PO text content: private static final String QUERY = """ Want to extract fields: "PO Number", "Total Amount" and "Delivery Address". Return result in JSON format without any explanation. The PO content is as follows: %s """; The query consists of two components: Specifying the desired fields. Formatting the field values as JSON data for easy retrieval from API response. And here is the example response from OpenAI: { "object": "text_completion", "model": "text-davinci-003", "choices": [ { "text": "\\n{\\n \\"PO Number\\": \\"PO-003847945\\",\\n \\"Total Amount\\": \\"1,485.00\\",\\n \\"Delivery Address\\": \\"Peera Consumer Good Co.(QSC), P.O.Box 3371, Dohe, QAT\\"\\n}", "index": 0, "logprobs": null, "finish_reason": "stop" } ], // ... some more fields } Decoding the text field's JSON string yields the following desired fields: { "PO Number": "PO-003847945", "Total Amount": "1,485.00", "Delivery Address": "Peera Consumer Good Co.(QSC), P.O.Box 3371, Dohe, QAT" } Run Sample Code Prerequisites: Java 16+ Maven Steps: Create an OpenAI account. Log in and generate an API key. Replace OPENAI_API_KEY in Main.java with your key. Update SAMPLE_PDF_FILE if needed. Execute the code and view the results from the output.

By Tho Luong CORE
Open-Source Authorization as a Service
Open-Source Authorization as a Service

Background Information The story starts back in 2007 when our founders, Omri Gazitt and Gert Drapers, were working on what would eventually become Azure Active Directory. At that time, Active Directory was a keystone workload for Windows Server. It enabled IT admins to map users and groups into the roles that enterprise apps exposed. However, when enterprise software moved to the cloud, there was no longer a server operating system that could authenticate the user and keep track of what groups they’re a member of. As a result, every cloud application was forced to reinvent both authentication and authorization. The Azure Access Control Service and Azure Active Directory were early efforts towards reimagining identity and access for the age of SaaS and cloud. Fine-Grained Access Control as a Service Fast forward fifteen years, and we now have an interoperable identity fabric built on standards like OAuth2, OpenID Connect, SAML, and JWT, supported by all major cloud platforms. In addition, companies like Okta, Auth0, OneLogin, and PingID have developed cloud-neutral solutions, so no one has to reinvent login. Authorization, on the other hand, remains widely underserved. In mid 2020, when Omri and Gert were searching for the next hard problem to solve, they immediately thought of creating a definitive solution for application and API access control. CTOs and VPs of engineering confirm that authorization is a pain point. They find themselves continuously building and rebuilding their access control systems based on ever evolving requirements. IT is frustrated with every app authorizing differently, based on separate sets of permissions, data, and backend models. Having to navigate through dozens of consoles to manage policies is no walk in the park either. Omri and Gert knew there had to be a better way. Modern Authorization Let’s start by defining the problem. Authentication is the process of proving who you are to a system. It can be done through a combination of user ID and password, or through single sign-on (SSO), multi-factor authentication, or biometrics. Authorization, or access control, is downstream from authentication. It is the process of evaluating what a logged in user can do in the context of your application. This is a different problem than authentication and it is surprisingly complex. In the beginning, simple, rudimentary roles, like admin and viewer, might suffice. But, as you grow and evolve your application and onboard more sophisticated customers with advanced requirements, these simple roles simply don’t cut it anymore. At that point, you need fine-grained access control. Modern authorization is fine-grained, policy-based, real-time access control. It is based on your resource hierarchy and domain model, and streams real-time data to local decision points to allow for millisecond enforcement based on the latest data. It is secure by default and employs principles of least privilege. It is easy to integrate and fits into your environments. Most importantly, it offers developers the flexibility to evolve access control models and policies as needs evolve. Open-Source Access Control System We built Aserto leveraging the best open-source, cloud-native projects, including Open Policy Agent (OPA), which is the basis of our decision engine. Rather than invent our own policy engine and language, we chose to join the OPA ecosystem, which has a good general-purpose decision engine. We’re focusing Aserto on solving the hard problem, which is: how do you build scalable API and application authorization on top of those open-source assets? One of the hardest challenges is getting data from policy information points to the decision engine, caching it to enable execution over local data, but keeping it in sync with the source. Aserto solves this problem by streaming user attribute and resource information to the policy decision points that are in your cloud in real-time, so you can make authorization decisions in milliseconds and based on real-time data. Our open-source strategy is what we call “open edge:” Topaz, the authorizer software that applications call to make authorization decisions, is open-source, while our control plane is proprietary. Our control plane is where we feel we add value for organizations that are looking to coordinate or manage the lifecycle of their policies, connect all their identity providers, bring all that data to the edge, and bring decision logs from the edge back to the control plane. Policy CLI Along the way, we’ve created some general-purpose open-source projects to help move the ecosystem forward. By default, OPA policies are built into tarballs. The Policy CLI brings a Docker-like workflow for building OPA policies into OCI images, which you can sign with cosign, to provide a secure software supply chain for your policy images. Policy-as-Code and Policy-as-Data There’s an interesting debate in the industry between two ecosystems—we call them “policy-as-code” and “policy-as-data.” Policy-as-code: Argues you can define everything in your policy. You can build general purpose rules, and use a logic engine to evaluate those rules, and decide whether or not this user has permission to perform this operation on this resource. Policy-as-data: Stems from a belief that most access control problems fit within a relationship-based model where the rule structure relates a subject, action, and object, essentially constructing a relationship graph between subjects and objects. This isn’t a new idea, but it has been revived by the Google Zanzibar paper, which describes how they built the permissioning system for Google Docs. We don’t believe this is an either-or: you actually get a more interesting and flexible system when you combine the two. The Aserto directory is built around the Zanzibar model, where you can define a set of object types like organizations, projects, teams, folders, or lists. You can then define a set of subject types like users and groups. Finally, you create relation types that connect the two, and hang permissions off of those relationships. But if you want to extend the relationship-based model with attribute based access control rules, you can create a single policy that does both. Bringing these concepts together is the foundation for a flexible system that will grow with you. A system that lets you start simple, but scales with your requirements over time. Conclusion Today, every cloud application is forced to build and rebuild its own access control system. Application authorization seems simple at first, but it is surprisingly complex. Authorization is in the critical path of every application request, and getting the most up-to-date data to the decision engine to allow for millisecond decisions is a distributed systems problem that most engineering teams simply cannot justify solving.

By Noa Shavit
Top 7 Trends in Open-Source Technology in 2022
Top 7 Trends in Open-Source Technology in 2022

Open-source technology refers to a certain type of technology or software which is distributed among the masses with its source code, enabling programmers to change the behavior of an application or program. If a programmer has access to the source code of particular software, he can amend, examine, and made changes to that software by upgrading it with new features or fixing a broken part to increase its efficiency. The idea of open-source technologies emerged in 1983 when Richard Stallman, a programmer and a researcher at MIT, floated the idea that technology should be open-source. He wanted more freedom for programmers, as he believed that programmers can create better versions of the software and bring revolutionary changes to the technology if they are provided with the source code. This idea led to the creation of the Open Source Initiative, or OSI, in 1998 (Adey, 2021). Over the years, many things have changed and new trends have emerged in the open-source software domain, giving birth to new ideas and creating more opportunities for programmers so they can learn, adapt, and implement their teachings and contribute to the well-being of open-source software (Wallen, 2022). Let’s look at some of the biggest trends about to happen in the open-source technology realm which will change this industry for the years to come. 1. Demand for People with Open-Source IT Skills Will Be on the Rise There will be a surge in demand for full-stack developers and IT persons having open-source skills. The diversity in stacks used in the development, modeling, and operations of software systems will provide programmers and developers with massive opportunities to enhance their skill set. Experience in the fields of cloud computing, DevOps tooling, Kubernetes, Python, PyTorch, and so on will enable programmers to optimize businesses and increase their revenue. Businesses and enterprises, whether large or small, will be looking to fill the gaps as they invest in the power of open-source technologies and the positive impact they can have in solving the customer’s problems (Kamaruzzaman, 2021). 2. The Adoption Rate for Containers and Kubernetes Will Increase Adoption of Kubernetes (which acts as an open-source container orchestration platform for cloud applications) will be on the rise. This will enable the widespread use of compatible open-source container formats as described in the Open Container Initiative. Although the learning curve required in implementing Kubernetes is massive, things are going smoothly as a large number of IT teams around the globe have realized the true potential this technology holds. Kubernetes is regarded as the most important open-source technology, and its adoption will increase in 2022 (Wallen, 2022). 3. Snap and Flatpak Will Be Accepted on a Larger Scale Snap and Flatpak are both systems designed for distributing Linux apps. Although they have been ridiculed over the course of time, these systems simplify the installation process of applications and make room for more applications on the desktop. Due to Snap and Flatpak, applications like Slack, Spotify, and Skype can be installed without any hassle. These two systems are now needed, and the Linux community will understand their significance sooner or later. In the near future, a distribution that completely defaults to Snap and Flatpak will be released on the App Store. It’s going to be a treat for new users (Kamaruzzaman, 2021). 4. Emphasis on Open-Source Security Will Increase and Attacks on Supply Chains Will Be Averted As the penetration of open-source technology increases in today’s IT world, so does the need to strengthen security measures to prevent cyberattacks on this technology. New tools which can scan vulnerabilities in open-source software are going to be introduced and will be used frequently to mitigate any harm. IT firms and organizations dealing with open-source technology will be investing in acquiring new versions of software and patches that will improve the overall security situation. Hackers intrude in the software supply chain when they find unpatched open-source vulnerabilities where they can insert nasty viruses or software. But this is going to be stopped now. Organizations like the Linux Foundation will step up their game and prevent hackers from achieving their malicious designs. The development of advanced open-source tools such as digital signing services will continue to evolve in 2022 and beyond (Wallen, 2022). 5. The Launch of a New Open-Source Social Network May Occur in 2022 This can be considered a wish that may easily see the light of the day in 2022. A completely new social network that is open-source from all sides is possible, and it can give a tough time to Facebook. This social network could completely alter the ways social networking is performed and will give more freedom to programmers and developers to improve the user experience for the common masses. 6. It's High Time to Implement AI with Full Force Technologies like artificial intelligence (AI), machine learning (ML), deep learning (DL), and data-driven technologies are here to stay and will see a rapid increase in their implementation and execution. AI can be of great help to humans as it can perform dull, monotonous tasks over and over again and save a lot of time for developers and programmers. They can shift their focus to more intelligent tasks at hand. Using GPT-3 and other NLP libraries, AI has the ability to automate these tasks. Several AI assistants are smart enough to generate source code for developers, such as Tabnine, GitHub Copilot, and Codota. They are still in the early phases but are maturing with each passing day (Wallen, 2022). 7. Steam Deck Will Prove that Linux Can Provide a Better Gaming Experience Steam Deck is a portable handheld gaming device, and in 2022, Linux is determined that it can game. Linux is not going to dethrone Windows in the realm of desktop gaming, but it will prove that Linux is also a viable option when it comes to playing games through Steam (Martinez-Torres & Diaz-Fernandez, 2013). References Wallen, J. (2022, January 5). Open-source predictions for 2022: Snap, Flatpak, Centos Stream, Linux job demand and more. TechRepublic. Retrieved January 29, 2022, from https://www.techrepublic.com/article/open-source-predictions-for-2022 / Kamaruzzaman, M. (2021, December 31). 22 predictions about the software development trends in 2022. Medium. Retrieved January 29, 2022, from https://towardsdatascience.com/22-predictions-about-the-software-development-trends-in-2022-fcc82c263788 Adey, O. (2021, December 29). 2022 - Tech Trends 2022: A good year for open source and the cloud. The Latest News. Retrieved January 29, 2022, from https://gettotext.com/tech-trends-2022-a-good-year-for-open-source-and-the-cloud/ Martinez-Torres, M. R., & Diaz-Fernandez, M. C. (2013). Current issues and research trends on open-source software communities. Technology Analysis & Strategic Management, 26(1), 55–68. https://doi.org/10.1080/09537325.2013.850158 Pollock, R. (n.d.). Innovation, imitation and open source. Multi-Disciplinary Advancement in Open Source Software and Processes, 114–127. https://doi.org/10.4018/978-1-60960-513-1.ch008

By Neeraj Agarwal
Monitor Kubernetes Events With Falco For Free
Monitor Kubernetes Events With Falco For Free

Kubernetes is now the platform of choice for many companies to manage their applications both on-premises and in the cloud. Its emergence a few years ago drastically changed the way we work. The flexibility of this platform has allowed us to increase the productivity of the engineering teams, thus requiring new working methods more adapted to this dynamic environment. Kubernetes requested an adaptation of the security control processes to ensure the continuity of the reliability of this system. Falco is a tool that fits into this ecosystem. What Is Falco? Falco is an open-source tool, created by Sysdig, to continuously detect risks and threats on Kubernetes platforms, containers, on-premise systems, and even cloud activity. Falco can be seen as an agent deployed on each node (master and worker) to observe and alert in real-time unexpected behaviors such as configuration changes, intrusions, or data theft. Falco is now supported by the Cloud Native Computing Foundation (CNCF) and a huge community that continues to improve and maintain the project. Falco is mainly used by security engineers (CISO, SRE, Security analysts, etc.) to detect and alert as soon as possible any deviant behavior on any system and potentially automate playbooks to fix any issue detected. To do so, Falco relies on predefined and/or custom rules that a security team can use to extend Falco’s detection range. What Is a Falco Rule? The way Falco manages these rules fits perfectly in the context of the Security as Code methodology, where security and policy decisions are codified to be shared and potentially maintained by multiple teams. Falco rules are the central component of the application to identify the deviant behavior of any component on a cluster. Their definition consists of macros, lists, and conditions defined in the YAML files deployed in the default folder or in a specific directory to be interpreted automatically by Falco at startup. Example of a Falco rule definition to identify the execution of a shell in a container A set of default rules is maintained by the community to help monitor baseline conditions such as: Installation of a package in a container Execution of specific commands in a container like Shell, Bash, ZSH, etc. Manipulation of any file on the filesystem Unexpected SSH connection on remote location Unusual network activities Attempt to start a container with privileged mode This is only a subset of the complete list of rules defined and used by default by the Falco agent. For more information, refer to the YAML file definition available on Github. These rules cover most of the common anomalies identified by the community. They are a good basis for ensuring a minimum of control over what happens in your Kubernetes clusters. However, they can be supplemented by custom rules in order to be integrated into your company’s contexts. For this, it is recommended to take an example on the definition of these rules and to read the online documentation in order to respect the good practices and, obviously, codify and version the whole configurations. Having a project shared with all teams can facilitate the adoption and maintenance of these security policies and therefore their relevance. How To Deploy Falco Falco has several installation options, but the best option on a Kubernetes cluster is the Helm Chart developed and maintained by Sysdig and the Falco community. Falco does not require a lot of resources to operate in optimal conditions: between 150–300Mi of memory and 100m of CPU. It is highly recommended to define a priority class to ensure that the agent runs continuously even on hosts under pressure. How to Monitor Events Falco, in itself, has two functions: the identification of an anomaly based on predefined or customized rules, and the generation of an event in the form of logs for each anomaly detected. These logs are usually sent to "stdout" to be readable via any client (command line or web interface). It is strongly recommended to extract these logs and centralize them on a storage space external to the cluster where they can be processed and ingested by the log management system in order to keep a history of anomalies, but also generate alerts using the available internal tools. However, there are two tools to improve the user experience by allowing you to view detected events in real-time and add a visual aspect to the data and generate alerts. Falcosidekick Falcosidekick is an add-on of Falco developed by Sysdig, to improve the usage and consumption of the events generated by Falco. This tool consists initially of a very simple user interface allowing to consult the anomalies and extract some metrics. This user interface makes it possible to quickly identify the level of criticality of the anomalies in order to prioritize the necessary actions. Viewing alerts in real-time is a good thing, but receiving an alert (mail, chat, SMS, call, etc.) is even better. Falcosidekick has an internal alert system that can be easily connected to a set of alert managers such as Slack, Teams, RocketChat, AlertManager, PagerDuty, OpsGenie, etc. and any SMTP service to get paged on critical issues. This add-on also allows automatic storage of events on an external platform such as Prometheus, Datadog, InfluxDB, ElasticSearch, etc., which is useful for integration with a monitoring system already in place. Another advantage of this application is its integration with event-based serverless systems. This aspect is obviously very important when a security team is mature enough to have automated actions based on a particular event such as the re-creation of a pod in case of detection of the installation of a package in a container. This requires an understanding of the feasibility and the time it takes to set up but brings a significant benefit for the security of a Kubernetes platform. Finally, the last practical aspect of Falcosidekick is its integration with Falco’s Helm Chart as a dependency, facilitating its installation and configuration even if it can be installed as a separate project as shown in the following screenshot: For more information on Falocsidekick, please refer to the GitHub project. Prometheus Exporter Another way to extract data for processing is to use the Prometheus exporter dedicated to events generated by Falco. Falcosidekick can also handle this part, but depending on the context, the exporter can be a better option. The Prometheus exporter is available on Github and can be deployed with Helm as a standalone application: The community also provides a Grafana dashboard to easily render the metrics collected. It is up to you to decide whether the exporter Prometheus or Falcosidekick is better suited to your use case. Both are good options to outsource the anomalies detected by Falco. How To Start With Falco Starting with Falco on Kubernetes is pretty easy, as it comes with a preconfigured Helm Chart to deploy the stack and watch, immediately after the deployment, the unexpected behavior on the platform. However, if you are looking for more information or hands-on before testing it on your platform, take a look at the Falco 101 training delivered by Sysdig for free. This five-hour training passes through all the aspects of the tools like Falcosidekick, Prometheus exporter, Helm, and more to properly guide you throughout your journey with Falco. It is definitely the right way to start with this tool. Next The Falco stack is a great open-source solution to improve the security of a Kubernetes platform. Customizable, easy to install, and maintain, it is the perfect combination to start your journey! For more information on Falco, please refer to this documentation: Falco project website Falco Github project

By Nicolas Giron
BPMN Workflows Version Management With Milestone Camunda Cawemo
BPMN Workflows Version Management With Milestone Camunda Cawemo

This article contains a step-by-step guide on how to manage BPMN file versioning and avoid conflicts in Cawemo when multiple team members are working together. It will also help you to avoid overriding BPMN files while updating and tells you how to sync files between the repository and Cawemo. What Is Milestone? This feature in the Cawemo tool is used to maintain Versioning in BPMN. We can see all of the BPMN history, the creator’s name who has done the changes, and the Last Changed date. We can upload the new version of BPMN without removing/deleting its older version. Steps to Follow: Create a new directory (e.g., Demo) in Cawemo for signed-off deployed BPMN files. Upload working/developed and integrated BPMN (e.g., Demo_BPMN) from code repository/local to the newly created directory as follows. Now create a duplicate copy of the BPMN which needs to update and move to the same in In Review Folder as follows. Move the file into In Review – Demo Folder. Open File in In Review-Demo folder, and rename/remove “copy ” from the file name. Update the file and autosave the same. Update the BPMN Design as per requirement. Click on Milestones. Click on "Create a new milestone"(+ symbol) on the latest version. Do the changes on the same file every time, create a new version, and name the milestone as 0.1, 0.2, etc. Next, complete the following steps: Review the BPMN with Business Analyst, Solution Architect, Designers, and Technology Architect. Download the same signed-off version. Take the same BPMN for development and add configuration. Test the standalone workflow with Postman collection. Handover the tested signed-off version of BPMN to the development team for integration. After successful integration, copy the same with the new Milestone version into the original directory in Cawemo. Alternative Way of Uploading the Updated BPMN Files to Cawemo and Versioning the Milestone: Update the BPMN in Camunda modeler as per the requirement and save it in "local". Go to the respective folder in Cawemo and open the BPMN. Click on the BPMN name (e.g., Replace via upload) and select the appropriate file from your local. It will get uploaded and autosaved. Click on Milestones. Edit the name as 0.1.

By Radhika K

Top Open Source Experts

expert thumbnail

Mark Gardner

Independent Contractor,
The Perl Shop

I've been a software developer since I was ten years old, typing BASIC on a Commodore 64. I help professional Perl developers to engineer modern, disciplined applications in the cloud so they can become experts that write easy-to-maintain code with confidence, increase their relevance in the market and get the best positions, high salaries, and work on interesting projects.
expert thumbnail

Nuwan Dias

VP and Deputy CTO,
WSO2

‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎
expert thumbnail

Radivoje Ostojic

Principal Software Engineer,
BrightMarbles

expert thumbnail

Adam Houghton

Senior Software Developer,
SAS Institute

The Latest Open Source Topics

article thumbnail
Getting Started With Prometheus Workshop: Introduction to the Query Language
Interested in open-source observability? Learn about Prometheus Query Language and how to set up a demo project to provide more realistic data for querying.
March 29, 2023
by Eric D. Schabell CORE
· 756 Views · 2 Likes
article thumbnail
The Current State of Open-Source Careers and Jobs
Despite the current economic climate, open-source jobs are out there, with the tech unemployment rate decreasing by 1.8% in 2023.
March 28, 2023
by Saqib Jan
· 1,027 Views · 1 Like
article thumbnail
The Architecture and MVC Pattern of an ASP.NET E-Commerce Platform
The architecture and source code organization of an e-commerce platform can be clear. Look at the architecture's summary and MVC design pattern.
March 27, 2023
by Dmitriy Kulagin CORE
· 1,266 Views · 2 Likes
article thumbnail
Introduction To OpenSSH
Readers will learn about OpenSSH, which is an open-source suite of secure networking abilities. Readers will learn OpenSSH’s history, background, and functions.
March 27, 2023
by Aditya Bhuyan
· 1,292 Views · 3 Likes
article thumbnail
19 Most Common OpenSSL Commands for 2023
Leverage the power of OpenSSL through our comprehensive list of the most common commands. Easily understand what each command does and why it is important.
March 21, 2023
by Janki Mehta
· 3,020 Views · 4 Likes
article thumbnail
Monitoring Linux OS Using Open Source Real-Time Monitoring HertzBeat
Use the open-source real-time monitoring system HertzBeat to monitor and alarm the Linux operating system, and it will be done in 5 minutes!
March 20, 2023
by gong tom
· 1,918 Views · 2 Likes
article thumbnail
Simulating and Troubleshooting BLOCKED Threads in Kotlin [Video]
As we continue a look into simulating and troubleshooting performance problems in Kotlin, let’s discuss how to make threads go into a BLOCKED state.
March 17, 2023
by Ram Lakshmanan CORE
· 4,209 Views · 2 Likes
article thumbnail
Introducing Remult: The Open Source Backend to Frontend Framework You Always Wanted
In this article, readers will learn about Remult, an open source backend to frontend framework, including the backstory, things we learned, and things to come.
March 14, 2023
by Noam Honig
· 1,894 Views · 2 Likes
article thumbnail
Getting Started With Prometheus Workshop: Installing Prometheus
Interested in open-source observability, but lack the knowledge to dive in? In this post, gain an understanding of available open-source observability tooling.
March 14, 2023
by Eric D. Schabell CORE
· 2,333 Views · 2 Likes
article thumbnail
OWASP Kubernetes Top 10
The OWASP Kubernetes Top 10 puts all possible risks in order of overall commonality or probability.
March 13, 2023
by Nigel Douglas
· 9,326 Views · 7 Likes
article thumbnail
10 Easy Steps To Start Using Git and GitHub
Make your entry into the world of Git and GitHub with this guide! Learn how to set up repository and branches and commit and push changes in 10 simple steps.
March 10, 2023
by Bhavesh Patel
· 3,354 Views · 1 Like
article thumbnail
3 Main Pillars in ReactJS
This article explains ReactJS and the three main pillars of ReactJS, which are component, state, and props.
March 10, 2023
by Muthuramalingam Duraipandi
· 3,869 Views · 1 Like
article thumbnail
Supply Chain Security: What Is SLSA? Part I
Attacks on software supply chains have evolved into dangerous threats. Let’s discuss the SLSA framework to understand where supply chain security is headed.
March 9, 2023
by Tiexin Guo
· 3,456 Views · 1 Like
article thumbnail
Solving the Enduring Pain of Authorization With Aserto’s Co-Founder and CEO, Omri Gazitt
Security requirements such as authorization and access can be a pain. Dev Interrupted interviews Omri for some tips and tricks.
March 8, 2023
by Dan Lines CORE
· 2,895 Views · 2 Likes
article thumbnail
Maven Troubleshooting, Unstable Builds, and Open-Source Infrastructure
This is a story about unstable builds and troubleshooting. More importantly, this story is written to thank all contributors to basic software infrastructure — the infrastructure we all use and take for granted.
March 7, 2023
by Jaromir Hamala
· 3,341 Views · 1 Like
article thumbnail
Open Source Maintenance Is Community Organizing
Yes it’s free, but it’s also a duty to our users. When a user commits to your hobby project, it stops being a hobby. It becomes a product. Read to learn more.
March 7, 2023
by Shai Almog CORE
· 3,338 Views · 2 Likes
article thumbnail
11 Observability Tools You Should Know
This article looks at the features, limitations, and important selling points of eleven popular observability tools to help you select the best one for your project.
March 7, 2023
by Lahiru Hewawasam
· 10,300 Views · 10 Likes
article thumbnail
Exploring Google's Open Images V7
Dig into the new features in Google's Open Images V7 dataset using the open-source computer vision toolkit FiftyOne!
March 6, 2023
by Jacob Marks
· 573 Views · 2 Likes
article thumbnail
Unlock the Power of Jakarta EE With These Awesome Resources!
In this article, we'll provide some of the most helpful resources for getting started or becoming highly productive with Jakarta EE.
March 5, 2023
by Ondro Mihalyi
· 4,308 Views · 1 Like
article thumbnail
Getting Started With Prometheus Workshop: Introduction to Prometheus
Interested in open-source observability but lack the knowledge to just dive right in? If so, learn about a workshop that is designed for you!
March 3, 2023
by Eric D. Schabell CORE
· 3,800 Views · 2 Likes
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: