*You* Can Shape Trend Reports: Join DZone's Developer Experience Research + Enter the Prize Drawing!
Bye Tokens, Hello Patches
Observability and Performance
The dawn of observability across the software ecosystem has fully disrupted standard performance monitoring and management. Enhancing these approaches with sophisticated, data-driven, and automated insights allows your organization to better identify anomalies and incidents across applications and wider systems. While monitoring and standard performance practices are still necessary, they now serve to complement organizations' comprehensive observability strategies. This year's Observability and Performance Trend Report moves beyond metrics, logs, and traces — we dive into essential topics around full-stack observability, like security considerations, AIOps, the future of hybrid and cloud-native observability, and much more.
Java Application Containerization and Deployment
Software Supply Chain Security
A few years ago, at my previous company, I found myself on a familiar quest: hunting down a specific Jira issue. What I discovered was both amusing and alarming — three versions of the same problem statement, each with different solutions spaced four to six months apart. Every solution was valid in its context, but the older ones had become obsolete. This scenario is all too common in the software development world. New ideas constantly emerge, priorities shift, and tasks often get put on hold. As a result, the same issues resurface repeatedly, leading to a chaotic backlog with multiple solutions for identical problems. This clutter makes it challenging to grasp our true roadmap and impedes our ability to achieve objectives. Digging deeper, we found hundreds of issues languishing in our backlog. In a rather humorous yet sobering realization, our back-of-the-envelope calculations suggested it would take over eight years to address all those features. From this experience, we concluded that any issue left unresolved for over six months should be deemed outdated and permanently deleted. If a problem remains unsolved for that long, it likely needs to be revisited and addressed from scratch. Based on these lessons, I'd like to share some best practices to help you manage your development process more effectively. Issues in Backlog: No Longer Than 4 Months Trello recommends regular backlog grooming to keep issues relevant and actionable. Why It's Important Issues that linger in the backlog for more than four months often lose context and priority. Regularly grooming the backlog ensures that the team focuses on the most valuable tasks and avoids unnecessary clutter. You can read more about how Jira recommends backlog grooming here. When the team is aware of the age of a Jira issue, it encourages them to be more mindful of prioritization and to spend their time more effectively. Regular reviews and a four-month limit on backlog items help maintain a clear, actionable roadmap and ensure that outdated tasks are addressed promptly. Techniques such as weighted scoring or the MoSCoW method (Must have, Should have, Could have, Won't have) can be useful in prioritizing tasks effectively. Backlog ages like milk, not like wine. Branches: No Longer Than a Month GitHub recommends regularly merging branches to avoid merge conflicts. Engineering leader: "Jamie, what’s the status of this client request?" Developer: "It’s all done, currently in QA. It should go out in a day or so." A week later: Engineering leader: "Jamie, did we push this feature to the client?" Developer: "Sorry, it’s waiting for another feature due to some merge conflicts." If you’ve ever experienced this, you know how frustrating it can be. Regularly merging branches can prevent these delays by ensuring that code conflicts are addressed early, maintaining code quality, and keeping the development process smooth. Keeping branches open for more than a month can lead to significant merge conflicts and integration issues. Regularly merging branches helps maintain code quality, reduces technical debt, and ensures that the team is working on the latest version of the code. Best Practices for Managing Branches 1. Frequent Integration Integrate changes at least once a week, if not more frequently, to ensure your branch doesn't diverge significantly from the main codebase and avoid merge conflicts. 2. Small, Incremental Changes Make small, incremental changes rather than large, sweeping updates. This makes it easier to integrate and review code, reduces the risk of conflicts, and speeds up and improves the review process. Epics: No Longer Than a Quarter Atlassian recommends breaking down epics that can be completed within a quarter. The higher up in the hierarchy, engineering leaders rely heavily on epics to understand where the team's efforts are being invested. Often, I’ve noticed that some epics, like those for technical debt or enhancements, end up containing hundreds of issues. These catch-all epics bloat badly because engineers are forced to associate every issue with an epic. As a result, it becomes difficult to distinguish between efforts spent on roadmap items versus technical debt or KTLO (Keep The Lights On) tasks. This leads to epics dragging on for years, making tracking progress difficult. Long epics can become unwieldy and difficult to manage. Teams can maintain momentum and deliver incremental value by ensuring that epics are completed within a quarter. This practice also facilitates better planning and tracking, ensuring that large projects are broken down into manageable parts that can be tackled effectively. Read more about this in Atlassian's guide. Best Practices for Managing Epics 1. Define Clear Boundaries Ensure each epic has a well-defined scope and objective. Avoid using catch-all epics by creating specific epics for distinct tasks. 2. Regular Review and Pruning Regularly review and break down large epics. Move completed tasks out and create new epics for ongoing work to keep the list manageable. For example, you can have a tech debt epic for every quarter. 3. Prioritize and Categorize Clearly categorize epics based on their purposes, such as roadmap items, technical debt, or KTLO. This helps in tracking where the team's efforts are being invested. 4. Limit Epic Duration Set a time limit for how long an epic can remain open. This ensures that long-term tasks are broken down into achievable milestones, facilitating better progress tracking. By managing epics effectively, engineering leaders can gain better insights into the team's workload, ensure that efforts are aligned with strategic goals, and reduce the risk of bloated, unmanageable epics. Tickets: No Longer Than a Sprint Scrum recommends that user stories should be completable within a sprint, usually 2-4 weeks. One of the challenges in agile development for both developers and managers is dealing with issues that spill over from sprint to sprint. In sprint 1, 20% of the work is done; in sprint 2, 30% is completed, and so on, but some issues always get carried over. The story points for these issues keep changing, setting wrong expectations for product managers and stakeholders. This can be demotivating for developers as it feels like progress is being stalled when, in reality, the ticket is simply too large for a single sprint. Instead, these large tickets should be treated as epics, broken down into multiple issues, and spread across sprints to set the right expectations. Keeping tickets manageable within a sprint ensures that tasks are bite-sized and achievable, leading to more predictable progress and faster delivery cycles. This practice also helps maintain team morale and clear focus. Ideally, each ticket or user story should deliver value to the end user and be independently completed within a given sprint. Implementing these time-based best practices can significantly enhance your software development process, ensuring that projects stay on track and deliver value consistently. By keeping tasks and initiatives timely, you can maintain focus, reduce waste, and drive continuous improvement.
With the advent of open-source software and the acceptance of these solutions in creating complex systems, the ability to develop applications that can run seamlessly across multiple hardware platforms becomes inherently important. There is a constant need to develop the software on one architecture but have the capability to execute these on other target architectures. One common technique to achieve this is cross-compilation of the application for the target architecture. Cross-compilation is significant in embedded systems where the intent is to run applications on specialized hardware like ARM and PowerPC boards. These systems are resource-constrained and hence a direct compilation is not an option. Thus, developers will leverage the common x86 architecture as a host and use toolchains specific to the target hardware, generating binaries compatible with the target hardware. This article covers one such case study where cross-compilation of an open-source package was done for PowerPC. The article will cover the details of the tools and toolchains leveraged and a step-by-step tutorial on how cross-compilation was achieved for this architecture. The Problem Statement Given a target board with PowerPC architecture, the intent was to add L3 routing capabilities to this board. For this purpose, a popular open-source routing protocol suite, FRRouting (FRR), was considered. FRR is a protocol suite that enables any Linux machine to behave like a full-fledged router. It is packaged for amd64, arm64, armhf, and i386 but not for PowerPC. This necessitated FRR’s cross-compilation. Build hosttarget host CPU Arch x86_64 Powerpc(32-bit) Operating System Ubuntu 18.4 QorIQ SDK CPU 12 2 (e5500) RAM 12GB 1GB Table 1. Difference in Build and Target platform The Cross-Compilation Journey There are two major stages in cross-compilation: Configuring the Build Environment and Pre-Compiled Toolchain 1. Install the required build tools in the environment. Common build tools include autoconf, make, cmake, build-essentials, pkg-config, libtool, etc. 2. Set up the pre-compiled toolchain specific to the target host environment. CPU/Board vendors provide their own architecture-specific toolchains. The target board-specific toolchain was obtained from the vendor’s product website. 3. The toolchain comes with an environment file, which is used to set the environment variables like CC, GCC, PKG_CONFIG_PATH, etc, that are required for cross-compilation. Edit the environment file /opt/fsl-qoriq/2.0/environment-setup-ppce5500-fsl-linux and update the path of variables with respect to the path of the toolchain directory. Shell export SDKTARGETSYSROOT=/opt/fsl-qoriq/2.0/sysroots/ppce5500-fsl-linux export PATH=/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin:/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin/../x86_64-fslsdk-linux/bin:/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin/powerpc-fsl-linux:/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin/powerpc-fsl-linux-uclibc:/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin/powerpc-fsl-linux-musl:$PATH export CCACHE_PATH=/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin:/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin/../x86_64-fslsdk-linux/bin:/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin/powerpc-fsl-linux:/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin/powerpc-fsl-linux-uclibc:/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr/bin/powerpc-fsl-linux-musl:$CCACHE_PATH export PKG_CONFIG_SYSROOT_DIR=$SDKTARGETSYSROOT export PKG_CONFIG_PATH=$SDKTARGETSYSROOT/usr/lib/pkgconfig export CONFIG_SITE=/opt/fsl-qoriq/2.0/site-config-ppce5500-fsl-linux export PYTHONHOME=/opt/fsl-qoriq/2.0/sysroots/x86_64-fslsdk-linux/usr unset command_not_found_handle export CC="powerpc-fsl-linux-gcc -m32 -mhard-float -mcpu=e5500 --sysroot=$SDKTARGETSYSROOT" export CXX="powerpc-fsl-linux-g++ -m32 -mhard-float -mcpu=e5500 --sysroot=$SDKTARGETSYSROOT" export CPP="powerpc-fsl-linux-gcc -E -m32 -mhard-float -mcpu=e5500 --sysroot=$SDKTARGETSYSROOT" export AS="powerpc-fsl-linux-as " export LD="powerpc-fsl-linux-ld --sysroot=$SDKTARGETSYSROOT" export GDB=powerpc-fsl-linux-gdb export STRIP=powerpc-fsl-linux-strip export RANLIB=powerpc-fsl-linux-ranlib export OBJCOPY=powerpc-fsl-linux-objcopy export OBJDUMP=powerpc-fsl-linux-objdump export AR=powerpc-fsl-linux-ar export NM=powerpc-fsl-linux-nm export M4=m4 export TARGET_PREFIX=powerpc-fsl-linux- export CONFIGURE_FLAGS="--target=powerpc-fsl-linux --host=powerpc-fsl-linux --build=x86_64-linux --with-libtool-sysroot=$SDKTARGETSYSROOT" export CFLAGS=" -O2 -pipe -g -feliminate-unused-debug-types" export CXXFLAGS=" -O2 -pipe -g -feliminate-unused-debug-types" export LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed" export CPPFLAGS="" export KCFLAGS="--sysroot=$SDKTARGETSYSROOT" export OECORE_DISTRO_VERSION="2.0" export OECORE_SDK_VERSION="2.0" export ARCH=powerpc export CROSS_COMPILE=powerpc-fsl-linux- Resolving Dependencies in Pre-Compiled Toolchain Each software has its own set of dependencies(tools/libraries), which needs to be resolved before the cross-compilation. Depending on the package availability of these tools/libraries in the target architecture, there are two options to resolve them. Either install them directly from available packages or cross-compile them as well from the source. Specific to the FRR compilation, libpcre2, libyang, clippy, libelf, and json-c libraries need to be built from the source. Aside from these, protobuf and libcap library packages were available for PPC(PowerPC) arch, which can be installed directly into the toolchain. 1. Installing Libraries from Packages Library packages available for the target architecture can be installed into the toolchain sysroot using two ways: The first way uses Ubuntu/Debian-based dpkg-debtool for directly installing Debian packages as mentioned below: Shell $ dpkg-deb -x <pkg_name>.deb <toolchain_directory_path> #Example of libcap: $ wget http://launchpadlibrarian.net/222364908/libcap-dev_2.24-12_powerpc.deb $ dpkg-deb -x libcap-dev_2.24-12_powerpc.deb /opt/fsl-qoriq/2.0/sysroots/ppce5500-fsl-linux/ Note: Download all dependency packages and install them in order.Library packages may get installed to different directory structure. Copy those library files to the correct directories as per the toolchain.In the second way debian/rpm packages are extracted and manually placed to the toolchain directory path as mentioned below: For extracting debian package, use ar and tartools as mentioned below: Plain Text $ ar -x <package>.deb $ tar -xJf data.tar.xz Note: This method is useful for systems without dpkg-deb support.For rpm package extraction, use rpm2cpiotool as the command below: Plain Text $ rpm2cpio <package>.rpm | cpio -idmv Package extraction and file placement with libcap example: Shell # Extract .deb package $ ar -x libcap-dev_2.24-12_powerpc.deb # Three files will be extracted (control.tar.gz, data.tar.xz, debian-binary) $ ls control.tar.gz data.tar.xz debian-binary libcap-dev_2.24-12_powerpc.deb # data.tar.xz has the package’s program files # control.tar.gz has metadata of package # debian-binary has version of deb file format # Untar data.tar.xz to extract the package’s program files. $ tar -xJf data.tar.xz # NOTE: rpm2cpio <package>.rpm will directly extract program files. # Do the same for all dependent debian or rpm packages at the same path, this will extract all program files and symlinks required in same directory structure. # Extracted files under usr/lib, usr/include and usr/bin directories should be copied to toolchain_directory_path’s /usr directory, alongside the existing files already present. $ cp usr/include/sys/capability.h /opt/fsl-qoriq/2.0/sysroots/ppce5500-fsl-linux/usr/include/sys/ To verify if the packages/libraries got installed successfully, run the command below: Shell # Make sure to export PKG_CONFIG_PATH variable from toolchain’s env file $ pkg-config --list-all | grep <package_library_name> Note: Install pkg-config if not already installed. 2. Cross-Compiling Libraries Library packages are not available for the target arch and are compiled from the source. Before starting compilation, load the environment file packaged with the toolchain to set all the necessary parameters required for cross-compilation. Shell $ source <env_file_path> Follow the compilation steps given in the library's README file. Additionally, set the below parameters in the build procedure steps: Set --host parameter when running ./configurescript Shell $ ./configure --host=<target_host_parameter> # Example: $ ./configure --host=powerpc-fsl-linux Note: <target_host_parameter> is the system for which the library/tool is being built. It can be found in the environment file of the toolchain. It is a common prefix found in $CC, $LD, etc.There will be two types of dependent libraries. One is only required for the compilation process, and the other is a dependency requirement for execution at the target host. Set the --host parameter accordingly.When using "make install" for building dependent libraries, set DESTDIR to the toolchain sysroot directory. Shell $ make DESTDIR=<toolchain_directory_path> install Examlple: $ make DESTDIR=/opt/fsl-qoriq/2.0/sysroots/ppce5500-fsl-linux/ install Conclusion The differences in system architectures, libraries, dependencies, and toolchains make cross-compilation a complex technique to execute. To ease out complexities, this article uncovers the phases of cross-compilation. FRR and PowerPC were taken as examples for the desired software and the target hardware, respectively. However, the steps covered in this article provide a reasonable strategy for the cross-compilation of any software on a specific hardware. The environment settings, the toolchain, and the dependencies will vary based on the requirements.
Data analytics teams, plenty of times, would have to do long-term trend analysis to study patterns over time. Some of the common analyses are WoW (week over week), MoM (month over month), and YoY (year over year). This would usually require data to be stored across multiple years. However, this takes up a lot of storage and querying across years worth of partitions is inefficient and expensive. On top of this, if we have to do user attribute cuts, it will be more cumbersome. To overcome this issue, we can implement an efficient solution using datelists. What Are Datelists? The most commonly used data formats in Hive are the simple types like int, bigint, varchar, and boolean. However, there are other complex types like Array<int>, Array<boolean>, and dict<varchar, varchar>, which give us more flexibility in terms of what we can achieve. To create a datelist, we simply store the metric values from different date partitions in each index position of an array with a start_date column to indicate the date corresponding to index 0 in the array. E.g., a datelist array would look like [5, 3, 4] with start_date column value as, say, 10/1, which means the first value 5 in index 0 corresponds to the metric value that was recorded on 10/1 and so on. If you look at the table below, you will see how traditional systems store data, where each row corresponds to each transaction that occurred. This causes redundancy, which can be avoided by transforming this data into a datelist format. Traditional Data Storage Datelist Data Storage As you must have noticed, the number of rows has significantly reduced as there is no redundancy, i.e., each row within a particular date partition would have only one row per user. This is because we have aggregated all the different metric values corresponding to a user into a single array. Designing a Datelist Designing a datelist involves joining the metric values of a user from today’s source table with yesterday’s target table and storing the corresponding results again into the target table but into today’s partition value. If it's a new user who has not yet been active, then we will create an empty array with all zeroes whose length would be the difference between start_date and today. Then, we would append today’s metric value to this newly created array. If the user already exists in yesterday’s partition, we simply append it to the already existing array. For example, if start_date is 10/1, and if a user first appears on 10/3, their array would be initialized as [0, 0], and the value for 10/3 would be appended, resulting in [0, 0, 7]. Sequence of Events From 10/1 to 10/4, the datelist grows as follows: On 10/1: On 10/2: Each day, as the data pipeline runs, the array would keep growing in length indefinitely. You can put some limits as to how long you want the array to be, i.e., if you are only interested in doing WoW analysis for the last 6 months, the array can be trimmed to fit those needs and also update the start_date value accordingly every time the job runs. But this is not really necessary as arrays are generally very efficient, so even if it's a long query, it shouldn’t cause any performance issues. Here is a simple example of how to set up the SQL query to create a datelist in Presto: SQL WITH today_source AS ( SELECT * FROM ( VALUES ('2024-10-02', 123, 10), ('2024-10-02', 234, 45) ) AS nodes (ds, userid, time_spent) ), yest_target AS ( SELECT * FROM ( VALUES ('2024-10-01', 123, ARRAY[4]) ) AS nodes (ds, userid, time_spent) ) SELECT '2024-10-01' AS dateid, userid, COALESCE(y.time_spent, ARRAY[0]) || t.time_spent AS time_spent_datelist FROM today_source t FULL OUTER JOIN yest_target y USING (userid) Which would yield an output like this: Querying the Datelist Table to Calculate Ln(n = 1, 7, 28, …) Metrics SQL SELECT id, ARRAY_SUM(SLICE(metric_values, -1, 1)) AS L1, ARRAY_SUM(SLICE(metric_values, -7, 7)) AS L7, ARRAY_SUM(SLICE(metric_values, -28, 28)) AS L28, * FROM dim_table_a WHERE ds = '<LATEST_DS>' In the above example, we look at a sample SQL query where we can easily calculate the L1, L7, and L28 results of a metric by simply querying the latest partition from the table and using a slice to get the subset of the array that we need and sum it. This helps in reducing the retention of a table just by maintaining as little as 7 days of partitions. We would be able to do an analysis that spans across years. Benefits Storage savings: We get considerable storage savings as we don't have to store partitions beyond 7-10 days, as we have all the data we need to be compressed into the array, which could span across years as we store only one user per row per date partition.Long-term trend analysis: Simpler query, as we just fetch the data from the latest partition and sum the subset of values needed from the datelist array for long-term analysis.Privacy compliance: If we need to delete a user record (for example, they deactivated their account, so we can’t store/use their data anymore), then we just have to delete it from a few partitions instead of having to clean it up across various partitions, especially if it's a tenured user.Fast processing and reduced compute: The time complexity would be O(n), and storage would be ‘retention value’ * O(n), where n is the number of users active on the app. Conclusion Datelists are a valuable tool that every data engineer can take advantage of. They are easy to implement and maintain, and the benefits we get out of them are vast. However, we need to be cautious about backfills. We need to build a framework that can properly update the right index values in the array. However, once this is tested and implemented, we can simply reuse it whenever backfills are required.
Threading is a fundamental concept in modern programming that allows applications to perform multiple operations concurrently. Rust, with its focus on memory safety and zero-cost abstractions, provides powerful tools for handling concurrent operations. In this article, we'll explore how threading works in Rust through practical examples. Introduction to Threading in Rust Rust's threading model is designed with safety in mind. The language's ownership and type systems help prevent common concurrent programming mistakes like data races at compile time. This approach makes concurrent programming more reliable and easier to reason about. Single-Threaded Execution: A Starting Point Let's begin with a simple example of sequential execution. Consider this code snippet from a command-line application: Plain Text fn count_slowly(counting_number: i32) { let handle = thread::spawn(move || { for i in 0..counting_number { println!("Counting slowly {i}!"); thread::sleep(Duration::from_millis(500)); } }); if let Err(e) = handle.join() { println!("Error while counting: {:?}", e); } } This code creates a single thread that counts up to a specified number, with a delay between each count. The thread::spawn function creates a new thread and returns a JoinHandle. The move keyword is crucial here as it transfers ownership of any captured variables to the new thread. The handle.join() call ensures our main thread waits for the spawned thread to complete before proceeding. This is important for preventing our program from terminating before the counting is finished. Parallel Execution: Leveraging Multiple Threads Now, let's examine a more sophisticated approach using parallel execution: Plain Text async fn count_fast(counting_number: i32) { let mut counting_tasks = vec![]; for i in 0..counting_number { counting_tasks.push(async move { println!("Counting in parallel: {i}"); sleep(Duration::from_millis(500)).await; }); } join_all(counting_tasks).await; println!("Parallel counting complete!"); } This implementation demonstrates a different approach using async/await and parallel task execution. Instead of using a single thread, we create multiple asynchronous tasks that can run concurrently. The join_all function from the futures crate allows us to wait for all tasks to complete before proceeding. Understanding the Key Differences The key distinction between these approaches lies in how they handle concurrent operations. The first example (count_slowly) uses a traditional threading model where a single thread executes sequentially. This is suitable for operations that need to maintain order or when you want to limit resource usage. The second example (count_fast) leverages Rust's async/await syntax and the tokio runtime to handle multiple tasks concurrently. This approach is more efficient for I/O-bound operations or when you need to perform many similar operations in parallel. Thread Safety and Rust's Guarantees One of Rust's strongest features is its compile-time guarantees around thread safety. The ownership system ensures that data can only be mutated in one place at a time, preventing data races. For example, in our count_fast implementation, the move keyword ensures each task owns its copy of the loop variable i, preventing any potential data races. Best Practices and Considerations When implementing threading in Rust, consider these important factors: Thread creation is relatively expensive, so spawning thousands of threads isn't always the best solution. The async approach count_fast is often more scalable as it uses a thread pool under the hood. Error handling is crucial in threaded applications. Notice how our count_slowly implementation properly handles potential thread panics using if let Err(e) = handle.join(). Thanks to its ownership system, resource cleanup is automatic in Rust, but you should still be mindful of resource usage in long-running threads. Conclusion Threading in Rust provides a powerful way to handle concurrent operations while maintaining safety and performance. Through the examples of count_slowly and count_fast, we've seen how Rust offers different approaches to concurrency, each with its use cases and benefits. Whether you choose traditional threading or async/await depends on your specific requirements for ordering, resource usage, and scalability. By understanding these concepts and following Rust's safety guidelines, you can write concurrent code that is both efficient and reliable. The combination of Rust's ownership system and threading capabilities makes it an excellent choice for building high-performance, concurrent applications.
Google Maps is probably the first thing that comes to mind when considering a routing and distance calculation solution. However, its pricing may discourage its use in open-source projects or projects with severe budget constraints. This article will present two alternatives encapsulated by a free library known as router4j. Geospatial APIs As stated by its developer, Ryan McCaffery, the Geospatial API or simply geo.dev is: a prototype to experiment with Geospatial Data from OpenStreetMap and explore Cloudflare Serverless Services. The API offers three endpoints, as detailed in its documentation. For the scope of this article, only two are presented: Text Search: Search the world with any text.Distance: Calculate the distance in a straight line or as indicated in the documentation: as the crow flies. Pros The project does not require any API key to use the endpoints.There is no restriction on the number of requestsLow global latency for requests and responses Cons As assumed by its developer, the project is a prototype. So, there is no guarantee of API uptime.There is no route distance calculation — the most common distance used by applications that need to deal with the path length between two geolocated points.The Text Search endpoint is more imprecise than other APIs. As an example, searching for "Curitiba, Paraná," the smartest city in the world in the year 2023, the API returns seven records. Other APIs hit the nail on the head — just one record. Open Route Service The Open Route Service (ORS) is a project maintained by the Heidelberg Institute for Geoinformation Technology. According to the official documentation, the API: consumes user-generated and collaboratively collected free geographic data, directly from OpenStreetMap. The project is open source and freely available for all to download and contribute to on GitHub. The API is made up of nine endpoints, all of which are well-documented. Using the API requires registration to obtain a private key. Users can control the API usage through the dashboards available after login. An example is the "Token quota" table, which indicates the number of requests consumed and the number left. The quota is renewed every 24 hours. Token quota This article will focus only on endpoints analogous to those provided by the Geospatial API. Geocode Search Structured: Returns a formatted list of objects corresponding to the search input.Matrix: Returns duration or routing distance matrix for multiple source and destination points. Pros It is a well-supported and stable project which already serves relevant clients.Free token quota is suitable for small and medium-sized projects. Broader limits may be granted to humanitarian, academic, government, or non-profit organizations. Cons The need to obtain an API token to consume the endpoints.The token quota is renewed only after 24 hours. Router4j Router4j is an open-source project that creates an abstraction layer over APIs focused on calculating routes and distances. Its first version includes only Geospatial and ORS APIs. To use the library in a Java project, simply add the Maven dependency to the pom.xml file. XML <dependency> <groupId>io.github.tnas</groupId> <artifactId>router4j</artifactId> <version>1.0.0</version> </dependency> The RouterApi main interface of router4j provides four methods: Java public interface RouterApi { Distance getRoadDistance(Point from, Point to, String apiKey); Locality getLocality(String name, String region, String apiKey); Locality getLocality(String name, String region, String country, String apiKey); ApiQuota getApiQuota(); } The next code snippet describes how to look up the geographic coordinates of a place — "Curitiba, Paraná." The code will use the Geospatial API under the hood, as indicated by the first command that instantiates the GeoDevRouterApi class. Note that no API key is passed to the method getLocality as the underlying API does not require a private token. Java RouterApi geoDevRouterApi = new GeoDevRouterApi(); Locality locality = geoDevRouterApi.getLocality("Curitiba", "Paraná", null); assertEquals(7, locality.getLocations().length); var location = Stream.of(locality.getLocations()) .filter(l -> l.getName().equals("Curitiba")) .findFirst() .orElse(null); assertNotNull(location); assertEquals(-49.28433, location.getPoint().getLongitude()); assertEquals(-25.49509, location.getPoint().getLatitude()); assertEquals("Curitiba", location.getName()); assertEquals("South Region", location.getRegion()); The code to calculate the distance is analogous to the above. But, in this case, as the ORS is the subjacent API, it is mandatory to pass the token - API Key. Java String apiKey = "ORS_API_TOKEN"; RouterApi orsRouterApi = new OrsRouterApi(); Point from = PointBuilder.newBuilder().apiType(ApiType.ORS) .longitude(-49.279708).latitude(-25.46005) .build(); Point to = PointBuilder.newBuilder().apiType(ApiType.ORS) .longitude(-50.311719).latitude(-23.302293) .build(); Distance distance = orsRouterApi.getRoadDistance(from, to, apiKey); assertEquals(382.56, distance.getValue()); assertEquals(-25.46005, distance.getFrom().getLatitude()); assertEquals(-49.279708, distance.getFrom().getLongitude()); assertEquals(-23.302293, distance.getTo().getLatitude()); assertEquals(-50.311719, distance.getTo().getLongitude()); assertEquals(Metric.KM, distance.getMetric()); Conclusion A very useful feature of the Google Maps API is the route distance calculation. There would be no downside to integrating it into applications (web and mobile) if it weren't for its pricing policy. In view of this, router4j arises as a very simple alternative to just one of the many features of Google Maps. As the project is in its early stages, only two underlying APIs are covered by the proposed abstraction layer. Despite this, the library can be a good option for projects that can use the ORS endpoints within the limits defined by the free quota.
Traditional internal developer platforms (IDPs) have transformed how organizations manage code and infrastructure. By standardizing workflows through tools like CI/CD pipelines and Infrastructure as Code (IaC), these platforms have enabled rapid deployments, reduced manual errors, and improved developer experience. However, their focus has primarily been on operational efficiency, often treating data as an afterthought. This omission becomes critical in today's AI-driven landscape. While traditional IDPs excel at managing infrastructure, they fall short when it comes to the foundational elements required for scalable and compliant AI innovation: Governance: Ensuring data complies with policies and regulatory standards is often a manual or siloed effort.Traceability: Tracking data lineage and transformations across workflows is inconsistent, if not entirely missing.Quality: Validating data for reliability and AI readiness lacks automation and standardization. To meet these challenges, data must be elevated to a first-class citizen within the IDP. A data-first IDP goes beyond IaC, directly embedding governance, traceability, quality, and Policy as Code (PaC) into the platform's core. This approach transforms traditional automation into a comprehensive framework that operationalizes data workflows alongside infrastructure, enabling Data Products as Code (DPaC). This architecture supports frameworks like the Open Data Product Specification (ODPS) and the Open Data Contract (ODC), which standardize how data products are defined and consumed. While resource identifiers (RIDs) are critical in enabling traceability and interoperability, the heart of the data-first IDP lies in meta-metadata, which provides the structure, rules, and context necessary for scalable and compliant data ecosystems. What (declarative definitions) The Data-First Approach: Extending Automation Templates and recipes are critical technologies that enable the IDP to achieve a high level of abstraction and componentize the system landscape. A recipe is a parameterized configuration, IaC, that defines how specific resources or workloads are provisioned, deployed, or managed within the platform. Recipes are customized and reusable to fit particular contexts or environments, ensuring standardization while allowing flexibility for specific use cases. A template is a group of recipes forming a "Golden Path" for developers. An architectural design pattern, such as a data ingestion pattern, for either Streaming, API, or File, the template creates a manifest, which is built, validated, and executed in the delivery plane. A Data-First IDP adds the "Data Product" specification as a component, a resource, and, therefore, a recipe to the IDP; this could be a parameterized version of the ODPS and ODC. The lifecycle and management of software are far more mature than that of data. The concept of a DPaC goes a long way toward changing this; it aligns the maturity of data management with the well-established principles of software engineering. DPaC transforms data management by treating data as a programmable, enforceable asset, aligning its lifecycle with proven software development practices. By bridging the maturity gap between data and software, DPaC empowers organizations to scale data-driven operations with confidence, governance, and agility. As IaC revolutionized infrastructure, DPaC is poised to redefine how we manage and trust our data. The Data Marketplace, discussed in the previous article, is a component, a resource, and a recipe, which may rely on other services such as observability, a data quality service, and a graph database, which are also components and part of the CI/CD pipeline. Governance and Engineering Baseline Governance and engineering baselines can be codified into policies that are managed, versioned, and enforced programmatically through PaC. By embedding governance rules and engineering standards into machine-readable formats (e.g., YAML, JSON, Rego), compliance is automated, and consistency across resources. Governance policies: Governance rules define compliance requirements, access controls, data masking, retention policies, and more. These ensure that organizational and regulatory standards are consistently applied.Engineering baselines: Baselines establish the minimum technical standards for infrastructure, applications, and data workflows, such as resource configurations, pipeline validation steps, and security protocols. The Role of RIDs While meta-metadata drives the data-first IDP, RIDs operationalize its principles by providing unique references for all data-related resources. RIDs ensure the architecture supports traceability, quality, and governance across the ecosystem. Facilitating lineage: RIDs are unique references for data products, storage, and compute resources, allowing external tools to trace dependencies and transformations.Simplifying observability: This allows objects to be tracked across the landscape. Example RID Format rid:<context>:<resource-type>:<resource-name>:<version> Data product RID: rid:customer-transactions:data-product:erp-a:v1.0Storage RID: rid:customer-transactions:storage:s3-bucket-a:v1.0 Centralized Management and Federated Responsibility With Community Collaboration A data-first IDP balances centralized management, federated responsibility, and community collaboration to create a scalable, adaptable, and compliant platform. Centralized governance provides the foundation for consistency and control, while federated responsibility empowers domain teams to innovate and take ownership of their data products. Integrating a community-driven approach results in a dynamically evolving framework to meet real-world needs, leveraging collective expertise to refine policies, templates, and recipes. Centralized Management: A Foundation for Consistency Centralized governance defines global standards, such as compliance, security, and quality rules, and manages critical infrastructure like unique RIDs and metadata catalogs. This layer provides the tools and frameworks that enable decentralized execution. Standardized Policies Global policies are codified using PaC and integrated into workflows for automated enforcement. Federated Responsibility: Shift-Left Empowerment Responsibility and accountability are delegated to domain teams, enabling them to customize templates, define recipes, and manage data products closer to their sources. This shift-left approach ensures compliance and quality are applied early in the lifecycle while maintaining flexibility: Self-service workflows: Domain teams use self-service tools to configure resources, with policies applied automatically in the background.Customization within guardrails: Teams can adapt central templates and policies to fit their context, such as extending governance rules for domain-specific requirements.Real-time validation: Automated feedback ensures non-compliance is flagged early, reducing errors and fostering accountability. Community Collaboration: Dynamic and Adaptive Governance The environment encourages collaboration to evolve policies, templates, and recipes based on real-world needs and insights. This decentralized innovation layer ensures the platform remains relevant and adaptable: Contributions and feedback: Domain teams contribute new recipes or propose policy improvements through version-controlled repositories or pull requests.Iterative improvement: Cross-domain communities review and refine contributions, ensuring alignment with organizational goals.Recognition and incentives: Teams are incentivized to share best practices and reusable artifacts, fostering a culture of collaboration. Automation as the Enabler Automation ensures that governance and standards are consistently applied across the platform, preventing deviation over time. Policies and RIDs are managed programmatically, enabling: Compliance at scale: New policies are integrated seamlessly, validated early, and enforced without manual intervention.Measurable outcomes Extending the Orchestration and Adding the Governance Engine A data-first IDP extends the orchestration engine to automate data-centric workflows and introduces a governance engine to enforce compliance and maintain standards dynamically. Orchestration Enhancements Policy integration: Validates governance rules (PaC) during workflows, blocking non-compliant deployments.Resource awareness: Uses RIDs to trace and enforce lineage, quality, and compliance Data automation: Automates schema validation, metadata enrichment, and lineage registration. Governance Engine Centralized policies: Defines compliance rules as PaC and applies them automatically.Dynamic enforcement: Monitors and remediates non-compliance, preventing drift from standards.Real-time feedback: Provides developers with actionable insights during deployment. Together, these engines ensure proactive compliance, scalability, and developer empowerment by embedding governance into workflows, automating traceability, and maintaining standards over time. The Business Impact Governance at scale: Meta-metadata and ODC ensure compliance rules are embedded and enforced across all data products.Improved productivity: Golden paths reduce cognitive load, allowing developers to deliver faster without compromising quality or compliance.Trust and transparency: ODPS and RIDs ensure that data products are traceable and reliable, fostering stakeholder trust.AI-ready ecosystems: The framework enables reliable AI model training and operationalization by reducing data prep and commoditizing data with all the information that adds value and resilience to the solution. The success of a data-first IDP hinges on meta-metadata, which provides the foundation for governance, quality, and traceability. Supported by frameworks like ODPS and ODC and operationalized through RIDs, this architecture reduces complexity for developers while meeting the business's needs for scalable, compliant data ecosystems. The data-first IDP is ready to power the next generation of AI-driven innovation by embedding smart abstractions and modularity.
When creating a new app or service, what begins as learning just one new tool can quickly turn into needing a whole set of tools and frameworks. For Python devs, jumping into HTML, CSS, and JavaScript to build a usable app can be daunting. For web devs, many Python-first backend tools work in JavaScript but are often outdated. You’re left with a choice: Stick with JavaScript or switch to Python for access to the latest features. FastHTML bridges the gap between these two groups. For Python devs, it makes creating a web app straightforward — no JavaScript required! For web devs, it makes creating a Python app quick and easy, with the option to extend using JavaScript — you’re not locked in. As a web developer, I’m always looking for ways to make Python dev more accessible. So, let’s see how quickly we can build and deploy a FastHTML app. I’ll follow the image generation tutorial and then deploy it to Heroku. Let’s go! Intro to FastHTML Never heard of FastHTML before? Here’s how FastHTML describes itself: FastHTML is a new next-generation web framework for fast, scalable web applications with minimal, compact code. It’s designed to be: Powerful and expressive enough to build the most advanced, interactive web apps you can imagine.Fast and lightweight, so you can write less code and get more done.Easy to learn and use, with a simple, intuitive syntax that makes it easy to build complex apps quickly. FastHTML promises to enable you to generate usable, lightweight apps quickly. Too many web apps are bloated and heavy, requiring a lot of processing and bandwidth for simple tasks. Most web apps just need something simple, beautiful, and easy to use. FastHTML aims to make that task easy. You may have heard of FastAPI, designed to make creating APIs with Python a breeze. FastHTML is inspired by FastAPI’s philosophy, seeking to do the same for front-end applications. Opinionated About Simplicity and Ease of Use Part of the FastHTML vision is to “make it the easiest way to create quick prototypes, and also the easiest way to create scalable, powerful, rich applications.” As a developer tool, FastHTML seems to be opinionated about the right things — simplicity and ease of use without limiting you in the future. FastHTML gets you up and running quickly while also making it easy for your users. It does this by selecting key core technologies such as ASGI and HTMX. The 'foundations page' from FastHTML introduces these technologies and gives the basics (though you don’t need to know about these to get started). Get Up and Running Quickly The tutorials from FastHTML offer several examples of different apps, each with its own use case. I was curious about the Image Generation App tutorial and wanted to see how quickly I could get a text-to-image model into a real, working app. The verdict? It was fast. Really fast. In less than 60 lines of code, I created a fully functioning web app where a user can type in a prompt and receive an image from the free Pollinations text-to-image model. Here’s a short demo of the tutorial app: In this tutorial app, I got a brief glimpse of the power of FastHTML. I learned how to: Submit data through a formInteract with external APIsDisplay some loading text while waiting What’s impressive is that it only took one tiny Python file to complete this, and the final app is lightweight and looks good. Here’s the file I ended up with: Python from fastcore.parallel import threaded from fasthtml.common import * import os, uvicorn, requests, replicate from PIL import Image app = FastHTML(hdrs=(picolink,)) # Store our generations generations = [] folder = f"gens/" os.makedirs(folder, exist_ok=True) # Main page @app.get("/") def home(): inp = Input(id="new-prompt", name="prompt", placeholder="Enter a prompt") add = Form(Group(inp, Button("Generate")), hx_post="/", target_id='gen-list', hx_swap="afterbegin") gen_list = Div(id='gen-list') return Title('Image Generation Demo'), Main(H1('Magic Image Generation'), add, gen_list, cls='container') # A pending preview keeps polling this route until we return the image preview def generation_preview(id): if os.path.exists(f"gens/{id}.png"): return Div(Img(src=f"/gens/{id}.png"), id=f'gen-{id}') else: return Div("Generating...", id=f'gen-{id}', hx_post=f"/generations/{id}", hx_trigger='every 1s', hx_swap='outerHTML') @app.post("/generations/{id}") def get(id:int): return generation_preview(id) # For images, CSS, etc. @app.get("/{fname:path}.{ext:static}") def static(fname:str, ext:str): return FileResponse(f'{fname}.{ext}') # Generation route @app.post("/") def post(prompt:str): id = len(generations) generate_and_save(prompt, id) generations.append(prompt) clear_input = Input(id="new-prompt", name="prompt", placeholder="Enter a prompt", hx_swap_oob='true') return generation_preview(id), clear_input # URL (for image generation) def get_url(prompt): return f"https://image.pollinations.ai/prompt/{prompt.replace(' ', '%20')}?model=flux&width=1024&height=1024&seed=42&nologo=true&enhance=true" @threaded def generate_and_save(prompt, id): full_url = get_url(prompt) Image.open(requests.get(full_url, stream=True).raw).save(f"{folder}/{id}.png") return True if __name__ == '__main__': uvicorn.run("app:app", host='0.0.0.0', port=int(os.getenv("PORT", default=5000))) Looking for more functionality? The tutorial continues, adding some CSS styling, user sessions, and even payment tracking with Stripe. While I didn’t go through it all the way, the potential is clear: lots of functionality and usability without a lot of boilerplate or using both Python and JavaScript. Deploy Quickly to Heroku Okay, so now that I have a pure Python app running locally, what do I need to do to deploy it? Heroku makes this easy. I added a single file called Procfile with just one line in it: Shell web: python app.py This simple text file tells Heroku how to run the app. With the Procfile in place, I can use the Heroku CLI to create and deploy my app. And it’s fast… from zero to done in less than 45 seconds. With two commands, I created my project, built it, and deployed it to Heroku. And let’s just do a quick check. Did it actually work? And it’s up for the world to see! Conclusion When I find a new tool that makes it easier and quicker to build an app, my mind starts spinning with the possibilities. If it’s that easy, then maybe next time I need to spin up something, I can do it this way and integrate it with this tool and that other thing. So much of programming is assembling the right tools for the job. FastHTML has opened the door to a whole set of Python-based applications for me, and Heroku makes it easy to get those apps off my local machine and into the world. That said, several of the foundations of FastHTML are new to me, and I look forward to understanding them more deeply as I use it more. I hope you have fun with FastHTML and Heroku! Happy coding!
GenAI Logic using ApiLogicServer has recently introduced a workflow integration using the n8n.io. The tool has over 250 existing integrations and the developer community supplies prebuilt solutions called templates (over 1000) including AI integrations to build chatbots. GenAI Logic can build the API transaction framework from a prompt and use natural language rules (and rule suggestions) to help get the user started on a complete system. Eventually, most systems require additional tooling to support features like email, push notifications, payment systems, or integration into corporate data stores. While ApiLogicServer is an existing API platform, writing 250 integration endpoints with all the nuances of security, transformations, logging, and monitoring — not to mention the user interface — would require a huge community effort. ApiLogicServer found the solution with n8n.io (one of many workflow engines on the market). What stands out is that n8n.io offers a community version using a native Node.js solution for local testing (npx n8n) as well as a hosted cloud version. N8N Workflow In n8n, you create a Webhook from ApiLogicServer object which creates a URL that can accept an HTTP GET, POST, PUT, or DELETE, with added basic authentication (user: admin, password: p) to test the webhook. The Convert to JSON block provides a transformation of the body (a string) into a JSON object using JavaScript. The Switch block allows routing based on different JSON payloads. The If Inserted block decides if the Employee was an insert or update (which is passed in the header). The SendGrid blocks register a SendGrid API key and format an email to send (selecting the email from the JSON using drag-n-drop). Finally, the Respond to Webhook returns a status code of 200 to the ApiLogicServer event. Employees, Customers, and Orders are all sent to the same Webhook Configuration There are two parts to the configuration. The first is the installation of the workflow engine n8n.io (either on-premise, Docker, or cloud), and then the creation of the webhook object in the workflow diagram (http://localhost:5678). This will generate a unique name and path that is passed to the ApiLogicServer project in the config/config.py directory; in this example, a simple basic authorization (user/password). Note: In an ApiLogicServer project integration/n8n folder, this sample JSON file is available to import this example into your own n8n project! Webhook Output ApiLogicServer Logic and Webhook The real power of this is the ability to add a business logic rule to trigger the webhook, adding some configuration information (n8n server, port, key, and path plus authorization). So the actual rule (after_flush_row_event) is called anytime an insert event occurs on an API endpoint. The actual implementation is simply a call to the Python code to post the payload (e.g., requests.post(url=n8n_webhook_url, json=payload, headers=headers)). Configuration to call n8n webhook config/config.py: Python wh_scheme = "http" wh_server = "localhost" # or cloud.n8n.io... wh_port = 5678 wh_endpoint = "webhook-test" # from n8n Webhook URL wh_path = "002fa0e8-f7aa-4e04-b4e3-e81aa29c6e69" # from n8n Webhook URL token = "YWRtaW46cA==" #base64 encode of user/pasword admin:p N8N_PRODUCER = {"authorization": f"Basic {token}", "n8n_url": \ f'"{wh_scheme}://{wh_server}:{wh_port}/{wh_endpoint}/{wh_path}"'} # Or enter the n8n_url directly: N8N_PRODUCER = {"authorization": f"Basic \ {token}","n8n_url":"http://localhost:5678/webhook-test/002fa0e8-f7aa-4e04-b4e3-e81aa29c6e69"} #N8N_PRODUCER = None # comment out to enable N8N producer Call a business rule (after_flush_row_event) on the API entity: Python def call_n8n_workflow(row: Employee, old_row: Employee, logic_row: LogicRow): """ Webhook Workflow: When Employee is inserted = post to n8n webhook """ if logic_row.is_inserted(): status = send_n8n_message(logic_row=logic_row) logic_row.log(status) Rule.after_flush_row_event(on_class=models.Emploee, calling=call_n8n_workflow) Declarative Logic (Rules) ApiLogicServer is an open-source platform based on SQLAlchemy ORM and Flask. The SQLAlchemy provides a hook (before flush) that allows LogicBank (another open-source tool) to let developers declare "rules." These rules fall into 3 categories: derivations, constraints, and events. Derivations are similar to spreadsheet rules in that they operate on a selected column (cell): formula, sums, counts, and copy. Constraints operate on the API entity to validate the row and will roll back a multi-table event if the constraint test does not pass. Finally, the events (early, row, commit, and flush) allow the developer to call "user-defined functions" to execute code during the lifecycle of the API entity. The WebGenAI feature (a chatbot to build applications) was trained on these rules to use natural language prompts (this can also be done in the IDE using Copilot). Notice that the rules are declared and unordered. New rules can be added or changed and are not actually processed until the state change of the API or attribute is detected. Further, these rules can impact other API endpoints (e.g., sums, counts, or formula) which in turn can trigger constraints and events. Declarative rules can easily be 40x more concise than code. Natural language rules generated by WebGenAI: Python Use LogicBank to enforce the Check Credit requirement: 1. The Customer's balance is less than the credit limit 2. The Customer's balance is the sum of the Order amount_total where date_shipped is null 3. The Order's amount_total is the sum of the Item amount 4. The Item amount is the quantity * unit_price 5. The Item unit_price is copied from the Product unit_price Becomes these Rules logic/declary_logic.py #ApiLogicServer: basic rules - 5 rules vs 200 lines of code: # logic design translates directly into rules Rule.constraint(validate=Customer, as_condition=lambda row: row.Balance <= row.CreditLimit, error_msg="balance ({round(row.Balance, 2)}) exceeds credit ({round(row.CreditLimit, 2)})") # adjust iff AmountTotal or ShippedDate or CustomerID changes Rule.sum(derive=Customer.Balance, as_sum_of=Order.AmountTotal, where=lambda row: row.ShippedDate is None and row.Ready == True) # adjust iff Amount or OrderID changes Rule.sum(derive=Order.AmountTotal, as_sum_of=OrderDetail.Amount) Rule.formula(derive=OrderDetail.Amount, as_expression=lambda row: row.UnitPrice * row.Quantity) # get Product Price (e,g., on insert, or ProductId change) Rule.copy(derive=OrderDetail.UnitPrice,from_parent=Product.UnitPrice) SendGrid Email N8N has hundreds of integration features that follow the same pattern. Add a node to your diagram and attach the input, configure the settings (here, a SendGrid API key is added), and test to see the output. The SendGrid will respond with a messageId (which can be returned to the caller or stored in a database or Google sheet). Workflows can be downloaded and stored in GitHub or uploaded into the cloud version. SendGrid input and output (use drag and drop to build email message) AI Integration: A Chatbot Example The community contributes workflow "templates" that anyone can pick up and use in their own workflow. One template has the ability to take documents from S3 and feed them to Pinecone (a vector data store). Then, use the AI block to link this to ChatGPT — the template even provides the code to insert into your webpage to make this a seamless end-to-end chatbot integration. Imagine taking your product documentation in Markdown and trying this out on a new website to help users understand how to chat and get answers to questions. AI workflow to build a chatbot Summary GenAI Logic is the new kid on the block. It combines the power of AI chat, natural language rules, and API automation framework to instantly deliver running applications. The source is easily downloaded into a local IDE and the work for the dev team begins. With the API in place, the UI/UX team can use the Ontimze (Angular) framework to "polish" the front end. The developer team can add logic and security to handle the business requirements. Finally, the integration team can build the workflows to meet the business use case requirements. ApiLogicServer has a Kafka integration for producers and consumers. This extends the need for real-time workflow integration and can produce a Kafka message that a consumer can start the workflow (and log, track, and retry if needed). N8N provides an integration space that gives ApiLogicServer new tools to meet most system integration needs. I have also tested Zapier webhook (a cloud-based solution) which works the same way. Try the WebGenAI for free to get started building apps and logic from prompts.
Snowflake Cortex enables seamless integration of Generative AI (GenAI) capabilities within the Snowflake Data Cloud. It allows organizations to use pre-trained large language models (LLMs) and create applications for tasks like content generation, text summarization, sentiment analysis, and conversational AI — all without managing external ML infrastructure. Prerequisites for Snowflake Cortex Setup Snowflake Environment Enterprise Edition or higher is required as a baseline for using advanced features like External Functions and Snowpark. Cortex Licensing Specific License: Snowflake Cortex requires an additional license or subscription. Ensure you have the Cortex license as part of your Snowflake. External Integration and Data Preparation Set up secure API access to LLMs (e.g., OpenAI or Hugging Face) for embedding and text generation.Prepare clean data in Snowflake tables and configure networking for secure external function calls. Key Features of Snowflake Cortex for GenAI Pre-Trained LLMs Access to pre-trained models for text processing and generation, like OpenAI’s GPT models or Snowflake's proprietary embeddings. Text Embeddings Generate high-dimensional vector embeddings from textual data for semantic search, clustering, and contextual understanding. Vector Support Native VECTOR data type to store embeddings, perform similarity comparisons, and optimize GenAI applications. Integration With SQL Leverage Cortex functions (e.g., EMBEDDINGS, MATCH, MATCH_SCORE) directly in SQL queries. Use Case: Build a Product FAQ Bot With GenAI Develop a GenAI-powered bot to answer product-related questions using Snowflake Cortex. Step 1: Create a Knowledge Base Table Start by storing your FAQs in Snowflake. SQL CREATE OR REPLACE TABLE product_faq ( faq_id INT, question STRING, answer STRING, question_embedding VECTOR(768) ); Step 2: Insert FAQ Data Populate the table with sample questions and answers. SQL INSERT INTO product_faq (faq_id, question, answer) VALUES (1, 'How do I reset my password?', 'You can reset your password by clicking "Forgot Password" on the login page.'), (2, 'What is your return policy?', 'You can return products within 30 days of purchase with a receipt.'), (3, 'How do I track my order?', 'Use the tracking link sent to your email after placing an order.'); Step 3: Generate Question Embeddings Generate vector embeddings for each question using Snowflake Cortex. SQL UPDATE product_faq SET question_embedding = EMBEDDINGS('cortex_default', question); What this does is: Converts the question into a 768-dimensional vector using Cortex’s default LLM.Stores the vector in the question_embedding column. Step 4: Query for Answers Using Semantic Search When a user asks a question, match it to the most relevant FAQ in the database. SQL SELECT question, answer, MATCH_SCORE(question_embedding, EMBEDDINGS('cortex_default', 'How can I reset my password?')) AS relevance FROM product_faq ORDER BY relevance DESC LIMIT 1; Explanation The user’s query ('How can I reset my password?') is converted into a vector.MATCH_SCORE calculates the similarity between the query vector and FAQ embeddings.Returns the most relevant answer. Step 5: Automate Text Generation Use GenAI capabilities to auto-generate answers for uncovered queries. SQL SELECT GENERATE_TEXT('cortex_default', 'How do I update my email address?') AS generated_answer; What this does is: Generates a text response for the query using the cortex_default LLM.Can be stored back in the FAQ table for future use. Advanced Use Cases Document Summarization Summarize lengthy product manuals or policy documents for quick reference. SQL SELECT GENERATE_TEXT('cortex_default', 'Summarize: Return policy allows refunds within 30 days...') AS summary; Personalized Recommendations Combine vector embeddings with user preferences to generate personalized product recommendations. SQL SELECT product_name, MATCH_SCORE(product_embedding, EMBEDDINGS('cortex_default', 'Looking for lightweight gaming laptops')) AS relevance FROM product_catalog ORDER BY relevance DESC LIMIT 3; Chatbot Integration Integrate Cortex-powered GenAI into chat applications using frameworks like Streamlit or API connectors. Best Practices Optimize Embedding Generation Use cleaned, concise text to improve embedding quality.Preprocess input text to remove irrelevant data. Use VECTOR Indexes Speed up similarity searches for large datasets: SQL CREATE VECTOR INDEX faq_index USING cortex_default ON product_faq (question_embedding) Monitor Model Performance Track MATCH_SCORE to assess query relevance.Fine-tune queries or improve data quality for low-confidence results. Secure Sensitive Data Limit access to tables and embeddings containing sensitive or proprietary information. Batch Processing for Scalability Process embeddings and queries in batches for high-volume use cases. Benefits of Snowflake Cortex for GenAI No Infrastructure Overhead Use pre-trained LLMs directly within Snowflake without managing external systems. Seamless Integration Combine GenAI capabilities with Snowflake’s data analytics features. Scalability Handle millions of embeddings or GenAI tasks with Snowflake’s scalable architecture. Flexibility Build applications like chatbots, recommendation engines, and content generators. Cost-Effective Leverage on-demand GenAI capabilities without investing in separate ML infrastructure. Next Steps Extend: Add advanced use cases like multi-lingual support or real-time chat interfaces.Explore: Try other Cortex features like clustering, sentiment analysis, and real-time text generation.Integrate: Use external tools like Streamlit or Flask to build user-facing applications. Snowflake Cortex makes it easy to bring the power of GenAI into your data workflows. Whether you’re building a chatbot, summarizing text, or creating personalized recommendations, Cortex provides a seamless, scalable platform to achieve your goals.
The industry's increasing focus on secure container images is undeniable. Companies like Chainguard — specializing in delivering container images free of CVEs — have demonstrated the demand by recently raising an impressive $140 million at a $1.1 billion valuation. In the open-source ecosystem, Cloud Native Buildpacks, an incubating CNCF project, and their vibrant communities deliver a comparable value proposition by automating the creation of optimized and secure container images. In this article, I'll explore Buildpack's core concepts, comparing them with Docker to illustrate their functionality and highlight how they provide a community-driven alternative to the value Chainguard brings to container security. What Are Buildpacks? Buildpacks automate the process of preparing your application for deployment, detecting dependencies, building runtime artifacts, and packaging everything into a container image. They abstract the manual effort to build images efficiently. In other words, if Docker allows you to define how a container is built through a Dockerfile explicitly, Buildpacks operate at a higher level of abstraction. They offer opinionated defaults that help developers ship production-ready images quickly. Comparing a Few Concepts Buildpacks do containerization differently and more efficiently. For those unfamiliar with the technology, let's review a few key Docker concepts and see how they translate to the Buildpacks world. Entrypoint and Start Commands In a Dockerfile, the ENTRYPOINT or CMD specifies the command that runs when the container starts. For example: Dockerfile CMD ["java", "-jar", "app.jar"] Buildpacks abstract this step; you have nothing to do. They automatically detect the appropriate start command for your application based on the runtime and build process. For example, when using a Java Buildpack, the resulting image includes logic to start your application with java -jar app.jar or a similar command. You don't need to configure it explicitly; Buildpacks "just know" how to start applications based on best practices. Writing a Dockerfile The concept of not doing anything goes even further; you don't even need to write the equivalent of a Dockerfile. Buildpacks will take care of everything that is needed to containerize your application into an OSI image. Multi-Stage Builds That abstraction is not coming at the cost of optimization. For example, multi-stage builds are a common technique in Docker to create lean images by separating the built environment from the runtime environment. For instance, you might compile a Java binary in one stage and copy it to a minimal base image in the final stage: Dockerfile # Build stage FROM maven:3.8-openjdk-11 as builder WORKDIR /app COPY . . RUN mvn package # Runtime stage FROM openjdk:11 COPY --from=builder /app/target/app.jar /app.jar CMD ["java", "-jar", "/app.jar"] Buildpacks handle the equivalent of multi-stage builds behind the scenes. During the build process, they: Detect your application's dependenciesBuild artifacts (e.g., compiled binaries for Java)Create a final image with only the necessary runtime components This is again done automatically, requiring no explicit configuration. About Security Let's jump into the security part and explore a few ways that the Buildpacks ecosystem can be seen as an OSS alternative to Chainguard. Non-Root Containers Running containers as non-root users is a best practice to improve security. In Dockerfiles, this typically involves creating a new user and configuring permissions. Buildpacks enforce non-root execution by default. The resulting container image is configured to run as an unprivileged user, with no extra effort required from the developer. CVEs Security is a significant focus for open-source Buildpack communities like Paketo Buildpacks, and Google Cloud. What these communities offer could be seen as the open-source alternative to Chainguard. By default, Buildpacks use pre-configured, community-maintained base images that are regularly updated to eliminate known vulnerabilities (CVEs). For example, Paketo Buildpacks stacks (build image and run image) are rebuilt whenever a package is patched to fix a CVE, and every stack is rebuilt weekly to ensure packages without CVEs are also up to date. The community releases stack updates that fix high and critical CVEs within 48 hours of the patch release and two weeks for low and medium CVEs. SBOM Buildpacks can provide an SBOM to describe the dependencies that they provide. It supports three ways to report SBOM data: CycloneDX, SPDX, or Syft. Paketo Buildpacks also uses SBOM generation to provide a detailed record of all dependencies in the images they provide, making it easier to track and audit components for vulnerabilities. A Solid OSS Chainguard Alternative Buildpacks offer a simple, secure, and standardized way to create production-ready container images, making them a potential cornerstone of platform engineering strategy. By automating tasks like dependency management, non-root execution, and security updates, Buildpacks provide a community-driven alternative to commercial security solutions like Chainguard. For teams looking to streamline workflows and enhance container security without the complexity of Dockerfiles and the cost and limitations of Chainguard, Buildpacks can be a solid starting point.
Best Gantt Chart Libraries for React
January 16, 2025 by
Forensic Product Backlog Analysis: A New Team Exercise
January 16, 2025 by CORE
Branches to Backlogs: Implementing Effective Timeframes in Software Development
January 15, 2025 by
ArangoDB: Achieving Success With a Multivalue Database
January 16, 2025 by CORE
Understanding Leaderless Replication for Distributed Data
January 16, 2025 by CORE
Feature Flags in .NET 8 and Azure
January 16, 2025 by CORE
Feature Flags in .NET 8 and Azure
January 16, 2025 by CORE
You Need to Validate Your Databases
January 16, 2025 by CORE
ISO 27001 vs SOC 2: Understanding the Differences
January 16, 2025 by
ArangoDB: Achieving Success With a Multivalue Database
January 16, 2025 by CORE
Build Your First Chrome Extension With Rust and WebAssembly
January 16, 2025 by
Best Gantt Chart Libraries for React
January 16, 2025 by
Feature Flags in .NET 8 and Azure
January 16, 2025 by CORE
You Need to Validate Your Databases
January 16, 2025 by CORE
Mastering Observability in 10 Minutes Using OpenSearch
January 16, 2025 by
Best Gantt Chart Libraries for React
January 16, 2025 by
January 15, 2025 by
Data-First IDP: Driving AI Innovation in Developer Platforms
January 15, 2025 by