Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service
Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.
Integration refers to the process of combining software parts (or subsystems) into one system. An integration framework is a lightweight utility that provides libraries and standardized methods to coordinate messaging among different technologies. As software connects the world in increasingly more complex ways, integration makes it all possible facilitating app-to-app communication. Learn more about this necessity for modern software development by keeping a pulse on the industry topics such as integrated development environments, API best practices, service-oriented architecture, enterprise service buses, communication architectures, integration testing, and more.
REST vs. Message Brokers: Choosing the Right Communication
Demystifying Enterprise Integration Patterns: Bridging the Gap Between Systems
Building robust and efficient applications requires a deep understanding of various architectural components in today's rapidly evolving technology landscape. While choices like microservices, monolithic architectures, event-driven approaches, and containerization garner significant attention, one fundamental aspect often overlooked is the persistence layer. This article explores the benefits of the book "Persistence Best Practices for Java Applications" and why the persistence layer is vital for modern applications. The Significance of the Persistence Layer The persistence layer is the part of an application responsible for storing and retrieving data. In Java applications, it plays a role similar to data stores in cloud-native solutions. Here are some key reasons why the persistence layer is crucial in today's application development landscape: Data Integration and Interoperability: Modern applications often need to interact with various data sources, including databases, APIs, and external services. An efficient persistence layer ensures seamless data integration and interoperability, enabling applications to exchange information effortlessly. Readability and Maintainability: A well-designed persistence layer enhances the readability and maintainability of the entire application. By following established patterns and standards, developers can create clean and organized code that is easier to understand and modify. Performance Optimization: The design of the persistence layer can significantly impact an application's performance. Developers can improve performance and responsiveness by implementing proper indexing strategies, optimizing database schema designs, and using caching techniques. Cloud-Native Technologies: With the increasing adoption of cloud-native technologies, applications are expected to be scalable, resilient, and easily deployable in cloud environments. The persistence layer must adapt to these cloud-native principles to ensure the application's success in modern cloud-native architectures. Data Modernization: As applications evolve, data modernization is often needed. The persistence layer plays a vital role in this process by enabling smooth data migration and integration, ensuring that legacy data can coexist with new data sources seamlessly. The book "Persistence Best Practices for Java Applications" offers valuable insights into the world of persistence layers in Java applications. Here are some of the critical benefits of the book: Database Patterns: The book delves into database patterns that help design readable and maintainable architectures for Java applications. These patterns guide how to structure databases effectively, making them a valuable resource for developers. Persistence Challenges: It addresses various challenges developers may encounter in their projects. Developers can build more robust applications by understanding these challenges and mastering techniques to overcome them. Modernization Strategies: In an era of cloud adoption and cost reduction through stack modernization, the book provides strategies for painless application modernization. It explores how cloud-native technologies and event-driven architectures can facilitate modernization with minimal impact on existing legacy systems. Performance Optimization: The book emphasizes the impact of design patterns on program performance. By following best practices, developers can learn how to optimize their applications for better speed and efficiency. Frameworks and Technologies: It provides insights into the role of cloud-native technologies in modern application persistence. Developers can better understand which frameworks and technologies to leverage in their projects. The Issues for Ignoring the Persistence Layer Neglecting the persistence layer in software development can lead to a range of issues and challenges that can affect an application's functionality, performance, and maintainability. Here are some key issues that can arise when the persistence layer is not given the attention it deserves: Data Integrity and Security Concerns: Inadequate attention to the persistence layer can result in data integrity issues. Data can become corrupted or compromised without proper validation and data storage mechanisms, leading to security vulnerabilities. This neglect can also make accessing sensitive information easier for unauthorized users. Reduced Performance: A poorly designed persistence layer can significantly impact an application's performance. Inefficient database queries, lack of indexing, and improper data caching can result in slow response times and decreased user satisfaction. This can be especially problematic in high-traffic or data-intensive applications. Maintenance Challenges: Neglecting the persistence layer makes the application's codebase less maintainable. Over time, developers may find it increasingly difficult to understand and modify the code. This can lead to higher maintenance costs, longer development cycles, and an increased risk of introducing bugs when making changes. Inflexibility: Applications not paying attention to the persistence layer may struggle to adapt to changing requirements or new data sources. This lack of flexibility can limit the application's ability to evolve and integrate with new technologies, such as cloud-native solutions or modern databases. Data Migration and Modernization Hurdles: As technology evolves, there is often a need to migrate or modernize the application's data storage. Neglecting the persistence layer can make this process more challenging and error-prone, potentially leading to data loss or compatibility issues when transitioning to newer systems. Scalability Problems: Inadequate design of the persistence layer can hinder an application's scalability. Scaling the application to handle growing user loads becomes difficult without proper data partitioning, sharding, or clustering strategies. Difficulty in Testing: Neglected persistence layers can be challenging to test thoroughly. This can result in incomplete or inadequate test coverage, making identifying and fixing issues harder before they reach production. Inefficient Resource Utilization: A poorly designed persistence layer may consume excessive server resources, such as CPU and memory. This inefficiency can lead to higher infrastructure costs and may require over-provisioning to maintain acceptable performance levels. Complex Codebase: Neglecting the persistence layer can lead to scattered, ad-hoc data access code throughout the application, making the codebase more complex and less cohesive. This can hinder collaboration among development teams and make it harder to enforce coding standards. Lack of Disaster Recovery and Redundancy: An application may be more susceptible to data loss during hardware failures or disasters without proper attention to data storage and redundancy mechanisms in the persistence layer. Overlooking the persistence layer in software development can result in a wide range of issues that affect data integrity, application performance, maintenance efforts, and the application's ability to adapt to changing technology landscapes. To build robust, efficient, and maintainable applications, prioritizing the persistence layer's design and implementation from the outset is essential. Data as First Citizen Architecture Paying careful attention to the persistence layer in software development yields numerous benefits from both short-term and long-term architectural perspectives. Here are the key advantages: Short-Term Benefits Data Integrity: A well-designed persistence layer ensures data consistency and prevents data corruption or loss. This means your application operates with reliable information, reducing the risk of errors and inaccuracies. Performance Optimization: A finely tuned persistence layer can improve immediate performance. Properly indexed databases and optimized queries result in faster data retrieval, enhancing the user experience and reducing latency. Maintenance Efficiency: Maintenance tasks become more straightforward with a well-structured persistence layer. Developers can quickly understand and modify the code, reducing the time and effort required for updates, bug fixes, and enhancements. Flexibility and Scalability: A thoughtfully designed persistence layer enables your application to scale efficiently. This adaptability is vital when dealing with changing user loads, ensuring your application remains responsive and available. Long-Term Benefits Architectural Integrity: A solid persistence layer contributes to the overall architectural integrity of your application. It ensures that data management adheres to best practices, making it easier to maintain a coherent and maintainable codebase as the application evolves. Compatibility with New Features: As your application evolves, the persistence layer plays a pivotal role in accommodating new features. It can adapt to changes in data requirements, support additional data sources, and enable the integration of new technologies and APIs. Scalability and Future-Proofing: A well-architected persistence layer can help future-proof your application. It enables seamless integration with emerging technologies and data storage solutions, allowing your application to remain relevant and competitive in the long term. Reduced Technical Debt: Prioritizing the persistence layer minimizes technical debt. It avoids accumulating suboptimal data management practices and code, reducing the burden of addressing these issues later, which can be costly and time-consuming. Enhanced Security: A carefully crafted persistence layer can incorporate robust security measures to protect your data. It ensures that sensitive information is appropriately encrypted, access controls are in place, and audit trails can be maintained to meet compliance requirements. Paying attention to the persistence layer in software development brings immediate benefits by improving data integrity, performance, maintenance, and scalability. From a long-term architectural perspective, it ensures the application's adaptability, compatibility with new features, scalability, and reduced technical debt. This proactive approach safeguards your application's integrity, security, and competitiveness as it evolves and meets changing user needs. Conclusion The persistence layer is critical to building successful, scalable, and maintainable applications in today's software development landscape. "Persistence Best Practices for Java Applications" offers a wealth of knowledge and practical guidance for developers, engineers, and software architects, enabling them to master the intricacies of the persistence layer and apply best practices to their Java solutions. By emphasizing the importance of this often overlooked aspect of application development, the book empowers professionals to create high-quality, efficient, and adaptable applications that meet the demands of the modern technology ecosystem.
In a constantly evolving enterprise landscape, integration remains the linchpin for seamless interactions between applications, data, and business processes. As Robert C. Martin aptly said, "A good architecture allows for major decisions to be deferred," emphasizing the need for Agile and adaptable integration strategies. The advent of cloud technologies has fundamentally reimagined how businesses approach integration. While traditional paradigms offer a foundational perspective, cloud-native integration patterns bring a transformative element to the table, reshaping the conventional wisdom around integrating modern business systems. The New Playground: Why Cloud-Native? Cloud-native architecture has become the new frontier for businesses looking to scale, adapt, and innovate in an increasingly interconnected world. But why is going cloud-native such a critical move? One primary reason is scalability. Traditional architectures, while robust, often face limitations in their ability to adapt to fluctuating demands. As Simon Wardley, a researcher in the field of innovation, once observed, "Historically, our approach to creating scalable, reliable systems required building bigger machines." But cloud-native architectures flip this script. They allow organizations to break free from the limitations of monolithic systems, embracing microservices and containers that scale horizontally. Another compelling advantage is resilience. Cloud-native patterns offer fault-tolerance and self-healing capabilities, thanks to the orchestrators that manage the architecture. In a cloud-native environment, if one component fails, it doesn't bring the whole system down. Instead, it either self-heals or redistributes the load to other functioning units, thus maintaining uninterrupted service. As Werner Vogels, Amazon’s CTO, emphasizes, "Everything fails all the time," which underlines the need for architectures designed to manage and adapt to failures gracefully. Furthermore, cloud-native architectures are conducive to continuous delivery and integration, thereby accelerating time-to-market and enhancing the customer experience. The ability to update a single microservice without affecting the entire application landscape means that you can push changes faster, experiment more, and adapt to market needs. Hence, it's not just a playground; it's a strategic arena where businesses can gain a competitive edge. Kubernetes: The Maestro of Cloud-Native Integration What sets Kubernetes apart in the cloud-native landscape? Firstly, it's its portability. Kubernetes is not tied to a specific cloud provider, allowing businesses to avoid vendor lock-in. This gives organizations the flexibility to choose the best environment for their applications, be it on-premises, in the public cloud, or in a hybrid setting. Secondly, Kubernetes enhances operational efficiency. Through its powerful orchestration capabilities, Kubernetes automates many operational tasks, such as load balancing, scaling, and updates. This automation not only reduces the operational burden but also lowers the chances of human error. Lastly, Kubernetes is extensible and adaptable. With a thriving community and a myriad of third-party extensions, Kubernetes can be customized to fit a wide range of use cases, from simple web applications to complex machine-learning models. As cloud-native thought leader Sarah Novotny suggests, "Orchestration and choreography need to be carefully balanced to optimize the efficiency of cloud-native systems," and Kubernetes provides the tools to strike this balance effectively. iPaaS: Integration as a Managed Service Integration Platform as a Service, commonly known as iPaaS, has emerged as a robust, scalable solution for managing and automating data flows between disparate systems. Gartner's Massimo Pezzini, a veteran analyst focusing on integration, has characterized iPaaS as "the cornerstone of any digital transformation strategy," and rightfully so. It enables the seamless exchange of data and functionalities between different services and applications, both on-premises and in the cloud, without the need to install, manage, and maintain middleware. One of the critical benefits of iPaaS is its ease of use, especially when it comes to managing complex integrations. As the term 'as a Service' implies, iPaaS takes a lot of the operational burden off the IT teams. It automates not only the data pipelines but also the maintenance and monitoring aspects, allowing technical teams to focus more on building features rather than troubleshooting integration issues. Furthermore, the flexibility iPaaS offers in terms of connecting various types of software is unparalleled. Whether it's linking legacy systems with new cloud-native applications or integrating SaaS platforms with on-premises databases, iPaaS solutions like Martini are designed to handle multiple scenarios, all while maintaining data consistency and integrity. However, with this ease and flexibility comes the challenge of governance. In an iPaaS environment, managing who has access to what data and keeping a log of data transactions becomes extremely important. Thankfully, leading iPaaS solutions come with built-in features for security and compliance, ensuring that the data flowing through the pipes is secure and compliant with regulatory standards. Event-Driven Architecture in the Cloud Event-driven architecture (EDA) has gained momentum as a core pattern in cloud-native environments. Martin Fowler, one of the leading voices in software architecture, describes EDA as "an architectural paradigm with its own set of concepts, practices, and mechanisms" designed to produce, detect, consume, and react to events. In the cloud, this architecture thrives because it enables applications to be decoupled and to interact asynchronously. The essence of an event-driven architecture in the cloud is its responsiveness. Instead of a traditional request-response model, EDA relies on events to trigger specific actions or reactions. This capability allows for immediate adaptation to changes, be it a user action, system update, or even an external trigger from another integrated system. Another compelling aspect of EDA is its support for real-time analytics and decision-making. With a continuous flow of events, data can be analyzed in real time, offering valuable insights almost instantaneously. This is particularly beneficial for applications requiring immediate response, such as fraud detection systems or customer engagement platforms. Security: The Never-Ending Concern Security remains a critical, ongoing concern in any integration strategy, cloud-native or otherwise. As Bruce Schneier, an internationally renowned security technologist, says, "Security is not a product, but a process." This rings especially true in a world where integration points are proliferating and the surface area for potential attacks is expanding. In a cloud-native landscape, security must be "baked in" rather than "bolted on." This implies incorporating security measures at every layer of the architecture, from the network up to the application level. Modern security practices like zero-trust architecture, identity and access management, and end-to-end encryption are not just optional but mandatory for ensuring a secure environment. However, security doesn't end with implementing measures; it extends to continuous monitoring and updating. The dynamic nature of cloud-native architectures, coupled with an ever-evolving threat landscape, necessitates constant vigilance. Security incident and event management (SIEM) solutions integrated with cloud-native systems offer real-time monitoring and alerts, providing an additional layer of security. For those dealing with highly sensitive data, compliance becomes a significant aspect of security. Cloud-native architectures must adhere to industry-specific regulations such as GDPR, HIPAA, or PCI-DSS, which adds another layer of complexity but is non-negotiable for maintaining trust and integrity. AI and Cloud-Native Integration "Integration is harder than you think, and it's going to be even more critical in the future," James Governor, co-founder of RedMonk, once noted. As we look ahead, technologies like Artificial Intelligence and Machine Learning offer intriguing possibilities. Imagine a self-healing integration pattern that can adapt in real time to workload changes or security threats. The Transformative Role of Cloud-Native Integration Patterns As we stand at the threshold of an era defined by digital transformation, the cloud-native approach to integration patterns isn't just a trend; it's an imperative. It takes the core objective of integration— orchestrating the seamless communication between disparate systems — and elevates it to meet the demands of today's dynamic cloud environments. Gartner's Yefim Natis reminds us that "architectural choices are among the most expensive choices that an enterprise can make." As we step further into the labyrinth of digital interconnectedness, making a well-informed architectural choice has never been more critical. Cloud-native integration patterns are shaping not just how we think about integration but how we conceive, develop, and execute enterprise strategies in a cloud-centric world. The conventional methods may still have their place, but as we advance, the future clearly belongs to cloud-native paradigms. They're not just orchestrating the symphony of interconnected systems; they're composing the future of enterprise landscapes.
Despite the advancement of dApps’ capabilities over the past year, adoption has been slowed by terrible user experience. Users are required to complete a complicated and onerous series of steps—download a wallet, learn about gas costs, obtain tokens to pay gas, save seed phrases, and more. This poses a significant hurdle for users new to the blockchain, or those who are just uncomfortable with holding crypto. Often they just give up. To solve this problem, walletless dApps on Flow have emerged. With this approach, users can easily sign up for dApps using credentials they are already comfortable with (social logins, email accounts). This allows them to get started quickly, without needing to understand the complexities of wallets or blockchain. In this two-part series, we’ll explore how walletless dApps on Flow work. We'll look at some use cases and walk you through the steps of building and deploying your wallet-less web3 dApp using the Flow Wallet API and account abstraction. Here in part one, we'll focus on building the backend for our walletless dApp. In part two, we'll wrap up the walkthrough by building the front end. How Flow Solves Web3 Onboarding Flow is a highly scalable blockchain with a design philosophy that prioritizes mainstream use: easy user logins, mobile-ready, fast development time, 99.99% up-time, and more. It’s made for dApps that have real-world usage Account abstraction (AA) on Flow falls right into this philosophy. AA, in combination with hybrid custody, creates a walletless onboarding experience for users. What does this mean? Typically, accounts on a blockchain are owned either by a user with a private key—called an externally owned account (EOA)—or by a smart contract—called a contract account. Account abstraction combines these ideas. It allows the user to control the wallet while abstracting away the idea of the wallet altogether by letting the contract also have control. On Flow, this is called Hybrid Custody. Use Cases of Account Abstraction There are lots of advantages to this “account delegation.” Since one account (the child/EOA account) can delegate control to another account (the parent/app account), developers can provide users with a seamless onboarding and in-app experience while giving them a sense of actual ownership and self-sovereignty. This is particularly useful for new users who are not familiar with the intricacies of blockchain technology and can help increase adoption and engagement with the application. For example, developers can build apps that create wallets for users, manage keys and transactions, and even abstract away the blockchain altogether. But in the end, the user still has control and ownership over any assets in the wallet. Other use cases include: Key recovery with multi-sig transactions An account may be set up to require multiple signatures to complete a transaction. This creates a wide variety of new use cases and enhances the usability of Web3 apps. Gasless experiences with sponsored transactions The stress that comes with paying fees—before newbies can use a Web3 app and execute a transaction—can be a barrier. Sponsored transactions give room for developers to subsidize these charges on behalf of users. Seamless experiences with bundled transactions Flow’s Cadence programming language, which introduces new features to smart contracts, also separates contracts from transactions. It supports bundling transactions from several contracts into a single transaction at the protocol level. Social logins with wallet-less onboarding + hybrid custody Flow enables its developers to deliver a familiar experience while progressively exposing new users to the benefits of Web3. Users can realize the benefits of both custodial and non-custodial experiences, which is what hybrid custody entails. All this together makes it much easier for users to get started with web3 dApps. Building a Sample Walletless Login dApp So let’s next walk through creating a walletless dApp. We’ll build a dApp that integrates Google social login/signup and creates a Flow wallet for the user on signup time. Here is a quick breakdown of what we’re going to do in part one of this walkthrough: Set up a dockerized Flow Wallet API application to use the Flow Testnet Test the Flow Wallet API Application Then, in part two: Create a New Next.js Application Set Up Prisma for Backend User Management Build the Next.js Application Frontend Functionality Test our Next.js Application Are you ready? Let’s go! Set up Flow Wallet API application To make our lives easier, we’ll use the Flow Wallet API. This is a REST HTTP service that allows developers to quickly integrate wallet functionality into dApps. This API was developed in Golang, so knowledge of Go will help you, though proficiency is unnecessary to make this app work! The Flow Wallet API is currently not maintained. However, at the time of writing, the API was working perfectly. Clone the project folder with the following command: Shell $ git clone https://github.com/flow-hydraulics/flow-wallet-api.git To facilitate the development, we will use Docker. If you don't have it installed on your OS, you can find instructions for it at this link: https://docs.docker.com/engine/install/ From the terminal, navigate to the newly created flow-wallet-api directory. I recommend taking the time to review the folder structure. This API was very well done and is a complete code with unit tests included! To make this article as short as possible, we will focus on executing the code. MacOS M1 chip config Note: If your computer is a Mac using an M1 chip, you will need to change the Dockerfile file in the docker/wallet/ folder. If you are not using an M1, you can skip this section. In Dockerfile, edit the first line of code. We will change the Golang version used by the container to: Dockerfile FROM golang:1.20rc1-alpine3.17 AS dependencies You will also need to change the value of the GOARCH field: Dockerfile GOARCH=arm64 That's it! These are the required changes in Mac M1 operating systems. Configure Environment Variables Before we can spin up the Flow Wallet API, we need to make some configuration changes. The first step is to rename the .env.example file to .env. The application will use this file to import the environment variables. Within the .env file, we need to change some values. Flow network fields Since we will be running the application over Flow’s Testnet network, we need to change the environment variables FLOW_WALLET_ACCESS_API_HOST and FLOW_WALLET_CHAIN_ID. In the .env file, comment out the following lines: Shell # emulator # FLOW_WALLET_ACCESS_API_HOST=localhost:3000 # FLOW_WALLET_CHAIN_ID=flow-emulator Then, uncomment the following lines: Shell # testnet FLOW_WALLET_ACCESS_API_HOST=access.testnet.nodes.onflow.org:9000 FLOW_WALLET_CHAIN_ID=flow-testnet With this change, we’ve modified the configuration of the application to use the Testnet network. Flow admin account fields To use the Testnet, we need to create a Testnet account and update the FLOW_WALLET_ADMIN_ADDRESS and FLOW_WALLET_ADMIN_PRIVATE_KEY fields inside the .env file. To create a Flow Testnet account, you must first install the Flow CLI on your operating system. You can find instructions on how to do this here. With Flow CLI installed, run the command: PowerShell $ flow keys generate This command will generate an asymmetric public-private key pair. Save the private key somewhere. Then, copy the public key—we will use it in the next step to create an account on the Flow blockchain! Create a Flow Testnet account Open the browser on the Flow faucet: https://testnet-faucet.onflow.org/ In the first input, paste in the public key you just generated. Leave the rest with the default settings, then perform the CAPTCHA verification and click the Create Account button. This will generate a Testnet Address. Copy the address and use it as a value for the FLOW_WALLET_ADMIN_ADDRESS field in the .env file. Also, update the FLOW_WALLET_ADMIN_PRIVATE_KEY field with the private key that you generated in the previous step. In my case, it would look like this: Shell FLOW_WALLET_ADMIN_ADDRESS=0x7cc7be2796e8cf29 FLOW_WALLET_ADMIN_PRIVATE_KEY=73bc408436c14befd74cb01fa4c54217c6c860ff97df1a77a3063a2807c7067f We’re ready! Our Testnet account has been created. This is the admin account that will execute, sign, and pay for the creation of new wallets for our application. Proposal key field Within an account, it is possible to have several proposal keys. As we will call several transactions with the same account to avoid concurrency problems, we will need to create new proposal keys for the admin account. You can read more about proposal keys here. We’ll change the configuration in .env to create 10 new proposal keys. For our tests, this is a sufficient amount. However, if you start having concurrency problems when executing transactions, you can change this value and restart the application. Shell FLOW_WALLET_ADMIN_PROPOSAL_KEY_COUNT=10 Idempotency middleware field We’ll also add a property to the .env file to disable idempotency middleware. Since we are just testing the application (and using it to facilitate our post requests), we will leave it disabled by setting the following value to true. However, it is recommended to activate it in a production environment. Shell FLOW_WALLET_DISABLE_IDEMPOTENCY_MIDDLEWARE=true Update the docker-compose.yml File Since we made some changes to our .env (including using the Testnet network and disabling idempotency middleware), We can remove a few lines of unnecessary code in the docker-compose.yml file. Our application will not be using the redis and emulator containers. So, we can remove them. After removing them, your docker-compose.yml file will look like this: YAML version: "3.9" networks: private: services: db: image: postgres:13-alpine environment: POSTGRES_DB: wallet POSTGRES_USER: wallet POSTGRES_PASSWORD: wallet networks: - private ports: - "5432:5432" healthcheck: test: [ "CMD-SHELL", "pg_isready --username=${POSTGRES_USER:-wallet} --dbname=${POSTGRES_DB:-wallet}", ] interval: 10s timeout: 5s retries: 10 api: build: context: . dockerfile: ./docker/wallet/Dockerfile target: dist network: host # docker build sometimes has problems fetching from alpine's CDN networks: - private ports: - "3000:3000" env_file: - ./.env environment: FLOW_WALLET_DATABASE_DSN: postgresql://wallet:wallet@db:5432/wallet FLOW_WALLET_DATABASE_TYPE: psql depends_on: db: condition: service_healthy Note that we also removed some lines from the api service: FLOW_WALLET_ACCESS_API_HOST and FLOW_WALLET_CHAIN_ID in the environment section. Lines for redis and emulator in the depends_on section We spin up our containers with the following command: Shell $ docker compose up Test the Flow Wallet API Application When we run the above command, Docker will spin up the following: flow-wallet-api-db-1: This is the PostgreSQL database container where all data will be stored, including the users’ private keys. Remember that we are only running the application in a test environment, using the Flow Testnet network. If you use this API in production (Mainnet), protect the users’ private keys. One option is to use key management system services. The Flow Wallet API has easy and fast integration with Google KMS and AWS KMS. flow-wallet-api-api-1: This is an application in Golang that connects to the Flow network and performs actions such as creating wallets, transactions, scripts, and much more. Done! Our application runs through these two Docker containers, We can make calls using (the default) port 3000. All endpoints available in the application can be found in the documentation here. Check the Application’s Health The endpoint to check if the application is healthy is /v1/health/ready. We send a curl request to check this endpoint, and this is what we receive as a response: Shell $ curl -X GET -i http://localhost:3000/v1/health/ready HTTP/1.1 200 OK Vary: Accept-Encoding Date: Fri, 05 May 2023 17:27:05 GMT Content-Length: 0 With the 200 response, we are assured that our application is up and running. Create a Wallet Creating a new wallet via the API is very easy! We send a POST request to the /v1/accounts endpoint. Shell $ curl -X POST http://localhost:3000/v1/accounts { "jobId":"2876f90d-d9ec-4007-935b-4aba3cb8e45e", "type":"account_create", "state":"INIT", "error":"", "errors":null, "result":"", "transactionId":"", "createdAt":"2023-05-05T17:29:34.800299551Z", "updatedAt":"2023-05-05T17:29:34.800299551Z" } Under the hood, the API creates an asymmetric key pair and executes a transaction on the Flow blockchain to create an account. Jobs Since a transaction takes a few seconds and can fail, the API creates jobs. These jobs are records/units that are stored in the database. Each time a transaction is called or queued, a job is created. The status of the transaction is stored in this job. Notice how the transaction state returned by our call above is INIT. The API is monitoring the account creation transaction. Get Job Data and Account Addresses To get the updated state of this job, we can send a GET request to the /v1/jobs/{JOB_ID} endpoint. Shell $ curl -X GET \ http://localhost:3000/v1/jobs/2876f90d-d9ec-4007-935b-4aba3cb8e45e { "jobId":"2876f90d-d9ec-4007-935b-4aba3cb8e45e", "type":"account_create", "state":"COMPLETE", "error":"", "errors":null, "result":"0xb23d449bc23d9d04", "transactionId": "ab68527578c323b68caf9d1b1533ebf4a3486f22b0d6f7df4339c57e48d9c4ca", "createdAt":"2023-05-05T17:29:34.800299Z", "updatedAt":"2023-05-05T17:29:47.507803Z" } We see the result, with a newly created address. It is also possible to check all addresses created by the application using the /v1/accounts endpoint. Shell $ curl -X GET http://localhost:3000/v1/accounts [ { "address":"0xb23d449bc23d9d04", "keys":null, "type":"custodial", "createdAt":"2023-05-05T17:29:47.504906Z", "updatedAt":"2023-05-05T17:29:47.504906Z" }, { "address":"0x7cc7be2796e8cf29", "keys":null, "type":"custodial", "createdAt":"2023-05-05T17:25:51.847239Z", "updatedAt":"2023-05-05T17:25:51.847239Z" } ] With our wallet API working, we can start implementing the application that will send requests to the wallet API to create new accounts. We'll cover this in part two of our walkthroug
When testing the FastAPI application with two different async sessions to the database, the following error may occur: In the test, an object is created in the database (the test session). A request is made to the application itself in which this object is changed (the application session). An object is loaded from the database in the test, but there are no required changes in it (the test session). Let’s find out what’s going on. Most often, we use two different sessions in the application and in the test. Moreover, in the test, we usually wrap the session in a fixture that prepares the database for tests, and after the tests, everything is cleaned up. Below is an example of the application. A file with a database connection app/database.py: Python """ Database settings file """ from typing import AsyncGenerator from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine, async_sessionmaker from sqlalchemy.orm import declarative_base DATABASE_URL = "postgresql+asyncpg://user:password@host:5432/dbname" engine = create_async_engine(DATABASE_URL, echo=True, future=True) async_session = async_sessionmaker(bind=engine, class_=AsyncSession, expire_on_commit=False) async def get_session() -> AsyncGenerator: """ Returns async session """ async with async_session() as session: yield session Base = declarative_base() A file with a model description app/models.py: Python """ Model file """ from sqlalchemy import Integer, String from sqlalchemy.orm import Mapped, mapped_column from .database import Base class Lamp(Base): """ Lamp model """ __tablename__ = 'lamps' id: Mapped[int] = mapped_column(Integer, primary_key=True, index=True) status: Mapped[str] = mapped_column(String, default="off") A file with an endpoint description app/main.py: Python """ Main file """ import logging from fastapi import FastAPI, Depends from sqlalchemy import select from sqlalchemy.ext.asyncio import AsyncSession from .database import get_session from .models import Lamp app = FastAPI() @app.post("/lamps/{lamp_id}/on") async def check_lamp( lamp_id: int, session: AsyncSession = Depends(get_session) ) -> dict: """ Lamp on endpoint """ results = await session.execute(select(Lamp).where(Lamp.id == lamp_id)) lamp = results.scalar_one_or_none() if lamp: logging.error("Status before update: %s", lamp.status) lamp.status = "on" session.add(lamp) await session.commit() await session.refresh(lamp) logging.error("Status after update: %s", lamp.status) return {} I have added logging and a few more requests to the example on purpose to make it clear. Here, a session is created using Depends. Below is the file with a test example tests/test_lamp.py: Python """ Test lamp """ import logging from typing import AsyncGenerator import pytest import pytest_asyncio from httpx import AsyncClient from sqlalchemy import select from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker from app.database import Base, engine from app.main import app, Lamp @pytest_asyncio.fixture(scope="function", name="test_session") async def test_session_fixture() -> AsyncGenerator: """ Async session fixture """ async_session = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False) async with async_session() as session: async with engine.begin() as conn: await conn.run_sync(Base.metadata.create_all) yield session async with engine.begin() as conn: await conn.run_sync(Base.metadata.drop_all) await engine.dispose() @pytest.mark.asyncio async def test_lamp_on(test_session): """ Test lamp switch on """ lamp = Lamp() test_session.add(lamp) await test_session.commit() await test_session.refresh(lamp) logging.error("New client status: %s", lamp.status) assert lamp.status == "off" async with AsyncClient(app=app, base_url="http://testserver") as async_client: response = await async_client.post(f"/lamps/{lamp.id}/on") assert response.status_code == 200 results = await test_session.execute(select(Lamp).where(Lamp.id == lamp.id)) new_lamp = results.scalar_one_or_none() logging.error("Updated status: %s", new_lamp.status) assert new_lamp.status == "on" This is a regular Pytest with getting a session to the database in a fixture. In this fixture, all tables are created before the session is returned, and after using it, they are deleted. Please note again that in the test, we use a session from the test_session fixture and, in the main code, from the app/database.py file. Despite the fact that we use the same engine, different sessions are generated. It is important. The expected sequence of database requests status = on should return from the database. In the test, I create an object in the database first. This is a usual INSERT through a session from a test. Let’s call it Session 1. At this moment, only this session is connected to the database. The application session is not connected yet. After creating an object, I perform a refresh. This is SELECT of a newly created object with an instance update via Session 1. As a result, I make sure that the object is created correctly and the status field is filled with the needed value — off. Then, I perform a POST request to the /lamps/1/on endpoint. This is turning on the lamp. To make the example shorter, I don’t use a fixture. As soon as the request starts working, a new session to the database is created. Let’s call it Session 2. With this session, I load the needed object from the database. I output the status to the log. It is off. After that, I update this status and save the update in the database. A request is made to the database: SQL BEGIN (implicit) UPDATE lamps SET status=$1::VARCHAR WHERE lamps.id = $2::INTEGER parameters: ('on', 1) COMMIT Note that the COMMIT command is also present. Despite the fact that the transaction is implicit, its result is instantly available after COMMIT in other sessions. Next, I make a request to get an updated object from the database using refresh. I output status. And its value is now on. It would seem that everything should work. The endpoint stops working, closes Session 2, and transfers control to the test. In the test, I make a usual request from Session 1 to get a modified object. But in the status field, I see the off value. Below is the scheme of the sequence of actions in the code. Sequence of actions in the code At the same time, according to all logs, the last SELECT request to the database was executed and returned status = on. Its value is definitely equal to on in the database at this moment. This is the value that engine asyncpg receives in response to the SELECT request. So, what happened? Here is what happened. It turned out that the request made to get a new object did not update the current one but found and used an existing one. In the beginning, I added a lamp object using ORM. I changed it in another session. When the change was made, the current session knew nothing about this change. And commit made in Session 2 did not request the expire_all method in Session 1. To fix this, you can do one of the following: Use a shared session for the test and application. Refresh the instance rather than trying to get it from the database Forcibly expire instance. Close the session. Dependency Overrides To use the same session, you can simply override the session in the application with the one I created in the test. It’s easy. To do this, we need to add the following code to the test: Python async def _override_get_db(): yield test_session app.dependency_overrides[get_session] = _override_get_db If you want, you can wrap this part into a fixture to use it in all tests. The resulting algorithm will be as follows: Steps in the code when using dependency overrides Below is the test code with session substitution: Python @pytest.mark.asyncio async def test_lamp_on(test_session): """ Test lamp switch on """ async def _override_get_db(): yield test_session app.dependency_overrides[get_session] = _override_get_db lamp = Lamp() test_session.add(lamp) await test_session.commit() await test_session.refresh(lamp) logging.error("New client status: %s", lamp.status) assert lamp.status == "off" async with AsyncClient(app=app, base_url="http://testserver") as async_client: response = await async_client.post(f"/lamps/{lamp.id}/on") assert response.status_code == 200 results = await test_session.execute(select(Lamp).where(Lamp.id == 1)) new_lamp = results.scalar_one_or_none() logging.error("Updated status: %s", new_lamp.status) assert new_lamp.status == "on" However, if the application uses multiple sessions (which is possible), that may not be the best way. Also, if commit or rollback is not called in the tested function, this will not help. Refresh The second solution is the simplest and most logical. We should not create a new request to get an object. To update, it is enough to call refresh immediately after processing the request to the endpoint. Internally, it calls expires, which leads to the fact that the saved instance is not used for a new request, and the data is filled in anew. This solution is the most logical and easiest to understand. Python await test_session.refresh(lamp) After it, you do not need to try and load the new_lamp object again, it is enough to check the same lamp. Below is the code scheme using refresh. Steps in the code when using refresh Below is the test code with the update. Python @pytest.mark.asyncio async def test_lamp_on(test_session): """ Test lamp switch on """ lamp = Lamp() test_session.add(lamp) await test_session.commit() await test_session.refresh(lamp) logging.error("New client status: %s", lamp.status) assert lamp.status == "off" async with AsyncClient(app=app, base_url="http://testserver") as async_client: response = await async_client.post(f"/lamps/{lamp.id}/on") assert response.status_code == 200 await test_session.refresh(lamp) logging.error("Updated status: %s", lamp.status) assert lamp.status == "on" Expire But if we change a lot of objects, it might be better to call expire_all. Then, all instances will be read from the database, and the consistency will not be broken. Python test_session.expire_all() You can also call expire on a particular instance and even on instance attribute. Python test_session.expire(lamp) After these calls, you will have to read the objects from the database manually. Below is the sequence of steps in the code when using expire. Steps in the code when using expire Below is the test code with expires. Python @pytest.mark.asyncio async def test_lamp_on(test_session): """ Test lamp switch on """ lamp = Lamp() test_session.add(lamp) await test_session.commit() await test_session.refresh(lamp) logging.error("New client status: %s", lamp.status) assert lamp.status == "off" async with AsyncClient(app=app, base_url="http://testserver") as async_client: response = await async_client.post(f"/lamps/{lamp.id}/on") assert response.status_code == 200 test_session.expire_all() # OR: # test_session.expire(lamp) results = await test_session.execute(select(Lamp).where(Lamp.id == 1)) new_lamp = results.scalar_one_or_none() logging.error("Updated status: %s", new_lamp.status) assert new_lamp.status == "on" Close In fact, the last approach with session termination also calls expire_all, but the session can be used further. And when reading the new data, we will get the up-to-date objects. Python await test_session.close() This should be called immediately after the request for the application is completed and before the checks begin. Below are the steps in the code when using close. Steps in the code when using close Below is the test code with session closure. Python @pytest.mark.asyncio async def test_lamp_on(test_session): """ Test lamp switch on """ lamp = Lamp() test_session.add(lamp) await test_session.commit() await test_session.refresh(lamp) logging.error("New client status: %s", lamp.status) assert lamp.status == "off" async with AsyncClient(app=app, base_url="http://testserver") as async_client: response = await async_client.post(f"/lamps/{lamp.id}/on") assert response.status_code == 200 await test_session.close() results = await test_session.execute(select(Lamp).where(Lamp.id == 1)) new_lamp = results.scalar_one_or_none() logging.error("Updated status: %s", new_lamp.status) assert new_lamp.status == "on" Calling rollback() will help as well. It also calls expire_all, but it explicitly rolls back the transaction. If the transaction needs to be executed, commit() also executes expire_all. But in this example, neither rollback nor commit will be relevant since the transaction in the test has already been completed, and the transaction in the application does not affect the session from the test. In fact, this feature only works in SQLAlchemy ORM in async mode in transactions. However, the behavior in which I do make a request to the database in the code to get a new object seems illogical if it still returns a cached object but not the forcibly received one from the database. This is a bit confusing when debugging the code. But when used correctly, this is how it should be. Conclusion Working in async mode with SQLAlchemy ORM, you have to track transactions and threads in parallel sessions. If all this seems too difficult, then use SQLAlchemy ORM synchronous mode. Everything is much simpler in it.
When building real-time multimedia applications, the choice of server technology is pivotal. Two big players in this space are Janus and MediaSoup, both enabling WebRTC capabilities but doing so in distinctly different ways. This comprehensive guide aims to provide a deep dive into the architecture, code examples, and key differentiators of each, helping you make an informed choice for your next project. The Role of a WebRTC Server Before diving into the specifics, let’s clarify what a WebRTC server does. Acting as the middleman in real-time web applications, a WebRTC server manages a plethora of tasks like signaling, NAT traversal, and media encoding/decoding. The choice of server can significantly affect the performance, scalability, and flexibility of your application. Janus: The General-Purpose WebRTC Gateway Janus is an open-source, general-purpose WebRTC gateway designed for real-time communication. Its modular architecture, broad protocol support, and extensive capabilities make it one of the most popular solutions in the realm of real-time multimedia applications. Janus serves as a bridge between different multimedia components, translating protocols and enabling varied real-time functionalities. It's not just restricted to WebRTC but also supports SIP, RTSP, and plain RTP, among other protocols. Janus can be extended using a plugin architecture, making it suitable for various use cases like streaming, video conferencing, and recording. Architecture Core Design Janus follows a modular architecture, acting as a gateway that can be customized using plugins. Plugins The real functionality of Janus is implemented via plugins, which can be loaded and unloaded dynamically. Pre-built plugins for popular tasks like SIP gateway or Video Room are available. API Layer Janus exposes APIs via HTTP and WebSocket. It communicates with clients through a JSON-based messaging format, offering a higher-level interface for applications. Media Engine Janus leans on GStreamer and libav for media-related tasks but isn't exclusively tied to them. You can use different engines if desired. Scalability Janus is designed for horizontal scalability, which means you can add more instances behind a load balancer to handle more connections. Session Management Janus maintains user sessions and allows for complex state management. Code Example Here’s a simplified JavaScript snippet using Janus.js to create a video room: JavaScript var janus = new Janus({ server: "wss://your-janus-instance", success: function() { janus.attach({ plugin: "janus.plugin.videoroom", success: function(pluginHandle) { pluginHandle.send({ message: { request: "join", room: 1234 } }); }, onmessage: function(msg, jsep) { // Handle incoming messages and media } }); } }); Advantages Modular and extensible design. A rich ecosystem of pre-built plugins. Wide protocol support. Horizontal scalability. Disadvantages Steeper learning curve due to its modular nature. The primary language is C, which may not be preferred for web services. MediaSoup: The WebRTC Specialist MediaSoup is an open-source WebRTC Selective Forwarding Unit (SFU) that specializes in delivering a highly efficient, low-latency server-side WebRTC engine. Designed for simplicity, performance, and scalability, MediaSoup is often the go-to choice for building cutting-edge, real-time video conferencing solutions. In this article, we’ll dive deep into what MediaSoup is, its architecture, design, advantages, and disadvantages, topped off with a code snippet to get you started. MediaSoup serves as a powerful WebRTC SFU, facilitating real-time transport of video, audio, and data between multiple clients. Its primary focus is on achieving low latency, high efficiency, and performance in multi-party communication scenarios. Unlike some other solutions that attempt to be protocol-agnostic, MediaSoup is a WebRTC specialist. Architecture Worker Processes MediaSoup utilizes a multi-core architecture with workers running in separate Node.js processes, taking full advantage of modern CPU capabilities. Routers Within each worker process, routers manage the media streams. They decide which media streams to forward to which connected clients. Transports Transports handle the underlying communication layer, dealing with DTLS, ICE, and other transport protocols essential for WebRTC. Producers and Consumers MediaSoup uses producers to represent incoming media streams and consumers for outgoing media streams. Code Example On the server-side using Node.js: JavaScript const mediaSoup = require("mediasoup"); const mediaSoupWorker = mediaSoup.createWorker(); let router; (async () => { const mediaCodecs = [{ kind: "audio", mimeType: "audio/opus", clockRate: 48000 }]; router = await mediaSoupWorker.createRouter({ mediaCodecs }); })(); On the client-side: JavaScript const device = new Device(); await device.load({ routerRtpCapabilities: router.rtpCapabilities }); const sendTransport = await device.createSendTransport(transportOptions); Advantages Low-latency and high-efficiency. Vertical scalability. Modern C++ and JavaScript codebase. Disadvantages Limited to WebRTC. Requires you to build higher-level features. Architectural Comparisons Modularity vs. focus: Janus is modular, enabling a wide range of functionalities through plugins. MediaSoup, however, offers a more streamlined and focused architecture. API and usability: Janus provides a higher-level API exposed through HTTP and WebSocket, while MediaSoup offers a more programmatic, lower-level API. Scalability: Janus focuses on horizontal scalability, whereas MediaSoup is optimized for vertical scalability within a single server. Session management: Janus manages sessions internally, while MediaSoup expects this to be managed by the application layer. Protocol support: Janus supports multiple protocols, but MediaSoup is specialized for WebRTC. Conclusion: Making the Right Choice Both Janus and MediaSoup are robust and capable servers but serve different needs: Choose Janus if you want a modular, highly extensible solution that can handle a variety of real-time communication needs and protocols. Choose MediaSoup if your primary concern is performance and low latency in a WebRTC-centric environment. Understanding their architectural differences, advantages, and disadvantages will help you align your choice with your project’s specific needs. Whether it's the modular and expansive ecosystem of Janus or the high-performance, WebRTC-focused architecture of MediaSoup, knowing what each offers can equip you to make a well-informed decision.
In the world of integration and data exchange, XML (eXtensible Markup Language) continues to play a crucial role due to its flexibility and widespread adoption. However, to ensure seamless communication between different systems, validating XML data against predefined rules becomes essential. In this article, we will explore the process of validating XML requests against XML Schema in Mule 4. The steps involved in validating XML schema in Mule 4 are as follows: Create a Mule project Define an XML schema Add schema to the project Add and configure the XML validation module Perform data validation Handle validation results Before We Start, Let’s Understand What XML Schema Is XML schema, also known as XSD (XML Schema Definition), is written in XML format and includes elements and attributes that describe the structure of the XML data. It serves as a set of rules or constraints defining the allowed elements, attributes, and relationships in an XML document. XML schema is used to ensure data consistency and validation in XML-based systems. XML Schema provides a standardized way to define the structure of XML documents, making it easier to share and understand the data requirements across different applications. Use Case: XML Schema Validation: XML Payload XML <root> <success>true</success> <message>XML schema validated</message> <number>1234</number> </root> For this tutorial, we will pass the above payload with a POST request to the /xml endpoint and validate it using an XML schema. I have created a demo project with /json endpoint and /xml endpoint for JSON and XML validation, respectively. In this tutorial, we will use the /xml endpoint. For XML validation, we will use xml-validation-flow with /xml endpoint. Before configuring the validation module, if we make a call to /xml endpoint with the above XML payload, it gives a 200 status code with a success message. Steps To Validate the Payload Against XML Schema: Step 1: Add the XML module to the project in the studio from Anypoint Exchange. Verify whether the XML module has been added or not. Step 2: Prepare the XML schema. For creating the XML schema, you can use any free online XML to XML schema generator tool available. (For this tutorial, I have used this one). Select the XML to XSD from the left side menu. In the schema generator, add the payload and generate the schema. Generated Schema for the above XML payload: XML <?xml version="1.0" encoding="utf-8"?> <!-- Created with Liquid Technologies Online Tools 1.0 (https://www.liquid-technologies.com) --> <xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element name="success" type="xs:boolean" /> <xs:element name="message" type="xs:string" /> <xs:element name="number" type="xs:unsignedShort" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Step 3: In the project's "src/main/resources" folder, create a folder with the name "schemas" (or any name you prefer). Create a file with the name "xml-schema.xsd" inside the "schemas" folder. Note: Make sure that the file type of the XML schema is ".xsd." Step 4: Copy and paste the generated XML schema to the "xml-schema.xsd" file. Step 5: Add the XML "Validate schema" processor before the "Transform Message" (or any other processor which needs the validated XML data). Configure the XML schema path in the "Validate schema" processor. Step 6: In the "Error handling" section, add an "On Error Propagate" scope and set the type to "XML-MODULE:SCHEMA_NOT_HONOURED" error type. Step 7: Add the "httpStatus" variable and error message in the error handler. (In this tutorial, I have set the "httpStatus" to 400 for error response). Step 8: Set the httpStatus variable in HTTP Listener as "Status code" in the "Responses" section. Step 9: Deploy the project and send a valid request to the/xml endpoint. We should get a successful response with a status code of 200. Now, make some changes to the request payload. We should get an error response with a status code of 400. (Here, I changed the "message" field to "messages"). Step 10: At this point, if we pass any extra fields in the XML request body, it will not allow the extra field to pass and will return an error message with 400 as the status code. Note: In the above screenshot, the "id" field is extra, we did not define the "id" field in the XML schema, and that's why the request failed. If we want to allow the extra fields that are not defined in the schema, we have to add "<xs:any processContents="lax" minOccurs="0" />" to the schema. Step 11: Save the schema and redeploy the app. Now make a call to the /xml endpoint with extra fields in the request payload. It should allow the extra field to pass and should return a successful response. This way, we can validate an XML request payload against an XML schema. For more information about the XML schema properties, please refer to the documentation here. You can get the source code of the above application here. I hope you find this tutorial helpful.
JetBrains IDEs based on the IntelliJ platform are probably one of the most common IDEs in existence nowadays. Their popularity is especially visible within the JVM languages community, where IntelliJ IDEA remains the right-away IDE pick for most developers. All of this is despite some new competitors showing up and old competitors overcoming their previous shortcomings and joining back the table. In this text, I would like to describe the plugins for IntelliJ IDEA that may be a great help for you in your daily fight with your tasks, and that will make your work easier. Some plugins will be language agonistic, while others can be language dependent. Downloadable Plugins Linter Static code check is a great tool and helps fight for our code quality. Additionally, it can give us an entry point about the overall system state when we start new work for an already existing system. Fortunately, there are also a greater number of plugins we can use to make such checks. SonarLint is probably the chief amongst them and is especially helpful when you are using SonarQube in your CI process – you can integrate your local SonarLint to use the same rules as CI SonarQube. As for overall UX, using SonarLint from IDE gives quite a good feeling, but Sonar is a relatively simple tool from the user's perspective, so it should be expected. Some checks on the overall project could be faster, but after a certain number of classes, it is understandable. It can also be a reasonable way to enforce some general practices among the team. If you want to use some other static check tools, I am aware that: PyCharm supports the Pylint plugin. WebStorm supports ESLint. Probably other IDEs support other more specialized linters, but I have no experience working with them. Kubernetes Most of us nowadays are using Kubernetes in one way or another via self-hosted or managed cloud services. The Kubernetes Plugin can help you interact with your K8s deployments, as it provides an extensive set of functionality for working with Kubernetes. Most notable of them are: Browsing cluster objects Extracting and editing configurations Describing deployments and pods Viewing and downloading pod logs Attaching pod console Running shell in a pod Forwarding ports to a pod Additionally, the plugin adds support for working with Kubernetes remotely (or locally) from your IDE. De facto adding the UI over the Kubectl to the functionalities of IDE. If you are bored or tired using your other Kubernetes tools like kube-dashboard or Lens, then give a try to the K8s IDE plugin, as it can be a way to go for you. As far as I know, the plugin is supported by all JetBrains IDEs. .ignore Probably 110% of us work with some version control systems (some with more than one) - either Git, Mercurial, or, god forbid, SVN or anything older. Additionally, we are working with software that sometimes requires a tremendous amount of configuration that we may not want to share with others. In such circumstances, the need to “hide” some files from others and not send them to remote repositories or not include them in our Docker containers is totally understandable. Of course, most of the tools offer their own type of ignore files - files that allow us to exclude certain other files from being sent to remote places - like ".gitignore" or ".dockerignore," but their default support for IDE is neither great nor terrible: it just is. The .ignore plugin aims to help us work with such files by syntax highlighting or rules inspection. Moreover, the plugin can mark excluded files in the IDE project view based on the configuration from a particular ".*ignore" file. Besides support for previously mentioned ".gitignore" and ".dockerignore" files, it supports other file types like ".npmignore" or ".helmignore". The full list of supported files is long and available on the plugin home page. Key Promoter X Using hotkeys and keyboard shortcuts for doing stuff inside the IDE is a great way to speed up your development process. Additionally, a good set of such keys can greatly improve your general UX of using the tool. However, remembering all the shortcuts, or even the bigger part - in fact, anything besides the ones that we are using every day - can be at least problematic, if not impossible. As in most modern-day IDEs, they number in dozens, and our brain-built-in RAM cannot contain them all, especially when we are using at most 5-10 shortcuts in our daily work (I do not have any hard data, it is just an educated guess based on some of my experience). Here comes the Key Promoter X plugin, all in white. The plugin knows all the hotkeys and keeps reminding us about them each time we use the feature by manually clicking instead of using a particular shortcut. It does so by prompting such a nice pop-up in the bottom right corner of IDE. Here you can see that I missed the shortcut for opening the terminal window 37 times and that Key Promoter reminds me of the correct one - in da face. I agree that in the long term, such pop-ups can be distracting and annoying. However, the Key Promoter allows us to disable certain alerts, just as you can see on the screen. Thus, the number of alerts can be greatly decreased. Moreover, you can just configure the promoter to work only for certain shortcuts from the start. Personally, I like to have this pop-up anyway. Maybe by accident, I will learn something new and useful. Using the Key Promoter can save you time reading IDE docs. Additionally, Key Promoter can be a good way to learn IDE hotkeys if you switch between systems - i.e., from Linux to Mac, or vice versa. Cloud Tools Probably a fair pair of us (software engineers) are using some kind of cloud. You name it AWS, GCP, Azure, or some other less commonly known provider. Fortunately for you and me (I am also using a cloud), JetBrains IDEs also have plugins for that, namely: AWS Toolkit Azure Toolkit for IntelliJ Google Cloud Code Alibaba Cloud Toolkit (Alibaba is probably the world's biggest cloud and chief player in Asian markets) In general, the plugins allow you to interact with your chosen cloud from IDE and manage your cloud services without changing the windows you work on. A deeper description of all of them in detail is worth an article itself, so I have just added links to each plugin homepage on JetBrains marketplace - one probably can not find better intro-level descriptions. AI Coding Buddy The importance of prompt engineering and overall machine learning base code helpers cannot be overseen nowadays. As the saying goes – you will not be replaced by AI but you will be replaced by a person using AI. No matter if you prefer Copilot or Chat GPT, JetBrains IDEs have plugins for all of that. Each tool has its own unique plugin – in the case of Chat GPT it is even a few plugins, so you can choose whichever suits you best. Of course, some problems may arise in case you are interested in less commonly known coding helpers, but maybe there is also a plugin for them. There are even plugins for quick Stack Overflow search (more than one) if you prefer a more “old-fashioned” approach to prompt-supported coding. .env Files Support This is a great plugin, especially when you are working a lot with all kinds of environmental variables. It promises and delivers better support for name completion, go-to usage and definition (same as normal go-to included in base IDE), and of course, syntax highlighting. Such a set of features can be very helpful while working with Docker or Docker Compose files, which in many cases have at least a couple of environmental variables inside. Additionally, if you are using PyCharm and .env files, the plugin also promises additional support over the one provided by IDE. Here is an example of .env files supported in PyCharm. Without plugin: With plugin: For me, the colored one looks better, and for you? Rainbow Brackets This is a very interesting and not-so-little plugin implementing a very simple yet extremely useful idea. It is just using different colors to mark openings and closings of brackets inside our code. Additionally, each variable has its unique color, which is constant through all of its usage in a particular scope; thus, we can easily catch which variable is used where. Unfortunately, the plugin is not open source, and only a part of the features is available in the free version. On the other hand, the cost is relatively low - a yearly subscription costs $15 USD (full pricing can be viewed here). Below is a simple presentation of what Rainbow is doing with your code. For me, it looks way more readable than plain old IntelliJ white, and you? Image from the official Rainbow Brackets plugin page gRPC Even if you are not a particular fan of Google and its doings, you must have heard about gRPC. If not, then my last piece of text may be interesting for you. Over recent years gRPC gained quite an audience. JetBrains also addressed the issue of its support through their IDEs. Thus, the gRPC plugin was created. It adds standard IDE support like syntax highlighting and go-to options for ".proto" files alongside some easily available documentation for gRPC building blocks. What is more, it allows us to create gRPC calls in IDEs' built-in HTTP client, effectively giving us a gRPC client we can use to call local and remote APIs. They also have decent documentation on how to do that - here is the link. Randomness This is quite a powerful utility plugin that specializes in generating all kinds of dummy data. The plugin is especially useful when writing tests – personally, I always have a problem with all the naming there, and in most cases, I end up with things like String testName = “test-{n}”. As for now, the plugin supports five basic types of data: Integers, such as 7,826,922, in any base from binary to hexatrigesimal (base 36) Decimals, such as 8,816,573.10, using customizable separators Strings, such as "PaQDQqSBEH," with custom symbol lists Words, such as "Bridge," with custom word lists UUIDs, such as 0caa7b28-fe58-4ba6-a25a-9e5beaaf8f4b, with or without dashes String Manipulation The plugin can make all kinds of magic with plain text for you. First of all, it gives you the possibility to easily switch cases of your text from kebab-case/snake_case or PascalCase/camelCase. Besides that, it allows for things like encoding text to HTML. Moreover, it can do all kinds of operations on plain text - swap words, reverse letters or multi-replace, and many others. I advise you to visit the plugin home page and check its complete feature list. You may find the one feature that you were missing until this point, and that will change your view. IdeaVim The plugin adds an extensive set of VIM features to IDE, from the simple inserts and removes to Vim hotkeys. It also supports VIM macros and plugins, effectively creating a fully functional VIM foreground over your IDE. Personally, I am not a fan; however, I can see certain benefits, especially if you are a VIM fan and have high proficiency in using it. In such cases, the plugin can boost your coding speed. On the other hand, if you are a VIM newcomer, the plugin can also be a decent way of learning how to use VIM – at least quitting the VIM is easier here than in the terminal. CPU Usage Indicator A “small” utility plugin that adds information about our current CPU usage in the bottom right corner of the IDE screen. Additionally, adding the information about the system CPU consumed by IDE itself. It also has some options that may be especially useful for troubleshooting potential IDE memory problems like taking the thread dump from the last IDE freeze. Just please keep in mind that constantly asking for CPU usage can be an “expensive” operation. Nyan Progress Bar Here comes the real champion of all the plugins for JetBrains IDEs, the plugin that will change your life and the way you are using your IDE, the Nyan Progress Bar.The plugin replaces the classic JetBrains progress bar with super extra Nyan Cat animation.There is nothing more I can say here but join me in the Nyan Progress Bar club – it is totally worth it. Themes Bundles The ability to customize the look of our IDE - probably the most viewed single window in our daily life - and express ourselves in some way may be quite an important thing for many people. Thus, JetBrains IDEs also have plugins for that - in fact, quite a few of them, starting from “simple” colors changes in the form of plugins like Material Theme UI through the additional icons pack in the form of plugins like Atom Material Icons. Everyone can pick something which suits their needs – just be careful. Choosing and customizing your perfect color design can take a very very long time (trust me - been there, done that, wasted a lot of time). JMH Plugin If you are a software engineer related to the JVM ecosystem, you probably have heard about JMH – the microbenchmark framework for JVM applications. This plugin adds full support of JMH to IDE. The level of support it provides is in pair with one IDE already has for libraries like JUnit or TestNG.We can run the singular, run the whole test suite from IntelliJ, or pass the configuration from the standard IDE window. Scala I would not be myself (Scala Software Engineer) if I would not mention this plugin. It adds full support for Scala syntax, build tools, and test libraries. Basically, providing a similar level of support as Java has from IntelliJ. Of course, there are some corner cases – like more complex implicit or Scala 3 support, but nevertheless, the level of support as a whole is pretty good. The plugin even has its own blog and Twitter profile, so if you want to know what is up in the plugin, both things may be worth following or checking from time to time. IntelliJ with this plugin is by far my favorite Scala IDE despite Metals rapidly growing in strength and fame. Built-In Plugins Docker The plugin was added to the default IDE plugin bundle in the November 2017 release. It focuses on extending an already existing set of IDE capabilities with Docker integration. It allows for connecting with local or remote Docker runtime and images. Besides, it adds standard (go-to, syntax highlight) support for Docker Compose and docker file, making it easier to work with. Additionally, it allows running and debugging Docker images from IDEs. Despite not being as advanced as Docker Desktop, it can be a reasonable replacement if, for some reason, you cannot use it. Lombok This plugin is a vital point of interest for Java software engineers as it adds standard IDE support for Lombok annotations. Lombok, on the other hand, tries to address a few mundane problems the Java language has. If the Lombok way is correct or not is another matter, and it is quite out of the scope of this article. The plugin is relatively “simple;” however, it is an interesting case to observe JetBrains’ reaction to feedback from their community. The plugin started as a community plugin and was then added by JetBrains to the standard IntelliJ plugins bundle based on the feedback from the users' community. Summary JetBrains IDEs are quite powerful beasts by themselves, but via the usage of plugins, we can bring their set of features to a whole new level. They are essentially ending up with an all-in-one machine for doing all kinds of things related to our daily work without even switching windows: that is the level of time-managed optimization. Thank you for your time.
If you are looking to create App Connect resources programmatically or provide your own monitoring and administration capabilities, this article offers an introduction to using the public API, with a worked example using Bash for those looking to get started. With the App Connect public API, you can programmatically perform the following tasks: Create and administer integration runtimes. Create and administer configurations. Create and administer BAR Files. Create and administer traces for an integration runtime. In Part One, we are going to: Create a configuration that gives us access to a GitHub repository that stores a BAR file. If you use a public repository, the configuration won’t actually be used, since my repository is public. If you were to use a private repository, the configuration would be necessary so App Connect can authenticate and pull in the BAR file from your repository. Create an integration runtime that will use the configuration. Call the flow’s endpoint that's running on our integration runtime. In Part Two, we will then do something more advanced that involves: Creating a new BAR file that contains a different flow from what is in the GitHub repository. Creating a new integration runtime that will run this BAR file. Editing the first integration runtime to use the BAR file that we have created and observing the changes. We will finish by cleaning up all of the above resources. At the time of this post, you can use this feature in: North Virginia Frankfurt London Sydney Jakarta Mumbai Prerequisites You need a current App Connect trial or paid subscription. For more information, see App Connect on AWS. From the App Connect Dashboard for your instance, navigate to the Settings panel and click the "Public API credentials" tab. Click on the "Generate" button and enter a name when prompted. This will give you a client ID and a client secret. Keep these safe and consider them both sensitive. From the same page, you will see a link to generate an API key. You will need to follow the steps on that page as well to get an API key. Once you have the client ID, client secret, and API key, these are not expected to change frequently. The current expiry date at the time of this post is two years. You will use these three pieces of information as well as your App Connect instance ID to generate an access token that you can use to interact with App Connect. You will need to use this access token for your operations. Note that at the time of this post, this access token will be active for a period of 12 hours only. To get the access token, you will need to use your client ID, client secret, API key, and instance ID, with our /api/v1/tokens POST endpoint. Your token will not automatically renew, so make sure you call this often enough to be able to continue with your App Connect API requirements. Here is the API overview. Note that the format of the token is a JSON Web Token and within the token, you will be able to determine its expiry date. Continue only once you have your token. What follows is a worked example that combines the above information with code snippets for you to adjust for your own needs. Note that for this example I have chosen to use Bash. As our API accepts JSON payloads, you will find lots of quotes around the strings: it is expected that you might use a request helper application or perform your API calls using a language that lets you easily create HTTP requests such as JavaScript, TypeScript, Node.js, Go, or Java – the choice is yours. To view the API documentation, click "API specification" on the Public API credentials page. You can also see the complete API specification in our documentation. Getting Started Make sure you have the following values to hand (in my example, I am going to set them as variables in my Bash shell, because they are going to be used a lot): Your App Connect instance ID, which I will be setting and using in the cURL commands with:export appConInstanceID=<the instance ID> Your App Connect client ID, which I will be setting and using in the cURL commands withexport appConClientID=<the client ID> Your App Connect authentication token (that you made earlier and lasts twelve hours), which I will be setting and using in the cURL commands with:export appConToken=<the App Connect authentication token> The App Connect endpoint that your instance is valid for, which I will be setting and using in the cURL commands with:export appConEndpoint=<the endpoint> The endpoint can be set to any of the regions that we mention above (check the documentation for the latest availability) but be aware of your own data processing and any regional needs - you may want to use the endpoint closest to you, or you may have legal, or data handling, requirements to use only a particular region. You can determine this endpoint from the OpenAPI document that can be downloaded from within App Connect, or via the public documentation we provide. Part One Creating a Configuration Note that configuration documentation applicable for this environment is available on the configuration types for integration runtimes page. In this example, you will create a configuration that stores a personal access token for a GitHub repository that contains a BAR file. Make sure that you have your personal access token to hand, with sufficient access scope and restrictions as you see fit. It is important that you keep this token to yourself. This personal access token is not to be confused with the token you will be using with App Connect. It is an example of a sensitive piece of data (in the form of an App Connect configuration) and is used for simplicity’s sake. Shell myGitHubToken="thetokenhere" gitHubAuthData=" { \"authType\":\"BASIC_AUTH\", \"credentials\":{ \"username\":\"token\", \"password\":\"${myGitHubToken}\" } } encodedAuthData=$(echo -e "${gitHubAuthData}" | base64) configurationBody="{ \"metadata\": { \"name\": \"my-github-configuration\" }, \"spec\": { \"data\": \"${encodedAuthData}\", \"description\": \"Authentication for GitHub\", \"type\": \"barauth\" } }" curl -X POST https://${appConEndpoint}/api/v1/configurations \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ -d "${configurationBody}" If successful, you will then be able to see the configuration in the App Connect Dashboard. You could also perform an HTTP GET call to either list all configurations you have access to, or to get a particular configuration’s details. To get all instances of a particular resource (in this case, a configuration), you would use the following command: Shell curl -X GET https://${appConEndpoint}/api/v1/configurations \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" To perform an HTTP GET operation on a named resource (in this case, called "my-github-configuration" that we created earlier), you would use the following command: Shell curl -X GET https://${appConEndpoint}/api/v1/configurations/my-github-configuration \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" Creating an Integration Runtime Using the Configuration For more information on creating an integration runtime, see the creating an integration runtime documentation. Now that we have a configuration, we can create an integration runtime that uses the configuration like so: Shell irBody=’ { "metadata": { "name": “http-echo-service" }, "spec": { "template": { "spec": { "containers": [ { "name": "runtime" } ] } }, "barURL": [ "https://github.com/<your GitHub org>/<your repo with Bar files in>/raw/main/<your BAR file name>.bar" ], "configurations": [ "my-github-configuration" ], "version": "12.0", "replicas": 1 } }’ curl -X POST https://${appConEndpoint}/api/v1/integration-runtimes \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ -d "${irBody}" Use the "spec" field to shape your resource. For more information, see the documentation.Note that we prevent the use of resource names that have certain prefixes in their name; e.g., "default-integration-runtime" (nor do we allow you to get, retrieve, create again, or delete it).You can also see this integration runtime running and using the configuration in the App Connect Dashboard. If you click to edit the integration runtime, you can see the configuration that is in use. In my case, I am using my own GitHub repository so I would see this: Once the integration runtime has started successfully, you will be able to invoke your flow as you would any other flow in App Connect. Invoking the Endpoint This depends on the flow that's defined in the BAR file that you've used. In my example, it is a simple HTTP echo service, and I can use the following cURL command to invoke it and receive a response. Consult the App Connect documentation for how you would retrieve the endpoint to use in this case. This request:curl -X POST https://http-echo-service-https-<my App Connect endpoint which has my instance ID in>/EchoGave me this response:<Echo><DateStamp>2023-08-07T13:04:13.143955Z</DateStamp></Echo> Part Two At this point, we know how to use the API to create a couple of simple resources. What we have not yet covered is uploading a BAR file for use in a new integration runtime. We have not yet covered the editing or deleting of resources either. Uploading a New BAR File These instructions assume that you already have a BAR file. To learn how to create a BAR file, refer to the App Connect Enterprise documentation. For more information about what resources are supported in BAR files that you import to App Connect, see the supported resources in imported BAR files documentation. You can use the App Connect API to upload a BAR file like so: Shell curl -X PUT https://${appConEndpoint}/api/v1/bar-files/TestBlogAPI \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/octet-stream" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ --data-binary @/Users/adam/Downloads/TestBlogAPI.bar Notice the use of "--data-binary" here in order to prevent the BAR file from being unusable once uploaded. It is important to note that at the time of this post, the validation of the BAR file occurs when it is used by an integration runtime. In my case, I have downloaded the BAR file from my GitHub repository. You don’t need to include ".bar" in the path for the API call, either. If successful, you will be able to see this BAR file in the App Connect Dashboard. Also, in the HTTP response, you will see the location of this BAR URL on the App Connect content server. It will be of the form: {"name":"TestBlogAPI.bar","url":"https://dataplane-api-dash.appconnect:3443/v1/ac0ikbdsupj/directories/TestBlogAPI?" Important: you will need to use the "url" part of this, in your next command in order to have the integration runtime use this BAR file. Creating a New Integration Runtime That Uses the New Bar File Shell irBody=’ { "metadata": { "name": "second-ir-using-bar" }, "spec": { "template": { "spec": { "containers": [ { "name": "runtime" } ] } }, "barURL": [ "the exact URL from the previous step – including the question mark" ], "version": "12.0", "replicas": 1 } }’ curl -X POST https://${appConEndpoint}/api/v1/integration-runtimes \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ -d "${irBody}" The key difference here is the removal of the configurations section and the difference in barURL.On success, you should be able to see this integration runtime in the App Connect Dashboard. Updating the First Integration Runtime To Use This Bar File Instead In my example, I now have two integration runtimes that use the same BAR file because I used the one from my GitHub repository. Let’s assume that I want to: Keep the first integration runtime Have it use this BAR file that I’ve uploaded using the API (instead of pulling from GitHub) Delete the second integration runtime We can do this like so: Shell irBody=’ { "metadata": { "name": "http-echo-service" }, "spec": { “template": { "spec": { "containers": [ { "name": "runtime" } ] } }, "barURL": [ "the exact URL from earlier – including the question mark" ], "version": "12.0", "replicas": 1 } }’ curl -X PUT https://${appConEndpoint}/api/v1/integration-runtimes/http-echo-service \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ -d "${irBody}" The BAR URL differs and we no longer need to provide a configurations section, because no authorization is required to access a particular GitHub repository. On success, you will again be able to see this integration runtime in the App Connect Dashboard. Cleaning Up Each resource can be cleaned up programmatically through its appropriate delete HTTP request API calls. The order in which you perform these operations doesn't matter. To delete the configuration created in this example: Shell curl -X DELETE https://${appConEndpoint}/api/v1/configurations/my-github-configuration\ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" To delete the BAR file created in this example: Shell curl -X DELETE https://${appConEndpoint}/api/v1/bar-files/TestBlogAPI \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" To delete both integration runtimes created in this example (although if you were following my commands, you should only have the first one): Shell curl -X DELETE https://${appConEndpoint}/api/v1/integration-runtimes/http-echo-service \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" Shell curl -X DELETE https://${appConEndpoint}/api/v1/integration-runtimes/second-ir-using-bar \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" Conclusion Using the public API, you can programmatically create resources for App Connect as you see fit (within the limits we define). The API provides an alternative to using the App Connect Dashboard, although you will be able to see changes in the Dashboard that you've made through the API. The public API provides equivalent functionality to the Dashboard only, and not to the App Connect Designer or App Connect Enterprise Toolkit. This example has demonstrated a broad range of features but more intricate scenarios using different connectors and flows can be explored.
OpenAI’s GPT has emerged as the foremost AI tool globally and is proficient at addressing queries based on its training data. However, it can not answer questions about unknown topics: Recent events after Sep 2021 Your non-public documents Information from past conversations This task gets even more complicated when you deal with real-time data that frequently changes. Moreover, you cannot feed extensive content to GPT, nor can it retain your data over extended periods. In this case, you need to build a custom LLM (Language Learning Model) app efficiently to give context to the answer process. This piece will walk you through the steps to develop such an application utilizing the open-source LLM App library in Python. The source code is on GitHub (linked below in the section "Build a ChatGPT Python API for Sales"). Learning Objectives You will learn the following throughout the article: The reason why you need to add custom data to ChatGPT How to use embeddings, prompt engineering, and ChatGPT for better question-answering Build your own ChatGPT with custom data using the LLM App Create a ChatGPT Python API for finding real-time discounts or sales prices Why Provide ChatGPT With a Custom Knowledge Base? Before jumping into the ways to enhance ChatGPT, let’s first explore the manual methods and identify their challenges. Typically, ChatGPT is expanded through prompt engineering. Assume you want to find real-time discounts/deals/coupons from various online markets. For example, when you ask ChatGPT “Can you find me discounts this week for Adidas men’s shoes?”, a standard response you can get from the ChatGPT UI interface without having custom knowledge is: As evident, GPT offers general advice on locating discounts but lacks specificity regarding where or what type of discounts, among other details. Now to help the model, we supplement it with discount information from a trustworthy data source. You must engage with ChatGPT by adding the initial document content prior to posting the actual questions. We will collect this sample data from the Amazon products deal dataset and insert only a single JSON item we have into the prompt: As you can see, you get the expected output and this is quite simple to achieve since ChatGPT is context-aware now. However, the issue with this method is that the model’s context is restricted (GPT-4 maximum text length is 8,192 tokens). This strategy will quickly become problematic when input data is huge you may expect thousands of items discovered in sales and you can not provide this large amount of data as an input message. Also, once you have collected your data, you may want to clean, format, and preprocess data to ensure data quality and relevancy. If you utilize the OpenAI Chat Completion endpoint or build custom plugins for ChatGPT, it introduces other problems as follows: Cost — By providing more detailed information and examples, the model’s performance might improve, though at a higher cost (for GPT-4 with an input of 10k tokens and an output of 200 tokens, the cost is $0.624 per prediction). Repeatedly sending identical requests can escalate costs unless a local cache system is utilized. Latency — A challenge with utilizing ChatGPT APIs for production, like those from OpenAI, is their unpredictability. There is no guarantee regarding the provision of consistent service. Security — When integrating custom plugins, every API endpoint must be specified in the OpenAPI spec for functionality. This means you’re revealing your internal API setup to ChatGPT, a risk many enterprises are skeptical of. Offline Evaluation — Conducting offline tests on code and data output or replicating the data flow locally is challenging for developers. This is because each request to the system may yield varying responses. Using Embeddings, Prompt Engineering, and ChatGPT for Question-Answering A promising approach you find on the internet is utilizing LLMs to create embeddings and then constructing your applications using these embeddings, such as for search and ask systems. In other words, instead of querying ChatGPT using the Chat Completion endpoint, you would do the following query: Given the following discounts data: {input_data}, answer this query: {user_query}. The concept is straightforward. Rather than posting a question directly, the method first creates vector embeddings through OpenAI API for each input document (text, image, CSV, PDF, or other types of data), then indexes generated embeddings for fast retrieval and stores them into a vector database and leverages the user’s question to search and obtain relevant documents from the vector database. These documents are then presented to ChatGPT along with the question as a prompt. With this added context, ChatGPT can respond as if it’s been trained on the internal dataset. On the other hand, if you use Pathway’s LLM App, you don’t need even any vector databases. It implements real-time in-memory data indexing directly reading data from any compatible storage, without having to query a vector document database that comes with costs like increased prep work, infrastructure, and complexity. Keeping source and vectors in sync is painful. Also, it is even harder if the underlined input data is changing over time and requires re-indexing. ChatGPT With Custom Data Using LLM App These simple steps below explain a data pipelining approach to building a ChatGPT app for your data with the LLM App. Collect: Your app reads the data from various data sources (CSV, JSON Lines, SQL databases, Kafka, Redpanda, Debezium, and so on) in real-time when a streaming mode is enabled with Pathway (Or you can test data ingestion in static mode too). It also maps each data row into a structured document schema for better managing large data sets. Preprocess: Optionally, you do easy data cleaning by removing duplicates, irrelevant information, and noisy data that could affect your responses’ quality and extracting the data fields you need for further processing. Also, at this stage, you can mask or hide privacy data to avoid them being sent to ChatGPT. Embed: Each document is embedded with the OpenAI API and retrieves the embedded result. Indexing: Constructs an index on the generated embeddings in real time. Search: Given a user question let’s say from an API-friendly interface, generate an embedding for the query from the OpenAI API. Using the embeddings, retrieve the vector index by relevance to the query on the fly. Ask: Insert the question and the most relevant sections into a message to GPT. Return GPT’s answer (chat completion endpoint). Build a ChatGPT Python API for Sales Once we have a clear picture of the processes of how the LLM App works in the previous section. You can follow the steps below to understand how to build a discount finder app. The project source code can be found on GitHub. If you want to quickly start using the app, you can skip this part clone the repository, and run the code sample by following the instructions in the README.md file there. Sample Project Objective Inspired by this article around enterprise search, our sample app should expose an HTTP REST API endpoint in Python to answer user queries about current sales by retrieving the latest deals from various sources (CSV, Jsonlines, API, message brokers, or databases) and leverages OpenAI API Embeddings and Chat Completion endpoints to generate AI assistant responses. Step 1: Data Collection (Custom Data Ingestion) For simplicity, we can use any JSON Lines as a data source. The app takes JSON Lines files like discounts.jsonl and uses this data when processing user queries. The data source expects to have an doc object for each line. Make sure that you convert your input data first to Jsonlines. Here is an example of a Jsonline file with a single raw: {"doc": "{'position': 1, 'link': 'https://www.amazon.com/deal/6123cc9f', 'asin': 'B00QVKOT0U', 'is_lightning_deal': False, 'deal_type': 'DEAL_OF_THE_DAY', 'is_prime_exclusive': False, 'starts_at': '2023-08-15T00:00:01.665Z', 'ends_at': '2023-08-17T14:55:01.665Z', 'type': 'multi_item', 'title': 'Deal on Crocs, DUNLOP REFINED(\u30c0\u30f3\u30ed\u30c3\u30d7\u30ea\u30d5\u30a1\u30a4\u30f3\u30c9)', 'image': 'https://m.media-amazon.com/images/I/41yFkNSlMcL.jpg', 'deal_price_lower': {'value': 35.48, 'currency': 'USD', 'symbol': '$', 'raw': '35.48'}, 'deal_price_upper': {'value': 52.14, 'currency': 'USD', 'symbol': '$', 'raw': '52.14'}, 'deal_price': 35.48, 'list_price_lower': {'value': 49.99, 'currency': 'USD', 'symbol': '$', 'raw': '49.99'}, 'list_price_upper': {'value': 59.99, 'currency': 'USD', 'symbol': '$', 'raw': '59.99'}, 'list_price': {'value': 49.99, 'currency': 'USD', 'symbol': '$', 'raw': '49.99 - 59.99', 'name': 'List Price'}, 'current_price_lower': {'value': 35.48, 'currency': 'USD', 'symbol': '$', 'raw': '35.48'}, 'current_price_upper': {'value': 52.14, 'currency': 'USD', 'symbol': '$', 'raw': '52.14'}, 'current_price': {'value': 35.48, 'currency': 'USD', 'symbol': '$', 'raw': '35.48 - 52.14', 'name': 'Current Price'}, 'merchant_name': 'Amazon Japan', 'free_shipping': False, 'is_prime': False, 'is_map': False, 'deal_id': '6123cc9f', 'seller_id': 'A3GZEOQINOCL0Y', 'description': 'Deal on Crocs, DUNLOP REFINED(\u30c0\u30f3\u30ed\u30c3\u30d7\u30ea\u30d5\u30a1\u30a4\u30f3\u30c9)', 'rating': 4.72, 'ratings_total': 6766, 'page': 1, 'old_price': 49.99, 'currency': 'USD'}"} The cool part is that the app is always aware of changes in the data folder. If you add another JSON Lines file, the LLM app does magic and automatically updates the AI model’s response. Step 2: Data Loading and Mapping With Pathway’s JSON Lines input connector, we will read the local JSONlines file, map data entries into a schema, and create a Pathway Table. See the full source code in app.py: ... sales_data = pw.io.jsonlines.read( "./examples/data", schema=DataInputSchema, mode="streaming" ) Map each data row into a structured document schema. See the full source code in app.py: class DataInputSchema(pw.Schema): doc: str Step 3: Data Embedding Each document is embedded with the OpenAI API and retrieves the embedded result. See the full source code in embedder.py: ... embedded_data = embeddings(context=sales_data, data_to_embed=sales_data.doc) Step 4: Data Indexing Then we construct an instant index on the generated embeddings: index = index_embeddings(embedded_data) Step 5: User Query Processing and Indexing We create a REST endpoint, take a user query from the API request payload, and embed the user query with the OpenAI API. ... query, response_writer = pw.io.http.rest_connector( host=host, port=port, schema=QueryInputSchema, autocommit_duration_ms=50, ) embedded_query = embeddings(context=query, data_to_embed=pw.this.query) Step 6: Similarity Search and Prompt Engineering We perform a similarity search by using the index to identify the most relevant matches for the query embedding. Then we build a prompt that merges the user’s query with the fetched relevant data results and send the message to the ChatGPT Completion endpoint to produce a proper and detailed response. responses = prompt(index, embedded_query, pw.this.query) We followed the same in-context learning approach when we crafted the prompt and added internal knowledge to ChatGPT in the prompt.py. prompt = f"Given the following discounts data: \\n {docs_str} \\nanswer this query: {query}" Step 7: Return the Response The final step is just to return the API response to the user. # Build prompt using indexed data responses = prompt(index, embedded_query, pw.this.query) Step 9: Put Everything Together Now if we put all the above steps together, you have LLM-enabled Python API for custom discount data ready to use as you see the implementation in the app.py Python script. import pathway as pw from common.embedder import embeddings, index_embeddings from common.prompt import prompt def run(host, port): # Given a user question as a query from your API query, response_writer = pw.io.http.rest_connector( host=host, port=port, schema=QueryInputSchema, autocommit_duration_ms=50, ) # Real-time data coming from external data sources such as jsonlines file sales_data = pw.io.jsonlines.read( "./examples/data", schema=DataInputSchema, mode="streaming" ) # Compute embeddings for each document using the OpenAI Embeddings API embedded_data = embeddings(context=sales_data, data_to_embed=sales_data.doc) # Construct an index on the generated embeddings in real-time index = index_embeddings(embedded_data) # Generate embeddings for the query from the OpenAI Embeddings API embedded_query = embeddings(context=query, data_to_embed=pw.this.query) # Build prompt using indexed data responses = prompt(index, embedded_query, pw.this.query) # Feed the prompt to ChatGPT and obtain the generated answer. response_writer(responses) # Run the pipeline pw.run() class DataInputSchema(pw.Schema): doc: str class QueryInputSchema(pw.Schema): query: str (Optional) Step 10: Add an Interactive UI To make your app more interactive and user-friendly, you can use Streamlit to build a front-end app. See the implementation in this app.py file. Running the App Follow the instructions in the README.md (linked earlier) file’s "How to run the project" section and you can start to ask questions about discounts, and the API will respond according to the discounts data source you have added. After we give this knowledge to GPT using UI (applying a data source), look how it replies: The app takes both Rainforest API and discounts.csv file documents into account (merges data from these sources instantly.), indexes them in real-time, and uses this data when processing queries. Further Improvements We’ve only discovered a few capabilities of the LLM App by adding domain-specific knowledge like discounts to ChatGPT. There are more things you can achieve: Incorporate additional data from external APIs, along with various files (such as Jsonlines, PDF, Doc, HTML, or Text format), databases like PostgreSQL or MySQL, and stream data from platforms like Kafka, Redpanda, or Debedizum. Maintain a data snapshot to observe variations in sales prices over time, as Pathway provides a built-in feature to compute differences between two alterations. Beyond making data accessible via API, the LLM App allows you to relay processed data to other downstream connectors, such as BI and analytics tools. For instance, set it up to receive alerts upon detecting price shifts.
First, let’s figure out the terms we are using and why API vs SDK are even paired together. What Is an API? API (an acronym for Application Programming Interface) is an interface that enables intercommunication between two applications. It includes a standardized set of rules that define how this interaction can undergo, i.e., what kind of information to exchange, what actions to carry out, etc. The first software sends out a standardized request, and the second responds in the manner described by the API. Here’s a Metaphor To Explain How an API Works Imagine you want to watch the Succession finale on a Sunday evening. You reach for a TV remote, click the power button, and then choose the HBO channel. In this metaphor, you are the software that wants to interact with other software (a TV) and uses a specific interface (a TV remote). It defines the rules by which you can interact with each other and what responses you can get. APIs are separated into four categories based on their availability and function. Public APIs can be used by any developer, be it an independent programmer or a business employee. It’s typically easily accessible and doesn’t require any specific authorization from you. Partner API is offered by an owner business mainly to other companies and demands stronger authorization, often involving signing contracts. As a result, parties receive the rights and licenses to access the necessary data through partner APIs. It’s mainly used for a business-to-business data transfer with a secure network — for example, to share customer data with someone outside a company. Private API can also be referred to as an internal API. It is used to transfer information within a company. Some examples of its usage include passing data from the front end to the back end or exchanging sensitive data between different departments. Composite API is the API that has more than a single request that it’s able to process. It’s a sequence of requests that are put together and sent out at the same time. There are multiple approaches to an API’s architecture and the ways it is built (REST and gRPC, to name a few), but the core concept and functionality stay the same. Since we are discussing API vs. SDK, let's see how an SDK's definition compares to API. What Is an SDK? An SDK, i.e., Software Development Kit, is a pack of various tools that can be used to build software for a particular platform, like Microsoft Windows or Java. An SDK is comprised of relevant programming languages, compilers, runtime environments, documentation, debuggers, and even samples of code that carry out specific functions. For example, Android’s SDK even has the standard Android buttons you can put in your app. Such kits can become quite large, and the popular ones are often updated with even more features and tools for software developers. Moreover, SDKs commonly include APIs. For example, Android SDK contains multiple APIs for other services and all the necessary elements to create an API for your own product. Thus, if an API is a TV remote, an SDK is a box with all the parts and schemas that you need to build a TV. You could say that posing the question as "API vs SDK" is a false dichotomy: you need both to create a new app, and the former is actually included in the latter. The disambiguation of "API vs. SDK" comes from the fact that they both involve a second technical agent, a platform, or software that you want to work with. I would suggest replacing vs. in API vs. SDK for and, but the SEO gods would not allow me. Now, let’s talk about the use cases for these instruments. As mentioned above, SDKs and APIs are almost ubiquitous in software development, but what are they actually used for? When Do We Need To Use an API vs. SDK? The question “when to use an SDK” has quite a simple answer. Obviously, an SDK comes in handy when you want to create a product for a specific platform: an app for Android or IOS, Microsoft Windows programs, etc. In theory, you could do it yourself completely from scratch and without any assistance, but using SDKs makes the software development process faster, simpler, and more accessible. It’s extremely rare to not use an SDK in software development. APIs are used when you need another app to respond to a specific request under a certain scenario, such as sending a specific set of data, registering a user, carrying out a transaction, etc. APIs are also quite common and are often used in the development of software that entails user authorization, monetary transactions, analysis of data from various sources, messaging, and so on. Let’s say you want to develop an app that lets people on vacation rent seaboats nearby. You can use Android SDK to help you build the app itself and the API from Google to let people log into it with their Gmail accounts (once again, proving that you don't need to choose between API and SDK). You can also add PayPal API to let people pay for the boats. Now that we’ve thoroughly reviewed these tools, let’s learn what are the noticeable distinctions between API and SDK. The Differences Between SDKs and APIs There are three main differences between API and SDK that we can point out. Some were brought up already, but here they are summarized: Functions As covered extensively above, APIs and SDKs solve different problems; one enables you to engage with other software while the other streamlines the development of your software. Thus, their main difference is in the goals they help to accomplish, but there are some more we can point out. Size A software development kit is quite a hefty collection of software-building tools that includes libraries, programming languages, documentation, lines of code samples, and guides to help create new software. All of this can be quite challenging to navigate. Additionally, SDKs take up a lot of space on your computer; for example, Android SDK is typically around 50GB (depending on the version). APIs, on the other hand, are extremely light and streamlined and only include the details needed to carry out their functions. They are very flexible and can be easily changed and adapted if needed. Usage The actual process of using these tools is also dissimilar. When you want to employ an SDK, you need to download the packs from the respective provider. Oftentimes, SDKs are free, so you don’t need to pay before downloading the files. On the other hand, most APIs from popular companies (social media, payment platforms) are commercial and require a signed contract to operate. The API itself looks like a set of codes that specifies the requests to the providers’ systems. The Benefits of API vs. SDK We mentioned repeatedly in this article that SDKs and APIs are used all over and for good reasons. Here are some of the benefits that API vs. SDK provides. SDKs and APIs Save Time on Development One of the main purposes behind inventing these tools in the first place was to cut the time needed for developing new apps and programs. Software development kits gather all the necessary resources you need to build a product from scratch, so you don’t have to waste your time locating solutions and looking for the best options and ways to build what you want. APIs eliminate the need to write a bunch of code every time you need to exchange information between two endpoints. Since they are standardized, you only need to choose an appropriate API and add it to your software - the job is done. Thus, APIs and SDKs save hours of coding and problem-solving. SDKs and APIs Provide Security Of course, the security of the product will always be one of the top priorities for developers. Companies put a lot of resources into safeguarding their software from potential breaches and loopholes. Trusted SDK and API providers commit to running regular security checks for the tools they offer. This lifts some part of the responsibility off developers’ shoulders and ensures there won’t be any data breaches when using their own product and interacting with other applications. Moreover, there are many open-source APIs whose security can be verified by outside teams. Most Commonly-Used APIs and SDKs Are Regularly Updated The world of software development is, pardon my pun, constantly in development. New technical solutions that make developers more efficient appear all the time. Popular SDKs and APIs, such as Java Developers Kit or Facebook’s Graph API, are updated every couple of months. SDK updates typically add new features and code samples, mitigate security breaches if there are any, etc. All updates are aimed at giving you the best version of the product possible. Thus, you don’t have to revise your own code every couple of months to come up with better ways to authorize your users through Facebook; for example, this is done for you. What Are the Challenges That Come With API vs. SDK? The main challenge with APIs is their volatility; they are quite easy to change, and sometimes, an API provider can introduce updates that are incompatible with your software. This forces you to revise your product and troubleshoot the problems brought about by the update, which takes time and effort. Fortunately, such cases are not that common. As for SDKs, the main challenge for developers is their sheer size. They can be enormous and include lots and lots of tools and examples, which sometimes can be frustrating if you just need to find a couple of functions for your product. Some providers solve this issue by offering different smaller packs designed for more specific needs.
John Vester
Staff Engineer,
Marqeta @JohnJVester
Colin Domoney
Chief Technology Evangelist,
42Crunch
Saurabh Dashora
Founder,
ProgressiveCoder
Cameron HUNT
Integration Architect,
TeamWork France