DZone Spotlight

Friday, April 19 View All Articles »

Vulnerable Code [Comic]

By Daniel Stori

CORE

Alternative Text: This comic depicts an interaction between two characters and is split into four panes. In the upper left pane, Character 1 enters the scene with a slightly agitated expression and comments to Character 2, "Your PR makes SQL injection possible!" Character 2, who is typing away at their computer, responds happily, "Wow, that wasn't even my intention," as if Character 1 has paid them a compliment. In the upper right pane, Character 1, now with an increasingly agitated expression, says, "I mean, your code is vulnerable." Character 2, now standing and facing Character 1, is almost proudly embarrassed at what they take as positive feedback and replies, "Stop praising me, I get shy." In the lower-left pane, Character 1, now shown with sharp teeth and a scowl, points a finger at Character 2 and shouts clearly, "Vulnerable is bad!" Character 2 seems shocked at this statement, standing with their mouth and eyes wide open. In the lower right and final pane of the comic, Character 2, smiling once again, replies with the comment, "At least it can do SQL injection!" Character 1 stares back at Character 2 with a blank expression. More

3 Ways Blockchain Reinforces Data Integrity in the Cloud

By Emily Newton

People initially became interested in blockchain several years ago after learning about it as a decentralized digital ledger. It supports transparency because no one can change information stored on it once added. People can also watch transactions as they happen, further enhancing visibility. But how does blockchain support the integrity of cloud-stored data? 3 Ways Blockchain Supports the Integrity of Cloud-Stored Data 1. Protecting and Facilitating the Sharing of Medical Records Technological advancements have undoubtedly improved the ease of sharing medical records between providers. When patients go to new healthcare facilities, all involved parties can easily see those individuals’ histories, treatments, test results, and more. Such records keep everyone updated about what’s happened to patients, which significantly reduces the likelihood of redundancies and confusion that could extend a health management timeline. Cloud computing has also accelerated information-sharing efforts within healthcare and other industries. It allows medical professionals to access and collaborate through scalable platforms. Many healthcare workers also appreciate how they can access cloud apps from anywhere. That convenience supports physicians who must travel for continuing medical education events, travel nurses, surgeons who split their time between multiple hospitals, and others who often work from numerous locations. However, despite these cloud computing benefits, a security-related downside is platforms use a centralized infrastructure to allow record sharing across users. That characteristic leaves cloud tools open to data breaches. In one case, researchers proposed addressing this shortcoming with a blockchain architecture to authenticate users and enable opportunities for sharing medical records securely. The group prioritized blockchain due to its immutability while seeking to create a system that allowed patients and their providers to share and store medical records securely. The researchers also wanted to design something that was not at risk of data loss or other failures. The researchers implemented so-called “special recognition keys” to identify medical-related specifics, such as identifying doctors, patients, and hospitals. When testing their system, some metrics studied included the time to complete a transaction and how well the communication-related attributes performed. The outcomes suggested the researchers’ approach worked far better than existing solutions. 2. Improving Access Control Data breaches can be costly, catastrophic events. Although there’s no single solution for preventing them, people can make meaningful progress by focusing on access control. One of the most convenient things about the cloud is it allows all authorized users to access content regardless of their location. However, as the number of people engaging with a cloud platform increases, so does the risk of compromised credentials that could allow hackers to enter networks and wreak havoc. Many corporate leaders have prioritized cloud-first strategies. That approach can strengthen cybersecurity because service providers have numerous security features to supplement internal measures. Additionally, cloud-based backup capabilities facilitate faster data recovery if cyberattacks occur. However, research suggests some access control practices used by cloud administrators have significant shortcomings that could make cyberattacks more likely. For example, one study about access management for cloud platforms found 49% of administrators store passwords in a spreadsheet. That’s a huge security risk for many reasons, but it also highlights the need for better password hygiene practices. Fortunately, the blockchain is well-positioned to solve this problem. In one example, researchers developed a blockchain system that uses attribute-based encryption technology to improve how cloud users access content. The setup also contains an audit contract that dynamically manages who can use the cloud and when. The team’s creation built a fine-grained and searchable system that maintained access control by strengthening cloud security and getting the desired results without excessive computing power. Results also showed this system increased storage capacity. When the group performed a security analysis on their blockchain creation, they found it stopped chosen-plaintext attacks and cybersecurity breaches based on guessed keywords. A theoretical examination and associated experiments suggested this tool worked better from a computing power and storage efficiency perspective than comparable alternatives. 3. Curbing Emerging Technologies’ Potential Threats Even as new technologies show tremendous progress and excite people about the future, some individuals specifically investigate how they could harm others through technological advancements. Developments associated with ChatGPT and other generative AI tools are excellent examples. Indeed, these chatbots can save people time by assisting them with tasks such as idea generation or outline creation. However, because these tools create believable-sounding paragraphs in seconds, some cybercriminals use generative artificial intelligence (genAI) chatbots to write phishing emails much faster than before. It’s easy to imagine the ramifications of a cybercriminal who writes a convincing phishing message and uses it to access someone’s cloud-stored information. ChatGPT runs on a cloud platform built by OpenAI, which created the chatbot. A lesser-known issue affecting data integrity is OpenAI uses interactions with the tool to train future versions of the algorithms. People can opt out of having their conversations become part of the training, but many people haven’t or don’t know the process for doing it. As workers eagerly tested ChatGPT and similar tools, some committed potential security breaches without realizing it. Consider if a web developer enters a proprietary code string into ChatGPT and asks the tool for help debugging it. That seemingly minor decision could result in sensitive information becoming part of training data and no longer being carefully protected by the developer’s employer. Some leaders quickly established rules for appropriate usage or banned generative AI tools to address these threats. A February 2024 study also showed some workers kept entering sensitive information when using ChatGPT despite knowing the associated risks. It’s still unclear how the blockchain will support data integrity for people using cloud-based generative AI tools, but many professionals are upbeat about the potential. Conclusion: Using Blockchain for Cloud Data Protection Entities ranging from government agencies to e-commerce stores use cloud platforms daily. These options are incredibly convenient because they eliminate geographical barriers and allow people to use them through an active internet connection anywhere in the world. However, many cloud tools store sensitive data, such as health records or payment details. Since cloud platforms hold such a wealth of information, hackers will likely continue targeting them. Although most cloud providers have built-in security features, cybercriminals continually seek ways to circumvent such protections. The examples here show why the blockchain is an excellent candidate for much-needed additional safeguards. More

Trend Report

Enterprise AI

Artificial intelligence (AI) has continued to change the way the world views what is technologically possible. Moving from theoretical to implementable, the emergence of technologies like ChatGPT allowed users of all backgrounds to leverage the power of AI. Now, companies across the globe are taking a deeper dive into their own AI and machine learning (ML) capabilities; they’re measuring the modes of success needed to become truly AI-driven, moving beyond baseline business intelligence goals and expanding to more innovative uses in areas such as security, automation, and performance.In DZone’s Enterprise AI Trend Report, we take a pulse on the industry nearly a year after the ChatGPT phenomenon and evaluate where individuals and their organizations stand today. Through our original research that forms the “Key Research Findings” and articles written by technical experts in the DZone Community, readers will find insights on topics like ethical AI, MLOps, generative AI, large language models, and much more.

Refcard #395

Open Source Migration Practices and Patterns

By Nuwan Dias

CORE

Open Source Migration Practices and Patterns

Refcard #171

MongoDB Essentials

By Abhishek Gupta

CORE

Essential Math to Master AI and Quantum

The essential mathematics for both Artificial intelligence (AI) and quantum computing are foundational to understanding and advancing these cutting-edge fields. In AI, concepts like linear algebra, calculus, probability theory, and optimization are pivotal for modeling data, training machine learning algorithms, and making predictions. Similarly, in quantum computing, these mathematical pillars are indispensable for representing quantum states, designing quantum algorithms, and analyzing quantum phenomena. Whether it's optimizing neural networks or harnessing the power of quantum superposition, a solid grasp of these mathematical principles is crucial for pushing the boundaries of artificial intelligence and quantum computing alike. Complex Numbers Complex numbers, which consist of a real and imaginary part (a+ib), and complex arithmetic and functions are fundamental to quantum mechanics. They allow for the representation of quantum states and the mathematical operations performed on them. In AI, complex numbers have also found applications in areas like neural networks and signal processing. A complex number Linear Algebra Linear algebra, including concepts like vectors, matrices, linear transformations, and eigenvalues/eigenvectors, is crucial for both quantum computing and many AI techniques. It provides the mathematical framework for representing and manipulating the states and operators in quantum systems, as well as the data structures and algorithms used in AI. Calculus and Optimization Calculus and optimization are crucial for training and tuning AI models, as well as for understanding the dynamics of quantum systems. The key concepts that need basic understanding are differentiation and integration, gradient-based optimization techniques, and variational methods. Additionally, a good understanding of convex optimization is an add-on in the context of optimization algorithms and loss minimization. Refer to Convex Optimization by Boyd and Vandenberghe. Mathematics for AI and Quantum Hilbert Spaces Quantum mechanics utilizes the mathematical structure of Hilbert spaces, which generalize the concepts of vectors and linear algebra to infinite dimensions. This allows for the representation of quantum states as vectors in a Hilbert space. Some AI models, such as those based on kernel methods, also make use of Hilbert space structures. Probability and Statistics Both quantum computing and AI rely heavily on probability theory and statistical methods. Quantum mechanics describes the probabilistic nature of measurements, while many AI algorithms, like Bayesian networks and reinforcement learning, are built on probabilistic foundations. Group Theory and Representation Theory Symmetry groups, unitary transformations, and irreducible representations are advanced mathematical concepts that are important for understanding the underlying structure of quantum systems and some quantum algorithms. Conclusion While the depth of understanding required may vary, a solid grasp of these core mathematical areas is essential for both advancing AI, including deep learning, and developing quantum computing technologies. The essential mathematics for both AI and quantum computing share several key concepts. Linear algebra serves as a cornerstone, enabling the representation of data and quantum states through vectors and matrices. Probability theory underpins both fields, facilitating the understanding of uncertainty in AI models and the probabilistic nature of quantum phenomena. Optimization techniques play a vital role in training machine learning models and optimizing quantum algorithms. Additionally, concepts from calculus provide the mathematical framework for gradient-based optimization and understanding quantum dynamics. Together, these mathematical foundations form the basis for advancing research and innovation in both AI and quantum computing domains.

By Vidyasagar (Sarath Chandra) Machupalli

CORE

Wireshark and tcpdump: A Debugging Power Couple

Wireshark, the free, open-source packet sniffer and network protocol analyzer, has cemented itself as an indispensable tool in network troubleshooting, analysis, and security (on both sides). This article delves into the features, uses, and practical tips for harnessing the full potential of Wireshark, expanding on aspects that may have been glossed over in discussions or demonstrations. Whether you're a developer, security expert, or just curious about network operations, this guide will enhance your understanding of Wireshark and its applications. Introduction to Wireshark Wireshark was initially developed by Eric Rescorla and Gerald Combs, and designed to capture and analyze network packets in real-time. Its capabilities extend across various network interfaces and protocols, making it a versatile tool for anyone involved in networking. Unlike its command-line counterpart, tcpdump, Wireshark's graphical interface simplifies the analysis process, presenting data in a user-friendly "proto view" that organizes packets in a hierarchical structure. This facilitates quick identification of protocols, ports, and data flows. The key features of Wireshark are: Graphical User Interface (GUI): Eases the analysis of network packets compared to command-line tools Proto view: Displays packet data in a tree structure, simplifying protocol and port identification Compatibility: Supports a wide range of network interfaces and protocols Browser Network Monitors FireFox and Chrome contain a far superior network monitor tool built into them. It is superior because it is simpler to use and works with secure websites out of the box. If you can use the browser to debug the network traffic you should do that. In cases where your traffic requires low-level protocol information or is outside of the browser, Wireshark is the next best thing. Installation and Getting Started To begin with Wireshark, visit their official website for the download. The installation process is straightforward, but attention should be paid to the installation of command-line tools, which may require separate steps. Upon launching Wireshark, users are greeted with a selection of network interfaces as seen below. Choosing the correct interface, such as the loopback for local server debugging, is crucial for capturing relevant data. When debugging a Local Server (localhost), use the loopback interface. Remote servers will probably fit with the en0 network adapter. You can use the activity graph next to the network adapter to identify active interfaces for capture. Navigating Through Noise With Filters One of the challenges of using Wireshark is the overwhelming amount of data captured, including irrelevant "background noise" as seen in the following image. Wireshark addresses this with powerful display filters, allowing users to hone in on specific ports, protocols, or data types. For instance, filtering TCP traffic on port 8080 can significantly reduce unrelated data, making it easier to debug specific issues. Notice that there is a completion widget on top of the Wireshark UI that lets you find out the values more easily. In this case, we filter by port tcp.port == 8080 which is the port used typically in Java servers (e.g., Spring Boot/tomcat). But this isn't enough as HTTP is more concise. We can filter by protocol by adding http to the filter which narrows the view to HTTP requests and responses as shown in the following image. Deep Dive Into Data Analysis Wireshark excels in its ability to dissect and present network data in an accessible manner. For example, HTTP responses carrying JSON data are automatically parsed and displayed in a readable tree structure as seen below. This feature is invaluable for developers and analysts, providing insights into the data exchanged between clients and servers without manual decoding. Wireshark parses and displays JSON data within the packet analysis pane. It offers both hexadecimal and ASCII views for raw packet data. Beyond Basic Usage While Wireshark's basic functionalities cater to a wide range of networking tasks, its true strength lies in advanced features such as ethernet network analysis, HTTPS decryption, and debugging across devices. These tasks, however, may involve complex configuration steps and a deeper understanding of network protocols and security measures. There are two big challenges when working with Wireshark: HTTPS decryption: Decrypting HTTPS traffic requires additional configuration but offers visibility into secure communications. Device debugging: Wireshark can be used to troubleshoot network issues on various devices, requiring specific knowledge of network configurations. The Basics of HTTPS Encryption HTTPS uses the Transport Layer Security (TLS) or its predecessor, Secure Sockets Layer (SSL), to encrypt data. This encryption mechanism ensures that any data transferred between the web server and the browser remains confidential and untouched. The process involves a series of steps including handshake, data encryption, and data integrity checks. Decrypting HTTPS traffic is often necessary for developers and network administrators to troubleshoot communication errors, analyze application performance, or ensure that sensitive data is correctly encrypted before transmission. It's a powerful capability in diagnosing complex issues that cannot be resolved by simply inspecting unencrypted traffic or server logs. Methods for Decrypting HTTPS in Wireshark Important: Decrypting HTTPS traffic should only be done on networks and systems you own or have explicit permission to analyze. Unauthorized decryption of network traffic can violate privacy laws and ethical standards. Pre-Master Secret Key Logging One common method involves using the pre-master secret key to decrypt HTTPS traffic. Browsers like Firefox and Chrome can log the pre-master secret keys to a file when configured to do so. Wireshark can then use this file to decrypt the traffic: Configure the browser: Set an environment variable (SSLKEYLOGFILE) to specify a file where the browser will save the encryption keys. Capture traffic: Use Wireshark to capture the traffic as usual. Decrypt the traffic: Point Wireshark to the file with the pre-master secret keys (through Wireshark's preferences) to decrypt the captured HTTPS traffic. Using a Proxy Another approach involves routing traffic through a proxy server that decrypts HTTPS traffic and then re-encrypts it before sending it to the destination. This method might require setting up a dedicated decryption proxy that can handle the TLS encryption/decryption: Set up a decryption proxy: Tools like Mitmproxy or Burp Suite can act as an intermediary that decrypts and logs HTTPS traffic. Configure network to route through proxy: Ensure the client's network settings route traffic through the proxy. Inspect Traffic: Use the proxy's tools to inspect the decrypted traffic directly. Integrating tcpdump With Wireshark for Enhanced Network Analysis While Wireshark offers a graphical interface for analyzing network packets, there are scenarios where using it directly may not be feasible due to security policies or operational constraints. tcpdump, a powerful command-line packet analyzer, becomes invaluable in these situations, providing a flexible and less intrusive means of capturing network traffic. The Role of tcpdump in Network Troubleshooting tcpdump allows for the capture of network packets without a graphical user interface, making it ideal for use in environments with strict security requirements or limited resources. It operates under the principle of capturing network traffic to a file, which can then be analyzed at a later time or on a different machine using Wireshark. Key Scenarios for tcpdump Usage High-security environments: In places like banks or government institutions where running network sniffers might pose a security risk, tcpdump offers a less intrusive alternative. Remote servers: Debugging issues on a cloud server can be challenging with Wireshark due to the graphical interface; tcpdump captures can be transferred and analyzed locally. Security-conscious customers: Customers may be hesitant to allow third-party tools to run on their systems; tcpdump's command-line operation is often more palatable. Using tcpdump Effectively Capturing traffic with tcpdump involves specifying the network interface and an output file for the capture. This process is straightforward but powerful, allowing for detailed analysis of network interactions: Command syntax: The basic command structure for initiating a capture involves specifying the network interface (e.g., en0 for wireless connections) and the output file name. Execution: Once the command is run, tcpdump silently captures network packets. The capture continues until it's manually stopped, at which point the captured data can be saved to the specified file. Opening captures in Wireshark: The file generated by tcpdump can be opened in Wireshark for detailed analysis, utilizing Wireshark's advanced features for dissecting and understanding network traffic. The following shows the tcpdump command and its output: $ sudo tcpdump -i en0 -w output Password: tcpdump: listening on en, link-type EN10MB (Ethernet), capture size 262144 bytes ^C3845 packets captured 4189 packets received by filter 0 packets dropped by kernel Challenges and Considerations Identifying the correct network interface for capture on remote systems might require additional steps, such as using the ifconfig command to list available interfaces. This step is crucial for ensuring that relevant traffic is captured for analysis. Final Word Wireshark stands out as a powerful tool for network analysis, offering deep insights into network traffic and protocols. Whether it's for low-level networking work, security analysis, or application development, Wireshark's features and capabilities make it an essential tool in the tech arsenal. With practice and exploration, users can leverage Wireshark to uncover detailed information about their networks, troubleshoot complex issues, and secure their environments more effectively. Wireshark's blend of ease of use with profound analytical depth ensures it remains a go-to solution for networking professionals across the spectrum. Its continuous development and wide-ranging applicability underscore its position as a cornerstone in the field of network analysis. Combining tcpdump's capabilities for capturing network traffic with Wireshark's analytical prowess offers a comprehensive solution for network troubleshooting and analysis. This combination is particularly useful in environments where direct use of Wireshark is not possible or ideal. While both tools possess a steep learning curve due to their powerful and complex features, they collectively form an indispensable toolkit for network administrators, security professionals, and developers alike. This integrated approach not only addresses the challenges of capturing and analyzing network traffic in various operational contexts but also highlights the versatility and depth of tools available for understanding and securing modern networks. Videos Wireshark tcpdump

By Shai Almog

CORE

The Intersection of Executive Engineering and Staff Engineering: A Staff Engineer's Viewpoint

Executive engineers are crucial in directing a technology-driven organization’s strategic direction and technological innovation. As a staff engineer, it is essential to understand the significance of executive engineering. It goes beyond recognizing the hierarchy within an engineering department to appreciating the profound impact these roles have on individual contributors’ day-to-day technical work and long-term career development. Staff engineers are deep technical experts who focus on solving complex technical challenges and defining architectural pathways for projects. However, their success is closely linked to the broader engineering strategy set by the executive team. This strategy determines staff engineers' priorities, technologies, and methodologies. Therefore, aligning executive decisions and technical implementation is essential for the engineering team to function effectively and efficiently. Executive engineers, such as Chief Technology Officers (CTOs) and Vice Presidents (VPs) of Engineering, extend beyond mere technical oversight; they embody the bridge between cutting-edge engineering practices and business outcomes. They are tasked with anticipating technological trends and aligning them with the business’s needs and market demands. In doing so, they ensure that the engineering teams are not just functional but are proactive agents of innovation and growth. For staff engineers, the strategies and decisions made at the executive level deeply influence their work environment, the tools they use, the scope of their projects, and their approach to innovation. Thus, understanding and engaging with executive engineering is essential for staff engineers who aspire to contribute significantly to their organizations and potentially advance into leadership roles. In this dynamic, the relationship between staff and executive engineers becomes a critical axis around which much of the company’s success revolves. This introduction aims to explore why executive engineering is vital from the staff engineer’s perspective and how it shapes an organization's technological and operational landscape. Hierarchal Structure of Engineering Roles In the hierarchical structure of engineering roles, understanding each position’s unique responsibilities and contributions—staff engineer, engineering manager, and engineering executive—is crucial for effective career progression and organizational success. Staff Engineers are primarily responsible for high-level technical problem-solving and creating architectural blueprints. They guide projects technically but usually only indirectly manage people. Engineering Managers oversee teams, focusing on managing personnel and ensuring that projects align with the organizational goals. They act as the bridge between the technical team and the broader business objectives. Engineering Executives, such as CTOs or VPs of Engineering, shape the strategic vision of the technology department and ensure its alignment with the company’s overarching goals. They are responsible for high-level decisions about the direction of technology and infrastructure, often dealing with cross-departmental coordination and external business concerns. The connection between a staff engineer and an engineering executive is pivotal in crafting and executing an effective strategy. While executives set the strategic direction, staff engineers are instrumental in grounding this strategy with their deep technical expertise and practical insights. This collaboration ensures that the strategic initiatives are visionary and technically feasible, enabling the organization to innovate while maintaining robust operational standards. The Engineering Executive’s Primer: Impactful Technical Leadership Will Larson’s book, The Engineering Executive’s Primer: Impactful Technical Leadership, is an essential guide for those aspiring to or currently in engineering leadership roles. With his extensive experience as a CTO, Larson offers a roadmap from securing an executive position to mastering the complexities of technical and strategic leadership in engineering. Key Insights From the Book Transitioning to Leadership Larson discusses the nuances of obtaining an engineering executive role, from negotiation to the critical first steps post-hire. This guidance is vital for engineers transitioning from technical to executive positions, helping them avoid common pitfalls. Strategic Planning and Communication The book outlines how to run engineering planning processes and maintain clear organizational communication effectively. These skills are essential for aligning various engineering activities with company goals and facilitating inter-departmental collaboration. Operational Excellence Larson delves into managing crucial meetings, performance management systems, and new engineers’ strategic hiring and onboarding. These processes are fundamental to maintaining a productive engineering team and fostering a high-performance culture. Personal Management Understanding the importance of managing one’s priorities and energy is another book focus, which is often overlooked in technical fields. Larson provides strategies for staying effective and resilient in the face of challenges. Navigational Tools for Executive Challenges From mergers and acquisitions to interacting with CEOs and peer executives, the book provides insights into the broader corporate interactions an engineering executive will navigate. Conclusion The engineering executive’s role is pivotal in setting a vision that integrates with the organization’s strategic objectives. Still, the symbiotic relationship with staff engineers brings this vision to fruition. Larson’s The Engineering Executive’s Primer is an invaluable resource for engineers at all levels, especially those aiming to bridge the gap between deep technical expertise and impactful leadership. Through this primer, engineering leaders can learn to manage, inspire, and drive technological innovation within their companies.

By Otavio Santana

CORE

Securing Mobile Apps: Development Strategies and Testing Methods

In today's digital world, mobile apps play a crucial role in our daily lives. They serve a range of purposes from transactions and online shopping to social interactions and work efficiency, making them essential. However, with their widespread use comes an increased risk of security threats. Ensuring the security of an app requires an approach from development methods to continuous monitoring. Prioritizing security is key to safeguarding your users and upholding the trustworthiness of your app. Remember, security is an ongoing responsibility rather than a one-time task. Stay updated on emerging risks. Adjust your security strategies accordingly. The following sections discuss the importance of security measures and outline the steps for developing a mobile app. What Is Mobile App Security and Why Does It Matter? Mobile app security involves practices and precautions to shield apps from vulnerabilities, attacks, and unauthorized entry. It encompasses elements such as data safeguarding, authentication processes, authorization mechanisms, secure coding principles, and encryption techniques. The Significance of Ensuring Mobile App Security User Trust: Users expect their personal information to be kept safe when using apps. A breach would damage trust and reputation. Compliance With Laws and Regulations: Most countries have laws to protect data such as GDPR, which organizations are required to adhere to. Not following these regulations could result in penalties. Financial Consequences: Security breaches can lead to losses to costs, compensations, and recovery efforts. Sustaining Business Operations: A compromised app has the potential to disrupt business functions and affect revenue streams. Guidelines for Developing a Secure Mobile App Creating an application entails various crucial steps aimed at fortifying the app against possible security risks. The following is a detailed roadmap for constructing an app. 1. Recognize and Establish Security Requirements Prior to commencing development, outline the security prerequisites specific to your app. Take into account aspects like authentication, data storage, encryption, and access management. 2. Choose a Reliable Cloud Platform Choose a cloud service provider that offers security functionalities. Popular choices may include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). 3. Ensure Safe Development Practices • Educate developers on coding methods to steer clear of vulnerabilities such as SQL injection, cross-site scripting (XSS), and insecure APIs. • Conduct routine code reviews to detect security weaknesses at an early stage. 4. Implement Authentication and Authorization Measures • Employ robust authentication methods like factor authentication (MFA) for heightened user login security. • Utilize Role-Based Access Control (RBAC) to assign permissions based on user roles limiting access to functionalities. 5. Safeguard Data Through Encryption • Utilize HTTPS for communication between the application and server for in-transit encryption. • Encrypt sensitive data stored in databases or files for at-rest encryption. 6. Ensure the Security of APIs • Validate input by employing API keys. Set up rate limiting for API security. • Securely handle user authentication and authorization with OAuth and OpenID Connect protocols. 7. Conduct Regular Security Assessments • Perform penetration testing periodically to identify vulnerabilities. • Leverage automated scanning tools to detect security issues efficiently. 8. Monitor Activities and Respond to Incidents • Keep track of behavior in time to spot any irregularities or anomalies promptly. Having a plan for handling security incidents is crucial. What Is Involved in Mobile Application Security Testing? Implementing robust security testing methods is crucial for ensuring the integrity and resilience of mobile applications. Static Application Security Testing (SAST), Dynamic Application Security Testing (DAST), and Mobile App Penetration Testing are fundamental approaches that help developers identify and address security vulnerabilities. These methodologies not only fortify the security posture of apps but also contribute to maintaining user trust and confidence. Let's delve deeper into each of these testing techniques to understand their significance in securing mobile apps effectively. Static Application Security Testing (SAST) This method involves identifying security vulnerabilities in applications during the development stage. It entails examining the application's source code or binary without executing it, which helps detect security flaws in the development process. SAST scans the codebase for vulnerabilities like injection flaws, authentication, insecure data storage, and other typical security issues. Automated scanning tools are used to analyze the code and pinpoint problems such as hardcoded credentials, improper input validation, and exposure of data. By detecting security weaknesses before deployment, SAST allows developers to make necessary improvements to enhance the application's security stance. Integrating SAST into the development workflow aids in meeting industry standards and regulatory mandates. In essence, SAST strengthens mobile application resilience against cyber threats by protecting information and upholding user confidence in today's interconnected environment. Dynamic Application Security Testing (DAST) This method is used to test the security of apps while they are running, assessing their security in time. Unlike analysis that looks at the app's source code, DAST evaluates how the app behaves in a setting. DAST tools emulate real-world attacks by interacting with the app as a user would, sending different inputs and observing the reactions. By analyzing how the app operates during runtime, DAST can pinpoint security issues such as injection vulnerabilities, weak authentication measures, and improper error handling. DAST mainly focuses on uncovering vulnerabilities that may not be obvious from examining the code. Some common techniques used in DAST include fuzz testing, where the app is bombarded with inputs to reveal vulnerabilities, and penetration testing conducted by hackers to exploit security flaws. By using DAST, developers can detect vulnerabilities that malicious actors could exploit to compromise an app's confidentiality, integrity, or availability of data. Integrating DAST into mobile app development allows developers to find and fix security weaknesses before deployment, thereby reducing the chances of security breaches and strengthening application security. Mobile App Penetration Testing This proactive approach is employed to pinpoint weaknesses and vulnerabilities in apps. Simulating real-world attacks is part of assessing the security stance of an application and its underlying infrastructure. Penetration tests can be conducted manually by cybersecurity experts or automated using specialized tools and software. The testing procedure includes several phases: Reconnaissance: Gather details about the application's structure, features, and possible attack paths. Vulnerability Scanning: Use automated tools to pinpoint security vulnerabilities in the app. Exploitation: Attempt to exploit identified vulnerabilities to gain entry or elevate privileges. Post-Exploitation: Document the consequences of breaches and offer recommendations for mitigation. Mobile App Penetration Testing helps organizations uncover and rectify security weaknesses and reduces the risk of data breaches, financial harm, and damage to reputation. By evaluating the security of their apps, companies can enhance their security standing and maintain the confidence of their clients. By combining the above methodologies, Mobile App Security Testing helps identify and rectify security vulnerabilities in the development process, ensuring that mobile apps are strong, resilient, and protected against cybersecurity risks. This helps safeguard user data and maintain user trust in today's interconnected world. Common Mobile App Security Threats Data Leakage Data leakage refers to the unauthorized exposure of sensitive information stored or transmitted via mobile apps. This poses significant risks for both individuals and companies, including identity theft, financial scams, damage to reputation, and legal ramifications. For individuals, data leaks can compromise details such as names, addresses, social security numbers, and financial information, impacting their privacy and security. Moreover, leaks of health or personal data can tarnish someone's reputation and well-being. On the business front, data leaks can result in financial losses, regulatory fines, and erosion of customer trust. Breaches involving customer data can harm a company's image, leading to customer loss, which can affect revenue and competitiveness. Failure to secure sensitive information can also lead to severe consequences and penalties, especially in regulated industries like healthcare, finance, or e-commerce. Therefore, implementing robust security measures is crucial to protect information and maintain user trust in mobile apps. Man-in-the-Middle (MITM) Attacks Man-in-the-Middle (MITM) Attacks happen when someone secretly intercepts and alters communication between two parties. In the context of apps, this involves a hacker inserting themselves between a user's device and the server, allowing them to spy on shared information. MITM attacks are risky, potentially leading to data theft and identity fraud as hackers can access login credentials, financial transactions, and personal data. To prevent MITM attacks, developers should use encryption methods such as HTTPS/TLS, while users should avoid public Wi-Fi networks and consider using VPNs for added security. Remaining vigilant and taking precautions are essential in protecting against MITM attacks. Injection Attacks Injection attacks pose significant security risks to apps as malicious actors exploit vulnerabilities to insert and execute unauthorized code. Common examples include SQL injection and JavaScript injection. During these attacks, perpetrators tamper with input fields to inject commands, gaining unauthorized access to data or disrupting app functions. Injection attacks can lead to data breaches, data tampering, and system compromise. To prevent these attacks, developers should enforce input validation, use secure queries, and adhere to secure coding practices. Regular security assessments and tests are crucial for pinpointing and addressing vulnerabilities in apps. Insecure Authentication Insecure authentication methods can lead to vulnerabilities, opening the door to entry and data breaches. Common issues include weak passwords, absence of two-factor authentication, and improper session management. Cyber attackers exploit these weaknesses to impersonate users, access data unlawfully, or seize control of user accounts. This compromised authentication system jeopardizes user privacy, data accuracy, and accessibility, posing risks to individuals and organizations. To address this risk, developers should implement security measures such as two-factor authentication and session tokens. Regular updates and enhancements to security protocols are crucial to stay ahead of evolving threats. Data Storage Ensuring secure data storage is crucial in today's technology landscape, especially for apps. It's vital to protect sensitive information and financial records to prevent unauthorized access and data breaches. Secure data storage includes encrypting information both at rest and in transit using encryption methods and secure storage techniques. Moreover, setting up access controls, authentication procedures, and conducting regular security checks are essential to uphold the confidentiality and integrity of stored data. By prioritizing these data storage practices and security protocols, developers can ensure that user information remains shielded from risks and vulnerabilities. Faulty Encryption Faulty encryption and flawed security measures can lead to vulnerabilities within apps, putting sensitive data at risk of unauthorized access and misuse. If encryption algorithms are weak or not implemented correctly, encrypted data could be easily decoded by actors. Poor key management, like storing encryption keys insecurely, worsens these threats. Additionally, security protocols lacking proper authentication or authorization controls create opportunities for attackers to bypass security measures. The consequences of inadequate encryption and security measures can be substantial and can include data breaches, financial losses, and a decline in user trust. To address these risks effectively, developers should prioritize encryption algorithms, secure management practices, and thorough security protocols in their mobile apps. The Unauthorized Use of Device Functions The misuse of device capabilities within apps presents a security concern, putting user privacy and device security at risk. Malicious apps or attackers could exploit weaknesses to access features like the camera, microphone, or GPS without permission leading to privacy breaches. This unauthorized access may result in monitoring, unauthorized audio/video recording, and location tracking, compromising user confidentiality. Additionally, unauthorized use of device functions could allow attackers to carry out activities such as sending premium SMS messages or making calls that incur costs or violate privacy. To address this issue effectively, developers should enforce permission controls. Carefully evaluate third-party tools and integrations to prevent misuse of device capabilities. Reverse Engineering and Altering Code Altering the code within apps can pose security risks and put the app's integrity and confidentiality at risk. Bad actors might decompile the code to find weaknesses, extract data, or alter its functions for malicious purposes. These activities allow attackers to bypass security measures, insert malicious code, or create vulnerabilities leading to data breaches, unauthorized access, and financial harm. Moreover, tampering with code can enable hackers to circumvent licensing terms or protections for developers' intellectual property, impacting their revenue streams. To effectively address this threat, developers should employ techniques like code obfuscation to obscure the code's meaning and make it harder for attackers to decipher. They should also establish safeguards during the app's operation and regularly audit the codebase for any signs of tampering or unauthorized modifications. These proactive measures help mitigate the risks associated with code alteration and maintain the app's security and integrity. Third-Party Collaborations Third-party collaborations in apps bring both advantages and risks. While connecting with third-party services can improve features and user satisfaction, it also exposes the app to security threats and privacy issues. Thoroughly evaluating third-party partners, following security protocols, and regularly monitoring are steps to manage these risks. Neglecting to assess third-party connections can lead to data breaches, compromised user privacy, and harm to the app's reputation. Therefore, developers should be cautious and diligent when entering into collaborations with parties to safeguard the security and credibility of their apps. Social Manipulation Strategies Social manipulation strategies present a security risk for apps leveraging human behavior to mislead users and jeopardize their safety. Attackers can use methods like emails deceptive phone calls or misleading messages to deceive users into sharing sensitive data like passwords or financial information. Moreover, these tactics can influence user actions like clicking on links or downloading apps containing malware. Such strategies erode user trust and may lead to data breaches, identity theft, or financial scams. To address this, it's important for users to understand social manipulation tactics and be cautious when dealing with suspicious requests, messages, or links in mobile apps. Developers should also incorporate security measures like two-factor authentication and anti-phishing tools to safeguard users against engineering attacks. Conclusion Always keep in mind that security is an ongoing responsibility and not a one-time job. Stay informed about threats and adapt your security measures accordingly. Developing an app can be crucial for safeguarding user data establishing trust and averting security breaches.

By Naga Santhosh Reddy Vootukuri

Telemetry Pipelines Workshop: Parsing Multiple Events

This article is part of a series exploring a workshop guiding you through the open source project Fluent Bit, what it is, a basic installation, and setting up the first telemetry pipeline project. Learn how to manage your cloud-native data from source to destination using the telemetry pipeline phases covering collection, aggregation, transformation, and forwarding from any source to any destination. The previous article in this series saw us building our first telemetry pipelines with Fluent Bit. In this article, we continue onwards with some more specific use cases that pipelines solve. You can find more details in the accompanying workshop lab. Let's get started with this use case. Before we get started it's important to review the phases of a telemetry pipeline. In the diagram below we see them laid out again. Each incoming event goes from input to parser to filter to buffer to routing before they are sent to their final output destination(s). For clarity in this article, we'll split up the configuration into files that are imported into a main fluent bit configuration file that we'll name workshop-fb.conf. Parsing Multiple Events One of the more common use cases for telemetry pipelines is having multiple event streams producing data that creates the situation that keys are not unique if parsed without some cleanup. Let's illustrate how Fluent Bit can easily provide us with a means to both parse and filter events from multiple input sources to clean up any duplicate keys before sending onward to a destination. To provide an example, we start with an inputs.conf file containing a configuration using the dummy plugin to generate two types of events, both using the same key to cause confusion if we try querying without cleaning them up first: # This entry generates a success message. [INPUT] Name dummy Tag event.success Dummy {"message":"true 200 success"} # This entry generates an error message. [INPUT] Name dummy Tag event.error Dummy {"message":"false 500 error"} Our configuration is tagging each successful event with event.success and failure events with event.error. The confusion will be caused by configuring the dummy message with the same key, message, for both event definitions. This will cause our incoming events to be confusing to deal with. The file called outputs.conf contains but one destination as shown in the following configuration: # This entry directs all tags (it matches any we encounter) # to print to standard output, which is our console. # [OUTPUT] Name stdout Match * With our inputs and outputs configured, we can now bring them together in a single main configuration file we mentioned at the start. Let's create a new file called workshop-fb.conf in our favorite editor. Add the following configuration; for now, just importing our other two files: # Fluent Bit main configuration file. # # Imports section, assumes these files are in the same # directory as the main configuration file. # @INCLUDE inputs.conf @INCLUDE outputs.conf To see if our configuration works, we can test run it with our Fluent Bit installation. Depending on the chosen install method used from the previous articles in this series, we have the option to run it from source, or using container images. First, we show how to run it using the source install execution from the directory we created to hold all our configuration files: # source install. # $ [PATH_TO]/fluent-bit --config=workshop-fb.conf The console output should look something like this - noting that we've cut out the ASCII logo at startup: ... [2024/04/05 16:49:33] [ info] [input:dummy:dummy.0] initializing [2024/04/05 16:49:33] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only) [2024/04/05 16:49:33] [ info] [input:dummy:dummy.1] initializing [2024/04/05 16:49:33] [ info] [input:dummy:dummy.1] storage_strategy='memory' (memory only) [2024/04/05 16:49:33] [ info] [output:stdout:stdout.0] worker #0 started [2024/04/05 16:49:33] [ info] [sp] stream processor started [0] event.success: [[1712328574.915990000, {}], {"message"=>"true 200 success"}] [0] event.error: [[1712328574.917728000, {}], {"message"=>"false 500 error"}] [0] event.success: [[1712328575.915732000, {}], {"message"=>"true 200 success"}] [0] event.error: [[1712328575.916608000, {}], {"message"=>"false 500 error"}] [0] event.success: [[1712328576.915161000, {}], {"message"=>"true 200 success"}] [0] event.error: [[1712328576.915288000, {}], {"message"=>"false 500 error"}] ... Also note the alternating generated event lines with messages that are hard to separate when using the same key. These events alternate in the console until exiting with CTRL_C. Next, we show how to run our telemetry pipeline configuration using a container image. First thing that is needed is a file called Buildfile. This is going to be used to build a new container image and insert our configuration files. Note this file needs to be in the same directory as your configuration files; otherwise, adjust the file path names: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.1 COPY ./workshop-fb.conf /fluent-bit/etc/fluent-bit.conf COPY ./inputs.conf /fluent-bit/etc/inputs.conf COPY ./outputs.conf /fluent-bit/etc/outputs.conf Now we'll build a new container image as follows using the Buildfile, naming it with a version tag, and assuming you are in the same directory (using Podman as discussed in previous articles): $ podman build -t workshop-fb:v4 -f Buildfile STEP 1/4: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.1 STEP 2/4: COPY ./workshop-fb.conf /fluent-bit/etc/fluent-bit.conf --> a379e7611210 STEP 3/4: COPY ./inputs.conf /fluent-bit/etc/inputs.conf --> f39b10d3d6d0 STEP 4/4: COPY ./outputs.conf /fluent-bit/etc/outputs.conf COMMIT workshop-fb:v4 --> b06df84452b6 Successfully tagged localhost/workshop-fb:v4 b06df84452b6eb7a040b75a1cc4088c0739a6a4e2a8bbc2007608529576ebeba Now, to run our new container image: $ podman run workshop-fb:v4 The output looks exactly like the source output above, just with different timestamps. Again you can stop the container using CTRL_C. Now we have dirty ingested data coming into our pipeline, showing that we have multiple messages on the same key. To be able to clean this up for usage before passing on to the backend (output), we need to make use of both the Parser and Filter phases. First, in the Parser phase, where unstructured data is converted into structured data, we'll make use of the built in REGEX parser plugin to structure the duplicate messages into something more usable. For clarity, this is where we are working in our telemetry pipeline: To set up the parser configuration, we create a new file called parsers.conf in our favorite editor. Add the following configuration, where we are defining a PARSER, naming the parser message_cleaning_parser, selecting the built-in regex parser, and applying the regular expression shown here to convert each message into a structured format (note this actually is applied to incoming messages in the next phase of the telemetry pipeline): # This parser uses the built-in parser plugin and applies the # regex to all incoming events. # [PARSER] Name message_cleaning_parser Format regex Regex ^(?<valid_message>[^ ]+) (?<code>[^ ]+) (?<type>[^ ]+)$ Next up is the Filter phase where we will apply the parser. For clarity, the following visual is provided: In the Filter phase, the previously defined parser is put to the test. To set up the filter configuration we create a new file called filters.conf in our favorite editor. Add the following configuration where we are defining a FILTER, naming the filter message_parser, matching all incoming messages to apply this filter, looking for the key message to select the value to be fed into the parser, and applying the parser message_cleaning_parser to it: # This filter is applied to all events and uses the named parser to # apply values found with the chosen key if it exists. # [FILTER] Name parser Match * Key_Name message Parser message_cleaning_parser To make sure the new filter and parser are included, we update our main configuration file workshop-fb.conf as follows: # Fluent Bit main configuration file. [SERVICE] parsers_file parsers.conf # Imports section. @INCLUDE inputs.conf @INCLUDE outputs.conf @INCLUDE filters.conf To verify that our configuration works we can test run it with our Fluent Bit installation. Depending on the chosen install method, here we show how to run it using the source installation followed by the container version. Below, the source install is shown from the directory we created to hold all our configuration files: # source install. # $ [PATH_TO]/fluent-bit --config=workshop-fb.conf The console output should look something like this - noting that we've cut out the ASCII logo at startup: ... [2024/04/09 16:19:42] [ info] [input:dummy:dummy.0] initializing [2024/04/09 16:19:42] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only) [2024/04/09 16:19:42] [ info] [input:dummy:dummy.1] initializing [2024/04/09 16:19:42] [ info] [input:dummy:dummy.1] storage_strategy='memory' (memory only) [2024/04/09 16:19:42] [ info] [output:stdout:stdout.0] worker #0 started [2024/04/09 16:19:42] [ info] [sp] stream processor started [0] event.success: [[1712672383.962198000, {}], {"valid_message"=>"true", "code"=>"200", "type"=>"success"}] [0] event.error: [[1712672383.964528000, {}], {"valid_message"=>"false", "code"=>"500", "type"=>"error"}] [0] event.success: [[1712672384.961942000, {}], {"valid_message"=>"true", "code"=>"200", "type"=>"success"}] [0] event.error: [[1712672384.962105000, {}], {"valid_message"=>"false", "code"=>"500", "type"=>"error"}] ... Be sure to scroll to the right in the above window to see the full console output. Note the alternating generated event lines with parsed messages that now contain keys to simplify later querying. This runs until exiting with CTRL_C. Let's now try testing our configuration by running it using a container image. The first thing that is needed is to open in our favorite editor the file Buildfile. This is going to be expanded to include the filters and parsers configuration files. Note this file needs to be in the same directory as our configuration files; otherwise, adjust the file path names: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.1 COPY ./workshop-fb.conf /fluent-bit/etc/fluent-bit.conf COPY ./inputs.conf /fluent-bit/etc/inputs.conf COPY ./outputs.conf /fluent-bit/etc/outputs.conf COPY ./filters.conf /fluent-bit/etc/filters.conf COPY ./parsers.conf /fluent-bit/etc/parsers.conf Now we'll build a new container image, naming it with a version tag as follows using the Buildfile and assuming you are in the same directory (using Podman as discussed in previous articles): $ podman build -t workshop-fb:v4 -f Buildfile STEP 1/6: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.1 STEP 2/6: COPY ./workshop-fb.conf /fluent-bit/etc/fluent-bit.conf --> 7eee3091e091 STEP 3/6: COPY ./inputs.conf /fluent-bit/etc/inputs.conf --> 53ff32210b0e STEP 4/6: COPY ./outputs.conf /fluent-bit/etc/outputs.conf --> 62168aa0c600 STEP 5/6: COPY ./filters.conf /fluent-bit/etc/filters.conf --> 08f0878ded1e STEP 6/6: COPY ./parsers.conf /fluent-bit/etc/parsers.conf COMMIT workshop-fb:v4 --> 92825169e230 Successfully tagged localhost/workshop-fb:v4 92825169e230a0cc36764d6190ee67319b6f4dfc56d2954d267dc89dab8939bd Now to run our new container image: $ podman run workshop-fb:v4 The output looks exactly like the source output above, noting that the alternating generated event lines with parsed messages now contain keys to simplify later querying. This completes our use cases for this article, be sure to explore this hands-on experience with the accompanying workshop lab. What's Next? This article walked us through a telemetry pipeline use case for multiple events using parsing and filtering. The series continues with the next step where we'll explore how to collect metrics using a telemetry pipeline. Stay tuned for more hands on material to help you with your cloud native observability journey.

By Eric D. Schabell

CORE

Real-Time Data Transfer from Apache Flink to Kafka to Druid for Analysis/Decision-Making

Businesses can react quickly and effectively to user behavior patterns by using real-time analytics. This allows them to take advantage of opportunities that might otherwise pass them by and prevent problems from getting worse. Apache Kafka, a popular event streaming platform, can be used for real-time ingestion of data/events generated from various sources across multiple verticals such as IoT, financial transactions, inventory, etc. This data can then be streamed into multiple downstream applications or engines for further processing and eventual analysis to support decision-making. Apache Flink serves as a powerful engine for refining or enhancing streaming data by modifying, enriching, or restructuring it upon arrival at the Kafka topic. In essence, Flink acts as a downstream application that continuously consumes data streams from Kafka topics for processing, and then ingests the processed data into various Kafka topics. Eventually, Apache Druid can be integrated to consume the processed streaming data from Kafka topics for analysis, querying, and making instantaneous business decisions. Click here for an enlarged view In my previous write-up, I explained how to integrate Flink 1.18 with Kafka 3.7.0. In this article, I will outline the steps to transfer processed data from Flink 1.18.1 to a Kafka 2.13-3.7.0 topic. A separate article detailing the ingestion of streaming data from Kafka topics into Apache Druid for analysis and querying was published a few months ago. You can read it here. Execution Environment We configured a multi-node cluster (three nodes) where each node has a minimum of 8 GB RAM and 250 GB SSD along with Ubuntu-22.04.2 amd64 as the operating system. OpenJDK 11 is installed with JAVA_HOME environment variable configuration on each node. Python 3 or Python 2 along with Perl 5 is available on each node. A three-node Apache Kafka-3.7.0 cluster has been up and running with Apache Zookeeper -3.5.6. on two nodes. Apache Druid 29.0.0 has been installed and configured on a node in the cluster where Zookeeper has not been installed for the Kafka broker. Zookeeper has been installed and configured on the other two nodes. The Leader broker is up and running on the node where Druid is running. Developed a simulator using the Datafaker library to produce real-time fake financial transactional JSON records every 10 seconds of interval and publish them to the created Kafka topic. Here is the JSON data feed generated by the simulator. JSON {"timestamp":"2024-03-14T04:31:09Z ","upiID":"9972342663@ybl","name":"Kiran Marar","note":" ","amount":"14582.00","currency":"INR","geoLocation":"Latitude: 54.1841745 Longitude: 13.1060775","deviceOS":"IOS","targetApp":"PhonePe","merchantTransactionId":"ebd03de9176201455419cce11bbfed157a","merchantUserId":"65107454076524@ybl"} Extract the archive of the Apache Flink-1.18.1-bin-scala_2.12.tgz on the node where Druid and the leader broker of Kafka are not running Running a Streaming Job in Flink We will dig into the process of extracting data from a Kafka topic where incoming messages are being published from the simulator, performing processing tasks on it, and then reintegrating the processed data back into a different topic of the multi-node Kafka cluster. We developed a Java program (StreamingToFlinkJob.java) that was submitted as a job to Flink to perform the above-mentioned steps, considering a window of 2 minutes and calculating the average amount transacted from the same mobile number (upi id) on the simulated UPI transactional data stream. The following list of jar files has been included on the project build or classpath. Using the code below, we can get the Flink execution environment inside the developed Java class. Java Configuration conf = new Configuration(); StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf); Now we should read the messages/stream that has already been published by the simulator to the Kafka topic inside the Java program. Here is the code block. Java KafkaSource kafkaSource = KafkaSource.<UPITransaction>builder() .setBootstrapServers(IKafkaConstants.KAFKA_BROKERS)// IP Address with port 9092 where leader broker is running in cluster .setTopics(IKafkaConstants.INPUT_UPITransaction_TOPIC_NAME) .setGroupId("upigroup") .setStartingOffsets(OffsetsInitializer.latest()) .setValueOnlyDeserializer(new KafkaUPISchema()) .build(); To retrieve information from Kafka, setting up a deserialization schema within Flink is crucial for processing events in JSON format, converting raw data into a structured form. Importantly, setParallelism needs to be set to no.of Kafka topic partitions else the watermark won't work for the source, and data is not released to the sink. Java DataStream<UPITransaction> stream = env.fromSource(kafkaSource, WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofMinutes(2)), "Kafka Source").setParallelism(1); With successful event retrieval from Kafka, we can enhance the streaming job by incorporating processing steps. The subsequent code snippet reads Kafka data, organizes it by mobile number (upiID), and computes the average price per mobile number. To accomplish this, we developed a custom window function for calculating the average and implemented watermarking to handle event time semantics effectively. Here is the code snippet: Java SerializableTimestampAssigner<UPITransaction> sz = new SerializableTimestampAssigner<UPITransaction>() { @Override public long extractTimestamp(UPITransaction transaction, long l) { try { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'"); Date date = sdf.parse(transaction.eventTime); return date.getTime(); } catch (Exception e) { return 0; } } }; WatermarkStrategy<UPITransaction> watermarkStrategy = WatermarkStrategy.<UPITransaction>forBoundedOutOfOrderness(Duration.ofMillis(100)).withTimestampAssigner(sz); DataStream<UPITransaction> watermarkDataStream = stream.assignTimestampsAndWatermarks(watermarkStrategy); //Instead of event time, we can use window based on processing time. Using TumblingProcessingTimeWindows DataStream<TransactionAgg> groupedData = watermarkDataStream.keyBy("upiId").window(TumblingEventTimeWindows.of(Time.milliseconds(2500), Time.milliseconds(500))).sum("amount"); .apply(new TransactionAgg()); Eventually, the processing logic (computation of average price for the same UPI ID based on a mobile number for the window of 2 minutes on the continuous flow of transaction stream) is executed inside Flink. Here is the code block for the Window function to calculate the average amount on each UPI ID or mobile number. Java public class TransactionAgg implements WindowFunction<UPITransaction, TransactionAgg, Tuple, TimeWindow> { @Override public void apply(Tuple key, TimeWindow window, Iterable<UPITransaction> values, Collector<TransactionAgg> out) { Integer sum = 0; //Consider whole number int count = 0; String upiID = null ; for (UPITransaction value : values) { sum += value.amount; upiID = value.upiID; count++; } TransactionAgg output = new TransactionAgg(); output.upiID = upiID; output.eventTime = window.getEnd(); output.avgAmount = (sum / count); out.collect( output); } } We have processed the data. The next step is to serialize the object and send it to a different Kafka topic. Add a KafkaSink in the developed Java code (StreamingToFlinkJob.java) to send the processed data from the Flink engine to a different Kafka topic created on the multi-node Kafka cluster. Here is the code snippet to serialize the object before sending/publishing it to the Kafka topic: Java public class KafkaTrasactionSinkSchema implements KafkaRecordSerializationSchema<TransactionAgg> { @Override public ProducerRecord<byte[], byte[]> serialize( TransactionAgg aggTransaction, KafkaSinkContext context, Long timestamp) { try { return new ProducerRecord<>( topic, null, // not specified partition so setting null aggTransaction.eventTime, aggTransaction.upiID.getBytes(), objectMapper.writeValueAsBytes(aggTransaction)); } catch (Exception e) { throw new IllegalArgumentException( "Exception on serialize record: " + aggTransaction, e); } } } And, below is the code block to sink the processed data sending back to a different Kafka topic. Java KafkaSink<TransactionAgg> sink = KafkaSink.<TransactionAgg>builder() .setBootstrapServers(IKafkaConstants.KAFKA_BROKERS) .setRecordSerializer(new KafkaTrasactionSinkSchema(IKafkaConstants.OUTPUT_UPITRANSACTION_TOPIC_NAME)) .setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE) .build(); groupedData.sinkTo(sink); // DataStream that created above for TransactionAgg env.execute(); Connecting Druid With Kafka Topic In this final step, we need to integrate Druid with the Kafka topic to consume the processed data stream that is continuously published by Flink. With Apache Druid, we can directly connect Apache Kafka so that real-time data can be ingested continuously and subsequently queried to make business decisions on the spot without interventing any third-party system or application. Another beauty of Apache Druid is that we need not configure or install any third-party UI application to view the data that landed or is published to the Kafka topic. To condense this article, I omitted the steps for integrating Druid with Apache Kafka. However, a few months ago, I published an article on this topic (linked earlier in this article). You can read it and follow the same approach. Final Note The provided code snippet above is for understanding purposes only. It illustrates the sequential steps of obtaining messages/data streams from a Kafka topic, processing the consumed data, and eventually sending/pushing the modified data into a different Kafka topic. This allows Druid to pick up the modified data stream for query, analysis as a final step. Later, we will upload the entire codebase on GitHub if you are interested in executing it on your own infrastructure. I hope you enjoyed reading this. If you found this article valuable, please consider liking and sharing it.

By Gautam Goswami

CORE

Comprehensive Guide to Unit Testing Spring AOP Aspects

Unit testing is an essential practice in software development that involves testing individual codebase components to ensure they function correctly. In Spring-based applications, developers often use Aspect-Oriented Programming (AOP) to separate cross-cutting concerns, such as logging, from the core business logic, thus enabling modularization and cleaner code. However, testing aspects in Spring AOP pose unique challenges due to their interception-based nature. Developers need to employ appropriate strategies and best practices to facilitate effective unit testing of Spring AOP aspects. This comprehensive guide aims to provide developers with detailed and practical insights on effectively unit testing Spring AOP aspects. The guide covers various topics, including the basics of AOP, testing the pointcut expressions, testing around advice, testing before and after advice, testing after returning advice, testing after throwing advice, and testing introduction advice. Moreover, the guide provides sample Java code for each topic to help developers understand how to effectively apply the strategies and best practices. By following the guide's recommendations, developers can improve the quality of their Spring-based applications and ensure that their code is robust, reliable, and maintainable. Understanding Spring AOP Before implementing effective unit testing strategies, it is important to have a comprehensive understanding of Spring AOP. AOP, or Aspect-Oriented Programming, is a programming paradigm that enables the separation of cross-cutting concerns shared across different modules in an application. Spring AOP is a widely used aspect-oriented framework that is primarily implemented using runtime proxy-based mechanisms. The primary objective of Spring AOP is to provide modularity and flexibility in designing and implementing cross-cutting concerns in a Java-based application. The key concepts that one must understand in Spring AOP include: Aspect: An aspect is a module that encapsulates cross-cutting concerns that are applied across multiple objects in an application. Aspects are defined using aspects-oriented programming techniques and are typically independent of the application's core business logic. Join point: A join point is a point in the application's execution where the aspect can be applied. In Spring AOP, a join point can be a method execution, an exception handler, or a field access. Advice: Advice is an action that is taken when a join point is reached during the application's execution. In Spring AOP, advice can be applied before, after, or around a join point. Pointcut: A pointcut is a set of joint points where an aspect's advice should be applied. In Spring AOP, pointcuts are defined using expressions that specify the join points based on method signatures, annotations, or other criteria. By understanding these key concepts, developers can effectively design and implement cross-cutting concerns in a Java-based application using Spring AOP. Challenges in Testing Spring AOP Aspects Unit testing Spring AOP aspects can be challenging compared to testing regular Java classes, due to the unique nature of AOP aspects. Some of the key challenges include: Interception-based behavior: AOP aspects intercept method invocations or join points, which makes it difficult to test their behavior in isolation. To overcome this challenge, it is recommended to use mock objects to simulate the behavior of the intercepted objects. Dependency Injection: AOP aspects may rely on dependencies injected by the Spring container, which requires special handling during testing. It is important to ensure that these dependencies are properly mocked or stubbed to ensure that the aspect is being tested in isolation and not affected by other components. Dynamic proxying: Spring AOP relies on dynamic proxies, which makes it difficult to directly instantiate and test aspects. To overcome this challenge, it is recommended to use Spring's built-in support for creating and configuring dynamic proxies. Complex pointcut expressions: Pointcut expressions can be complex, making it challenging to ensure that advice is applied to the correct join points. To overcome this challenge, it is recommended to use a combination of unit tests and integration tests to ensure that the aspect is being applied correctly. Transaction management: AOP aspects may interact with transaction management, introducing additional complexity in testing. To overcome this challenge, it is recommended to use a combination of mock objects and integration tests to ensure that the aspect is working correctly within the context of the application. Despite these challenges, effective unit testing of Spring AOP aspects is crucial for ensuring the reliability, maintainability, and correctness of the application. By understanding these challenges and using the recommended testing approaches, developers can ensure that their AOP aspects are thoroughly tested and working as intended. Strategies for Unit Testing Spring AOP Aspects Unit testing Spring AOP Aspects can be challenging, given the system's complexity and the multiple pieces of advice involved. However, developers can use various strategies and best practices to overcome these challenges and ensure effective unit testing. One of the most crucial strategies is to isolate aspects from dependencies when writing unit tests. This isolation ensures that the tests focus solely on the aspect's behavior without interference from other modules. Developers can accomplish this by using mocking frameworks such as Mockito, EasyMock, or PowerMockito, which allow them to simulate dependencies' behavior and control the test environment. Another best practice is to test each piece of advice separately. AOP Aspects typically consist of multiple pieces of advice, such as "before," "after," or "around" advice. Testing each piece of advice separately ensures that the behavior of each piece of advice is correct and that it functions correctly in isolation. It's also essential to verify that the pointcut expressions are correctly configured and target the intended join points. Writing tests that exercise different scenarios helps ensure the correctness of point-cut expressions. Aspects in Spring-based applications often rely on beans managed by the ApplicationContext. Mocking the ApplicationContext allows developers to provide controlled dependencies to the aspect during testing, avoiding the need for a fully initialized Spring context. Developers should also define clear expectations for the behavior of the aspect and use assertions to verify that the aspect behaves as expected under different conditions. Assertions help ensure that the aspect's behavior aligns with the intended functionality. Finally, if aspects involve transaction management, developers should consider testing transactional behavior separately. This can be accomplished by mocking transaction managers or using in-memory databases to isolate the transactional aspect of the code for testing. By employing these strategies and best practices, developers can ensure effective unit testing of Spring AOP Aspects, resulting in robust and reliable systems. Sample Code: Testing a Logging Aspect To gain a better understanding of testing Spring AOP aspects, let's take a closer look at the sample code. We will analyze the testing process step-by-step, emphasizing important factors to take into account, and providing further information to ensure clarity. Let's assume that we will be writing unit tests for the following main class: Java import org.aspectj.lang.JoinPoint; import org.aspectj.lang.annotation.Aspect; import org.aspectj.lang.annotation.Before; import org.springframework.stereotype.Component; @Aspect @Component public class LoggingAspect { @Before("execution(* com.example.service.*.*(..))") public void logBefore(JoinPoint joinPoint) { System.out.println("Logging before " + joinPoint.getSignature().getName()); } } The LoggingAspect class logs method executions with a single advice method, logBefore, which executes before methods in the com.example.service package. The LoggingAspectTest class contains unit tests for the LoggingAspect. Let's examine each part of the test method testLogBefore() in detail: Java import org.aspectj.lang.JoinPoint; import org.aspectj.lang.Signature; import org.junit.jupiter.api.Test; import static org.mockito.Mockito.*; public class LoggingAspectTest { @Test void testLogBefore() { // Given LoggingAspect loggingAspect = new LoggingAspect(); // Creating mock objects JoinPoint joinPoint = mock(JoinPoint.class); Signature signature = mock(Signature.class); // Configuring mock behavior when(joinPoint.getSignature()).thenReturn(signature); when(signature.getName()).thenReturn("methodName"); // When loggingAspect.logBefore(joinPoint); // Then // Verifying interactions with mock objects verify(joinPoint, times(1)).getSignature(); verify(signature, times(1)).getName(); // Additional assertions can be added to ensure correct logging behavior } } In the above code, there are several sections that play a vital role in testing. Firstly, the Given section sets up the test scenario. We do this by creating an instance of the LoggingAspect and mocking the JoinPoint and Signature objects. By doing so, we can control the behavior of these objects during testing. Next, we create mock objects for the JoinPoint and Signature using the Mockito mocking framework. This allows us to simulate behavior without invoking real instances, providing a controlled environment for testing. We then use Mockito's when() method to specify the behavior of the mock objects. For example, we define that when thegetSignature() method of the JoinPoint is called, it should return the mock Signature object we created earlier. In the When section, we invoke the logBefore() method of the LoggingAspect with the mocked JoinPoint. This simulates the execution of the advice before a method call, which triggers the logging behavior. Finally, we use Mockito's verify() method to assert that specific methods of the mocked objects were called during the execution of the advice. For example, we verify that the getSignature() and getName() methods were called once each. Although not demonstrated in this simplified example, additional assertions can be added to ensure the correctness of the aspect's behavior. For instance, we could assert that the logging message produced by the aspect matches the expected format and content. Additional Considerations Testing pointcut expressions: Pointcut expressions define where advice should be applied within the application. Writing tests to verify the correctness of pointcut expressions ensures that the advice is applied to the intended join points. Testing aspect behavior: Aspects may perform more complex actions beyond simple logging. Unit tests should cover all aspects of the aspect's behavior to ensure its correctness, including handling method parameters, logging additional information, or interacting with other components. Integration testing: While unit tests focus on isolating aspects, integration tests may be necessary to verify the interactions between aspects and other components of the application, such as service classes or controllers. By following these principles and best practices, developers can create thorough and reliable unit tests for Spring AOP aspects, ensuring the stability and maintainability of their applications. Conclusion Unit testing Spring AOP aspects is crucial for reliable and correct aspect-oriented code. To create robust tests, isolate aspects, use mocking frameworks, test each advice separately, verify pointcut expressions, and assert expected behavior. Sample code provided as a starting point for Java applications. With proper testing strategies in place, developers can confidently maintain and evolve AOP-based functionalities in their Spring app.

By Reza Ganji

CORE

Vector Databases: Unlocking New Dimensions in Video Streaming

The Data Story At the core of every software application, from the simplest to the most complex, operating at scale to serve millions of users with low-latency requests, lies a foundational element: data. For over three decades, relational database management systems (RDBMS) have been at the forefront of this domain. These systems, from simply storing data in a table format consisting of rows for records and columns for attributes, have undergone significant advancements and innovations that have revolutionized structured data and semi-unstructured storage. Relational database models have established themselves as the foundation of structured data handling, are renowned for their reliability, and battle-tested their efficacy in supporting massive big data scales for enterprise applications. However, as we evolve deeper into the era of big data and artificial intelligence (AI), the limitations of traditional RDBMS in handling unstructured data, such as images, videos, audio, and natural language have become increasingly apparent. Enter the vector database, a cutting-edge innovation tailored for the age of AI and significantly change the recommendation systems. Unlike RDBMS, which excels in managing structured data, vector databases are designed to handle and query high-dimensional vector embeddings, a form of unstructured data representation that is central to modern machine learning algorithms. Introduction: Vector DB Vector embeddings allow complex data like text, images, and sounds to be transformed into numerical vectors, capturing the essence of the data in a way that machines can process. This transformation is crucial for tasks such as similarity search, recommendation systems, and natural language processing, where understanding the nuanced relationships between data points is key. Vector databases leverage specialized indexing and search algorithms to efficiently query these embeddings, enabling applications that were previously challenging or impossible with traditional RDBMS. Fundamental Difference of RDBMS and Vectors The application interacts with the database by executing various transactions and actions, which are stored in the form of rows and columns. When it comes to the vector database, the action might look a bit different. Below, you can see the different types of files, which will be read and processed by many types of AI models and create vector embeddings. Example in Action Consider the process of transforming a comprehensive movie database, such as IMDB, into a format where each movie is represented by vector embeddings and stored in a vector database. This transformation allows the database to leverage the power of vector embeddings to significantly enhance the user search experience. Because these vectors are organized within a three-dimensional space, search engineers can more efficiently perform queries across the movie database. This spatial organization not only streamlines the retrieval process but also enables the implementation of sophisticated search functionalities, such as finding movies with similar themes or genres, thereby creating a more intuitive and responsive search experience for users. Now, we will demonstrate in Python how to convert textual movie data, similar to the tables mentioned above, into vector representations using BERT (Bidirectional Encoder Representations from Transformers), a pre-trained deep learning model developed by Google. This process entails several crucial steps for transforming the text into a format that the model can process, followed by the extraction of meaningful embeddings. Let's break down each step. Step 1 Python #Import Libraries import sqlite3 from transformers import BertTokenizer, BertModel import torch sqlite3: This imports the SQLite3 library, which allows Python to interact with SQLite databases. It's used here to access a database containing IMDB movie information. from transformers import BertTokenizer, BertModel: These imports from the Hugging Face transformers library bring in the necessary tools to tokenize text data (BertTokenizer) and to load the pre-trained BERT model (BertModel) for generating vector embeddings. import torch: This imports PyTorch, a deep learning framework that BERT and many other models in the transformers library are built on. It's used for managing tensors, which are multi-dimensional arrays that serve as the basic building blocks of data for neural networks. Step 2 Python #Initialize Tokenizer and Model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased') tokenizer: This initializes the BERT tokenizer, configuring it to split input text into tokens that the BERT model can understand. The from_pretrained('bert-base-uncased') method loads a tokenizer trained in lowercase English text. model: This initializes the BERT model itself, also using the from_pretrained method to load a version trained in lowercase English. This model is what will generate the embeddings from the tokenized text. Step 3 Python # Connect to Database and Fetch Movie Data conn = sqlite3.connect('path/to/your/movie_database.db') cursor = conn.cursor() cursor.execute("SELECT name, genre, release_date, length FROM movies") movies = cursor.fetchall() conn = sqlite3.connect('path/to/your/movie_database.db'): Opens a connection to an SQLite database file that contains your movie data cursor = conn.cursor(): Creates a cursor object which is used to execute SQL commands through the connection cursor.execute(...): Executes an SQL command to select specific columns (name, genre, release date, length) from the movies table movies = cursor.fetchall(): Retrieves all the rows returned by the SQL query and stores them in the variable movies Step 4 Python #Convert Movie Data to Vector Embeddings movie_vectors = [] for movie in movies: movie_data = ', '.join(str(field) for field in movie) inputs = tokenizer(movie_data, return_tensors="pt", padding=True, truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) movie_vector = outputs.last_hidden_state[:, 0, :].numpy() movie_vectors.append(movie_vector) movie_vectors = []: Initializes an empty list to store the vector embeddings for each movie For loop: Iterates over each movie retrieved from the database movie_data = ', '.join(...): Concatenates the movie's details into a single string inputs = tokenizer(...): Uses the BERT tokenizer to prepare the concatenated string for the model, converting it into a tensor with torch.no_grad():: Temporarily disables gradient computation, which is unnecessary during inference (model.predict) outputs = model(**inputs): Feeds the tokenized input to the BERT model to get the embeddings movie_vector = ...: Extracts the embedding of the [CLS] token, which represents the entire input sequence movie_vectors.append(movie_vector): Adds the movie's vector embedding to the list Output movie_vectors: At the end of this script, you have a list of vector embeddings, one for each movie in your database. These vectors encapsulate the semantic information of the movies' names, genres, release dates, and durations in a form that machine learning models can work with. Conclusion In our example of vector database, movies such as "Inception" and "The Matrix" known for their action-packed, thought-provoking narratives, or "La La Land" and "Eternal Sunshine of the Spotless Mind," which explore complex romantic themes are transformed into high-dimensional vectors using BERT, a deep learning model. These vectors capture not just the overt categories like genre or release year, but also subtler thematic and emotional nuances encoded in their descriptions. Once stored in a vector database, these embeddings can be queried efficiently to perform similarity searches. When a user searches for a film with a particular vibe or thematic element, the streaming service can quickly identify and suggest films that are "near" the user's interests in the vector space, even if the user's search terms don't directly match the movie's title, genre, or other metadata. For instance, a search for "dream manipulation movies" might not only return "Inception" but also suggest "The Matrix," given their thematic similarities represented in the vector space. This method of storage and retrieval significantly enriches the user experience on streaming platforms, facilitating a discovery process that aligns content with both the user's interests and current mood. It’s designed to lead to "aha moments," where users uncover hidden gems, especially valuable when navigating the vast catalogs and offerings of streaming services. By detailing the creation and application of vector embeddings from textual movie data, we demonstrate the significant use of machine learning and vector databases in revolutionizing search capabilities and elevating the user experience in digital content ecosystems, particularly within streaming video services.

By Sai Mahesh vuppalapati

Visualizing Software Architecture With the C4 Model and Structurizr

What Is the C4 Model? The C4 model is a hierarchical framework designed to help software architects and developers visualize and communicate the essential aspects of software architecture in a clear and structured way. Unlike traditional diagramming approaches that often result in cluttered and overly complex diagrams, the C4 model focuses on simplicity and abstraction to convey architectural concepts effectively. The next question is which tool you use to create said diagrams. You can use Visio, draw.io, PlantUML, even PowerPoint, or whatever tool you normally use for creating diagrams. However, these tools do not check whether naming, relations, etc. are consistently used in the different diagrams. Besides that, it might be difficult to review new versions of diagrams because it is not clear which changes are made. In order to solve these problems, Simon Brown, the author of the C4 model, created Structurizr. What Is Structurizr? Structurizr allows you to create diagrams as code. Based on the code, Structurizr visualizes the diagrams for you and you can interact with the visualization. Because the diagrams are maintained in code, you can add them to your version control system (git), and changes in the diagrams are tracked and can be easily reviewed. In a previous article, some features of Structurizr are explored. Structurizr Lite was used, which supports only one workspace. However, if you have a more diverse system landscape, Structurizr Lite is not sufficient anymore. You will have multiple workspaces, one for every software system. You also probably want an overview of your entire system landscape. In this article, you will explore how you can use Structurizr to maintain not only the software architecture of one system but your entire system landscape as code. Sources used in this blog can be found at GitHub. Prerequisites Prerequisites for this blog are: Basic knowledge of the C4 model Basic knowledge of Docker Basic knowledge of Structurizr Linux is used — if you are using a different Operating System, you will need to adjust the commands accordingly Installation As mentioned before, Structurizr Lite cannot be used for this scenario. Instead, you need to install Structurizr on-premises. Create in the root of the repository a data directory. This directory will be mapped as a volume in the docker container. If you have executed the previous blog, ensure that you clean the data directory first. With Structurizr Lite, it is intended that you can edit files in this data directory, with Structurizr on-premises it is advised not to alter the files in the data directory. Structurizr on-premises should be run on a separate server and a normal user should not have access to the data directory anyway. Execute the following command from within the root of the repository: Shell $ docker run -it --rm -p 8080:8080 -v ./data:/usr/local/structurizr structurizr/onpremises Navigate in your browser to http://localhost:8080, log in with the default user structurizr and password password, and the Structurizr webpage is shown. Single Workspace First, let’s see how you can create a single workspace with Structurizr on-premises. Click New workspace, and an empty workspace is created. It is not possible anymore to edit files on your host machine, just like with Structurizr Lite. So, how can you upload your DSL files to the workspace? In order to do so, you need Structurizr CLI. At the moment of writing, v2024.02.22 is the latest version, which can be downloaded as a zip from GitHub. Unpack the zip file, and add the directory to your path. You will upload the latest version of the software system from the previous blog. The DSL is located in the workspaces/3-basic-styles directory. Navigate to this directory. To push the DSL to Structurizr, you will make use of the push command. The push command needs some parameters, which can be found in the settings of the Structurizr workspace. You need the information as shown under API details. Below this information, the parameters can easily be copied. Execute the following command, replacing the parameters for your situation: Shell $ structurizr.sh push -url http://localhost:8080/api -id 1 -key 2607de22-7ce0-4eb1-9f28-1e7e9979121a -secret 09528dfd-0c0a-4380-85cb-766b8da5e1dc -workspace workspace.dsl Pushing workspace 1 to http://localhost:8080/api - creating new workspace - parsing model and views from /<path to project directory>/MyStructurizrPlanet/workspaces/3-basic-styles/workspace.dsl - merge layout from remote: true - storing previous version of workspace in null - pushing workspace Getting workspace with ID 1 Putting workspace with ID 1 {"success":true,"message":"OK","revision":2} - finished If everything goes well, the DSL is pushed successfully. The System Context and Container diagram are now added to the workspace. Workspace Features In this section, some interesting features of Structurizr on-premises are shown. 5.1 Version Control Every upload automatically creates a new version. It is also possible to retrieve an older version. 5.2 Error Checking The Inspections in the left menu, gives you an overview of errors in your DSL. 5.3 Reviews When you open a diagram, you can create a review. When creating the review, you can choose which diagrams need to be reviewed, what kind of review you are requesting and whether unauthenticated access is allowed or not. The reviewer can add comments of course. Next to the Public review text, a link to a checklist is present which can help you executing the review. Create System Landscape Using DSL Only The above examples consist of diagrams for a single software system. Often, multiple software systems are used in an organization. These software systems interact with each other and thus form together a system landscape. Each team will be responsible for its own software system diagrams, but it is also necessary to have a diagram containing the larger picture. Let’s explore whether this is possible using Structurizr. You will be using an example based on the enterprise example provided at the Structurizr GitHub repository. The files can be found in workspaces/4-system-landscape. Create a new workspace via the UI, navigate to the 4-system-landscape directory, and push the customer-service DSL to this workspace. Shell $ structurizr.sh push -url http://localhost:8080/api -id 2 -key f24fe705-a508-4f8d-9cf7-3fc7b323f293 -secret 02c6597f-c750-47e0-9b88-f6e26fccdf38 -workspace customer-service/workspace.dsl In the same way, create a workspace for the invoice-service and the order-service. Push the corresponding DSL to each workspace. A separate system-landscape DSL is present, which uses a plugin to create the relationships between the software systems. Create a workspace for this DSL and push it. Shell $ structurizr.sh push -url http://localhost:8080/api -id 5 -key cb18cabb-61c7-4c3a-a58e-2e97ff0fa285 -secret a638aa99-73cd-427d-8188-3788e678129f -workspace system-landscape/workspace.dsl This creates the system landscape overview. However, two issues are encountered with this view: It is not possible to click on the Order Service f.e. in order that it opens the software system diagram for the Order Service. The DSL of the Customer Service does not define the relationship with Order Service and Invoice Service as can be seen in the diagram below. It would be nice if this inconsistency was reported one way or the other. I asked a question about this on GitHub and used the answer to create a solution that can be found in the following paragraphs. Create System Landscape Using Java The solution to the problem with the absence of links to the different services is to make use of the Java Structurizr library. With this library, you have much more control to achieve the desired functionality. I used the source code from the example in the Structurizr repository and added it to the directory: workspaces/5-system-landscape. The pom file contains the necessary dependencies to run the code, and the maven-assembly-plugin is added to create a fat jar. The code executes the following steps: Create a workspace for the system landscape. Create workspaces for each service. Generate the system landscape by parsing the workspaces metadata, create the necessary relationships, add a link to the services and create a view for the system landscape. Execute the following command from within the workspaces/5-system-landscape directory in order to build the fat jar. Shell $ mvn clean package Run the code and an error occurs. Shell $ java -jar target/mystructurizrplanet-1.0-SNAPSHOT-jar-with-dependencies.jar Mar 02, 2024 11:41:12 AM com.structurizr.api.AdminApiClient createWorkspace SEVERE: com.structurizr.api.StructurizrClientException: The API key is not configured for this installation - please refer to the documentation Exception in thread "main" com.structurizr.api.StructurizrClientException: com.structurizr.api.StructurizrClientException: The API key is not configured for this installation - please refer to the documentation at com.structurizr.api.AdminApiClient.createWorkspace(AdminApiClient.java:109) at com.mydeveloperplanet.mystructurizrplanet.CreateSystemLandscape.main(CreateSystemLandscape.java:30) Caused by: com.structurizr.api.StructurizrClientException: The API key is not configured for this installation - please refer to the documentation at com.structurizr.api.AdminApiClient.createWorkspace(AdminApiClient.java:105) ... 1 more To use the Java library, you need to use an API key. This API key is disabled by default. To enable it, you need to add a file structurizr.properties to your data directory. In the properties file, you set the API key to its bcrypt encoded value. Properties files structurizr.apiKey=$2a$10$ekjju1h3fC1y2YAln7wqxuJ.q0gBjQoFPX/Wvmzr.L5aIdoqvUIwa Add read permissions to the file. Shell $ chmod o+r data/structurizr.properties Restart the Docker container and execute the jar file again. Shell $ java -jar target/mystructurizrplanet-1.0-SNAPSHOT-jar-with-dependencies.jar Mar 02, 2024 11:50:03 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 7 Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: Putting workspace with ID 7 Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: {"success":true,"message":"OK","revision":2} Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 8 Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: Putting workspace with ID 8 Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: {"success":true,"message":"OK","revision":2} Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 9 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: Putting workspace with ID 9 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: {"success":true,"message":"OK","revision":2} Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 1 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 2 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 3 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 4 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 5 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 6 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 7 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 8 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 9 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 6 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: Putting workspace with ID 6 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: {"success":true,"message":"OK","revision":2} If you open the system landscape workspace, it is now possible to double-click one of the services, and you will be navigated to the corresponding service. Great, but there are some caveats to mention: This source code always creates new workspaces every time you run it. This is just an example of what is possible using the Java library. If you want to update existing workspaces, you will need to alter the source code for this purpose. The source code contains a hardcoded API key in plain text. You should not do this in a production environment. Validate Relationships Is it possible to validate the relationships using the Java library? Yes, it is. An example of the source code can be found in directory workspaces/6-validate-relationships. This code will validate offline whether the DSL contains the correct relationships. It is only intended to prove that the validation can be done. For using this in production, the source code needs to be made more robust. Build the code and run it. Shell $ mvn clean package $ java -jar target/validaterelationships-1.0-SNAPSHOT-jar-with-dependencies.jar missing relation in CustomerService {2 | Order Service | } ---[Manages customer data using]---> {4 | Customer Service | } missing relation in CustomerService {3 | Invoice Service | } ---[Gets customer data from]---> {4 | Customer Service | } The validation finds the two errors in the Customer Service. Add the relationships to the Customer Service DSL. Plain Text model { !extend customerService { api = container "Customer API" database = container "Customer Database" api -> database "Reads from and writes to" orderService -> customerService "Gets customer data from" invoiceService -> customerService "Gets customer data from" } } Build the code and run it. The errors are gone and the relationships are visible in the Customer Service if you run the code from the previous paragraph. Conclusion Structurizr offers many features to get a grip on your software architecture. It also allows you to generate a system landscape and to implement several customizations, e.g. custom validation checks. You need to learn the Java Structurizr library, but the learning curve is not very steep.

By Gunter Rotsaert

CORE

Analyze and Debug Quarkus-Based AWS Lambda Functions With X-Ray

Serverless architectures have emerged as a paradigm-shifting approach to building, fast, scalable, and cost-efficient applications. While serverless architectures provide unparalleled flexibility, they also introduce new challenges in terms of monitoring and troubleshooting. In this article, we'll explore how Quarkus integrates with AWS X-Ray and how using a Jakarta CDI Interceptor can keep your code clean while adding custom instrumentation. Quarkus and AWS Lambda Quarkus is a Java-based framework tailored for GraalVM and HotSpot, which results in an amazingly fast boot time while having an incredibly low memory footprint. It offers near-instant scale-up and high-density memory utilization, which can be very useful for container orchestration platforms like Kubernetes or Serverless runtimes like AWS Lambda. Building AWS Lambda Functions can be as easy as starting a Quarkus project, adding the quarkus-amazon-lambda dependency, and defining your AWS Lambda Handler function. XML <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-amazon-lambda</artifactId> </dependency> An extensive guide on how to develop AWS Lambda Functions with Quarkus can be found in the official Quarkus AWS Lambda Guide. Enabling X-Ray for Your Lambda Functions Quarkus provides out-of-the-box support for X-Ray, but you will need to add a dependency to your project and configure some settings to make it work with GraalVM/native compiled Quarkus applications. Let's first start by adding the quarkus-amazon-lambda-xray dependency. XML  <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-amazon-lambda-xray</artifactId> </dependency> Don't forget to enable tracing for your Lambda function otherwise, it won't work. An example of doing that is by setting the tracing argument to active within your AWS CDK code. Java function = Function.Builder.create(this, "feed-parsing-function") ... .memorySize(512) .tracing(Tracing.ACTIVE) .runtime(Runtime.PROVIDED_AL2023) .logRetention(RetentionDays.ONE_WEEK) .build(); After the deployment of your function and a function invocation, you should be able to see the X-Ray traces from within the Cloudwatch interface. By default, it will show you some basic timing information for your function like the initialization and the invocation duration. Adding More Instrumentation Now that the dependencies are in place and tracing is enabled for our function, we can enrich the traces in X-Ray by leveraging the X-Ray SDKs TracingIntercepter . For instance, for the SQS and DynamoDB client, you can explicitly set the intercepter inside the application.properties file. Plain Text quarkus.dynamodb.async-client.type=aws-crt quarkus.dynamodb.interceptors=com.amazonaws.xray.interceptors.TracingInterceptor quarkus.sqs.async-client.type=aws-crt quarkus.sqs.interceptors=com.amazonaws.xray.interceptors.TracingInterceptor After putting these properties in place, redeploying, and executing the function, the TracingIntercepter will wrap around each API call to SQS and DynamoDB and store the actual trace information alongside the trace. This is very useful for debugging purposes as it will allow you to validate your code and check for any mistakes. Requests to AWS Services are part of the pricing model, so if you make a mistake in your code and you make too many calls, it can become quite costly. Custom Subsegments With the AWS SDK TracingInterceptor configured, we get information about the calls to the AWS APIs, but what if we want to see information about our own code or remote calls to services outside of AWS? The Java SDK for X-Ray supports the concept of adding custom subsegments to your traces. You can add subsegments to a trace by adding a few lines of code to your own business logic as you can see in the following code snippet. Java public void someMethod(String argument) { // wrap in subsegment Subsegment subsegment = AWSXRay.beginSubsegment("someMethod"); try { // Your business logic } catch (Exception e) { subsegment.addException(e); throw e; } finally { AWSXRay.endSubsegment(); } } Although this is trivial to do, it will become quite messy if you have a lot of methods you want to apply tracing to. This isn't ideal, and it would be better if we didn't have to mix our own code with the X-Ray instrumentation. Quarkus and Jakarta CDI Interceptors The Quarkus programming model is based on the Lite version of the Jakarta Contexts and Dependency Injection 4.0 specification. Besides dependency injection, the specification also describes other features like: Lifecycle Callbacks — A bean class may declare lifecycle @PostConstruct and @PreDestroy callbacks. Interceptors — Used to separate cross-cutting concerns from business logic. Decorators — Similar to interceptors, but because they implement interfaces with business semantics, they are able to implement business logic. Events and Observers — Beans may also produce and consume events to interact in a completely decoupled fashion. As mentioned, CDI Interceptors are used to separate cross-cutting concerns from business logic. As tracing is a cross-cutting concern, this sounds like a great fit. Let's take a look at how we can create an interceptor for our AWS X-Ray instrumentation. How to Create an Interceptor for AWS X-Ray Instrumentation We start with defining our interceptor binding, which we will call XRayTracing. Interceptor bindings are intermediate annotations that may be used to associate interceptors with target beans. Java package com.jeroenreijn.aws.quarkus.xray; import jakarta.annotation.Priority; import jakarta.interceptor.InterceptorBinding; import java.lang.annotation.Retention; import static java.lang.annotation.RetentionPolicy.RUNTIME; @InterceptorBinding @Retention(RUNTIME) @Priority(0) public @interface XRayTracing { } The next step is to define the actual Interceptor logic, which is the code that will add the additional X-Ray instructions for creating the subsegment and wrapping it around our business logic. Java package com.jeroenreijn.aws.quarkus.xray; import com.amazonaws.xray.AWSXRay; import jakarta.interceptor.AroundInvoke; import jakarta.interceptor.Interceptor; import jakarta.interceptor.InvocationContext; @Interceptor @XRayTracing public class XRayTracingInterceptor { @AroundInvoke public Object tracingMethod(InvocationContext ctx) throws Exception { AWSXRay.beginSubsegment("## " + ctx.getMethod().getName()); try { return ctx.proceed(); } catch (Exception e) { AWSXRay.getCurrentSubsegment().addException(e); throw e; } finally { AWSXRay.endSubsegment(); } } } An important part of the interceptor is the @AroundInvoke annotation, which means that this interceptor code will be wrapped around the invocation of our own business logic. Now that we've defined both our interceptor binding and our interceptor, it's time to start using it. Every method that we want to create a subsegment for can now be annotated with the @XRayTracing annotation. Java @XRayTracing public SyndFeed getLatestFeed() { InputStream feedContent = getFeedContent(); return getSyndFeed(feedContent); } @XRayTracing public SyndFeed getSyndFeed(InputStream feedContent) { try { SyndFeedInput feedInput = new SyndFeedInput(); return feedInput.build(new XmlReader(feedContent)); } catch (FeedException | IOException e) { throw new RuntimeException(e); } } That looks much better. Pretty clean, if I say so myself. Based on the hierarchy of subsegments for a trace, X-Ray will be able to show a nested tree structure with the timing information. Closing Thoughts The integration between Quarkus and X-Ray is quite simple to enable. The developer experience is really good out of the box with defining the interceptors on a per-client basis. With the help of CDI interceptors, you can keep your code clean without worrying too much about X-Ray-specific code inside your business logic. An alternative to building your own Interceptor might be to start using AWS PowerTools for Lambda (Java). Powertools for Java is a great way to boost your developer productivity, but it can be used for more than X-Ray, so I’ll save it for another post.

By Jeroen Reijn

CORE