Beyond the Resume: Practical Interview Techniques for Hiring Great DevSecOps Engineers
Java vs. Scala: Comparative Analysis for Backend Development in Fintech
Enterprise AI
Artificial intelligence (AI) has continued to change the way the world views what is technologically possible. Moving from theoretical to implementable, the emergence of technologies like ChatGPT allowed users of all backgrounds to leverage the power of AI. Now, companies across the globe are taking a deeper dive into their own AI and machine learning (ML) capabilities; they’re measuring the modes of success needed to become truly AI-driven, moving beyond baseline business intelligence goals and expanding to more innovative uses in areas such as security, automation, and performance.In DZone’s Enterprise AI Trend Report, we take a pulse on the industry nearly a year after the ChatGPT phenomenon and evaluate where individuals and their organizations stand today. Through our original research that forms the “Key Research Findings” and articles written by technical experts in the DZone Community, readers will find insights on topics like ethical AI, MLOps, generative AI, large language models, and much more.
Open Source Migration Practices and Patterns
MongoDB Essentials
In the landscape of software development, efficiently processing large datasets has become paramount, especially with the advent of multicore processors. The Java Stream interface provided a leap forward by enabling sequential and parallel operations on collections. However, fully exploiting modern processors' capabilities while retaining the Stream API’s simplicity posed a challenge. Responding to this, I created an open-source library aimed at experimenting with a new method of parallelizing stream operations. This library diverges from traditional batching methods by processing each stream element in its own virtual thread, offering a more refined level of parallelism. In this article, I will talk about the library and its design. It is more detail than you need simply to use the library. The library is available on GitHub and also as a dependency in Maven Central. <dependency> <groupId>com.github.verhas</groupId> <artifactId>vtstream</artifactId> <version>1.0.1</version> </dependency> Check out the actual version number on the Maven Central site or on GitHub. This article is based on the version 1.0.1 of the library. Parallel Computing Parallel computing is not a new thing. It has been around for decades. The first computers were executing tasks in batches, hence in a serial way, but soon the idea of time-sharing came into the picture. The first time-sharing computer system was installed in 1961 at the Massachusetts Institute of Technology (MIT). This system, known as the Compatible Time-Sharing System (CTSS), allowed multiple users to log into a mainframe computer simultaneously, working in what appeared to be a private session. CTSS was a groundbreaking development in computer science, laying the foundation for modern operating systems and computing environments that support multitasking and multi-user operations. This was not a parallel computing system, per se. CTSS was designed to run on a single mainframe computer, the IBM 7094, at MIT. It has one CPU, thus the code was executed serially. Today we have multicore processors and multiple processors in a single computer. I edit this article on a computer that has 10 processor cores. To execute tasks concurrently, there are two plus-one approaches: Define the algorithm in a concurrent way; for example, reactive programming. Define the algorithm the good old sequential way and let some program decide on the concurrency. Mix the two. When we’re programming some reactive algorithm or defined streams as in Java 8 stream, we help the application execute the tasks concurrently. We define small parts and their interdependence so that the environment can decide which parts can be executed concurrently. The actual execution is done by the framework and when we are using Virtual threads, or Threads (perhaps processes) The difference is in the scheduler: who makes the decision about which processor should execute which task the next moment. In the case of threads or processes, the executor is the operating system. The difference between thread and process execution is that threads belonging to the same process share the same memory space. Processes have their own memory space. Similarly, virtual threads belonging to the same operating system thread share the same stack. Transitioning from processes to virtual threads, we encounter a reduction in shared resources and, consequently, overhead. This makes virtual threads significantly less costly compared to traditional threads. While a machine might support thousands of threads and processes, it can accommodate millions of virtual threads. In defining a task with streams, you are essentially outlining a series of operations to be performed on multiple elements. The decision to execute these operations concurrently rests with the framework, which may or may not choose to do so. However, Stream in Java serves as a high-level interface, offering us the flexibility to implement a version that facilitates concurrent execution of tasks. Implementing Streams in Threads The library contains two primary classes located in the main directory, namely: ThreadedStream Command ThreadedStream is the class responsible for implementing the Stream interface. public class ThreadedStream<T> implements Stream<T> { The Command class encompasses nested classes that implement functionality for stream operations. public static class Filter<T> extends Command<T, T> { public static class AnyMatch<T> extends Command<T, T> { public static class FindFirst<T> extends Command<T, T> { public static class FindAny<T> extends Command<T, T> { public static class NoOp<T> extends Command<T, T> { public static class Distinct<T> extends Command<T, T> { public static class Skip<T> extends Command<T, T> { public static class Peek<T> extends Command<T, T> { public static class Map<T, R> extends Command<T, R> { All the mentioned operators are intermediaries. The terminal operators are implemented within the ThreadedStream class, which converts the threaded stream into a regular stream before invoking the terminal operator on this stream. An example of this approach is the implementation of the collect method. @Override public <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator, BiConsumer<R, R> combiner) { return toStream().collect(supplier, accumulator, combiner); } The source of the elements is also a stream, which means that the threading functionality is layered atop the existing stream implementation. This setup allows for the utilization of streams both as data sources and as destinations for processed data. Threading occurs in the interim, facilitating the parallel execution of intermediary commands. Therefore, the core of the implementation—and its most intriguing aspect—lies in the construction of the structure and its subsequent execution. We will first examine the structure of the stream data and then explore how the class executes operations utilizing virtual threads. Stream Data Structure The ThreadedStream class maintains its data through the following member variables: private final Command<Object, T> command; private final ThreadedStream<?> downstream; private final Stream<?> source; private long limit = -1; private boolean chained = false; command represents the Command object to be executed on the data. It might be a no-operation (NoOp) command or null if there is no specific command to execute. downstream variable points to the preceding ThreadedStream in the processing chain. A ThreadedStream retrieves data either from the immediate downstream stream, if available, or directly from the source if it’s the initial in the chain. source is the initial data stream. It remains defined even when a downstream is specified, in which scenario the source for both streams remains identical. limit specifies the maximum number of elements this stream is configured to process. Implementing a limit requires a workaround, as stream element processing starts immediately rather than being "pulled" by the terminal operation. Consequently, infinite streams cannot feed into a ThreadedStream. chained is a boolean flag indicating whether the stream is part of a processing chain. When true, it signifies that there is a subsequent stream dependent on this one’s output, preventing execution in cases of processing forks. This mechanism mirrors the approach found in JVM’s standard stream implementations. Stream Build The stream data structure is constructed dynamically as intermediary operations are chained together. The initiation of this process begins with the creation of a starting element, achieved by invoking the static method threaded on the ThreadedStream class. An exemplary line from the unit tests illustrates this initiation: final var k = ThreadedStream.threaded(Stream.of(1, 2, 3)); This line demonstrates the creation of a ThreadedStream instance named k, initialized with a source stream consisting of the elements 1, 2, and 3. The threaded method serves as the entry point for transforming a regular stream into a ThreadedStream, setting the stage for further operations that can leverage virtual threads for concurrent execution. When an intermediary operation is appended, it results in the creation of a new ThreadedStream instance. This new instance designates the preceding ThreadedStream as its downstream. Moreover, the source stream for this newly formed ThreadedStream remains identical to the source stream of its predecessor. This design ensures a seamless flow of data through the chain of operations, facilitating efficient processing in a concurrent environment. For example, when we call: final var t = k.map(x -> x * 2); The map method is called, which is: public <R> ThreadedStream<R> map(Function<? super T, ? extends R> mapper) { return new ThreadedStream<>(new Command.Map<>(mapper), this); } It generates a new ThreadedStream object wherein the preceding ThreadedStream acts as the downstream. Additionally, the command field is populated with a new instance of the Command class, configured with the specified mapper function. This process effectively constructs a linked list composed of ThreadedStream objects. This linked structure comes into play during the execution phase, triggered by invoking one of the terminal operations on the stream. This method ensures that each ThreadedStream in the sequence can process data in a manner that supports concurrent execution, leveraging the capabilities of virtual threads for efficient data processing. It’s crucial to understand that the ThreadedStream class refrains from performing any operations on the data until a terminal operation is called. Once execution commences, it proceeds concurrently. To facilitate independent execution of these operations, ThreadedStream instances are designed to be immutable. They are instantiated during the setup phase and undergo a single mutation when they are linked together. During execution, these instances serve as a read-only data structure, guiding the flow of operation execution. This immutability ensures thread safety and consistency throughout concurrent processing, allowing for efficient and reliable stream handling. Stream Execution The commencement of stream execution is triggered by invoking a terminal operation. These terminal operations are executed by first transforming the threaded stream back into a conventional stream, upon which the terminal operation is then performed. The collect method serves as a prime example of this process, as previously mentioned. This method is emblematic of how terminal operations are seamlessly integrated within the ThreadedStream framework, bridging the gap between concurrent execution facilitated by virtual threads and the conventional stream processing model of Java. By converting the ThreadedStream into a standard Stream, it leverages the rich ecosystem of terminal operations already available in Java, ensuring compatibility and extending functionality with minimal overhead. @Override public <R> R collect(Supplier<R> supplier, BiConsumer<R, ? super T> accumulator, BiConsumer<R, R> combiner) { return toStream().collect(supplier, accumulator, combiner); } The toStream() method represents the core functionality of the library, marking the commencement of stream execution by initiating a new virtual thread for each element in the source stream. This method differentiates between ordered and unordered execution through two distinct implementations: toUnorderedStream() toOrderedStream() The choice between these methods is determined by the isParallel() status of the source stream. It’s worth noting that executing an ordered stream in parallel can be advantageous. Although the results may be produced out of order, parallel processing accelerates the operation. Ultimately, care must be taken to collect the results in a sequential manner, despite the unordered processing potentially yielding higher efficiency by allowing elements to be passed to the resulting stream as soon as they become available, eliminating the need to wait for the preceding elements. The implementation of toStream() is designed to minimize an unnecessary collection of elements. Elements are forwarded to the resulting stream immediately upon readiness in the case of unordered streams, and in sequence upon the readiness and previous element’s forwarding in ordered streams. In subsequent sections, we delve into the specifics of these two execution methodologies. Unordered Stream Execution Unordered execution promptly forwards results as they become prepared. This approach employs a concurrent list for result storage, facilitating simultaneous result deposition by threads and retrieval by the target stream, preventing excessive list growth. The iteration over the source stream initiates the creation of a new virtual thread for each element. When a limit is imposed, it’s applied directly to the source stream, diverging from traditional stream implementations where limit acts as a genuine intermediary operation. The implementation of the unordered stream execution is as follows: private Stream<T> toUnorderedStream() { final var result = Collections.synchronizedList(new LinkedList<Command.Result<T>>()); final AtomicInteger n = new AtomicInteger(0); final Stream<?> limitedSource = limit >= 0 ? source.limit(limit) : source; limitedSource.forEach( t -> { Thread.startVirtualThread(() -> result.add(calculate(t))); n.incrementAndGet(); }); return IntStream.range(0, n.get()) .mapToObj(i -> { while (result.isEmpty()) { Thread.yield(); } return result.removeFirst(); }) .filter(f -> !f.isDeleted()) .peek(r -> { if (r.exception() != null) { throw new ThreadExecutionException(r.exception()); } }) .map(Command.Result::result); } The counter n is utilized to tally the number of threads initiated. The resulting stream is constructed using this counter by mapping the numbers 0 to n-1 to the elements of the concurrent list as they become ready. If the list lacks elements at any point, the process pauses, awaiting the availability of the next element. This waiting mechanism is implemented within a loop that incorporates a yield call to prevent unnecessary CPU consumption by halting the loop’s execution until it’s necessary to proceed. This efficient use of resources ensures that the system remains responsive and minimizes the potential for performance degradation during the execution of parallel tasks. Ordered Stream Execution Ordered stream execution introduces a more nuanced approach compared to its unordered counterpart. It incorporates a local class named Task, designed specifically to await the readiness of a particular thread. Similar to the unordered execution, a concurrent list is utilized, but with a key distinction: the elements of this list are the tasks themselves, rather than the results. This list is populated by the code responsible for thread creation, rather than by the threads themselves. The presence of a fully populated list eliminates the need for a separate counter to track thread initiation. Consequently, the process transitions to sequentially waiting on each thread as dictated by their order in the list, thereby ensuring that each thread’s output is relayed to the target stream in a sequential manner. This method meticulously maintains the ordered integrity of the stream’s elements, despite the concurrent nature of their processing, by aligning the execution flow with the sequence of the original stream. private Stream<T> toOrderedStream() { class Task { Thread workerThread; volatile Command.Result<T> result; /** * Wait for the thread calculating the result of the task to be finished. This method is blocking. * @param task the task to wait for */ static void waitForResult(Task task) { try { task.workerThread.join(); } catch (InterruptedException e) { task.result = deleted(); } } } final var tasks = Collections.synchronizedList(new LinkedList<Task>()); final Stream<?> limitedSource = limit >= 0 ? source.limit(limit) : source; limitedSource.forEach( sourceItem -> { Task task = new Task(); tasks.add(task); task.workerThread = Thread.startVirtualThread(() -> task.result = calculate(sourceItem)); } ); return tasks.stream() .peek(Task::waitForResult) .map(f -> f.result) .peek(r -> { if (r.exception() != null) { throw new ThreadExecutionException(r.exception()); } } ) .filter(r -> !r.isDeleted()).map(Command.Result::result); } Summary and Takeaway Having explored an implementation that facilitates the parallel execution of stream operations, it’s noteworthy that this library is open source, offering you the flexibility to either utilize it as is or reference its design and implementation to craft your own version. The detailed exposition provided here aims to shed light on both the conceptual underpinnings and practical aspects of the library’s construction. However, it’s important to acknowledge that the library has not undergone extensive testing. It received a review from Istvan Kovacs, a figure with considerable expertise in concurrent programming. Despite this, his review does not serve as an absolute assurance of the library’s reliability and absence of bugs. Consequently, should you decide to integrate this library into your projects, it’s advised to proceed with caution and conduct thorough testing to ensure it meets your requirements and standards. The library is provided "as is," with the understanding that users adopt it at their own risk, underpinning the importance of due diligence in its deployment.
TL; DR: Scrum Master Interview Questions on Creating Value With Scrum If you are looking to fill a position for a Scrum Master (or agile coach) in your organization, you may find the following 12th set of the Scrum Master interview questions useful to identify the right candidate. They are derived from my eighteen years of practical experience with XP as well as Scrum, serving both as Product Owner and Scrum Master as well as interviewing dozens of Scrum Master candidates on behalf of my clients. So far, this Scrum Master interview guide has been downloaded more than 27,000 times. Scrum Master Interview Questions: How We Organized Questions and Answers Scrum has proven time and again to be the most popular framework for software development. Given that software is eating the world, a seasoned Scrum Master is even nowadays, given the frosty economic climate of Spring 2024, in high demand. And that demand causes the market entry of new professionals from other project management branches, probably believing that reading one or two Scrum books will be sufficient, which makes any Scrum Master interview a challenging task. The Scrum Master Interview Questions ebook provides both questions as well as guidance on the range of suitable answers. These should allow an interviewer to dive deep into a candidate’s understanding of Scrum and her agile mindset. However, please note: The answers reflect the personal experience of the authors and may not be valid for every organization: what works for organization A may not work in organization B. There are no suitable multiple-choice questions to identify a candidate’s agile mindset, given the complexity of applying “Agile” to any organization. The authors share a holistic view of agile practices: Agility covers the whole arch from product vision (our grand idea on how to improve mankind’s fate) to product discovery (what to build) plus product delivery (how to build it). Creating Value as a Scrum Master The following questions and responses are designed to draw out a nuanced understanding of a candidate’s experience and skills in applying agile product development principles to improve customer value and economics of delivery and enhance predictability in various organizational contexts to address the current economic climate: Question 74: Resistant Industries How have you tailored Scrum practices to elevate customer value, particularly in industries resistant to Agile practices? Background: This question probes the candidate’s ability to adapt Scrum principles to sectors where Agile is not the norm, emphasizing customer-centric product development. It seeks insights into the candidate’s innovative application of Scrum to foster customer engagement and satisfaction, even in challenging environments. It is also an opportunity for the candidate to build confidence in the interview process and rapport with the interviewers. Acceptable Answer: An excellent response would detail a scenario where the candidate navigated resistance by demonstrating Agile’s benefits through small-scale pilot projects or workshops. They would probably even describe specific adjustments to Scrum events or artifacts to align with industry-specific constraints, culminating in enhanced customer feedback loops and ultimately leading to product features that directly addressed customer pain points. Question 75: Reducing Product Costs Please describe a scenario in which you significantly reduced production costs through strategic Scrum application without compromising the product’s quality. Background: This delves into the candidate’s proficiency in supporting the optimization of a team’s capacity allocation and streamlining workflows within the Scrum framework to cut costs. It’s about balancing maintaining high-quality standards and achieving cost effectiveness through Agile practices. Acceptable Answer: Look for a narrative where the candidate identifies wasteful practices or bottlenecks in the development process and implements targeted Scrum practices to address them. Examples include refining the Product Backlog to focus on high-impact features, improving cross-functional collaboration to reduce dependencies, or leveraging automated testing to speed up lead time while preserving quality standards. The answer should highlight the candidate’s analytical problem-solving approach and ability to help the team accept a cost-conscious entrepreneurial stance to solving customer problems without sacrificing quality. Question 76: Improving Predictability in a Volatile Market Please share an experience where you used Scrum to improve predictability in product delivery in a highly volatile market. Background: This question explores the candidate’s capability to use Scrum to enhance delivery predictability amidst market fluctuations. It’s about leveraging Agile’s flexibility to adapt to changing priorities while maintaining a steady pace of delivery. Acceptable Answer: The candidate should recount an instance where they utilized Scrum artifacts and events to better forecast delivery timelines in a shifting landscape. This example might involve adjusting Sprint lengths, prioritizing Product Backlog items more dynamically, or involving closer stakeholder engagement to reassess priorities during Sprint Reviews or other alignment-creating opportunities, for example, User Story Mapping sessions. The story should underscore their strategic thinking in balancing flexibility with predictability and their communication skills in setting realistic expectations with stakeholders. Question 77: Successfully Promoting Scrum Despite Skepticism How have you promoted the value of Scrum in organizations where the leadership and middle management met Agile practices with skepticism? Background: This question examines the candidate’s ability to champion Scrum in environments resistant to change. Such an environment requires a deep understanding of Agile principles and strong advocacy and education skills. Acceptable Answer: Successful candidates will describe a multifaceted strategy that includes educating leadership on Agile benefits, organizing interactive workshops to demystify Scrum practices, and securing quick wins to demonstrate value. They might also discuss establishing a community of practice to sustain Agile learning and sharing success stories to build momentum. The answer should reflect their perseverance, persuasive communication, and their role as a change agent. (Learn more about successful stakeholder communication tactics during transformations here.) Question 78: Effective Change Please describe your approach to conducting effective Sprint Retrospectives that drive continuous improvement. Background: The question probes the candidate’s techniques for facilitating Retrospectives that genuinely contribute to team growth and product enhancement. It seeks to understand how they ensure these events are productive, inclusive, and actionable. Acceptable Answer: A comprehensive response would outline a structured approach to Retrospectives, including preparation, facilitation, follow-up practices, and valuable enhancements to the framework, for example, embracing the idea of a directly responsible individual to drive change the team considers beneficial. The candidate might mention using a variety of formats to keep the sessions engaging, techniques to ensure all team members contribute, and strategies for prioritizing action items. They should emphasize their method for tracking improvements over time to ensure accountability and demonstrate the Retrospective’s impact on the team’s performance and morale. Again, this question allows the candidates to distinguish themselves in the core competence of any Scrum Master. Question 79: Balancing Demands with Principles Please explain how you’ve balanced stakeholder demands with Agile principles to help the Scrum team prioritize work effectively. Background: This question seeks insights into the candidate’s ability to support the Scrum team in general and the Product Owner in particular in navigating competing demands, aligning stakeholder expectations with Agile principles to focus the team’s efforts on the most impactful work from the customers’ perception and the organization’s perspective. Acceptable Answer: The candidate should provide an example of supporting the Product Owner by employing prioritization techniques, such as User Story Mapping, in collaboration with stakeholders to align on priorities that offer the most value, leading to the creation of valuable Product Goals and roadmaps in the process. They should highlight their negotiation skills, ability to facilitate consensus, and adeptness at transparent communication to manage expectations and maintain a sustainable pace for the team. Question 80: Boring Projects and Motivation How do you sustain team motivation and engagement in long-term projects with high levels of task repetition? Background: This question explores the candidate’s strategies for keeping the team engaged and motivated through the monotony of prolonged projects or repetitive tasks. While we all like to work on cutting-edge technology all the time, everyday operations often comprise work that we consider less glamorous yet grudgingly accept as valuable, too. The question gauges a candidate’s ability to uphold enthusiasm and maintain high performance in a potentially less motivating environment. Acceptable Answer: Expect the candidate to discuss innovative approaches like introducing gamification elements to mundane tasks, rotating roles within the team to provide fresh challenges, and setting up regular skill-enhancement workshops. They might also mention the importance of celebrating small wins, giving recognition, for example, Kudo cards, and ensuring that the team’s work aligns with individual growth goals. The response should underline their commitment to maintaining a positive and stimulating work environment, even under challenging circumstances. Question 81: Onboarding New Team Members Please describe your experience integrating a new team member into an established Scrum team, ensuring a seamless transition and maintaining team productivity. Background: This question assesses the candidate’s approach to onboarding new team members to minimize disruption and maximize integration speed. This approach is critical for maintaining an existing team’s cohesive and productive dynamics, acknowledging that Scrum teams will regularly change composition. Acceptable Answer: Look for answers detailing a structured and inclusive onboarding plan that includes, for example: Mentorship programs A buddy system Clear documentation of team norms and expectations, such as a working agreement and a Definition of Done Team activities Gradual immersion into the Scrum team’s projects through pair programming or shadowing The candidate should highlight the importance of fostering an inclusive team culture that welcomes questions and supports new members in their learning journey, ensuring they feel valued and part of the team from day one. Question 82: Conflict Resolution How do you approach conflict resolution within a Scrum team or between the team and stakeholders to ensure continued progress and collaboration? Background: Conflicts are inevitable in any team dynamic. This question probes the candidate’s skills in navigating and resolving disagreements in a way that strengthens the team and stakeholder relationships rather than undermining them. Acceptable Answer: The candidate should describe their ability to act as a neutral mediator, actively listen to understand all perspectives, and facilitate problem-solving sessions focusing on interests rather than positions. They might also discuss creating forums for open dialogue, such as conflict-themed Retrospectives, and the importance of fostering a culture of trust and psychological safety where conflicts can be aired constructively. The response should convey their adeptness at turning conflicts into opportunities for growth and deeper understanding. However, the candidate should also make clear that not all disputes among team members may be solvable and that, once all team-based options have been exhausted, the Scrum Master needs to ask for management support to bring the conflict to a conclusion. Question 83: Scaling Scrum? Please reflect on a time when scaling Scrum across multiple teams presented significant challenges. How did you address these challenges to ensure the organization’s success with its Agile transformation? Background: Scaling Agile practices is a complex endeavor that can highlight organizational impediments and resistance. This question delves into the candidate’s experience in successfully scaling Scrum, ensuring alignment and cohesion among multiple teams, and helping everyone see the value in a transformation. Acceptable Answer: This open question allows candidates to address their familiarity with frameworks like LeSS or Nexus or share their opinion on whether SAFe is useful. Moreover, at a philosophical level, it opens the discussion of whether “Agile” is scalable at all, given that most scaling frameworks apply more processes to the issue. Also, the objecting opinion points to the need to descale the organization by empowering those closest to the problems to decide within the given constraints and governance rules. The candidate should emphasize the importance of maintaining a shared vision and goals, creating communities of practice to share knowledge and best practices, and addressing cultural barriers to change. They should also reflect on the importance of executive sponsorship, the strategic engagement of key stakeholders to champion and support the scaling effort, and the necessity of a failure culture. How To Use The Scrum Master Interview Questions Scrum has always been a hands-on business, and to be successful in this, a candidate needs to have a passion for getting her hands dirty. While the basic rules are trivial, getting a group of individuals with different backgrounds, levels of engagement, and personal agendas to form and perform as a team is a complex task. (As always, you might say, when humans and communication are involved.) Moreover, the larger the organization is, the more management levels there are, the more likely failure is lurking around the corner. The questions are not necessarily suited to turning an inexperienced interviewer into an agile expert. But in the hands of a seasoned practitioner, they can help determine what candidate has worked in the agile trenches in the past.
Alternative Text: This comic depicts an interaction between two characters and is split into four panes. In the upper left pane, Character 1 enters the scene with a slightly agitated expression and comments to Character 2, "Your PR makes SQL injection possible!" Character 2, who is typing away at their computer, responds happily, "Wow, that wasn't even my intention," as if Character 1 has paid them a compliment. In the upper right pane, Character 1, now with an increasingly agitated expression, says, "I mean, your code is vulnerable." Character 2, now standing and facing Character 1, is almost proudly embarrassed at what they take as positive feedback and replies, "Stop praising me, I get shy." In the lower-left pane, Character 1, now shown with sharp teeth and a scowl, points a finger at Character 2 and shouts clearly, "Vulnerable is bad!" Character 2 seems shocked at this statement, standing with their mouth and eyes wide open. In the lower right and final pane of the comic, Character 2, smiling once again, replies with the comment, "At least it can do SQL injection!" Character 1 stares back at Character 2 with a blank expression.
People initially became interested in blockchain several years ago after learning about it as a decentralized digital ledger. It supports transparency because no one can change information stored on it once added. People can also watch transactions as they happen, further enhancing visibility. But how does blockchain support the integrity of cloud-stored data? 3 Ways Blockchain Supports the Integrity of Cloud-Stored Data 1. Protecting and Facilitating the Sharing of Medical Records Technological advancements have undoubtedly improved the ease of sharing medical records between providers. When patients go to new healthcare facilities, all involved parties can easily see those individuals’ histories, treatments, test results, and more. Such records keep everyone updated about what’s happened to patients, which significantly reduces the likelihood of redundancies and confusion that could extend a health management timeline. Cloud computing has also accelerated information-sharing efforts within healthcare and other industries. It allows medical professionals to access and collaborate through scalable platforms. Many healthcare workers also appreciate how they can access cloud apps from anywhere. That convenience supports physicians who must travel for continuing medical education events, travel nurses, surgeons who split their time between multiple hospitals, and others who often work from numerous locations. However, despite these cloud computing benefits, a security-related downside is platforms use a centralized infrastructure to allow record sharing across users. That characteristic leaves cloud tools open to data breaches. In one case, researchers proposed addressing this shortcoming with a blockchain architecture to authenticate users and enable opportunities for sharing medical records securely. The group prioritized blockchain due to its immutability while seeking to create a system that allowed patients and their providers to share and store medical records securely. The researchers also wanted to design something that was not at risk of data loss or other failures. The researchers implemented so-called “special recognition keys” to identify medical-related specifics, such as identifying doctors, patients, and hospitals. When testing their system, some metrics studied included the time to complete a transaction and how well the communication-related attributes performed. The outcomes suggested the researchers’ approach worked far better than existing solutions. 2. Improving Access Control Data breaches can be costly, catastrophic events. Although there’s no single solution for preventing them, people can make meaningful progress by focusing on access control. One of the most convenient things about the cloud is it allows all authorized users to access content regardless of their location. However, as the number of people engaging with a cloud platform increases, so does the risk of compromised credentials that could allow hackers to enter networks and wreak havoc. Many corporate leaders have prioritized cloud-first strategies. That approach can strengthen cybersecurity because service providers have numerous security features to supplement internal measures. Additionally, cloud-based backup capabilities facilitate faster data recovery if cyberattacks occur. However, research suggests some access control practices used by cloud administrators have significant shortcomings that could make cyberattacks more likely. For example, one study about access management for cloud platforms found 49% of administrators store passwords in a spreadsheet. That’s a huge security risk for many reasons, but it also highlights the need for better password hygiene practices. Fortunately, the blockchain is well-positioned to solve this problem. In one example, researchers developed a blockchain system that uses attribute-based encryption technology to improve how cloud users access content. The setup also contains an audit contract that dynamically manages who can use the cloud and when. The team’s creation built a fine-grained and searchable system that maintained access control by strengthening cloud security and getting the desired results without excessive computing power. Results also showed this system increased storage capacity. When the group performed a security analysis on their blockchain creation, they found it stopped chosen-plaintext attacks and cybersecurity breaches based on guessed keywords. A theoretical examination and associated experiments suggested this tool worked better from a computing power and storage efficiency perspective than comparable alternatives. 3. Curbing Emerging Technologies’ Potential Threats Even as new technologies show tremendous progress and excite people about the future, some individuals specifically investigate how they could harm others through technological advancements. Developments associated with ChatGPT and other generative AI tools are excellent examples. Indeed, these chatbots can save people time by assisting them with tasks such as idea generation or outline creation. However, because these tools create believable-sounding paragraphs in seconds, some cybercriminals use generative artificial intelligence (genAI) chatbots to write phishing emails much faster than before. It’s easy to imagine the ramifications of a cybercriminal who writes a convincing phishing message and uses it to access someone’s cloud-stored information. ChatGPT runs on a cloud platform built by OpenAI, which created the chatbot. A lesser-known issue affecting data integrity is OpenAI uses interactions with the tool to train future versions of the algorithms. People can opt out of having their conversations become part of the training, but many people haven’t or don’t know the process for doing it. As workers eagerly tested ChatGPT and similar tools, some committed potential security breaches without realizing it. Consider if a web developer enters a proprietary code string into ChatGPT and asks the tool for help debugging it. That seemingly minor decision could result in sensitive information becoming part of training data and no longer being carefully protected by the developer’s employer. Some leaders quickly established rules for appropriate usage or banned generative AI tools to address these threats. A February 2024 study also showed some workers kept entering sensitive information when using ChatGPT despite knowing the associated risks. It’s still unclear how the blockchain will support data integrity for people using cloud-based generative AI tools, but many professionals are upbeat about the potential. Conclusion: Using Blockchain for Cloud Data Protection Entities ranging from government agencies to e-commerce stores use cloud platforms daily. These options are incredibly convenient because they eliminate geographical barriers and allow people to use them through an active internet connection anywhere in the world. However, many cloud tools store sensitive data, such as health records or payment details. Since cloud platforms hold such a wealth of information, hackers will likely continue targeting them. Although most cloud providers have built-in security features, cybercriminals continually seek ways to circumvent such protections. The examples here show why the blockchain is an excellent candidate for much-needed additional safeguards.
The essential mathematics for both Artificial intelligence (AI) and quantum computing are foundational to understanding and advancing these cutting-edge fields. In AI, concepts like linear algebra, calculus, probability theory, and optimization are pivotal for modeling data, training machine learning algorithms, and making predictions. Similarly, in quantum computing, these mathematical pillars are indispensable for representing quantum states, designing quantum algorithms, and analyzing quantum phenomena. Whether it's optimizing neural networks or harnessing the power of quantum superposition, a solid grasp of these mathematical principles is crucial for pushing the boundaries of artificial intelligence and quantum computing alike. Complex Numbers Complex numbers, which consist of a real and imaginary part (a+ib), and complex arithmetic and functions are fundamental to quantum mechanics. They allow for the representation of quantum states and the mathematical operations performed on them. In AI, complex numbers have also found applications in areas like neural networks and signal processing. A complex number Linear Algebra Linear algebra, including concepts like vectors, matrices, linear transformations, and eigenvalues/eigenvectors, is crucial for both quantum computing and many AI techniques. It provides the mathematical framework for representing and manipulating the states and operators in quantum systems, as well as the data structures and algorithms used in AI. Calculus and Optimization Calculus and optimization are crucial for training and tuning AI models, as well as for understanding the dynamics of quantum systems. The key concepts that need basic understanding are differentiation and integration, gradient-based optimization techniques, and variational methods. Additionally, a good understanding of convex optimization is an add-on in the context of optimization algorithms and loss minimization. Refer to Convex Optimization by Boyd and Vandenberghe. Mathematics for AI and Quantum Hilbert Spaces Quantum mechanics utilizes the mathematical structure of Hilbert spaces, which generalize the concepts of vectors and linear algebra to infinite dimensions. This allows for the representation of quantum states as vectors in a Hilbert space. Some AI models, such as those based on kernel methods, also make use of Hilbert space structures. Probability and Statistics Both quantum computing and AI rely heavily on probability theory and statistical methods. Quantum mechanics describes the probabilistic nature of measurements, while many AI algorithms, like Bayesian networks and reinforcement learning, are built on probabilistic foundations. Group Theory and Representation Theory Symmetry groups, unitary transformations, and irreducible representations are advanced mathematical concepts that are important for understanding the underlying structure of quantum systems and some quantum algorithms. Conclusion While the depth of understanding required may vary, a solid grasp of these core mathematical areas is essential for both advancing AI, including deep learning, and developing quantum computing technologies. The essential mathematics for both AI and quantum computing share several key concepts. Linear algebra serves as a cornerstone, enabling the representation of data and quantum states through vectors and matrices. Probability theory underpins both fields, facilitating the understanding of uncertainty in AI models and the probabilistic nature of quantum phenomena. Optimization techniques play a vital role in training machine learning models and optimizing quantum algorithms. Additionally, concepts from calculus provide the mathematical framework for gradient-based optimization and understanding quantum dynamics. Together, these mathematical foundations form the basis for advancing research and innovation in both AI and quantum computing domains.
Wireshark, the free, open-source packet sniffer and network protocol analyzer, has cemented itself as an indispensable tool in network troubleshooting, analysis, and security (on both sides). This article delves into the features, uses, and practical tips for harnessing the full potential of Wireshark, expanding on aspects that may have been glossed over in discussions or demonstrations. Whether you're a developer, security expert, or just curious about network operations, this guide will enhance your understanding of Wireshark and its applications. Introduction to Wireshark Wireshark was initially developed by Eric Rescorla and Gerald Combs, and designed to capture and analyze network packets in real-time. Its capabilities extend across various network interfaces and protocols, making it a versatile tool for anyone involved in networking. Unlike its command-line counterpart, tcpdump, Wireshark's graphical interface simplifies the analysis process, presenting data in a user-friendly "proto view" that organizes packets in a hierarchical structure. This facilitates quick identification of protocols, ports, and data flows. The key features of Wireshark are: Graphical User Interface (GUI): Eases the analysis of network packets compared to command-line tools Proto view: Displays packet data in a tree structure, simplifying protocol and port identification Compatibility: Supports a wide range of network interfaces and protocols Browser Network Monitors FireFox and Chrome contain a far superior network monitor tool built into them. It is superior because it is simpler to use and works with secure websites out of the box. If you can use the browser to debug the network traffic you should do that. In cases where your traffic requires low-level protocol information or is outside of the browser, Wireshark is the next best thing. Installation and Getting Started To begin with Wireshark, visit their official website for the download. The installation process is straightforward, but attention should be paid to the installation of command-line tools, which may require separate steps. Upon launching Wireshark, users are greeted with a selection of network interfaces as seen below. Choosing the correct interface, such as the loopback for local server debugging, is crucial for capturing relevant data. When debugging a Local Server (localhost), use the loopback interface. Remote servers will probably fit with the en0 network adapter. You can use the activity graph next to the network adapter to identify active interfaces for capture. Navigating Through Noise With Filters One of the challenges of using Wireshark is the overwhelming amount of data captured, including irrelevant "background noise" as seen in the following image. Wireshark addresses this with powerful display filters, allowing users to hone in on specific ports, protocols, or data types. For instance, filtering TCP traffic on port 8080 can significantly reduce unrelated data, making it easier to debug specific issues. Notice that there is a completion widget on top of the Wireshark UI that lets you find out the values more easily. In this case, we filter by port tcp.port == 8080 which is the port used typically in Java servers (e.g., Spring Boot/tomcat). But this isn't enough as HTTP is more concise. We can filter by protocol by adding http to the filter which narrows the view to HTTP requests and responses as shown in the following image. Deep Dive Into Data Analysis Wireshark excels in its ability to dissect and present network data in an accessible manner. For example, HTTP responses carrying JSON data are automatically parsed and displayed in a readable tree structure as seen below. This feature is invaluable for developers and analysts, providing insights into the data exchanged between clients and servers without manual decoding. Wireshark parses and displays JSON data within the packet analysis pane. It offers both hexadecimal and ASCII views for raw packet data. Beyond Basic Usage While Wireshark's basic functionalities cater to a wide range of networking tasks, its true strength lies in advanced features such as ethernet network analysis, HTTPS decryption, and debugging across devices. These tasks, however, may involve complex configuration steps and a deeper understanding of network protocols and security measures. There are two big challenges when working with Wireshark: HTTPS decryption: Decrypting HTTPS traffic requires additional configuration but offers visibility into secure communications. Device debugging: Wireshark can be used to troubleshoot network issues on various devices, requiring specific knowledge of network configurations. The Basics of HTTPS Encryption HTTPS uses the Transport Layer Security (TLS) or its predecessor, Secure Sockets Layer (SSL), to encrypt data. This encryption mechanism ensures that any data transferred between the web server and the browser remains confidential and untouched. The process involves a series of steps including handshake, data encryption, and data integrity checks. Decrypting HTTPS traffic is often necessary for developers and network administrators to troubleshoot communication errors, analyze application performance, or ensure that sensitive data is correctly encrypted before transmission. It's a powerful capability in diagnosing complex issues that cannot be resolved by simply inspecting unencrypted traffic or server logs. Methods for Decrypting HTTPS in Wireshark Important: Decrypting HTTPS traffic should only be done on networks and systems you own or have explicit permission to analyze. Unauthorized decryption of network traffic can violate privacy laws and ethical standards. Pre-Master Secret Key Logging One common method involves using the pre-master secret key to decrypt HTTPS traffic. Browsers like Firefox and Chrome can log the pre-master secret keys to a file when configured to do so. Wireshark can then use this file to decrypt the traffic: Configure the browser: Set an environment variable (SSLKEYLOGFILE) to specify a file where the browser will save the encryption keys. Capture traffic: Use Wireshark to capture the traffic as usual. Decrypt the traffic: Point Wireshark to the file with the pre-master secret keys (through Wireshark's preferences) to decrypt the captured HTTPS traffic. Using a Proxy Another approach involves routing traffic through a proxy server that decrypts HTTPS traffic and then re-encrypts it before sending it to the destination. This method might require setting up a dedicated decryption proxy that can handle the TLS encryption/decryption: Set up a decryption proxy: Tools like Mitmproxy or Burp Suite can act as an intermediary that decrypts and logs HTTPS traffic. Configure network to route through proxy: Ensure the client's network settings route traffic through the proxy. Inspect Traffic: Use the proxy's tools to inspect the decrypted traffic directly. Integrating tcpdump With Wireshark for Enhanced Network Analysis While Wireshark offers a graphical interface for analyzing network packets, there are scenarios where using it directly may not be feasible due to security policies or operational constraints. tcpdump, a powerful command-line packet analyzer, becomes invaluable in these situations, providing a flexible and less intrusive means of capturing network traffic. The Role of tcpdump in Network Troubleshooting tcpdump allows for the capture of network packets without a graphical user interface, making it ideal for use in environments with strict security requirements or limited resources. It operates under the principle of capturing network traffic to a file, which can then be analyzed at a later time or on a different machine using Wireshark. Key Scenarios for tcpdump Usage High-security environments: In places like banks or government institutions where running network sniffers might pose a security risk, tcpdump offers a less intrusive alternative. Remote servers: Debugging issues on a cloud server can be challenging with Wireshark due to the graphical interface; tcpdump captures can be transferred and analyzed locally. Security-conscious customers: Customers may be hesitant to allow third-party tools to run on their systems; tcpdump's command-line operation is often more palatable. Using tcpdump Effectively Capturing traffic with tcpdump involves specifying the network interface and an output file for the capture. This process is straightforward but powerful, allowing for detailed analysis of network interactions: Command syntax: The basic command structure for initiating a capture involves specifying the network interface (e.g., en0 for wireless connections) and the output file name. Execution: Once the command is run, tcpdump silently captures network packets. The capture continues until it's manually stopped, at which point the captured data can be saved to the specified file. Opening captures in Wireshark: The file generated by tcpdump can be opened in Wireshark for detailed analysis, utilizing Wireshark's advanced features for dissecting and understanding network traffic. The following shows the tcpdump command and its output: $ sudo tcpdump -i en0 -w output Password: tcpdump: listening on en, link-type EN10MB (Ethernet), capture size 262144 bytes ^C3845 packets captured 4189 packets received by filter 0 packets dropped by kernel Challenges and Considerations Identifying the correct network interface for capture on remote systems might require additional steps, such as using the ifconfig command to list available interfaces. This step is crucial for ensuring that relevant traffic is captured for analysis. Final Word Wireshark stands out as a powerful tool for network analysis, offering deep insights into network traffic and protocols. Whether it's for low-level networking work, security analysis, or application development, Wireshark's features and capabilities make it an essential tool in the tech arsenal. With practice and exploration, users can leverage Wireshark to uncover detailed information about their networks, troubleshoot complex issues, and secure their environments more effectively. Wireshark's blend of ease of use with profound analytical depth ensures it remains a go-to solution for networking professionals across the spectrum. Its continuous development and wide-ranging applicability underscore its position as a cornerstone in the field of network analysis. Combining tcpdump's capabilities for capturing network traffic with Wireshark's analytical prowess offers a comprehensive solution for network troubleshooting and analysis. This combination is particularly useful in environments where direct use of Wireshark is not possible or ideal. While both tools possess a steep learning curve due to their powerful and complex features, they collectively form an indispensable toolkit for network administrators, security professionals, and developers alike. This integrated approach not only addresses the challenges of capturing and analyzing network traffic in various operational contexts but also highlights the versatility and depth of tools available for understanding and securing modern networks. Videos Wireshark tcpdump
Executive engineers are crucial in directing a technology-driven organization’s strategic direction and technological innovation. As a staff engineer, it is essential to understand the significance of executive engineering. It goes beyond recognizing the hierarchy within an engineering department to appreciating the profound impact these roles have on individual contributors’ day-to-day technical work and long-term career development. Staff engineers are deep technical experts who focus on solving complex technical challenges and defining architectural pathways for projects. However, their success is closely linked to the broader engineering strategy set by the executive team. This strategy determines staff engineers' priorities, technologies, and methodologies. Therefore, aligning executive decisions and technical implementation is essential for the engineering team to function effectively and efficiently. Executive engineers, such as Chief Technology Officers (CTOs) and Vice Presidents (VPs) of Engineering, extend beyond mere technical oversight; they embody the bridge between cutting-edge engineering practices and business outcomes. They are tasked with anticipating technological trends and aligning them with the business’s needs and market demands. In doing so, they ensure that the engineering teams are not just functional but are proactive agents of innovation and growth. For staff engineers, the strategies and decisions made at the executive level deeply influence their work environment, the tools they use, the scope of their projects, and their approach to innovation. Thus, understanding and engaging with executive engineering is essential for staff engineers who aspire to contribute significantly to their organizations and potentially advance into leadership roles. In this dynamic, the relationship between staff and executive engineers becomes a critical axis around which much of the company’s success revolves. This introduction aims to explore why executive engineering is vital from the staff engineer’s perspective and how it shapes an organization's technological and operational landscape. Hierarchal Structure of Engineering Roles In the hierarchical structure of engineering roles, understanding each position’s unique responsibilities and contributions—staff engineer, engineering manager, and engineering executive—is crucial for effective career progression and organizational success. Staff Engineers are primarily responsible for high-level technical problem-solving and creating architectural blueprints. They guide projects technically but usually only indirectly manage people. Engineering Managers oversee teams, focusing on managing personnel and ensuring that projects align with the organizational goals. They act as the bridge between the technical team and the broader business objectives. Engineering Executives, such as CTOs or VPs of Engineering, shape the strategic vision of the technology department and ensure its alignment with the company’s overarching goals. They are responsible for high-level decisions about the direction of technology and infrastructure, often dealing with cross-departmental coordination and external business concerns. The connection between a staff engineer and an engineering executive is pivotal in crafting and executing an effective strategy. While executives set the strategic direction, staff engineers are instrumental in grounding this strategy with their deep technical expertise and practical insights. This collaboration ensures that the strategic initiatives are visionary and technically feasible, enabling the organization to innovate while maintaining robust operational standards. The Engineering Executive’s Primer: Impactful Technical Leadership Will Larson’s book, The Engineering Executive’s Primer: Impactful Technical Leadership, is an essential guide for those aspiring to or currently in engineering leadership roles. With his extensive experience as a CTO, Larson offers a roadmap from securing an executive position to mastering the complexities of technical and strategic leadership in engineering. Key Insights From the Book Transitioning to Leadership Larson discusses the nuances of obtaining an engineering executive role, from negotiation to the critical first steps post-hire. This guidance is vital for engineers transitioning from technical to executive positions, helping them avoid common pitfalls. Strategic Planning and Communication The book outlines how to run engineering planning processes and maintain clear organizational communication effectively. These skills are essential for aligning various engineering activities with company goals and facilitating inter-departmental collaboration. Operational Excellence Larson delves into managing crucial meetings, performance management systems, and new engineers’ strategic hiring and onboarding. These processes are fundamental to maintaining a productive engineering team and fostering a high-performance culture. Personal Management Understanding the importance of managing one’s priorities and energy is another book focus, which is often overlooked in technical fields. Larson provides strategies for staying effective and resilient in the face of challenges. Navigational Tools for Executive Challenges From mergers and acquisitions to interacting with CEOs and peer executives, the book provides insights into the broader corporate interactions an engineering executive will navigate. Conclusion The engineering executive’s role is pivotal in setting a vision that integrates with the organization’s strategic objectives. Still, the symbiotic relationship with staff engineers brings this vision to fruition. Larson’s The Engineering Executive’s Primer is an invaluable resource for engineers at all levels, especially those aiming to bridge the gap between deep technical expertise and impactful leadership. Through this primer, engineering leaders can learn to manage, inspire, and drive technological innovation within their companies.
In today's digital world, mobile apps play a crucial role in our daily lives. They serve a range of purposes from transactions and online shopping to social interactions and work efficiency, making them essential. However, with their widespread use comes an increased risk of security threats. Ensuring the security of an app requires an approach from development methods to continuous monitoring. Prioritizing security is key to safeguarding your users and upholding the trustworthiness of your app. Remember, security is an ongoing responsibility rather than a one-time task. Stay updated on emerging risks. Adjust your security strategies accordingly. The following sections discuss the importance of security measures and outline the steps for developing a mobile app. What Is Mobile App Security and Why Does It Matter? Mobile app security involves practices and precautions to shield apps from vulnerabilities, attacks, and unauthorized entry. It encompasses elements such as data safeguarding, authentication processes, authorization mechanisms, secure coding principles, and encryption techniques. The Significance of Ensuring Mobile App Security User Trust: Users expect their personal information to be kept safe when using apps. A breach would damage trust and reputation. Compliance With Laws and Regulations: Most countries have laws to protect data such as GDPR, which organizations are required to adhere to. Not following these regulations could result in penalties. Financial Consequences: Security breaches can lead to losses to costs, compensations, and recovery efforts. Sustaining Business Operations: A compromised app has the potential to disrupt business functions and affect revenue streams. Guidelines for Developing a Secure Mobile App Creating an application entails various crucial steps aimed at fortifying the app against possible security risks. The following is a detailed roadmap for constructing an app. 1. Recognize and Establish Security Requirements Prior to commencing development, outline the security prerequisites specific to your app. Take into account aspects like authentication, data storage, encryption, and access management. 2. Choose a Reliable Cloud Platform Choose a cloud service provider that offers security functionalities. Popular choices may include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). 3. Ensure Safe Development Practices • Educate developers on coding methods to steer clear of vulnerabilities such as SQL injection, cross-site scripting (XSS), and insecure APIs. • Conduct routine code reviews to detect security weaknesses at an early stage. 4. Implement Authentication and Authorization Measures • Employ robust authentication methods like factor authentication (MFA) for heightened user login security. • Utilize Role-Based Access Control (RBAC) to assign permissions based on user roles limiting access to functionalities. 5. Safeguard Data Through Encryption • Utilize HTTPS for communication between the application and server for in-transit encryption. • Encrypt sensitive data stored in databases or files for at-rest encryption. 6. Ensure the Security of APIs • Validate input by employing API keys. Set up rate limiting for API security. • Securely handle user authentication and authorization with OAuth and OpenID Connect protocols. 7. Conduct Regular Security Assessments • Perform penetration testing periodically to identify vulnerabilities. • Leverage automated scanning tools to detect security issues efficiently. 8. Monitor Activities and Respond to Incidents • Keep track of behavior in time to spot any irregularities or anomalies promptly. Having a plan for handling security incidents is crucial. What Is Involved in Mobile Application Security Testing? Implementing robust security testing methods is crucial for ensuring the integrity and resilience of mobile applications. Static Application Security Testing (SAST), Dynamic Application Security Testing (DAST), and Mobile App Penetration Testing are fundamental approaches that help developers identify and address security vulnerabilities. These methodologies not only fortify the security posture of apps but also contribute to maintaining user trust and confidence. Let's delve deeper into each of these testing techniques to understand their significance in securing mobile apps effectively. Static Application Security Testing (SAST) This method involves identifying security vulnerabilities in applications during the development stage. It entails examining the application's source code or binary without executing it, which helps detect security flaws in the development process. SAST scans the codebase for vulnerabilities like injection flaws, authentication, insecure data storage, and other typical security issues. Automated scanning tools are used to analyze the code and pinpoint problems such as hardcoded credentials, improper input validation, and exposure of data. By detecting security weaknesses before deployment, SAST allows developers to make necessary improvements to enhance the application's security stance. Integrating SAST into the development workflow aids in meeting industry standards and regulatory mandates. In essence, SAST strengthens mobile application resilience against cyber threats by protecting information and upholding user confidence in today's interconnected environment. Dynamic Application Security Testing (DAST) This method is used to test the security of apps while they are running, assessing their security in time. Unlike analysis that looks at the app's source code, DAST evaluates how the app behaves in a setting. DAST tools emulate real-world attacks by interacting with the app as a user would, sending different inputs and observing the reactions. By analyzing how the app operates during runtime, DAST can pinpoint security issues such as injection vulnerabilities, weak authentication measures, and improper error handling. DAST mainly focuses on uncovering vulnerabilities that may not be obvious from examining the code. Some common techniques used in DAST include fuzz testing, where the app is bombarded with inputs to reveal vulnerabilities, and penetration testing conducted by hackers to exploit security flaws. By using DAST, developers can detect vulnerabilities that malicious actors could exploit to compromise an app's confidentiality, integrity, or availability of data. Integrating DAST into mobile app development allows developers to find and fix security weaknesses before deployment, thereby reducing the chances of security breaches and strengthening application security. Mobile App Penetration Testing This proactive approach is employed to pinpoint weaknesses and vulnerabilities in apps. Simulating real-world attacks is part of assessing the security stance of an application and its underlying infrastructure. Penetration tests can be conducted manually by cybersecurity experts or automated using specialized tools and software. The testing procedure includes several phases: Reconnaissance: Gather details about the application's structure, features, and possible attack paths. Vulnerability Scanning: Use automated tools to pinpoint security vulnerabilities in the app. Exploitation: Attempt to exploit identified vulnerabilities to gain entry or elevate privileges. Post-Exploitation: Document the consequences of breaches and offer recommendations for mitigation. Mobile App Penetration Testing helps organizations uncover and rectify security weaknesses and reduces the risk of data breaches, financial harm, and damage to reputation. By evaluating the security of their apps, companies can enhance their security standing and maintain the confidence of their clients. By combining the above methodologies, Mobile App Security Testing helps identify and rectify security vulnerabilities in the development process, ensuring that mobile apps are strong, resilient, and protected against cybersecurity risks. This helps safeguard user data and maintain user trust in today's interconnected world. Common Mobile App Security Threats Data Leakage Data leakage refers to the unauthorized exposure of sensitive information stored or transmitted via mobile apps. This poses significant risks for both individuals and companies, including identity theft, financial scams, damage to reputation, and legal ramifications. For individuals, data leaks can compromise details such as names, addresses, social security numbers, and financial information, impacting their privacy and security. Moreover, leaks of health or personal data can tarnish someone's reputation and well-being. On the business front, data leaks can result in financial losses, regulatory fines, and erosion of customer trust. Breaches involving customer data can harm a company's image, leading to customer loss, which can affect revenue and competitiveness. Failure to secure sensitive information can also lead to severe consequences and penalties, especially in regulated industries like healthcare, finance, or e-commerce. Therefore, implementing robust security measures is crucial to protect information and maintain user trust in mobile apps. Man-in-the-Middle (MITM) Attacks Man-in-the-Middle (MITM) Attacks happen when someone secretly intercepts and alters communication between two parties. In the context of apps, this involves a hacker inserting themselves between a user's device and the server, allowing them to spy on shared information. MITM attacks are risky, potentially leading to data theft and identity fraud as hackers can access login credentials, financial transactions, and personal data. To prevent MITM attacks, developers should use encryption methods such as HTTPS/TLS, while users should avoid public Wi-Fi networks and consider using VPNs for added security. Remaining vigilant and taking precautions are essential in protecting against MITM attacks. Injection Attacks Injection attacks pose significant security risks to apps as malicious actors exploit vulnerabilities to insert and execute unauthorized code. Common examples include SQL injection and JavaScript injection. During these attacks, perpetrators tamper with input fields to inject commands, gaining unauthorized access to data or disrupting app functions. Injection attacks can lead to data breaches, data tampering, and system compromise. To prevent these attacks, developers should enforce input validation, use secure queries, and adhere to secure coding practices. Regular security assessments and tests are crucial for pinpointing and addressing vulnerabilities in apps. Insecure Authentication Insecure authentication methods can lead to vulnerabilities, opening the door to entry and data breaches. Common issues include weak passwords, absence of two-factor authentication, and improper session management. Cyber attackers exploit these weaknesses to impersonate users, access data unlawfully, or seize control of user accounts. This compromised authentication system jeopardizes user privacy, data accuracy, and accessibility, posing risks to individuals and organizations. To address this risk, developers should implement security measures such as two-factor authentication and session tokens. Regular updates and enhancements to security protocols are crucial to stay ahead of evolving threats. Data Storage Ensuring secure data storage is crucial in today's technology landscape, especially for apps. It's vital to protect sensitive information and financial records to prevent unauthorized access and data breaches. Secure data storage includes encrypting information both at rest and in transit using encryption methods and secure storage techniques. Moreover, setting up access controls, authentication procedures, and conducting regular security checks are essential to uphold the confidentiality and integrity of stored data. By prioritizing these data storage practices and security protocols, developers can ensure that user information remains shielded from risks and vulnerabilities. Faulty Encryption Faulty encryption and flawed security measures can lead to vulnerabilities within apps, putting sensitive data at risk of unauthorized access and misuse. If encryption algorithms are weak or not implemented correctly, encrypted data could be easily decoded by actors. Poor key management, like storing encryption keys insecurely, worsens these threats. Additionally, security protocols lacking proper authentication or authorization controls create opportunities for attackers to bypass security measures. The consequences of inadequate encryption and security measures can be substantial and can include data breaches, financial losses, and a decline in user trust. To address these risks effectively, developers should prioritize encryption algorithms, secure management practices, and thorough security protocols in their mobile apps. The Unauthorized Use of Device Functions The misuse of device capabilities within apps presents a security concern, putting user privacy and device security at risk. Malicious apps or attackers could exploit weaknesses to access features like the camera, microphone, or GPS without permission leading to privacy breaches. This unauthorized access may result in monitoring, unauthorized audio/video recording, and location tracking, compromising user confidentiality. Additionally, unauthorized use of device functions could allow attackers to carry out activities such as sending premium SMS messages or making calls that incur costs or violate privacy. To address this issue effectively, developers should enforce permission controls. Carefully evaluate third-party tools and integrations to prevent misuse of device capabilities. Reverse Engineering and Altering Code Altering the code within apps can pose security risks and put the app's integrity and confidentiality at risk. Bad actors might decompile the code to find weaknesses, extract data, or alter its functions for malicious purposes. These activities allow attackers to bypass security measures, insert malicious code, or create vulnerabilities leading to data breaches, unauthorized access, and financial harm. Moreover, tampering with code can enable hackers to circumvent licensing terms or protections for developers' intellectual property, impacting their revenue streams. To effectively address this threat, developers should employ techniques like code obfuscation to obscure the code's meaning and make it harder for attackers to decipher. They should also establish safeguards during the app's operation and regularly audit the codebase for any signs of tampering or unauthorized modifications. These proactive measures help mitigate the risks associated with code alteration and maintain the app's security and integrity. Third-Party Collaborations Third-party collaborations in apps bring both advantages and risks. While connecting with third-party services can improve features and user satisfaction, it also exposes the app to security threats and privacy issues. Thoroughly evaluating third-party partners, following security protocols, and regularly monitoring are steps to manage these risks. Neglecting to assess third-party connections can lead to data breaches, compromised user privacy, and harm to the app's reputation. Therefore, developers should be cautious and diligent when entering into collaborations with parties to safeguard the security and credibility of their apps. Social Manipulation Strategies Social manipulation strategies present a security risk for apps leveraging human behavior to mislead users and jeopardize their safety. Attackers can use methods like emails deceptive phone calls or misleading messages to deceive users into sharing sensitive data like passwords or financial information. Moreover, these tactics can influence user actions like clicking on links or downloading apps containing malware. Such strategies erode user trust and may lead to data breaches, identity theft, or financial scams. To address this, it's important for users to understand social manipulation tactics and be cautious when dealing with suspicious requests, messages, or links in mobile apps. Developers should also incorporate security measures like two-factor authentication and anti-phishing tools to safeguard users against engineering attacks. Conclusion Always keep in mind that security is an ongoing responsibility and not a one-time job. Stay informed about threats and adapt your security measures accordingly. Developing an app can be crucial for safeguarding user data establishing trust and averting security breaches.
This article is part of a series exploring a workshop guiding you through the open source project Fluent Bit, what it is, a basic installation, and setting up the first telemetry pipeline project. Learn how to manage your cloud-native data from source to destination using the telemetry pipeline phases covering collection, aggregation, transformation, and forwarding from any source to any destination. The previous article in this series saw us building our first telemetry pipelines with Fluent Bit. In this article, we continue onwards with some more specific use cases that pipelines solve. You can find more details in the accompanying workshop lab. Let's get started with this use case. Before we get started it's important to review the phases of a telemetry pipeline. In the diagram below we see them laid out again. Each incoming event goes from input to parser to filter to buffer to routing before they are sent to their final output destination(s). For clarity in this article, we'll split up the configuration into files that are imported into a main fluent bit configuration file that we'll name workshop-fb.conf. Parsing Multiple Events One of the more common use cases for telemetry pipelines is having multiple event streams producing data that creates the situation that keys are not unique if parsed without some cleanup. Let's illustrate how Fluent Bit can easily provide us with a means to both parse and filter events from multiple input sources to clean up any duplicate keys before sending onward to a destination. To provide an example, we start with an inputs.conf file containing a configuration using the dummy plugin to generate two types of events, both using the same key to cause confusion if we try querying without cleaning them up first: # This entry generates a success message. [INPUT] Name dummy Tag event.success Dummy {"message":"true 200 success"} # This entry generates an error message. [INPUT] Name dummy Tag event.error Dummy {"message":"false 500 error"} Our configuration is tagging each successful event with event.success and failure events with event.error. The confusion will be caused by configuring the dummy message with the same key, message, for both event definitions. This will cause our incoming events to be confusing to deal with. The file called outputs.conf contains but one destination as shown in the following configuration: # This entry directs all tags (it matches any we encounter) # to print to standard output, which is our console. # [OUTPUT] Name stdout Match * With our inputs and outputs configured, we can now bring them together in a single main configuration file we mentioned at the start. Let's create a new file called workshop-fb.conf in our favorite editor. Add the following configuration; for now, just importing our other two files: # Fluent Bit main configuration file. # # Imports section, assumes these files are in the same # directory as the main configuration file. # @INCLUDE inputs.conf @INCLUDE outputs.conf To see if our configuration works, we can test run it with our Fluent Bit installation. Depending on the chosen install method used from the previous articles in this series, we have the option to run it from source, or using container images. First, we show how to run it using the source install execution from the directory we created to hold all our configuration files: # source install. # $ [PATH_TO]/fluent-bit --config=workshop-fb.conf The console output should look something like this - noting that we've cut out the ASCII logo at startup: ... [2024/04/05 16:49:33] [ info] [input:dummy:dummy.0] initializing [2024/04/05 16:49:33] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only) [2024/04/05 16:49:33] [ info] [input:dummy:dummy.1] initializing [2024/04/05 16:49:33] [ info] [input:dummy:dummy.1] storage_strategy='memory' (memory only) [2024/04/05 16:49:33] [ info] [output:stdout:stdout.0] worker #0 started [2024/04/05 16:49:33] [ info] [sp] stream processor started [0] event.success: [[1712328574.915990000, {}], {"message"=>"true 200 success"}] [0] event.error: [[1712328574.917728000, {}], {"message"=>"false 500 error"}] [0] event.success: [[1712328575.915732000, {}], {"message"=>"true 200 success"}] [0] event.error: [[1712328575.916608000, {}], {"message"=>"false 500 error"}] [0] event.success: [[1712328576.915161000, {}], {"message"=>"true 200 success"}] [0] event.error: [[1712328576.915288000, {}], {"message"=>"false 500 error"}] ... Also note the alternating generated event lines with messages that are hard to separate when using the same key. These events alternate in the console until exiting with CTRL_C. Next, we show how to run our telemetry pipeline configuration using a container image. First thing that is needed is a file called Buildfile. This is going to be used to build a new container image and insert our configuration files. Note this file needs to be in the same directory as your configuration files; otherwise, adjust the file path names: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.1 COPY ./workshop-fb.conf /fluent-bit/etc/fluent-bit.conf COPY ./inputs.conf /fluent-bit/etc/inputs.conf COPY ./outputs.conf /fluent-bit/etc/outputs.conf Now we'll build a new container image as follows using the Buildfile, naming it with a version tag, and assuming you are in the same directory (using Podman as discussed in previous articles): $ podman build -t workshop-fb:v4 -f Buildfile STEP 1/4: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.1 STEP 2/4: COPY ./workshop-fb.conf /fluent-bit/etc/fluent-bit.conf --> a379e7611210 STEP 3/4: COPY ./inputs.conf /fluent-bit/etc/inputs.conf --> f39b10d3d6d0 STEP 4/4: COPY ./outputs.conf /fluent-bit/etc/outputs.conf COMMIT workshop-fb:v4 --> b06df84452b6 Successfully tagged localhost/workshop-fb:v4 b06df84452b6eb7a040b75a1cc4088c0739a6a4e2a8bbc2007608529576ebeba Now, to run our new container image: $ podman run workshop-fb:v4 The output looks exactly like the source output above, just with different timestamps. Again you can stop the container using CTRL_C. Now we have dirty ingested data coming into our pipeline, showing that we have multiple messages on the same key. To be able to clean this up for usage before passing on to the backend (output), we need to make use of both the Parser and Filter phases. First, in the Parser phase, where unstructured data is converted into structured data, we'll make use of the built in REGEX parser plugin to structure the duplicate messages into something more usable. For clarity, this is where we are working in our telemetry pipeline: To set up the parser configuration, we create a new file called parsers.conf in our favorite editor. Add the following configuration, where we are defining a PARSER, naming the parser message_cleaning_parser, selecting the built-in regex parser, and applying the regular expression shown here to convert each message into a structured format (note this actually is applied to incoming messages in the next phase of the telemetry pipeline): # This parser uses the built-in parser plugin and applies the # regex to all incoming events. # [PARSER] Name message_cleaning_parser Format regex Regex ^(?<valid_message>[^ ]+) (?<code>[^ ]+) (?<type>[^ ]+)$ Next up is the Filter phase where we will apply the parser. For clarity, the following visual is provided: In the Filter phase, the previously defined parser is put to the test. To set up the filter configuration we create a new file called filters.conf in our favorite editor. Add the following configuration where we are defining a FILTER, naming the filter message_parser, matching all incoming messages to apply this filter, looking for the key message to select the value to be fed into the parser, and applying the parser message_cleaning_parser to it: # This filter is applied to all events and uses the named parser to # apply values found with the chosen key if it exists. # [FILTER] Name parser Match * Key_Name message Parser message_cleaning_parser To make sure the new filter and parser are included, we update our main configuration file workshop-fb.conf as follows: # Fluent Bit main configuration file. [SERVICE] parsers_file parsers.conf # Imports section. @INCLUDE inputs.conf @INCLUDE outputs.conf @INCLUDE filters.conf To verify that our configuration works we can test run it with our Fluent Bit installation. Depending on the chosen install method, here we show how to run it using the source installation followed by the container version. Below, the source install is shown from the directory we created to hold all our configuration files: # source install. # $ [PATH_TO]/fluent-bit --config=workshop-fb.conf The console output should look something like this - noting that we've cut out the ASCII logo at startup: ... [2024/04/09 16:19:42] [ info] [input:dummy:dummy.0] initializing [2024/04/09 16:19:42] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only) [2024/04/09 16:19:42] [ info] [input:dummy:dummy.1] initializing [2024/04/09 16:19:42] [ info] [input:dummy:dummy.1] storage_strategy='memory' (memory only) [2024/04/09 16:19:42] [ info] [output:stdout:stdout.0] worker #0 started [2024/04/09 16:19:42] [ info] [sp] stream processor started [0] event.success: [[1712672383.962198000, {}], {"valid_message"=>"true", "code"=>"200", "type"=>"success"}] [0] event.error: [[1712672383.964528000, {}], {"valid_message"=>"false", "code"=>"500", "type"=>"error"}] [0] event.success: [[1712672384.961942000, {}], {"valid_message"=>"true", "code"=>"200", "type"=>"success"}] [0] event.error: [[1712672384.962105000, {}], {"valid_message"=>"false", "code"=>"500", "type"=>"error"}] ... Be sure to scroll to the right in the above window to see the full console output. Note the alternating generated event lines with parsed messages that now contain keys to simplify later querying. This runs until exiting with CTRL_C. Let's now try testing our configuration by running it using a container image. The first thing that is needed is to open in our favorite editor the file Buildfile. This is going to be expanded to include the filters and parsers configuration files. Note this file needs to be in the same directory as our configuration files; otherwise, adjust the file path names: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.1 COPY ./workshop-fb.conf /fluent-bit/etc/fluent-bit.conf COPY ./inputs.conf /fluent-bit/etc/inputs.conf COPY ./outputs.conf /fluent-bit/etc/outputs.conf COPY ./filters.conf /fluent-bit/etc/filters.conf COPY ./parsers.conf /fluent-bit/etc/parsers.conf Now we'll build a new container image, naming it with a version tag as follows using the Buildfile and assuming you are in the same directory (using Podman as discussed in previous articles): $ podman build -t workshop-fb:v4 -f Buildfile STEP 1/6: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.1 STEP 2/6: COPY ./workshop-fb.conf /fluent-bit/etc/fluent-bit.conf --> 7eee3091e091 STEP 3/6: COPY ./inputs.conf /fluent-bit/etc/inputs.conf --> 53ff32210b0e STEP 4/6: COPY ./outputs.conf /fluent-bit/etc/outputs.conf --> 62168aa0c600 STEP 5/6: COPY ./filters.conf /fluent-bit/etc/filters.conf --> 08f0878ded1e STEP 6/6: COPY ./parsers.conf /fluent-bit/etc/parsers.conf COMMIT workshop-fb:v4 --> 92825169e230 Successfully tagged localhost/workshop-fb:v4 92825169e230a0cc36764d6190ee67319b6f4dfc56d2954d267dc89dab8939bd Now to run our new container image: $ podman run workshop-fb:v4 The output looks exactly like the source output above, noting that the alternating generated event lines with parsed messages now contain keys to simplify later querying. This completes our use cases for this article, be sure to explore this hands-on experience with the accompanying workshop lab. What's Next? This article walked us through a telemetry pipeline use case for multiple events using parsing and filtering. The series continues with the next step where we'll explore how to collect metrics using a telemetry pipeline. Stay tuned for more hands on material to help you with your cloud native observability journey.
Businesses can react quickly and effectively to user behavior patterns by using real-time analytics. This allows them to take advantage of opportunities that might otherwise pass them by and prevent problems from getting worse. Apache Kafka, a popular event streaming platform, can be used for real-time ingestion of data/events generated from various sources across multiple verticals such as IoT, financial transactions, inventory, etc. This data can then be streamed into multiple downstream applications or engines for further processing and eventual analysis to support decision-making. Apache Flink serves as a powerful engine for refining or enhancing streaming data by modifying, enriching, or restructuring it upon arrival at the Kafka topic. In essence, Flink acts as a downstream application that continuously consumes data streams from Kafka topics for processing, and then ingests the processed data into various Kafka topics. Eventually, Apache Druid can be integrated to consume the processed streaming data from Kafka topics for analysis, querying, and making instantaneous business decisions. Click here for an enlarged view In my previous write-up, I explained how to integrate Flink 1.18 with Kafka 3.7.0. In this article, I will outline the steps to transfer processed data from Flink 1.18.1 to a Kafka 2.13-3.7.0 topic. A separate article detailing the ingestion of streaming data from Kafka topics into Apache Druid for analysis and querying was published a few months ago. You can read it here. Execution Environment We configured a multi-node cluster (three nodes) where each node has a minimum of 8 GB RAM and 250 GB SSD along with Ubuntu-22.04.2 amd64 as the operating system. OpenJDK 11 is installed with JAVA_HOME environment variable configuration on each node. Python 3 or Python 2 along with Perl 5 is available on each node. A three-node Apache Kafka-3.7.0 cluster has been up and running with Apache Zookeeper -3.5.6. on two nodes. Apache Druid 29.0.0 has been installed and configured on a node in the cluster where Zookeeper has not been installed for the Kafka broker. Zookeeper has been installed and configured on the other two nodes. The Leader broker is up and running on the node where Druid is running. Developed a simulator using the Datafaker library to produce real-time fake financial transactional JSON records every 10 seconds of interval and publish them to the created Kafka topic. Here is the JSON data feed generated by the simulator. JSON {"timestamp":"2024-03-14T04:31:09Z ","upiID":"9972342663@ybl","name":"Kiran Marar","note":" ","amount":"14582.00","currency":"INR","geoLocation":"Latitude: 54.1841745 Longitude: 13.1060775","deviceOS":"IOS","targetApp":"PhonePe","merchantTransactionId":"ebd03de9176201455419cce11bbfed157a","merchantUserId":"65107454076524@ybl"} Extract the archive of the Apache Flink-1.18.1-bin-scala_2.12.tgz on the node where Druid and the leader broker of Kafka are not running Running a Streaming Job in Flink We will dig into the process of extracting data from a Kafka topic where incoming messages are being published from the simulator, performing processing tasks on it, and then reintegrating the processed data back into a different topic of the multi-node Kafka cluster. We developed a Java program (StreamingToFlinkJob.java) that was submitted as a job to Flink to perform the above-mentioned steps, considering a window of 2 minutes and calculating the average amount transacted from the same mobile number (upi id) on the simulated UPI transactional data stream. The following list of jar files has been included on the project build or classpath. Using the code below, we can get the Flink execution environment inside the developed Java class. Java Configuration conf = new Configuration(); StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf); Now we should read the messages/stream that has already been published by the simulator to the Kafka topic inside the Java program. Here is the code block. Java KafkaSource kafkaSource = KafkaSource.<UPITransaction>builder() .setBootstrapServers(IKafkaConstants.KAFKA_BROKERS)// IP Address with port 9092 where leader broker is running in cluster .setTopics(IKafkaConstants.INPUT_UPITransaction_TOPIC_NAME) .setGroupId("upigroup") .setStartingOffsets(OffsetsInitializer.latest()) .setValueOnlyDeserializer(new KafkaUPISchema()) .build(); To retrieve information from Kafka, setting up a deserialization schema within Flink is crucial for processing events in JSON format, converting raw data into a structured form. Importantly, setParallelism needs to be set to no.of Kafka topic partitions else the watermark won't work for the source, and data is not released to the sink. Java DataStream<UPITransaction> stream = env.fromSource(kafkaSource, WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofMinutes(2)), "Kafka Source").setParallelism(1); With successful event retrieval from Kafka, we can enhance the streaming job by incorporating processing steps. The subsequent code snippet reads Kafka data, organizes it by mobile number (upiID), and computes the average price per mobile number. To accomplish this, we developed a custom window function for calculating the average and implemented watermarking to handle event time semantics effectively. Here is the code snippet: Java SerializableTimestampAssigner<UPITransaction> sz = new SerializableTimestampAssigner<UPITransaction>() { @Override public long extractTimestamp(UPITransaction transaction, long l) { try { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'"); Date date = sdf.parse(transaction.eventTime); return date.getTime(); } catch (Exception e) { return 0; } } }; WatermarkStrategy<UPITransaction> watermarkStrategy = WatermarkStrategy.<UPITransaction>forBoundedOutOfOrderness(Duration.ofMillis(100)).withTimestampAssigner(sz); DataStream<UPITransaction> watermarkDataStream = stream.assignTimestampsAndWatermarks(watermarkStrategy); //Instead of event time, we can use window based on processing time. Using TumblingProcessingTimeWindows DataStream<TransactionAgg> groupedData = watermarkDataStream.keyBy("upiId").window(TumblingEventTimeWindows.of(Time.milliseconds(2500), Time.milliseconds(500))).sum("amount"); .apply(new TransactionAgg()); Eventually, the processing logic (computation of average price for the same UPI ID based on a mobile number for the window of 2 minutes on the continuous flow of transaction stream) is executed inside Flink. Here is the code block for the Window function to calculate the average amount on each UPI ID or mobile number. Java public class TransactionAgg implements WindowFunction<UPITransaction, TransactionAgg, Tuple, TimeWindow> { @Override public void apply(Tuple key, TimeWindow window, Iterable<UPITransaction> values, Collector<TransactionAgg> out) { Integer sum = 0; //Consider whole number int count = 0; String upiID = null ; for (UPITransaction value : values) { sum += value.amount; upiID = value.upiID; count++; } TransactionAgg output = new TransactionAgg(); output.upiID = upiID; output.eventTime = window.getEnd(); output.avgAmount = (sum / count); out.collect( output); } } We have processed the data. The next step is to serialize the object and send it to a different Kafka topic. Add a KafkaSink in the developed Java code (StreamingToFlinkJob.java) to send the processed data from the Flink engine to a different Kafka topic created on the multi-node Kafka cluster. Here is the code snippet to serialize the object before sending/publishing it to the Kafka topic: Java public class KafkaTrasactionSinkSchema implements KafkaRecordSerializationSchema<TransactionAgg> { @Override public ProducerRecord<byte[], byte[]> serialize( TransactionAgg aggTransaction, KafkaSinkContext context, Long timestamp) { try { return new ProducerRecord<>( topic, null, // not specified partition so setting null aggTransaction.eventTime, aggTransaction.upiID.getBytes(), objectMapper.writeValueAsBytes(aggTransaction)); } catch (Exception e) { throw new IllegalArgumentException( "Exception on serialize record: " + aggTransaction, e); } } } And, below is the code block to sink the processed data sending back to a different Kafka topic. Java KafkaSink<TransactionAgg> sink = KafkaSink.<TransactionAgg>builder() .setBootstrapServers(IKafkaConstants.KAFKA_BROKERS) .setRecordSerializer(new KafkaTrasactionSinkSchema(IKafkaConstants.OUTPUT_UPITRANSACTION_TOPIC_NAME)) .setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE) .build(); groupedData.sinkTo(sink); // DataStream that created above for TransactionAgg env.execute(); Connecting Druid With Kafka Topic In this final step, we need to integrate Druid with the Kafka topic to consume the processed data stream that is continuously published by Flink. With Apache Druid, we can directly connect Apache Kafka so that real-time data can be ingested continuously and subsequently queried to make business decisions on the spot without interventing any third-party system or application. Another beauty of Apache Druid is that we need not configure or install any third-party UI application to view the data that landed or is published to the Kafka topic. To condense this article, I omitted the steps for integrating Druid with Apache Kafka. However, a few months ago, I published an article on this topic (linked earlier in this article). You can read it and follow the same approach. Final Note The provided code snippet above is for understanding purposes only. It illustrates the sequential steps of obtaining messages/data streams from a Kafka topic, processing the consumed data, and eventually sending/pushing the modified data into a different Kafka topic. This allows Druid to pick up the modified data stream for query, analysis as a final step. Later, we will upload the entire codebase on GitHub if you are interested in executing it on your own infrastructure. I hope you enjoyed reading this. If you found this article valuable, please consider liking and sharing it.
How SRE Copilot Tools Will Transform Organizations
April 19, 2024 by
Unlocking the Power of Static Pods in Kubernetes: A Beginner’s Guide
April 19, 2024 by
Cell-Based Architecture: Comprehensive Guide
April 19, 2024 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
Guide to Cloud-Native Application Security
April 19, 2024 by
Unlocking the Power of Static Pods in Kubernetes: A Beginner’s Guide
April 19, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Guide to Cloud-Native Application Security
April 19, 2024 by
Unlocking the Power of Static Pods in Kubernetes: A Beginner’s Guide
April 19, 2024 by
Deep Dive Into Java Executor Framework
April 19, 2024 by
Implementing Low-Code Data Mapping in Java
April 19, 2024 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by