Convergence in Cybersecurity and Cloud Operations: Lessons From NOC and SOC Fusion
By building a common infrastructure and merging the processes of NOC and SOC, organizations can enhance reliability and resiliency, all at a lower cost.
Join the DZone community and get the full member experience.Join For Free
The IT organizational structure is in massive flux, as traditional models no longer fit the modern demands of cloud infrastructure and digital transformation. While this is happening in many different shapes and forms, one of the most interesting shifts is the convergence happening between network operations centers (NOC) and the security operations center (SOC) teams. To understand why this is not just interesting but also critical, let me give an example.
In one company, the SOC team investigated a suspect data breach for ten days without alerting any other part of the organization. During this period the SOC team searched for new information, stopping when they needed more data, and looking at the traffic that seemed nefarious. Their approach prevented any communication across other stakeholders. Neither the legal or internal press team was aware that a crisis might have been happening. No one else provided additional input or data or was engaged to help thwart the event.
This example shines a light on how dangerous the silos between the NOC and SOC can be. To improve the resilience and responsiveness of their networks and security incident management, many cybersecurity teams are beginning a critical convergence of the NOC and SOC. The rationale behind such a move is that the mission of the NOC (to keep the network running reliably and resolve operational problems quickly) and the mission of the SOC (to identify and resolve security incidents) are increasingly using the same types of data, automation, and analytics.
By building a common infrastructure and a merged process, organizations can enhance reliability and resiliency, all at a lower cost. This convergence is still at an early stage, but already we can see lessons for technology operations teams implementing DevOps and DevSecOps practices, and also expanding automation overall.
Forces Driving the NOC/SOC Convergence
A NOC comprises IT technicians who monitor, maintain, and oversee enterprise networks, including internal or external networks if the organization supports managed services, such as cloud, for clients. NOC teams use remote monitoring tools to oversee network activity, respond to network availability disruptions, and optimize performance. As traditional security perimeters dissolve, the entire IT ecosystem becomes a prime target for cyber-criminal activity and network incidents. Organizations are responding by standing up SOCs, which engage in manual work to complement automated systems to manage the organization's increasingly complex cyber-security environment.
Both NOCs and SOCs identify, investigate, prioritize, escalate, and resolve issues to mitigate or altogether avoid customer and business-impacting events. NOC teams solve network performance and availability issues while the SOC team receives information about security threats and responds to them. While both teams continuously monitor logs and events, they use different tools and processes, leading to challenges in understanding which team should respond.
In a recent book on cybersecurity, Fight Fire with Fire: Proactive Cybersecurity Strategies for Today’s Leaders, author Renee Tarun describes the confusion that precipitates when the NOC and SOC remain siloed:
When a problem occurs, each side deploys its own response team—often using duplicative tools and looking through two different lenses at the same set of data. Not surprisingly, the network team approaches incidents from a network availability standpoint while the security team approaches the same issues from the perspective of malicious intent or security vulnerabilities. What appears to be a network problem actually may be an attack or another cyber issue; and what looks like a cyber incident or threat may in reality be a network issue. Some issues fall into both categories. .
Advantages of Centralizing the Incident Management Process
Breaking down these silos requires a centralized approach, but this approach also requires ownership. The NOC is best suited for owning the incident management process as they manage a larger breadth of incident types, those not just related to security. When the NOC team takes ownership of the incident management area, it paves the way for convergence. Fully reaching this convergence means coming together through shared processes and tooling (such as ticketing, communication, and monitoring tools).
Better Communication With Cross-Organizational Stakeholders
As anyone who has been on-call knows all too well, investigating an incident is only part of the job. Communicating with internal and external stakeholders is a crucial element. By managing the incident response process, the NOC ensures the appropriate level of communication happens across the organization. Keeping all stakeholders in the loop, even if it’s from a communication standpoint, not necessarily the people required to mitigate or investigate, ensures everyone can be ready to act, no matter what the outcome.
Improved Data Access and Sharing
The NOC is well suited as the connector between the security team and the product managers, technical program managers, or individual service owners. Because the NOC team maintains relationships with many stakeholders, they generally have access to the right data and expertise across the organization. In this way, the NOC becomes a subject matter expert to the SOC. Centralized incident management brings the automated analytics from each group together. At some point within the incident response process, the teams determine which expert owns what part of the problem.
Isolation Allows Risk to Flourish
Ultimately, convergence happens through partnership. Each team sees the other as filling a complementary role in mitigating an incident’s impact on the business or customer. Unfortunately, despite the benefits, many organizations still operate in silos like the company described above. The SOC team spent 10 days investigating a suspected data breach, all within a silo and never including the NOC, press team, legal, or any other important business stakeholder.
How Single Events Catalyze Change
Events like this catalyze change. In the example above, the data breach was assessed as benign and a crisis didn’t occur. Still, what if the threat turned out to be real? When a highly visible incident kicks off in a bubble, it prevents anyone else from engaging, mitigating, and protecting customers and the business.
Fortunately, this company used this misstep as an opportunity for change. The NOC introduced a simplified incident management process requiring a ticket to be raised for any suspected activity needing a security investigation. This meant that the SOC would never be working on a problem in the dark again. Centralizing the incident management process opened up communication channels, enabled subject matter experts to be brought in when needed, and ensured data sharing occurred, all while maintaining visibility of the incident.
Converging Technology and Data to Break Down Silos
Breaking down silos between teams is tricky but rewarding. Collaboration and integration across NOCs and SOCs not only addresses monetary and physical resource restraints, but it also reduces risk.
As the digital system landscape broadens, it may take longer to know what experts are needed when things go wrong. Centralized incident management is the on-ramp to shortening the time it takes to engage multiple kinds of expertise and respond. However, even with a converged process, the explosion of best-of-breed cloud tools and services makes it difficult to converge data.
Currently, individual service owners build their own services and microservices and use their monitoring tools, each of which recognizes its own set of events. While teams may be using separate tooling (each with its own data sets), the goal should be to have a shared platform that can create structured data from every system.
Pushing all these events into a single place allows you to do normalization and correlation. Together, multiple event alerts from disparate systems build a complete story. When grouped, these alerts represent an incident with context, helping organizations accelerate the time it takes them to reach a point where they can begin initial mitigation or triage steps. The routed alerts trigger automation versus the beginning of a process. As a result, the single pane of glass concept that everyone was sold for the last 15 years can finally be realized.
How Cybersecurity Convergence Is a Proving Ground for Tech Ops
The convergence between NOC and SOC teams is a proving ground for solving the same sorts of problems that will increasingly appear in TechOps. As the landscape of operational systems grows, there will be a need for specialists to attend to each system. At the same time, just like in cybersecurity, it will take analysis and collaboration to first understand what’s going on so that the right team can take over or the right set of teams can collaborate. The practices of centralized incident management and shared analytical tools along with a streamlined process for analyzing events and parceling out responsibility will be one of many patterns of convergence that will be repeated over and over as TechOps becomes a far broader and more pervasive field.
Opinions expressed by DZone contributors are their own.