Incident Review — What Was Behind the September 7 Spectrum Outage: A Case of Dr. BGP Hijack or Mr. BGP Mistake?
In this post, we share our investigation into the outage and our hunch (backed by real data) that it may have been caused by a BGP hijack.
Join the DZone community and get the full member experience.Join For Free
September 7, 2021, 16:36 UTC: an outage hit Spectrum cable customers in the Midwest of the U.S., including Ohio, Wisconsin, and Kentucky. Users of their broadband and TV services hit social media to voice their annoyance at the disruption it was causing.
Everything was resolved at around 18:11 UTC, and services were restored to users.
Fact Checking the Spectrum Investigation
Spectrum was vague about the type of issue they were having that was causing the outage. Our investigation shows it may have involved a BGP route hijack. A BGP route hijack happens when an autonomous system (AS) claims to be the origin for a network that has been assigned to another AS. If the hijack was accidental, it can lead to a denial of services. If it was deliberate, at its worst, another AS could attempt to steal sensitive information.
To better understand what happened, let’s investigate data collected by rrc10, the RIS route collector deployed by RIPE NCC (RIPE Network Coordination Centre) at the Milan Internet Exchange (MIX), in Italy. Looking at the closest RIB snapshot available, we can see that Spectrum (AS10796) was announcing 690 networks, most of which routed via its backbone (AS7843).
From the update files provided by rrc10 collector between 16:30 and 18:30 UTC, we can further see that KHS USA (AS398994) started to originate 449 of the 690 networks previously originated by Spectrum. This started the outage. Routes were seen by most of the RIS peers up to 18:11 UTC, when KHS stopped announcing any routes.
So…Was It Dr. BGP Hijack Or Mr. BGP Mistake?
One of the peculiarities of this hijack is that it started in the belly of the attacked AS. KHS announced the Spectrum networks via Spectrum AS itself (AS10796), and the Spectrum backbone (AS7843) propagated them in the wild. Only a few routes were announced via Spectrum AS itself (AS10796) and Tata communications (AS6453), one of Spectrum’s providers.
Another oddity is that KHS was not announcing any networks before or after the hijack. This can be seen from the BGP Update activity widget provided by RIPE Stat.
The outage might have been caused by an experiment at Spectrum. Alternately, similar scenarios can happen if someone exploits a security vulnerability inside of the carrier and finds a way to open a BGP session with one of their routers. This could then lead to a hijacking of the routes to create an outage on purpose. In the past, we have seen similar scenarios where this has been the case.
Lessons for Network Administrators
Either way, there are a few lessons network administrators can obtain from what happened.
Apply Bulletproof Security To Routers
A BGP hijack may not be the case with Spectrum, but in any case, it is worth mentioning that unrecognized sources absolutely must not be able to set up a BGP session on their own and announce networks at will. MANRS (MANRS – Mutually Agreed Norms for Routing Security) provides a useful set of rules to prevent the spreading of events like this one.
Set Up Automated Controls
Network administrators need to set up automated controls to drop any route announcement related to networks that their customers are not allowed to announce. We suspect that this level of control was missing at the Spectrum router. Otherwise, AS 398994 would not have been able to hijack routes belonging to 449 Spectrum networks.
However, simple controls are not enough for transit AS. If any of them were setting up a list of networks that AS10796 was allowed to announce, that would still not have stopped the spreading of the hijack in the wild. Indeed, AS10796 was the original owner of the networks and signed most of its routes in RPKI, including 431 networks out of the 449 being hijacked.
Dropping routes found invalid via RPKI checks would have been a solution to mitigate the hijack spread. However, even that would not have stopped the hijack itself.
Take Proactive Countermeasures Against BGP Issues
Last, but not least, network administrators should take immediate countermeasures against events like this one. Network administrators should lean on 24/7 BGP monitoring tools. At Catchpoint, we inform customers about hijacks within only a few seconds.
Understand More About Proactively Tackling BGP Insecurity
Check out Catchpoint's five-part blog series on BGP security issues like hijacks and route leaks, featuring more real-world examples and technical deep dives.
Published at DZone with permission of Alessandro Improta. See the original article here.
Opinions expressed by DZone contributors are their own.