A Jan. 25 outage that affected Microsoft’s Azure, Teams, and Outlook has been linked to rapid Border Gateway Protocol (BGP) router updates, according to a ThousandEyes analysis of the incident.
The network intelligence company cites the repeated, rapid readvertising of BGP router prefixes as the culprit.
- The analysis states that the 90-minute incident “was triggered by an external BGP change by Microsoft that impacted connected service providers, leading to destabilization of global routes to its prefixes, which led to significant packet loss and diminished reachability of its services and those of its customers.”
- The “external BGP change” refers to the withdrawal of BGP routes responsible for directing internet traffic to the most optimal route based on the BGP best-path selection algorithm.
- The rapid changes in traffic paths brought on by the continuous readvertising of BGP router prefixes led to route-table instability, thus resulting in a service disruption.
- Microsoft plans to release a post-incident review in the next few weeks.