Last night (11-14-22) at about 9:24 PM SFCN experienced an unusual service outage. It was quickly determined that the service interruption was beyond the Salt Lake City connection point to our primary bandwidth provider.
As we were preparing to manually fail over to our backup provider, at about 9:34 PM services appeared to be restored through our primary provider. As we monitored the network traffic, services were again impacted at 9:39 PM. The decision was then made to fail over to our backup provider until the issues with our primary provider could be resolved. This action solved our connectivity issues and normal network traffic was restored at 9:46 PM. As of this morning we are still running normally through our backup provider, but we will not fail back to our primary circuit until the least impactful hours of the morning tomorrow in case of any unforeseen issues.
The Salt Lake City connection point has redundant circuits to the rest of the world. Earlier in the day yesterday one of those fiber circuits failed, leaving all traffic on a single path out. Last night as the fiber vendor responsible for repairing that failed path was working to restore it, they mistakenly cut the wrong fiber and took down the functional fiber path instead of the failed fiber path. This action did not affect our connectivity to the Salt Lake City location but it did effectively isolate that location from the rest of the world.
SFCN has some excellent automatic failover methods in place, but unfortunately they are simply not capable of detecting issues beyond our connections to our providers. Because our connection to our provider was not impacted, these methods were unable to determine the issue’s source and reroute our traffic accordingly. This requires human intervention and effort and takes a few extra minutes.
While these types of outages are extremely rare on our network, they can occur. We thank you all for your patience and understanding. We truly do have the best customers in the world.