On the last day of February, websites and web based services numbering into the millions were totally offline for a big portion of the day. How does this happen? Well, the commonality was they are all using Amazon's cloud computing system called S3, which is short for "simple storage services". The S3 service allows small, medium and even very large companies to host data and services in the cloud, running on Amazon's hardware in data centres strategically placed all over the globe.
The S3 service is the world's largest cloud computing platform by a large factor, so even though only a small portion of the service went offline, it created huge disruptions.
Even our own company was affected as we use a CRM & automation management platform that is hosted on S3, so we ourselves were unable to use that system most of the day.
What caused this? Well, it turns out that Amazon engineers were trying to resolve a billing system issue, and essentially what amounts to a typo caused the commands they were typing in to take down a large swath of their East Coast data centre, causing them to need to fully reboot all these servers.This reboot takes many hours to complete, AND this equipment had not been rebooted for several years so it created a whole additional wrath of problems.
Luckily services were restored towards the end of the day, and are working well again now.I should also add that Amazon's S3 services are extremely reliable, which is probably why this came as such a shock to individuals & businesses that use their services regularly.