As a way to improve our API and to be transparent about any issues that may effect our users, we intend to post up quick postmortems of any service interruptions we have when they occur.
Last night (13/3/2017), our database server went offline which kicked off a serious of automated events to get us back online with our latest backup.
Total outage time: ~7 minutes
The following is the timeline of events:
- 12:24 UTC - Database server was lost due to a down stream cloud provider issue
- 12:24 UTC - Connection to database server lost from API servers
- ~12:24 - 12:31 UTC - API Load balancer detected unhealthy API servers and replaced them trying to restore service
- 12:24 UTC - New database server started to replace lost one
- ~12:30 UTC - Automatic database restore started
- ~12:31 UTC - Restore completed
- ~12:31 UTC - Database + API online taking requests.
We will learn what we can from this outage to try to improve response times and, when possible, avoid outages like this. Though our API is still in our ‘Beta’ period, we try to ensure as much uptime and availability as we practically can and sorry if this outage caused any inconvenience.