API Downtime
Incident Report for Clerk.io
Resolved
We have now back tracked the incident fully and determined the source.

This Saturday night we implemented a extra backup service for emergency backups (a backup of the backups). To keep the system online during the process we tuned some configurations to allow for some extra capacity. When we were done we did a reset of all the configurations... except one, which was still left to high!

This resulted in internal processing queues slowly growing to large and finally crashing the system this monday afternoon.

Our entire software infrastructure is decomposed into individual services so as soon as we received the alerts from our monitoring systems we were able to identify the crashed service and get everything back online within minutes.

To prevent this in the future we will of course keep detailed checklists when doing systems changes. We will also look in to more detailed monitoring for our early warning systems.

We apologize for the trouble this has caused.
Posted Aug 11, 2014 - 18:27 CEST
Monitoring
We have had an ca. 10 minutes outage of our API service. We have successfully resolved the issue by restarting the service and are now closely monitoring it.

We will be back with an in depth post-mortem when we have explores the cause for the outage.
Posted Aug 11, 2014 - 13:45 CEST