Earlier today the Clerk.io API was unavailable for a total of 18 minutes due to a failure in our cache infrastructure increasing average response time from 9 miliseconds to 17 seconds.
The failure was caused by a memory leak in a monitoring application on the central cache server causing the server to rapidly run out of memory.
The monitoring application has been temporarily disabled, a bug reported to the company providing the service and tonight we will do a maintenance update to avoid this from happening in the future.
Here is what happened (times are in UTC):
A single cache server should never be able to take down the entire service. We will look at both having more spare memory on our servers but also looking into cache mirroring services for improved stability.
We know that our service is essential to your business and we deeply apologize for any downtime! We are also thankful for your support during this incident even though we let you down.
I wish you all the very best