Today we have had periods with slow API response times from 14:47 CET (Central European Time) to 17:08 CET and we chose to close down for the API from 15:57 CET to 16:09 CET.
The slow response time was due to a new search feature, launched a couple of days ago, that had slowly created a bottleneck in our platform. The increase in response time was initially so slow that none of our monitoring services detected it but today the response times started to grow exponentially and create a domino-effect slowing all our services.
Our product team quickly identified the problem and tried to contain it. But after roughly half an hour we decided that the fastest way to resolve the issue was to temporarily reject new incoming API requests until the system had cooled down.
The cooldown took a couple of minutes where after we opened up for incoming API requests. Over the next hour we slowly and controlled enabled all other (non-noticeably) sub-services while closely monitoring the system.
Posted about 1 year ago. Aug 28, 2017 - 18:19 CEST