Client Latency
Incident Report for Avochato
Postmortem

What happened

A large spike in network requests combined with a backlog automated usage led to the Avochato platform queueing HTTP requests for a longer than average period of time. The resulting callbacks that resulted from the spike in usage created a large backlog of work to be done by our servers and led to page load times to spike and delays in processing sending messages.

Subsequently, the load-balancer for our platform ran out of available connections for HTTP requests as websocket escalations piled up due to our users refreshing their browsers during the period of degraded performance. This caused a negative feedback loop leading to longer delays to process requests and connect to live updates, which then contributed to live updates for inboxes and conversations continueing to be intermittent and HTTP requests being dropped.

Action items

Specific bottlenecks in our platform infrastructure’s ability to broker websockets have been identified and implemented.

Some additional updates to our asynchronous architecture are being planned and prioritized to prevent a similar incident in the future.

Posted Oct 29, 2020 - 15:47 PDT

Resolved
This incident has been resolved.
Posted Oct 28, 2020 - 16:09 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 28, 2020 - 11:25 PDT
Update
Our team has taken steps to mitigate platform latency which has improved but not resolved performance.

We are continuing to monitor performance.
Posted Oct 28, 2020 - 11:17 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Oct 28, 2020 - 10:02 PDT
Investigating
We are currently investigating this issue.
Posted Oct 28, 2020 - 09:20 PDT
This incident affected: avochato.com.