Client Latency

Incident Report for Avochato

Postmortem

What happened

A large spike in network requests combined with a backlog automated usage led to the Avochato platform queueing HTTP requests for a longer than average period of time. The resulting callbacks that resulted from the spike in usage created a large backlog of work to be done by our servers and led to page load times to spike and delays in processing sending messages.

Subsequently, the load-balancer for our platform ran out of available connections for HTTP requests as websocket escalations piled up due to our users refreshing their browsers during the period of degraded performance. This caused a negative feedback loop leading to longer delays to process requests and connect to live updates, which then contributed to live updates for inboxes and conversations continueing to be intermittent and HTTP requests being dropped.

Action items

Specific bottlenecks in our platform infrastructure’s ability to broker websockets have been identified and implemented.

Some additional updates to our asynchronous architecture are being planned and prioritized to prevent a similar incident in the future.

Posted Oct 29, 2020 - 15:47 PDT

Resolved

This incident has been resolved.
Posted Oct 28, 2020 - 16:09 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Oct 28, 2020 - 11:25 PDT

Update

Our team has taken steps to mitigate platform latency which has improved but not resolved performance.

We are continuing to monitor performance.
Posted Oct 28, 2020 - 11:17 PDT

Identified

The issue has been identified and a fix is being implemented.
Posted Oct 28, 2020 - 10:02 PDT

Investigating

We are currently investigating this issue.
Posted Oct 28, 2020 - 09:20 PDT
This incident affected: avochato.com.