Platform Latency
Incident Report for Avochato
Postmortem

What happened

High concurrent outbound message volume caused our production write database to run out of connections. This caused most queued processes to take an extremely long time to finish, as well as page load times to time out for many users who tried accessing the platform during the impact period.

Impact

Pending messages, inbound messages, and broadcasts during this period may have remained queued but were not dropped. Inbound calls initiated to Avochato numbers during this period were often unable to connect or be forwarded properly. Upon resolution, inbound messages and queued work retried themselves and in most identifiable cases were received properly.

Resolution

Our database automatically failed over to a read replica and was able to resume serving requests, however we are investigating ways for this failover to happen sooner to prevent longer periods of inaccessibility.

Our engineers have identified the root cause relating to message callback method prioritization, and we patched our production application servers with both a fix for the root cause as well as new safeguards to prevent excess resource consumption during periods of extreme load.

We are evaluating solutions to make our infrastructure more resilient while continuing to offer a best in class live inbox experience for customers of all sizes.

As a team, we have committed to aggressively monitoring our platform’s health and proactively deploying updates to bottlenecks detected in our current application.

We appreciate the trust you place in our platform for communicating to those that matter most to you, and thank you for your patience during this busy time of the year.

Thank you for choosing Avochato,

Christopher Neale, CTO and Co-founder

Posted Nov 25, 2020 - 12:58 PST

Resolved
This incident has been resolved.
Posted Nov 24, 2020 - 18:16 PST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 24, 2020 - 16:01 PST
Identified
We are working to deploy an update to resolve issues impacting clients.
Posted Nov 24, 2020 - 15:14 PST
Investigating
We are currently investigating this issue.
Posted Nov 24, 2020 - 14:41 PST
This incident affected: avochato.com, API, and Mobile.