High concurrent outbound message volume caused our production write database to run out of connections. This caused most queued processes to take an extremely long time to finish, as well as page load times to time out for many users who tried accessing the platform during the impact period.
Pending messages, inbound messages, and broadcasts during this period may have remained queued but were not dropped. Inbound calls initiated to Avochato numbers during this period were often unable to connect or be forwarded properly. Upon resolution, inbound messages and queued work retried themselves and in most identifiable cases were received properly.
Our database automatically failed over to a read replica and was able to resume serving requests, however we are investigating ways for this failover to happen sooner to prevent longer periods of inaccessibility.
Our engineers have identified the root cause relating to message callback method prioritization, and we patched our production application servers with both a fix for the root cause as well as new safeguards to prevent excess resource consumption during periods of extreme load.
We are evaluating solutions to make our infrastructure more resilient while continuing to offer a best in class live inbox experience for customers of all sizes.
As a team, we have committed to aggressively monitoring our platform’s health and proactively deploying updates to bottlenecks detected in our current application.
We appreciate the trust you place in our platform for communicating to those that matter most to you, and thank you for your patience during this busy time of the year.
Thank you for choosing Avochato,
Christopher Neale, CTO and Co-founder