On February 20 at roughly 3:00 a.m. UTC, Ditto experienced an unexpected server issue that caused all services to be unavailable for all customers. We understand this downtime may have impacted productivity for your business, and for that, we apologize. This is not the level of service we strive to deliver, and we want to do better. The outage occurred in our Ditto cloud service when a log rotation utility failed to adequately archive past logs. This caused the hard drives to run out of space and prevent the servers from serving requests to our users. We have adjusted our log rotation utility and added additional disk space notifications to prevent this problem from occurring again.
The outage lasted approximately 45 minutes. We were not aware of the issue initially, but we acted quickly to resolve the problem within 15 minutes after being made aware. While we have many redundancies in place, our systems still failed to properly notify us of these issues within an acceptable period of time. The lack of notification extended the outage significantly.
Going forward, we have put measures in place to prevent situations like this.
1. We’ve assessed and reconfigured our notification systems to alert engineers of downtime and critical issues faster and more reliably.
2. We are evaluating and reconfiguring back-end monitoring metrics to help avoid issues that caused the original outage.
We are striving to be proactive, as this incident caused us to look deeper at how more serious issues could impact customers. We appreciate your trust while we continue to grow Ditto and learn from mistakes like this.
Again, we are sorry for the inconvenience, and we will strive to do better. If you have any questions, please don’t hesitate to contact us.
Kind regards,
Andrew Gould
Squirrels CEO
Comment