May 15 Outage Post-Mortem
Last night, we had an incident in our data-processing pipeline that resulted in some data loss. At around 11:30 PM Pacific Time, an on-call engineering was paged due to a server becoming unresponsive and unreachable. While quite rare, our infrastructure is robust enough to tolerate some servers going down…