This Week's Data Processing Lag

This week at Skylight HQ (technically it’s in the Cloud™️ somewhere), we were hit with a pretty big surge of requests on our collector (Cyber Monday perhaps?).

Fortunately our backend is architected to handle a very large volume of data from the agents with minimal latency, so this kind of surge on our backend would not affect the performance of your apps in any way (in addition, the agent is sending data in the background from a different process separate from your app/server).

However, once we received the data, we need to aggregate it in our data processing pipeline before it will show up in your Skylight dashboard. The surge in requests saturated our capacity to process data, causing up to an hour (47 minutes at peak, to be exact) of delay before performance data was available for a portion of our customers earlier this week. (Our status page shows the maximum delay across all shards for simplicity).

To avoid causing even more complications and interruptions, we scheduled to deploy additional capacity (by sub-sharding the saturated parts of the pipeline) on Wednesday evening. Thankfully, that went very smoothly. Everything is snappy again after the deploy and we should have capacity to spare for a while.

