Get Skylight

Using Skylight on Skylight

We recently rolled out a new billing system that relies heavily on Stripe's Billing APIs. To avoid issues of inconsistency we try to rely on Stripe, whenever possible, as our single source of truth. This means that we frequently have to reach out to Stripe to get up-to-date information.

We realized early on that constantly reaching out to Stripe wasn't going to be good for performance, so we implemented a caching solution and figured that things were good to go. However, when we looked at the Skylight UI recently (yes, we actually use our own tool!) we noticed that some endpoints were much slower than we expected.

A list of our endpoints with a couple of ones that take over 2 seconds at the top

When we dove into the OrganizationsController#show endpoint, things didn't look great:

An event sequence with some cache hits, but also a number of different slow calls to api.stripe.com

We could see that we had caching in place, but we were still having to call out to Stripe multiple times. Clearly, something was not working as we had expected.

There are a number of different places in our app where we could be calling out to Stripe so, to get some additional insight, we added custom instrumentation around these points to learn a bit more about where they were getting called from and what objects we were trying to fetch.

An event sequence with a number of different slow calls to api.stripe.com wrapped in additional instrumentation

Now we had a bit better idea of what we were trying to fetch. In some of these places, we realized that we were actually bypassing the cache. This was easy to fix. We just needed to add some additional caching.

But that couldn't explain all of these cases! We were certain that some of these places shouldn't be calling out to Stripe after the cache was primed, yet here they were. After some further investigation, we realized that we weren't caching nil values. This meant that any time that we got a nil from Stripe (something that we did expect), we would continue to try to fetch from Stripe again, instead of using the nil value.

After adding additional caching and making sure we were caching nil values, we checked to see how we were doing:

An event sequence without any external API calls and multiple cache hits.

Much better! (Those database queries could use some optimization, but perhaps we'll come back to those in another blog post. 😉)

Without Skylight, it would have taken much longer to realize how slow these endpoints had become. Maybe at some point, we would have logged into the app and realized that it felt a little slow. Perhaps we would have added more caching, but even then, we might have missed a few spots. We probably would have missed the issue of nil values not getting cached, just assuming that the remaining slowness was inherent in the setup.

The insights provided by Skylight enabled us to catch a problem before anyone complained, and better yet, helped show us what we needed to do to fix it.

If Skylight sounds useful to you, or if you have some endpoints like these ones that you'd like to investigate further, sign up today and get a free 30-day trial. Or, refer a friend and you'll both get $50 in credit!