Forem: Marc-Andre Giroux

How Should we Version GraphQL APIs?

Marc-Andre Giroux — Wed, 06 Nov 2019 15:34:06 +0000

How do you version GraphQL APIs? The most common answer you’ll get these days is “you don’t”. If you’re like me when I first read that, you might be a little anxious about maintaining a version-less API or a bit skeptical of the approach. In fact, not only is this a common answer, but it’s listed as a feature of GraphQL itself on graphql.org

When reading graphql.org, this may look like a feature being specific to GraphQL, but in fact, evolving APIs without versioning is something that has been recommended a lot for REST and HTTP APIs already. In fact, Roy Fielding himself, famously said at a conference that the best practice for versioning REST APIs was not to do it! Let’s explore API versioning and maybe try to see which evolution approach makes most sense for GraphQL APIs.

API Versioning is Never Fun

In API Versioning Has No “Right Way”, Phil Sturgeon does a very useful tour of API versioning techniques we’ve been seeing in the wild. The conclusion Phil ends up is one I share. No versioning approach is bullet proof, and most of them are quite 💩 for clients.

Probably the most popular versioning approach out there is global URL versioning. This is the approach you see when APIs have /api/v1/user and /api/v2/user for example. While this may appear like the most simple approach to get started it comes with a lot of annoying problems. A now classic versioning article from Brandur Leach on Stripe's versioning approach explains this beautifully:

“This can work, but has the major downside of changes between versions being so big and so impactful for users that it’s almost as painful as re-integrating from scratch. It’s also not a clear win because there will be a class of users that are unwilling or unable to upgrade and get trapped on old API versions. Providers then have to make the difficult choice between retiring API versions and by extension cutting those users off, or maintaining the old versions forever at considerable cost.”

Instead, Stripe’s approach is to use smaller date-based change sets that map to server side “transforms”, hoping for smaller sets of changes, but also incurring less cost on the server side implementation. Instead of expressing these versions in the URL, a header is used to switch between these smaller versions. They reserve the right to version in the URL, but would only be used for really major changes.

The header based / dynamic approach has become a fairly popular approach to versioning, which I think provides a better experience to clients and API providers.

When using resource based HTTP APIs, another approach to evolving resources is using content negotiation. For example, the GitHub API allows clients to specify a custom media type that includes a version number. This is a pretty good way to express versioning for the representation of a resource, but not necessarily for other types of changes. The other down side is that clients aren’t forced to pass that media type, and if it goes away, are reverted back usually to the latest version, almost certainly breaking them. For GraphQL, since we aren’t dealing with different HTTP resources, this is a no go.

Versioning GraphQL is Possible

Nothing is stopping anyone from versioning a GraphQL API. The spirit of Lee Byron won’t come haunt you if that happens (Actually, I can’t promise that 👻). In fact, Shopify recently adopted a URL versioning approach to evolve their GraphQL API.

The versioning approach uses the URL for global versioning. With GraphQL, this is less annoying because it doesn’t create a full new hierarchy of resources under the new identifiers. It uses finer grained, 3 month long versions, which helps with the typical global versioning problems.

While versioning often gives a sense of security to both providers and clients, it doesn’t generally last forever. Unless an infinite amount of versions is supported, which causes unbounded complexity on the server side, clients eventually need to evolve. In that Shopify example, this happens 9 months after a stable version is released. After that 9 months, clients need to either upgrade to the next viable version, which contains a smaller set of changes, or upgrade to the newest version, which probably includes a lot more changes.

Continuous Evolution

The alternative to versioning, as mentioned at the beginning of this post is to simply not do it. The process of maintaining a single version and constantly evolving it in place rather than cutting new versions is often called Continuous Evolution. One of my favorite way to describe the philosophy comes from again, Phil Sturgeon:

“API evolution is the concept of striving to maintain the “I” in API, the request/response body, query parameters, general functionality, etc., only breaking them when you absolutely, absolutely, have to. It’s the idea that API developers bending over backwards to maintain a contract, no matter how annoying that might be, is often more financially and logistically viable than dumping the workload onto a wide array of clients.”

Commitment to Contracts

Ironically, I remember hearing that Tobi was absolutely against API versions back when I worked at Shopify. Back then I was shocked to hear that, and was really skeptical. The funny thing is now I totally understand where Tobi was coming from, and I hold similar views nowadays.

A big part of the philosophy behind continuous evolution is a strong commitment to contracts, aiming to evolve the API in a backward compatible way the absolute best we can. This may sound like an unachievable dream to some of you, but I’ve noticed a lot of API providers will settle into versioning rather than trying to find creative ways to maintain their interface.

Additive changes are almost always backward compatible, and a lot of the time breaking changes can be avoided if they are used wisely. The main downside to an additive approach to evolution is usually naming real estate and API “bloat”. The naming issue can usually be mitigated by being overly specific in naming in the first place. The bloating, especially in GraphQL, is most probably less of an issue than the cost of versioning to clients. But that’s of course a tradeoff for API providers to decide.

Unavoidable Breaking Changes

At this point you’re probably yelling at your screen thinking back to moments when you totally had to make breaking changes to your interface. And it’s true! Not all changes can be made through addition, and there will always be moments where a breaking change need to be made. In fact, here are a few good examples of unavoidable breaking changes:

Security Changes: You realize a field has been leaking private data or that you should never expose a certain set of fields.
Performance issues linked to the API design: A unpaginated list that can potentially return millions of records, causing timeouts and breaking clients.
Authentication changes: An API provider deciding to deprecate “basic auth” all together, forcing API clients to move towards JWTs.
A non-null field actually can be null at runtime: I would not wish this one on my worst enemy.

In these four example cases, there is often no way an additive change can be made to address the situation. The API must be modified, or fields be removed. With evolution, we rely on deprecation notices, a period to let clients move away from the deprecated fields, and a final sunset making the breaking change.

Versioning seems like it would solve breaking change issues but if you look at the examples we listed, none of them would be easy even if we had a great versioning strategy in place. In fact, we would need to make breaking changes in all affected versions, making use of deprecations, a period to let clients move away, before finally making the breaking change. Notice something funny? Yep, versioning requires the same amount of work (possibly more depending on amount of versions) than if we had a single continuously evolving version.

Not matter what strategy you picked, it’s how you go through that deprecation period that will determine how good your API evolution is. This is sometimes referred to as “change management”.

Change Management

As API maintainers, no matter what evolution / versioning process we decide to standardize one, one thing is for sure: we have to get good at change management. Rarely will throwing a deprecated directive on a field solve all your problems. We need more a bit more than this!

Communicate 💌

Communicate early and often. Reaching all users of the use case you are deprecating is hard. Often, a combination of all mediums you can think of works best:

An API upcoming change log.
Your API’s Twitter Account.
A notice on your developers doc website.
A blog post announcing the upcoming changes.
A notice on the API client dashboard if you have one.
Direct Email.

You communications should be clear about what is going away, why that’s the case, and what the new way of doing things is. Provide examples of old vs new if you can. Empathy goes a long way, and if this does not happen often serious integrators won’t hold a grudge.

Track 🔎

The best communication is often the one that is targeted and closer to individual integrators. But for that to happen you need to know in the first place who out of all integrators could be affected. For GraphQL APIs, it’s both easier and harder than with HTTP APIs. Let’s start with the good news.

As I mentioned in this article, GraphQL allows us to track our API usage in a very fine grained way, which is awesome. The bad news is that doing that requires a bit more work than simply looking at HTTP logs or tracking endpoints.

If you want a plug and play solution, Apollo’s Platform product allows you to do just that. If not you’ll have to build your own. Generally, you’ll want to parse incoming GraphQL query strings, map them to which part of the schema they use, and store the fields & arguments as well as client details (app id, email, etc) that are being used in your datastore of choice. You’ll then be able to query for things like “Which clients have queries User.name in the past 24 hours?”, which will help you extract a list of integrators to contact. Once you have contacted them, you can use the same data to see if usage is going down over time. Once it’s at an acceptable state to you, you can make the change. This is an absolutely amazing concept that serious GraphQL platforms should consider using.

Last Resort 🚨

Sometimes you’ve announced the change everywhere, reached out to individual integrators via email, and still there’s usage. There will always be a point where a tradeoff needs to be made between how perfect the transition needs to be, and evolution not being blocked.

A technique I really like to use to “wake up” integrators that haven’t seen or haven’t acted on the deprecation notice is called “API brownouts”. With brownouts, you temporally make the breaking change in the hope that a monitoring system, logs, or a human notices that something is broken for those last integrators.

Hopefully the error your API is returning includes some kind of information on how to fix it:

{
  "errors": [{ 
    "message": "Deprecated: Field `name` does not exist on type `User` anymore. Upgrade as soon as possible. See: https://my.api.com/blog/deprecation-of-name-on-user"
  }]
}

Take a look at the data you’re gathering after some brownout sessions and see if the usage is dropping!

So… Should we Version?

That decision ultimately boils down to your own set of trade offs, what your clients are expecting, and what kind of expectations you want to set as an API provider. However, more and more, I’m starting to think that versioning usually ends up causing more trouble than anything, since a lot of the time, there comes a point where changes need to be made, just like in a continuous evolution approach.

GraphQL helps us do continuous evolution in a few ways that make it quite a bit easier:

It has first-class deprecation support on fields, and most tooling already knows how to use it.
Additive changes come with no overhead on existing and new clients.
Usage tracking can be done down to single fields.

These three things make GraphQL a really good candidate for continuous evolution, which is why I think we see it being recommended so heavily. Another thing to keep in mind is that if you opt for a continuous evolution approach first and then decide you absolutely need versioning, that’s possible. The opposite is much harder.

It’s important to mention that I think continuous evolution can definitely be done in a bad way. It is a big responsibility and can’t be abused. That’s why additive changes must be the absolute priority before making changes.

Finally, the best way to avoid all these kinds of problems is often at the root: API design. Use a design-first approach with a focus on evolvable design from the get go. When changes have to be made, we “just” have to be very good at change management and hope for the best!

If you want to read more on what’s been said about the subject, here are another bunch of versioning articles I loved:

Originally published at https://productionreadygraphql.com on November 6, 2019.

Caching & GraphQL: The Elephant in the Room

Marc-Andre Giroux — Tue, 25 Jun 2019 18:18:40 +0000

This post is a preview of a book im working on, on building great GraphQL servers, if you're interested you can check it out here: https://book.graphqlschemadesign.com 💚

If you've followed the discussions around whether GraphQL is a good idea or not, you might have heard things like "GraphQL breaks caching", or "GraphQL is not cacheable". If not, I guarantee you'll hear similar things when you start displaying interest, or implementing GraphQL. This is something I see some companies starting to use GraphQL being scared of, and for which they never heard a clear answer on. Before we dive into the world of caching and GraphQL, it might be a good idea to address these common concerns, and where they originate from.

Comments like "GraphQL breaks caching" lack the nuance required to actually have a proper discussion about caching and GraphQL. What kind of caching? Client side? Server side? HTTP caching? Application side caching? To have a proper discussion, and end with a better understanding of GraphQL's limitations in terms of caching, we must be more nuanced.

GraphQL breaks server-side caching?

This is a common thing to see thrown around when talking about GraphQL. The first thing to understand is that "server-side caching" is already vague. At this point, we know that GraphQL can actually be a thin layer over our existing servers, and that in no way GraphQL prevents us to cache on the server-side, specifically what we could call Application Caching. We will dive deeper into some concepts that can be applied at the application level later on in this chapter.

If you're familiar with popular GraphQL clients, you know one of their major feature is a denormalized cache that allows client side application to avoid refetching data they already posess, using it to optimistically update an UI, and to keep a consistent version of the world across components.

If we can actually cache things both at the server and client layers, why are we hearing so much about GraphQL "breaking", or making caching really hard? This is where it becomes more nuanced.

HTTP Caching

While certain API styles like REST make great use the powerful HTTP semantics, GraphQL does not really, at least, by default. Since GraphQL is transport agnostic, most server implementations out there use HTTP as a "dumb pipe", rather than using it to its full potential. This causes issues around certain things, like HTTP caching. There are multiple parts to HTTP caching that are important to understand before we go further.

First, there are many different cache entities that can be involved in HTTP caching. Client side caches, such as browser caches, use HTTP caching to avoid refetching data that is still fresh. Gateway caches are usually deployed along with a server, to avoid requests from always hitting servers if the information is still up to date at the cache level.

There are two concepts that are particularly important to understand when it comes HTTP caching: freshness and validation. Freshness lets the server transmit, through Cache-Control and Expires HTTP headers, the time for which a resource should be considered fresh. For example, a server returning this Cache-Control header is telling clients not to bother fetching this resource again until it has been at least one hour (3600 seconds).

Cache-Control: max-age=3600

This is especially great for data that doesn't change often, such as browser assets. Whenever the age of the resource we fetched will be greater than this max-age, the client will emit a request instead of using the value in its cache. However, it doesn't mean that it actually changed on the server. This is where validation comes in.

Validation is a way for clients to avoid refetching data when they're not sure if the data is still fresh or not. There are two common HTTP headers to achieve this. The first one is Last-Modified. When an HTTP cache on the server has a value for Last-Modified, a client can send a If-Modified-Since to avoid downloading the data if the data hasn't changed since last time it downloaded it.

The other common way of validating caches is using ETag. Etags are server generated identifiers for reprensentations that change when the representation does. This lets the client track which "version" of the reprensentation it has and avoid re-downloading a representation for which the Etag is the same as the one the client has.

Together, freshness and validation are a powerful way to control client and gateway caches. To get a deeper understanding of HTTP caching, I highly recommend this article, by the great Mark Nottingham.

GraphQL & HTTP Caching

When we dig deeper into the issues with GraphQL & Caching, we discover some of these issues are purely related HTTP Caching. It is an important distinction to make since server-side caching could mean just as well an HTTP Gateway cache, or application side caching on the server.

One of the first things that could influence how HTTP caching works with GraphQL is the HTTP verb used to send GraphQL queries. There is a lot of misinformation out there, that has lead to some people believing using POST on a GraphQL endpoint is the only way to make it work. HTTP caches will not caches POST requests, which means GraphQL is simply not cacheable at the HTTP level. However, GET is indeed a valid way to query a GraphQL server over HTTP. This means that caches could indeed cache GraphQL responses.

The only issue with GET is with the size of the query string. For example, almost each browser has different limits for these. If this becomes an issue, persisted queries become very useful. We'll cover those later on, but they let you store query strings on the server instead of the client, meaning a client could execute queries like this:

GET /graphql/my_query

There's one last blocker. Since most GraphQL implementations don't use much of HTTP semantics, most GraphQL servers will currently let you use GET along with a mutation operation. This will not play well with caches. One way to address this issue would be to design your server to reject mutations using GET, and require mutation operations to be run on POST only.

At this point, we've got all the basic elements to have HTTP caching and GraphQL working together. In fact, as we talked about earlier in the book, if we see GraphQL queries as way to dynamically create a server side client specific representation, each query is in fact, something that could be cached. Can we apply HTTP concepts to GraphQL queries? Let's start with freshness. With freshness, what we would want is for a server to be able to tell a client how long the query can be considered as fresh, and when to request for this data again. The unfortunate thing here is that HTTP semantics operate on whole responses/representations, and doesn't care or understand GraphQL queries, meaning we don't have a way to do per-field freshness for example. Still, nothing could stop us from adding a freshness to a whole query: we could say that a GraphQL query's max-age is equal to the field in the query with the lowest max-age.

Validation is similar. While we can't use HTTP to revalidate only parts of the query, we could set Last-Modified to the value of the field with oldest Last-Modified value, and we could also generate an ETag based on a combination of all data loaded within the query.

While these are possible, they're not ideal. Since GraphQL queries possibly span multiple entities that could change, and that they need to be represented as one representation on the GraphQL side, the amount of invalidations would be quite high. A single field being invalidated would invalidate the entire query, even if the rest of it was still fresh.

Customizability vs Optimizability, Again

Remember the continum of customizability we covered earlier in this book? Well it turns out this also affect how "cacheable" GraphQL really is. The invalidation issue we discussed above is not something very specific to GraphQL. In fact, it is specific to highly customizable APIs.

Take for example a typical HTTP endpoint for a web API:

GET /user/1

This particular endpoint accepts no particular query parameters and simply returns the user associated to this URI. As a public API especially, this endpoint is highly cacheable across all API clients. Now imagine a more customizable version of this endpoint:

GET /user/1?partial=complete

GET /user/1?partial=compact

This API uses a partial query parameter to change the level of detail of the response. An even more customizable API, just as we saw in the introduction could look like this:

GET /user/1?fields=name,friends

The more versions of an HTTP endpoint we have, the more we "dillute" the cache. Meaning someone requesting fields=name only can't actually use a cache, even though someone requested fields=name,friends. We've got the same issue happening with GraphQL, remove a field, change anything to a query in fact, and we lose the benefit of all queries that were cached with a superset or subset of the data.

As you see however, this is not something specific to GraphQL at all, and can be found in any API over HTTP that decides to opt for a more customizable API. Hopefully, that tradeoff was deliberate and the cache invalidation issues were worth it on the long run. Instead of "GraphQL is not cacheable", how about "Highly customizable APIs benefit less from HTTP caching"?

How Important is HTTP Caching to you?

There's no doubt HTTP caching is a wonderful mechanism for data that doesn't change often, and can be shared across multiple users, especially when talking about gateway caches. For authenticated, web APIs, the eternal debate is on how useful HTTP caching really is. It is a debate which I won't solve here, but that is still interesting to discuss.

An interesting fact is that shared caches actually should not cache any request with an Authorization header. If your API is authenticated, the "GraphQL breaks shared caches" argument simply does not apply.

Private caches, such as browser caches and client side caches could still gain a lot from using HTTP caching. As we saw, it is not out of question with GraphQL, it is simply not as powerful as for highly optimized/one-size-fits-all APIs because of how often a query can be invalidated and how little can be shared.

Another thing to keep in mind is that lot of web APIs actually can't have stale data for very long and freshness headers become less useful.

Validators such as ETag and Last-Modified usually require the server to retrieve all necessary data and run business logic to be computed. This usually is the major part of the work, savings being mainly on serialization, and bandwith since no data needs to be transmited. If bandwidth or serialization is an issue, again, nothing could stop you to implement Etag or Last-Modified generation for a GraphQL query.

GraphQL definitely made tradeoffs where it is much more suited to authenticated APIs and realtime data that changes often, versus serving long lived data as a public API. If your use case is the latter one, and it is the only thing your API does, considering using an API architecture that uses HTTP in a more meaningful way could be a better choice.

Ways Forward

HTTP Caching could benefit GraphQL in good ways. The lack of GraphQL over HTTP specification is something that makes things a bit harder. The mutations over GET is an example of something that could be solved by such specification. However, there are many other ways to cache GraphQL, be it at the client level, the whole response level, the individual resolver level, etc. In this chapter, we will mainly cover GraphQL specific approaches since these are the most used tools at the moment and can be more powerful in the long run since they understand GraphQL semantics.