Making Changes in a Distributed System


You have a change to make in a distributed system. How many pull requests will it take to make a change?

An Easily-distributed App

A distributed system has two or more independent deployables. In this most common of scenarios, we have a frontend (FE) client and a backend (BE) server. It's just so easy to start. We push this architecture all the time.

The two apps are not independent, however. They are inter-dependent at runtime. The FE and BE work in coordination to do something. Let's say they want to set a date on an object. There's a contract between the two so that they can talk together.

Today, if you want to set the date, you POST /api/object from the FE and send a request body of: { "updatedAt": "2024-04-04T04:04:04Z" }. The BE requires this request body, reads it and updates the value in a database or something. Ok, that's the current state of things.

Steps to Change a Distributed System

Let's say we want to make a change. How does that work in a distributed system?

Let's say our change is that the name of the field changes. Now we want it called statusDate instead of updatedAt. Simple enough. The discrete deployments required to make this happen will be:

  1. BE - add support for optional statusDate alongside updatedAt. Deprecate updatedAt. Backwards compatible. We don't want to break the client.
  2. FE - adjust to send both updatedAt and statusDate. Forwards compatible.
  3. BE - remove support for updateAt. Make statusDate required. To clean up.
  4. FE - remove sending updateAt. To clean up.

Each step is discrete because the two apps are independently deployed. Each are running simultaneously and new versions to either the FE or BE must be compatible with n+1 or n-1 versions of the other at all times. Thus, the dance: 4 pull requests, 4 reviews, 4 builds, 4 deploy events.

Steps to Change a Single Deployable

If the client and server are a part of a single deployed runtime, through a single codebase or linked libraries, there are fewer discrete steps to make a change, even as simple as a field name change. This is because the "client" and "server" portions can be deployed at the exact same time.

In the same scenario for changing updatedAt to statusDate, we could have 1 pull request, 1 review, 1 build and 1 deploy event. For 1 field name change -- feels proportionate.

The Distributed Cost

The distributed nature of the system can buy you some things and cost others. The costs shown here includes:

  1. More paper-pushing management of codebases.
  2. More thinking about contracts and maintaining compatibility.
  3. Breakage possibilities increase if contract thinking is mistaken or if steps are skipped for speed or convenience.