Making Changes in a Distributed System
You have a change to make in a distributed system. How many pull requests will it take to make a change?
An Easily-distributed App
A distributed system has two or more independent deployables. In this most common of scenarios, we have a frontend (FE) client and a backend (BE) server. It's just so easy to start. We push this architecture all the time.
The two apps are not independent, however. They are inter-dependent at runtime. The FE and BE work in coordination to do something. Let's say they want to set a date on an object. There's a contract between the two so that they can talk together.
Today, if you want to set the date, you POST /api/object
from the FE and send a request body of: { "updatedAt": "2024-04-04T04:04:04Z" }
. The BE requires this request body, reads it and updates the value in a database or something. Ok, that's the current state of things.
Steps to Change a Distributed System
Let's say we want to make a change. How does that work in a distributed system?
Let's say our change is that the name of the field changes. Now we want it called statusDate
instead of updatedAt
. Simple enough. The discrete deployments required to make this happen will be:
BE - add support for optional
statusDate
alongsideupdatedAt
. DeprecateupdatedAt
. Backwards compatible. We don't want to break the client.FE - adjust to send both
updatedAt
andstatusDate
. Forwards compatible.BE - remove support for
updateAt
. MakestatusDate
required. To clean up.FE - remove sending
updateAt
. To clean up.
Each step is discrete because the two apps are independently deployed. Each are running simultaneously and new versions to either the FE or BE must be compatible with n+1 or n-1 versions of the other at all times. Thus, the dance: 4 pull requests, 4 reviews, 4 builds, 4 deploy events.
Steps to Change a Single Deployable
If the client and server are a part of a single deployed runtime, through a single codebase or linked libraries, there are fewer discrete steps to make a change, even as simple as a field name change. This is because the "client" and "server" portions can be deployed at the exact same time.
In the same scenario for changing updatedAt
to statusDate
, we could have 1 pull request, 1 review, 1 build and 1 deploy event. For 1 field name change -- feels proportionate.
The Distributed Cost
The distributed nature of the system can buy you some things and cost others. The costs shown here includes:
More paper-pushing management of codebases.
More thinking about contracts and maintaining compatibility.
Breakage possibilities increase if contract thinking is mistaken or if steps are skipped for speed or convenience.