Documentation Isn't Enough

Documentation isn't enough to learn a new codebase.

I just read a very thought-provoking article by Baldur Bjarnason, called Theory-building and why employee churn is lethal to software companies. It is thought-provoking at this point because I sense that the essential point is true, and it's uncomfortable if it's true.

The main point seems to be that programmers are tied to the code they write -- and not easily trained in existing code. But I want to be able to write code that is easy to inherit, and I want to grok what has already written in code I inherit. Anyway, that's another post. In this post, I want to focus on a sub-point about the effectiveness of docs as a learning tool for those programmers who inherit code:

Documentation only works up to a point because it can both get out of sync with what the code is doing and because documenting the internals of a complex piece of software is a rare skill. A skill that most developers don’t possess. Most internal documentation only begins to make sense to a developer after they’ve developed an internal mental model of how it all hangs together. Most code documentation becomes useful after you have built the theory in your mind, not before. It operates as a mnemonic for what you already know, not as a tool for learning.

Yesterday I removed code from a codebase. It was a feature that had been deprecated. The removals were spread over 80+ files. As a part of the removal, I deleted a function that seemed to only be used in the code path of the feature I was removing.

Thankfully, I was informed otherwise in the merge request (feedback paraphrased):

...Is this function really only required for the thing you're removing? ...My only thought is that this might later blow up in a weird way, because it doesn't look like it checks for the important data before firing off an async request.

This was from someone with deeper knowledge about the subsystem I was adjusting.

And another programmer was tapped to make sure that we understood the need. He chimed in with his knowledge:

In this specific case, the goal is to prevent the user from advancing to a later screen that would not be able to succeed if that important data haven't been written back to the service. At the time this function was added, it was this feature that needed the data, and all other code paths could gracefully degrade if none were present. But now that we have these other features, that's a new dependency on having the data there, and this will never work if the data is null. So, I would recommend to keep this function in place. You maybe can move it further back in the code path, but if we don't have it in place at all then it's possible that a user could get ahead of an asynchronous background event and then eventually get a 500 from this service when the data is null.

I went back and looked at the doc on the function I had removed (paraphrased):

Decides whether the data necessary to proceed with the process are available.

If not, this function returns False to force the client to wait and retry.

This function exists to ensure that the asynchronous process has completed successfully before proceeding forward with the process.

And only then did I appreciate the documentation. One could argue it shouldn't have taken me so long. :) I reinstated the code and avoided introducing a bug (or at least this one ;).

The day before I read the article, I experienced this: I read the doc but I didn't understand it -- its significance, really -- until the mental model was grown in me through a conversation in the context of the concrete change that was being made in the moment. That mental model was passed on from people who were there when it was created. They knew why it was there. They detected a problem and were able to share what they sensed, a bit of history and application to the current context.

I don't think there is nothing to learn from code docs. This same codebase has some helpful counter examples. But I think their usefulness in providing sufficient context for a fresh programmer is over-estimated. In that sense, I agree with the article. Now, to go ponder on the other implications therein.