Chesterton's Fence

Imagine yourself walking down a country lane, lush green grass around you, no farm animals anywhere, when suddenly you see a fence right in the middle of the path. You think, now, that’s a bit silly, that fence is blocking the path, somebody should have this fence removed. And by thinking that you’d fall right into the predicament known as Chesterton’s Fence. That is, you see something that you instinctively feel does not belong and you want to remove it. And perhaps that is exactly what needs to be done, but not before you ask a very important question, “why”? Why is the fence here? What function does it serve? Who put it there? What were they trying to achieve?

In any complex system, and most of the systems we work with these days are complex, problems often arise as a result of relationships and interactions between components. Our systems contain many components, some with special optimizations, some acting as local stabilizers, that might appear inefficient and unintuitive. Other components, or parts of the system seem to serve no apparent purpose at all.

Any given component is usually self-contained and can be understood, reasoned about, modified and improved by one person, or a small team. Where the complexity usually surfaces is when these well-defined components start forming relationships and dependencies. A change in one place can cause a cascade of effects across the system. Sometimes these effects are immediate, sometimes they’re separated by several layers of indirection or appear as latent failures. For instance, removing a rate limiter in one component could adversely affect another that is not directly connected. Or optimizing the processing speed of another component might degrade the entire system during peak sales season, because the database was not scaled to accommodate the increased load.

And here’s where the fence metaphor reveals something insightful. If I don’t understand why “this” thing is here, what are the chances are that removing or modifying it will break or adversely affect the system? I would say they’re pretty high.

So what should we do? Same thing we did when encountering the fence, ask questions first.

If I remove this thing here, how will that affect system as a whole?

If I optimize this locally, will I cause the bottleneck somewhere else?

Does this odd-looking code here handle a rare edge case that only happens once a month? Will removing it plunge the system into a metastable failure?

It so might happen that this useless fence was put here to prevent a runaway dynamic. Things will appear fine for a while until the trigger is initiated and we get into a state that might require a cold restart of the entire system.

That fence is more than just an oddity, it serves as a useful metaphor for working with technology and software systems. Even though the Chesterton’s Fence heuristic predates it, it is very much related to Systems Thinking approach, something that I consider essential whenever we deal with systems of any complexity.

There is another dimension to this mode of thinking and asking questions, and that is respecting the wisdom of the elders. The engineers who walked this road before us. The ones that built the original system and its components. Working systems, especially ones that have been around for a while, tend to be built by smart people, just as smart as us, under a set of constraints. Deadlines, budget, available technology, best practices, regulatory rules, organizational structure, all of these affect what gets built at the time.

This connects to Gall’s Law: every working complex system evolved from a simple system that worked. One doesn’t need to look far. Quite a few prominent services, such as YouTube, Facebook or Uber, started as relatively simple Python/PHP based application running on a LAMP stack. Over time, as the scale requirements changed, friction points emerged, and engineers learned the pain points, the architecture evolved and relevant components were modified or removed. A lot of questions were asked that resulted in well-targeted actions.

Conversely, examples abound of complex systems designed from scratch that failed, such as IBM’s OS/360 (see The Mythical Month), Netscape Browser Total Rewrite, or countless public sector IT project. This is known as the Second-System Effect or what I also call a Reverse Chesterton’s Fence. That is, having hubris to try to build something complex from get go, bypassing the iterative learning stages. The systems we build are complex, because the world in which they operate is complex. Emergent phenomena, changing landscape, new ways of doing things, all these introduce friction. And in overcoming that friction we build fences. If we pretend that we can account for all friction upfront, we risk building a complex system doomed to fail.

Senior engineers often fall prey to Chesterton’s Fence combined with the Cargo Cult architecture anti-pattern. Imagine you just joined a new company after having implemented successful migration from a monolith to microservices in your previous role. Upon encountering a monolith in the new place you waste no time in proclaiming that the monolith application needs to be converted into microservices. Without understanding the reasons and history behind the current architecture and the actual problems with the system, we risk inheriting all of those problems over to the redesigned system.

If we ignore the judgment of those who came before us we risk creating new problems. So the next time you see a fence in the middle of a path, stop and think “how curious, I wonder why it is there”. You might just learn something and avoid disturbing the balance in the system.

Most importantly, none of this means you shouldn’t change things or that there aren’t better ways. It just means that we should respect the past, our elders, and approach what we find with curiosity and humility. For that is what we certainly wish the ones that come after us will do when encountering our creations.

References

Chesterton's Fence — Wikipedia. "Chesterton's Fence." https://en.wikipedia.org/wiki/Chesterton's_fence
Gall's Law — Wikipedia. "John Gall (author)." https://en.wikipedia.org/wiki/John_Gall_(author)
Second-System Effect and IBM OS/360 — Brooks, Frederick P. Jr. (1975). The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley. https://en.wikipedia.org/wiki/Second-system_effect
Netscape Navigator Rewrite — Spolsky, Joel. "Things You Should Never Do, Part I." https://www.joelonsoftware.com/articles/fog0000000069.html
Cargo Cult Architecture — McConnell, Steve. "Cargo Cult Software Engineering." https://stevemcconnell.com/articles/cargo-cult-software-engineering/
YouTube Early Architecture — High Scalability. "YouTube Architecture." (2008) http://highscalability.com/youtube-architecture/
Facebook Early Architecture — InfoQ. "Facebook: Science and the Social Graph." Aditya Agarwal, Director of Engineering at Facebook. QCon SF 2008. https://www.infoq.com/presentations/Facebook-Software-Stack/
Uber Early Architecture — Uber Engineering. "Why Uber Engineering Switched from Postgres to MySQL." (2016) https://www.uber.com/en-HK/blog/postgres-to-mysql-migration