September 28, 2008

Coupling & Cohesion, Part II

The concepts of coupling and cohesion apply at all levels of programming from writing a single method all the way up to planning the architecture of Amazon.com. As you build each piece (an individual line of code, a method, an object, or an entire remote service), you have to make sure it has strong cohesion and loose coupling. Does the component do exactly one thing which is easy to describe and conceptualize? Does it have relatively few, easy-to-understand connections to the other components around it?

Consider what it means to have strong cohesion for a single line of code. To have good cohesion, it would need to produce a single clear outcome. On the other hand, a line of code with poor cohesion will tend to have multiple side effects, or calculate many values at once:


int balance = priorBalances[balanceIndex++] -
withdrawals[withdrawalIndex++];

float gravitation = UNIVERSAL_G * (bodyA.mass * KG_PER_LB) *
(bodyB.mass * KG_PER_LB) /
((bodyA.position.x - bodyB.position.x) *
(bodyA.position.x - bodyB.position.x) +
(bodyA.position.y - bodyB.position.y) *
(bodyA.position.y - bodyB.position.y));


In both of these cases, the code is doing multiple things at once, and in order to understand what is going on, you have to mentally pull it apart, understand each piece, and then integrate them back together. Both of these lines can easily be re-written as several lines which each demonstrate much better cohesion:


balanceIndex++;
withdrawalIndex++;
int balance = priorBalances[balanceIndex] - widthdrawals[withdrawalIndex];

float massA = bodyA.mass * KG_PER_LB;
float massB = bodyB.mass * KG_PER_LB;
float xRadiusPart = bodyA.position.x - bodyB.position.y;
float yRadiusPart = bodyA.position.y - bodyB.position.y;
float radiusSquared = xRadiusPart * xRadiusPart + yRadiusPart *
yRadiusPart;
float gravitation = UNIVERSAL_G * massA * massB /
radiusSquared;


Each of the re-written examples has statements which are simpler, easier to understand, and clearly accomplish a single result.

At the far other end of the size spectrum, consider what strong cohesion means for a single service in a massively distributed system (e.g. Amazon.com). For a long while, there was a single, central piece of software, called Obidos, which was responsible for everything from presenting HTML to calculating the cost of an order, to contacting the UPS server to find out where a package was. This ultimately resulted in single program which constantly broke down, was impossible to understand fully, and nearly impossible to actually compile. The crux of the problem is that Obidos tried to do too much, and wound up with terrible cohesion. There was no way anyone could get their head around the essential functions it performed without dropping all kinds of important information.

That was many years ago, and since then, Amazon has considerably improved its situation. As an example, there is now a single service whose sole purpose is to compose the totals and subtotals for an order. It communicates with other services which each compute individual charges (e.g. shipping charges, tax, etc), and all it does is put them together in the right order. This new service is much easier to understand, far easier to describe, and much, much easier to work with on a daily basis.

Next time, I'll talk about how coupling applies across all levels, too.

September 21, 2008

Coupling & Cohesion

In my previous post, I discussed how the mind is naturally limited in the number of things it can consider at once, and how we create abstractions to increase the range of our thinking. By using abstractions, we can hide the details of how something works, thereby allowing ourselves to handle more information and still only have to keep in mind a small number of discrete items. This concept is called Unit Economy.

A consequence of this limitation is that we naturally design complex systems by breaking them down into simpler pieces. If any one piece is still too complex to build, then we break that piece down even further. The act of breaking a system into pieces serves the same function in engineering that creating abstractions does in thinking. Both allow us to ignore the details how a part of the system works, and just keep in mind the overall notion of what it does.

In order for this decomposition to work, however, we must follow two principles: Coupling and Cohesion.

Coupling is the extent to which two components are interconnected. This connection can be defined in terms of actual connections in the final design. But, for our purposes, consider it in terms of how much one has to know about one component in order to understand the function of the other. The crucial point is that Coupling describes the mental load required to understand the relationship between the two components.

To take some examples in the physical world, consider a toaster and a gas stove. A toaster is loosely coupled to the rest of the kitchen. It has a single plug, which is an industry standard, and which is shared by nearly every other electrical appliance in the kitchen. On the other hand, a gas stove is tightly coupled to the rest of the kitchen. It requires a gas main, a vent to be installed above it, an exhaust pipe, and it must be mounted flush with the rest of the cabinetry. When installing a toaster, you simply have to find a flat surface near a plug. When installing a gas stove, you need to understand quite a bit about the structure of the whole kitchen. The mental effort required to understand how a toaster is connected to the rest of the kitchen is far less than that required for the stove.

Cohesion is the extent to which all the parts of a component serve a unified purpose. For the purposes of computing mental load, we measure this by how easily we can come up with a single sentence which describes the essence of what the component does, and by whether each part of the component is needed to accomplish that task. The crucial point in terms of Unit Economy is that we are able to come up with a simple abstraction for the component which allows us to ignore the details of how the component works.

For some examples of strong and weak cohesion, consider a television set and a swiss army knife. In the television set, the description of what it does is pretty simple: "A television set converts a TV signal into a visible picture". On the other hand, describing a swiss army knife isn't nearly so simple. Attempting to come up with a similar statement gets pretty awkward: "A swiss army knife is a multi-function device which provides the ability to conveniently store and reveal tools to: cut things in a variety of ways, drive screws of various kinds, etc." When considering building some kind of system with these things, it's much easier to keep in mind a simple definition (like the TV set) than a rambling, complex one (like the swiss army knife).

So, what does this have to do with programming? More on that next time.

September 13, 2008

Crow Epistemology

Our brain is an amazing organ, capable of truly astounding feats of abstraction and generalization, particularly when compared to a computer. On the other hand, it measures up pretty poorly when it comes down to managing a lot of information at once.  Ayn Rand, a 20th century philosopher, described this phenomenon as Crow Epistemology with the following story (paraphrased).
Imagine there are a bunch of crows sitting in the tree-tops at the edge of a forest. A pair of hunters pass them on their way into the forest, and all the crows get really quiet. One hunter comes out alone, but the crows stay quiet because they know there's still another one in the forest. At little while later, the second hunter comes out, and as soon as he's out of sight, the crows relax and start cawing again.

Now, imagine that a group of 20 hunters goes into the forest. The crows get all quiet, just like before. After a while, 15 hunters come out again. As soon as they're out of sight, the crows start up with their cawing again. The crows could keep track of two hunters, but 20 was just too many for them to track by sight.
Of course, humans have the same problem. To address this, we create abstractions (like numbers) which allow us to group things together and keep track of them as a single unit. In our example, a boy sitting at the edge of the forest could simply count the hunters, and just remember one thing (the number 20). That way, he could easily know whether all the hunters had left the woods or not.

It turns out, programming is a lot harder than counting, and to do it, we need to keep all kinds of information stuffed in our heads. Naturally, the rules are no different, so we, as programmers, use lots of abstractions to keep everything straight. No one could possibly think about all the electrical signals racing around in a computer while they were designing a game. Even when designing a simple game, we need to break the program up into pieces, and complete each piece one at a time, and then stitch the parts together.

This process of using higher and higher abstractions to manage complexity is known as Unit Economy.  By grouping complex things together into a single unit, we can forget about how it works, and just remember what it does.  You don't have to remember how a transistor works to understand what it does, just as you don't need to remember how to implement a hash table to understand what it does.

The concept of Unit Economy is behind everything we do in our daily work as programmers.  Not too surprisingly, abusing a reader's Unit Economy is the foremost way to make your code unreadable.  More to come on this topic next time.

Introduction

I got interested in writing a blog through observing the daily complaint by programmers everywhere that there's a lot of pretty bad code out there. Now, there's probably a huge number of reasons for that, but one I personally keep running across is that the author of the code doesn't seem to know how to write code to be read by another human being.

The focus of this blog will be on what we can do in writing our code to help future readers of our code (including ourselves) understand what it is supposed to do. Like any discussion of coding style, a good deal of it will be based upon my own opinion of what makes something clearer. Wherever possible, though, I'd like to justify my own opinions by referencing how the mind works, and providing objective reasons for doing things one way over another.

Hopefully, I'll be able to share some of the things I've picked up in my own work, and learn something in the process.  Whether you agree or disagree, I'd love to hear your take on any of the topics I address!