October 26, 2008

Fail Early, Fail Loudly

One easy step to help simplify your programs is to follow the adage "fail early and fail loudly". By failing early, I mean that your code should actively look for problems and stop as soon as something wrong is encountered. By failing loudly, I mean that your code should raise the alarm in a way that makes it obvious to other parts of the system (and people reading the code) that something unusual has just occurred.

Naturally, this doesn't mean that your code should explode whenever the slightest thing is wrong. On the contrary, it means that each object/package/component in your system should demonstrate good cohesion by refusing to operate in a situation it shouldn't have enough context to do so. Instead, each component should fail when encountering something that it isn't supposed to be able to handle, and delegate responsibility for handling that situation to the caller. The caller can then decide whether it has sufficient context to handle the situation, or to pass along the responsibility to another component.

Failing early is a benefit because you can then build whole sections of your program which don't have to consider some particular error case. For example, consider building a web service which accepts objects encoded in XML as input. If all the necessary error checking is performed up front (e.g. the XML is validated against the DTD, required elements needed in the API are found to be present, and various values are confirmed to be within acceptable ranges), the remainder of the web service call can be written to assume that the request was perfectly valid. The notion of Short Circuit statements I discussed recently is another variation on failing early. All of them reduce the mental load of reading (and writing) the code which follows them.

Failing loudly is a benefit because it reduces the mental effort required to follow up on errors which occur in other parts of the system. If a component fails loudly, a caller must ensure that they correctly respond to error conditions in order to be correct themselves. The best example of this is a checked exception. In Java, any method which can possibly throw a checked exception must explicitly declare it in its signature. Calling methods are forced - by the compiler - to either to catch the exception or declare that they throw the same exception themselves. This completely radically reduces the mental effort required to follow up on the error, and provides a built-in mechanism to remind you when you've forgotten to do so.

(The fact that some API's abuse checked exceptions horrendously is a topic for another time.)

There are a number of cases where these two ideas apply, but they are especially true around where data enters a program's runtime environment. Some examples include: reading from a database, accepting user input from the mouse/keyboard, accepting an incoming TCP/HTTP connection, or simply reading data from a file. In all of these cases, doing all of your error checking as soon as the data has entered the runtime environment will allow subsequent code to assume all the data was correct.

Failing early and failing loudly can also be seen as corollaries of Cohesion and Coupling. If a component is well-integrated to a single purpose, it will not attempt to handle conditions which are outside of that purpose. Instead, it will fail as soon as it is asked to cope with such a situation. In order for a component to offer loose coupling, it must reduce the amount of knowledge needed to interact with it. One aspect of this is to make it very clear what to expect in case the component has problems. Both of these concepts lead back to preserving the Unit Economy of the reader (and author!) of the code.

Questions, feedback, suggestions? I'd love to hear it.

October 19, 2008

Class Names

Coming up with good names is one of the most challenging aspects of Object-Oriented programming. So much of the mental context required to understand a program can be greatly simplified by a well-chosen metaphor, or obfuscated completely by one which doesn't quite fit. A poorly named class can muddle an entire API by confusing which pieces of functionality belong to what class, or creating the wrong mental model in the mind of the reader. So, what are some good rules to follow when naming classes?

Think of a definition statement

In order to ensure good Cohesion, think of what purpose the class serves. If you are able to come up with a single, short, clear statement of what the class should be, it means the candidate class likely has good cohesion. If not, then you may need to merge the class into another, or split it into multiple classes. Once you have a definition statement, choose the shortest noun phrase which captures the essence of it; this is your class's name. The benefit of this approach is not only to come up with a good name; it also provides excellent documentation for the class. In addition, as you and others work with the class in the future, you can refer to the definition to decide whether some new functionality belongs in that class or in another.

As an example, consider the definitional statement for the java.io.LineNumberReader class in the Java Standard Library: "A buffered character-input stream that keeps track of line numbers." This definition draws a very clear boundary around the function of this class, and very neatly boils down to the name of the class itself.

Reuse vocabulary

Keeping in mind the principle of Unit Economy, reuse vocabulary to reduce the number of concepts in the system. When choosing class names, consider what other classes in the system are related, serve a similar function, or fit into the application's architecture in the same way. Subclasses of the same parent class often should repeat the parent class's name with a new adjective to differentiate them from other subclasses (consider the whole java.io package). Classes which play a part in a standard design pattern often should include the name of the part they play (Apple's Cocoa API uses Model, View, Controller frequently in class names). Classes responsible for validating user input may all contain the word, "Validator", to explicitly spell out its role. Reusing this common vocabulary conveys a wealth of information to the future reader by explicitly telling him how the class fits into the system as a whole.

Stick with the end user's language

Nearly all programming is done with some end user in mind, and that end user invariably has a whole set of vocabulary around the problem your program is trying to solve. Take advantage of that built-in set of concepts for naming your classes. If you're writing a program to deal with a user's digital image, name the class "Photo". If you need a collection of them with some meta-data, name the class "Album". Take advantage of the real-world metaphors which already exist to reduce the effort required to understand your code.

Sticking to the user's own language becomes especially important when dealing with more technical areas where the end user's vocabulary may be somewhat foreign to you as a programmer (e.g. finance, mechanical engineering, or medicine). In this case, the existing terminology generally has an exact definition in the user's own problem space, and by sticking with that language, you carry all that meaning over into the program itself. This makes the mental job of translating between what the user says to what the program does much easier in everything from initial requirements, to bug reports, to enhancement requests. By understanding the user's problem domain, one can gain a lot of information about how the program works.

Naturally, there are a ton more suggestions, guidelines, and principles to apply when naming classes. What are some of your favorites that I haven't included?

October 12, 2008

Short Circuit Statements

As promised, it's finally time to move away from theory into a more specific topic about coding. My goal is to make most posts in the future more like this one, now that the most important concepts have been laid out. I'll refer back to them often, so if you haven't read the prior posts, you'll probably want to do that before going further.

What is a short-circuit statement? In this case, I'm not talking about the language feature related to boolean comparisons, but instead I'm talking about statements which cause a method to return as soon as some conclusion has definitely been reached. Here's an example from a very simple compareTo method in Java:

public int compareTo(Foobar that) {
if (that == null) return -1;
if (this._value < that._value) return -1;
if (this._value > that._value) return +1;
return 0;
}


In this example, each line (except the last one) would qualify as a short circuit statement; that is, they all return as soon as a definite answer is determined. If we weren't using short circuit statements, then the code may look like this:

public int compareTo(Foobar that) {
int result = 0;
if (that == null) {
if (this._value == that._value) {
if (this._value < that._value) {
result = -1;
} else {
result = +1;
}
}
}
return result;
}


For something this simple, there isn't a huge difference in the complexity between the two functions, but it still demonstrates the point. Many people ardently recommend always having a single return statement in any function, and would strongly advocate using the second example over the first. However, I would argue that the first is superior because it better respects the Unit Economy of the reader.

Short circuit statements support Unit Economy because they allow a reader to take certain facts for granted for the remainder of a method. In the first example after reading the first line of the method the reader knows that they will never have to worry about the that variable having a null value for the rest of the method. In the second example, the reader will have to carry the context of whether he is still reading code within the first if statement. Every short circuit statement removes one item from the set of things one must consider while reading the remainder of the method.

Naturally, this example is pretty simplistic, and it's a stretch to claim that either method is more complicated than the other. However, consider if this weren't a simple example. If this were a formula to compute an amortization table for a home mortgage, then the first few lines may look like this:

public AmortizationSchedule computeSchedule(
int principle, float rate, int term) {
if (principle <= 0) throw new IllegalArgumentException(
"Principle must be greater than zero");
if (rate < 0.0) throw new IllegalArgumentException(
"Rate cannot be less than zero");
if (term <= 0) throw new IllegalArgumentException(
"Term must be greater than zero");

// Here is the 20 or so lines to compute the schedule...
}


In this case, there may be a substantial amount of code following this brief preamble, and none of it has to consider what may happen if these invariants are broken. This greatly simplifies the act of writing the code, the logic of the code itself, and the process of maintaining the code later on.

Thoughts? Comments? I'd love to hear them.

October 05, 2008

Coupling & Cohesion, Part III

Last time, I described how Cohesion applies to everything from writing a single line of code all the way up to designing a remote service. Now, let's consider the same thing for Coupling.

Recall that Coupling is the mental load required to understand how a particular component relates to another compoent. If we take a line of code as a single component, then what defines how it is connected to the lines around it? For a start: the local variables it uses, methods it calls, conditional statements it is part of, the method it is contained in, and exceptions it catches or throws. The more of these things a single line of code involves, the more coupled it is to the rest of the system.

As an example, consider a line of code which uses a few local variables to call a method and store the result. This could be more or less coupled depending upon a number of factors. How many local variables are needed? Are any of the variables static or global variables? Is the method call private to the class, a public method on another class, or a static method defined somewhere? Is the result being stored in a local variable, an instance variable, or a static/global variable? Depending upon the answers to these questions, that one line may be more or less coupled to the other lines around it.

The implication of having Coupling which is too tight for a single line of code is that you have to understand a lot of other lines in order to understand that one. If it uses global variables, then you have to also understand what other code modifies the state of those variables. If it uses many local variables, then you have to understand the code which sets their values. If it calls a method on another object, then you have to understand what impact that method call will have. All of these things increase the amount of information you need to keep in mind to understand that line of code.

Now, consider what Coupling would mean for a remote service which is part of a large distributed system (e.g. Amazon.com). The connections such a service has are defined by the API it offers, the other services it consumes, and how their APIs are defined. For the service's own API, consider the following: Does the API respond to many service calls or just a few? Do the service calls require a lot of structured data to be passed in? How easy is it for a caller to obtain all the necessary information? How much is the service's internal implementation attached to the API it presents? How common is the communication protocol clients must implement? For the other services it consumes, consider: How many other services does it use? How are their APIs defined (considering the questions above)? Just as with a single line of code, the answers to these questions will define how tightly coupled a service is to the rest of the system around it.

Having Coupling which is too tight for a remote service carries troubles, too. Changes to downstream systems may force the service to need an update. Any change to the API may require upstream services to change as well. It may be impossible to change the service's implementation if it is too tightly coupled to its own API. Finally, it may be difficult to break the service into separate services as it grows in scope. It can be a costly and painful mistake down the road to allow too much coupling between services in a distributed environment.

Okay... enough theory! Next time, on to a more specific subject: Short Circuit statements.