October 26, 2008

Fail Early, Fail Loudly

One easy step to help simplify your programs is to follow the adage "fail early and fail loudly". By failing early, I mean that your code should actively look for problems and stop as soon as something wrong is encountered. By failing loudly, I mean that your code should raise the alarm in a way that makes it obvious to other parts of the system (and people reading the code) that something unusual has just occurred.

Naturally, this doesn't mean that your code should explode whenever the slightest thing is wrong. On the contrary, it means that each object/package/component in your system should demonstrate good cohesion by refusing to operate in a situation it shouldn't have enough context to do so. Instead, each component should fail when encountering something that it isn't supposed to be able to handle, and delegate responsibility for handling that situation to the caller. The caller can then decide whether it has sufficient context to handle the situation, or to pass along the responsibility to another component.

Failing early is a benefit because you can then build whole sections of your program which don't have to consider some particular error case. For example, consider building a web service which accepts objects encoded in XML as input. If all the necessary error checking is performed up front (e.g. the XML is validated against the DTD, required elements needed in the API are found to be present, and various values are confirmed to be within acceptable ranges), the remainder of the web service call can be written to assume that the request was perfectly valid. The notion of Short Circuit statements I discussed recently is another variation on failing early. All of them reduce the mental load of reading (and writing) the code which follows them.

Failing loudly is a benefit because it reduces the mental effort required to follow up on errors which occur in other parts of the system. If a component fails loudly, a caller must ensure that they correctly respond to error conditions in order to be correct themselves. The best example of this is a checked exception. In Java, any method which can possibly throw a checked exception must explicitly declare it in its signature. Calling methods are forced - by the compiler - to either to catch the exception or declare that they throw the same exception themselves. This completely radically reduces the mental effort required to follow up on the error, and provides a built-in mechanism to remind you when you've forgotten to do so.

(The fact that some API's abuse checked exceptions horrendously is a topic for another time.)

There are a number of cases where these two ideas apply, but they are especially true around where data enters a program's runtime environment. Some examples include: reading from a database, accepting user input from the mouse/keyboard, accepting an incoming TCP/HTTP connection, or simply reading data from a file. In all of these cases, doing all of your error checking as soon as the data has entered the runtime environment will allow subsequent code to assume all the data was correct.

Failing early and failing loudly can also be seen as corollaries of Cohesion and Coupling. If a component is well-integrated to a single purpose, it will not attempt to handle conditions which are outside of that purpose. Instead, it will fail as soon as it is asked to cope with such a situation. In order for a component to offer loose coupling, it must reduce the amount of knowledge needed to interact with it. One aspect of this is to make it very clear what to expect in case the component has problems. Both of these concepts lead back to preserving the Unit Economy of the reader (and author!) of the code.

Questions, feedback, suggestions? I'd love to hear it.

2 comments:

  1. Paragraph four is missing a "to" in "required mental effort [to] follow up."

    Great post! A point that might be interesting to expand on is how to know when you have the required context to handle the exception and when you don't. Any good examples to demonstrate this distinction?

    ReplyDelete
  2. Oops! Thanks David! I'll keep that in mind for a future post.

    ReplyDelete