November 16, 2008

The Right Size for a Method

One occasionally hears of some programming zealot who swears up and down that methods should be kept to 'n' lines or less. The actual value may vary, but the arbitrary nature of the claim remains. However, there is a kernel of truth there, and it has to do with preserving Unit Economy.

We all know that the longer a method is, the more we have to keep in our minds to understand it. There are likely to be more local variables, more conditional statements, more exceptions caught and thrown, and more side-effects of all those lines of code. Furthermore, the problem grows faster and faster as there are more lines of code since they all potentially can have an impact on one another. Keeping methods short has a disproportionately large benefit.

Of course, claiming that there's some absolute "correct" number is clearly nonsensical. The same number of lines in C, Lisp, Java, Assembler, or Ruby will accomplish radically different things, and even how one legitimately counts lines will change dramatically. What does not change, though, is the need for the reader (and author) of the code to understand it as a whole. To this end, one should strive to keep the number of discrete tasks a method accomplishes to within the range of what people generally can remember at once: between one and six.

Each task within a method may have several lines of code of its own; how many tends to vary widely. Consider the process of reading a series of rows from a database. There may be a task to establish a database connection, another to create the query, another to read the values, and perhaps one more to close everything down. Each of these may be composed of anywhere from one to many lines of code.

Tasks may even have subtasks. Consider the example of building a login dialog. At some point, there is likely to be some code which creates a variety of controls, and places them on the screen (e.g. an image control for the company logo, a text field to capture the user name, etc). In the method which does this, one may consider the process of creating the components a single task which has a number of subtasks: one for each component.

In both cases, the important consideration is how organizing the method into tasks and subtasks helps preserve Unit Economy. By creating tasks which have strong Cohesion (i.e. you can name what that group of code does) and loose Coupling (i.e. you can actually separate that group of lines from the others), you give the reader ready-made abstractions within your method. In the first example, the reader can readily tell that there's a section for setting up the connection, and be able to mentally file that away as one unit without the need to remember each line of code in it. In the latter example, the reader can categorize the series of UI element creation subtasks as a single "build all the UI components" task, and again be able to abstract the entire thing away under a single unit. Even if there are a dozen or more individual components, it still can be considered a single task, that is, a single mental unit.

This ability to create abstractions within a single method is why there is no absolute "right" size for a method. Since grouping like things into tasks and subtasks preserves the reader's (and author's) Unit Economy, it is quite possible to have a method which is very long in absolute terms, and still quite comprehensible. It also implies that a fairly short method can be "too long" if it fails to provide this kind of mental structure. The proper length will always be determined by the amount of units (tasks) which one has to keep in mind, and the complexity of how those tasks are interrelated.

No comments:

Post a Comment