Friday, October 28, 2011

A few of my favourite programming heuristics

There are many aphorisms and heuristics that can helpfully guide programming. Here are four that mean quite concrete things to me, and that have been characteristic of easy to understand but practical and performant implementations during my career.

  1. Clear object life cycle
  2. Analysis and action are separate concerns
  3. Single direction of flow
  4. Hide your technologies

(1) and (2) are particular takes on the single responsibility principle, focusing on the system, at any point in time, be doing one thing. (3) has that in it as well, do one thing and then move on to the next; design so that you're not coming back with feedback and then branching in response to intermediate states. (4) is addresses the laziness that accompanies excitement about the shiny new stuff; it's new, it's great; what could possible be wrong with letting it touch everything, (languages and frameworks being the most dangerous shiny stuff because they are hard to contain).

Know the life cycles of your objects and make them clear. Consider an object system that involves building up a web of objects and then sending it request. It's really useful to draw a sharp line between the phase of building up the web of objects and then subsequently triggering operations. Keep them separate. Where possible try not mutate the web of objects so the system is less likely to wander into invalid arrangements. Keep mutation separate from queries about it's present state or asking it to perform it's work. Explicitly representing object life cycle stages and the transitions and causes of transitions between them is one of the most powerful types of self documenting coding style I've every encountered.

Getters and setters are the enemy of the well managed object life cycles. The classic (anti-)pattern of creating an object and then using setters to tweak it into a valid state means you created an object in an invalid state. You really don't want your objects to have stages in their life cycle when they are invalid; don't allow object tweaking: ban setters. Getters mean other objects can react to what ought to be this object's internal representation, making it hard to change that representation. In messaging terms freely accessible getters are one step away from public attributes, broadcast messages, with each one exponentially increasing the combination of state information available in your program; don't feed runaway complexity: hide state, ban getters.

Many objectives require analysing a situation and acting, but analysis and action are very different beasts and it's best to keep them separate; keep two clear phases that carry you to your goal. Separation of analysis and action is my reaction to the nasty anti-pattern of getting a little information, doing a little work, getting a little more information, branching off somewhere, doing a bit more work, getting some more information. That way of programming means you need to know the history of the processing that got you to the interesting point you're trying to understand. My preference is to take time to figure out the current state, then act on that knowledge. Planning complex mutations or the construction of complex new objects and object collections this way is excellent; like the builder pattern, gather all the information, then act on it; similarly the interpreter pattern, build up a set of actions, then execute those actions.

Single direction of flow, or pipelining, is more of the same really. Avoid going up and down the stack repeatedly, and avoid too much back and forth in complex dialogues. Tell the down stream process or function everything it needs, how to handle errors, where to send results; fire your message; and turn your back on it. It requires you to more explicitly represent different states and possible reactions to those states. Frankly it's easier to trace and debug, less of the nonsense of being in very different states from one line to the next, less needing to know the accumulated side effects of the lines of code in front of you. Further, if you start with a model like this it's much easier to scale via message passing systems, a real win if you're actually planning to be successful.

You use many complex technologies, keep each one well contained. Hiding your technologies is about keeping separate different kinds of complexity. Don't get trapped in a framework, don't let someone else's technology dictate how you organise and present your code. You have complex business logic, represent it clearly. It is almost never the case that your choice of technology is the same as your business logic, so don't let them blend together. Don't design objects that expose they way they are being persisted, (the fact that they are being persisted is clearly a requirement and should be represented, but not specific how).

We all have different experiences and abilities and they shaped our personal measures of quality and our personal rules for achieving quality. It's good to know the values you hold and what in your history led to you holding those values. These four principles deliver their primary value by making large bodies of code easier to understand and particularly easier to debug and extend. I've spent too many hours puzzling over code, particularly stressful hours debugging code or trying to feel safe putting in place what ought to be a simple fix. They work by making temporal interactions simpler, making it clear what has been done, what is known, and what remains to be done. They're shaped by both success and failure at working on some very large bodies of code, and working on the same bodies of code for long periods as I tend to keep my jobs for several years.