Line Noise: May 2011

Saturday, May 14, 2011

The Right of Veto

I'm trying to get my head around the question, "Why do good developers write bad code, and what can we do to fix it?" It's provoking dozens of ideas. This one came to me yesterday:

All developers should have the right of veto on any code in development. Developers should be taking the time to look at the rest of the team's work, to ask the questions, "Do I understand this?", "Could I maintain this?", and "Do I know a better way to do this?". If there's a problem in understanding, or a better way to do it, then exercise the veto.

Exercising the veto means that code doesn't get committed, or should be reverted, or must be improved.

Exercising the veto means you must now get involved in making it better.

Pairs of developers can't pass their own code, they need to get someone else to consider their code and possibly veto it.

No one is so senior as to be above the veto.

If there is disagreement between the code author(s) and the developer exercising the veto then don't argue, bring in more of the team. Perhaps the resolution will be to share knowledge better rather than the veto. But ideally you want to write code that any other team member will readily accept.

As I was thinking of this and talking about with my pair I realised we were writing code that I felt I would want to veto, so the next idea is that every developer or pair announces the state of their code. I plan to stick a big green or red button on top of my monitor; the red button means I veto my own work, and green means I think this is okay.

Self veto, (the red button), is a visible an invitation to help, work with me, pair with me, take me aside and talk about design or the domain or our existing code base. Open to veto, (the green button), is a request, please look at my code, judge it, share in the ownership of it, take responsibility for the whole teams output.

One of the threads that led to this idea came from circumstances that have led to us pairing less. My response was to ask what else we could do to improve code quality: how do we conduct worthwhile code reviews.

I'm heading toward the idea that even pair programmed code could benefit for code review, but I want to make it as light weight as possible. And the real point of code review, the thing that makes it meaningful, is the possibility of rejecting the work. Otherwise it's no more than a poor tool for sharing knowledge.

That realisation that code review is most valuable if it can be acted on tied in with a quote I read, "If you can’t say “No” at work, then your “Yes” is meaningless." The simplest "no" in code review is the veto on commit.

When working in sprints or iterations there's often a step where the developers commit to doing a set of tasks. But I've never really been in a position where I thought I could say no. The best has normally been to say that it will take to long or to negotiate details, but at the end of the day we assume that the right things are being asked for and we try to do them.

It's not an ideal commitment if I can't say no. Can my commitment be real consent if I can't say no? Certainly there have been times when I thought the requirements were unwise, and I've been demoralised doing what I thought to be the wrong thing. Could we create a team, or a work flow, or a process, that really empowered developers to say, "No, we cannot commit to doing that", and be honoured for it?

Wednesday, May 11, 2011

XP is more than Pair Programming and TDD

There are 12 practices in XP and 5 values. The values are open to interpretation but the practices are pretty clear. Many people magpie practices and while the practices are often useful in isolation there is a synergy between them which can be lost.

Coding Standards is one of the practices. To me it encompasses more than the simple things captured by findbugs, checkstyle, and source code analysis tools. It goes beyond naming standards, line lengths, function and class size, and even sophisticated measures like fan out, cylcometric complexity, and whatever other panaceas.

Coding Standards should bring us toward principles like: any developer should be able to understand the code others have written; our implementations should correspond to the way we talk about our system; a developer should be able to figure out requirements from the written code; it should be easy to represent or articulate what we have implemented so that we can meaningfully reason about it and reliably build on it.

It's frequently not so. Frequently TDD and Pair Programming produce code that only makes sense to the developers writing it while they are actively working on it, sometimes developers don't understand what they're doing even as they satisfy tests. Tests frequently capture behaviour that is incidental to requirements. Developers hunker down and satisfy requirements expressed in tests but don't look up beyond that narrow focus.

Failing to attend to Coding Standards is failing to do XP, or at least failing at the whole cloth, and undoubted exposing development to greater risk of producing poor quality results.

There's nothing wrong with building up a personal or local process by picking elements from different development systems, but sometimes it feels like those personal processes are about making life easier by rejecting or neglecting things that require active practice to get substantial quality.

Coding Standards, don't leave home without them.

Monday, May 9, 2011

Implicit and Explicit

Each step of language design gives us tools that make more behaviour implicit, except some things that languages make implicit are really requirements better made explicit. So another thread in language design is to make it possible to be explicit about the things that matter.

The three basics of structured programming are sequence, selection, and repetition. I'm going to use them to discuss how languages make things implicit or explicit.

Selection

The basic tool of selection is the if statement which connects different states to different code paths:

if (condition) then

    fnA()

else

    fnB()

The condition inspects available state and in some states selections fnA and in others fnB. This is an implicit relationship. It is a coincidence of placing these statements in this way. If we had to make the choice again in some other piece of code we could easily get it wrong, we could construct the condition badly, or forget we had to make a choice at all and just call one of the functions, or perhaps not even have access to the state required to make the decision.

The use of polymorphism in typical OO languages makes the connection between state and function explicit.

obj.fn()

The relevant state is represented in obj and how it was instantiated. There are in principle different types of obj, but for each one there is a correct version of fn and that is the one that will be called. It is explicit in the fact of obj being of a particular type that a particular fn will be called.

Repetition

The basic tool of repetition is the loop:

while (condition) do

     fn()

done

But we do many different things with these loops, we might for example iterate over a collection of numbers and sum them. Or we might make a copy of that collection while translating each value in some way. Many languages provide constructs that make the meanings of such loops explicit while taking away all the accidental machinery of loop conditions and so forth.

collection.reduce(0, lambda a x: a + x)

collection.map(lambda x: fn(x))

Given constructs like these there's no possibility of missing elements because of the condition being wrong, and by drawing on the well established vocabulary of list processing the meaning is made clear.

Sequence

This is the tricky one, it's an implicit behaviour to which we are so accustomed that we rarely notice we've relied on it. Many bugs happen through accidental dependency on implicit sequence. Consider:

 def fn()

    stat_a

    stat_b

end



stat_1

fn()

stat_2

We all know that stat_2 is the statement that will be executed after stat_b. First, what else could it be?

In a parallel environment if could be that statements at the same level in a block are executed in parallel, perhaps interleaved, but with no guaranteed order. Simple as that, and very important given the rise of multicore devices.

In such an environment we would need to make sequence explicit. Consider a construct that forces consecutive execution in a parallel environment:

stat_a ::: stat_b

So stat_a will be followed by stat_b. You might also find things like this in functional programmings in the form of monads.

Another form of explicit sequence is the continuation, a mechanism for representing where the next line to be execute is to be found, and a matching mechanism to actual go there. These mechanisms often look like a type of function call, but one for which there is no notion of return, which results in no need for a stack. I admit continuations are one of the weirder twists in computing languages, it can be hard to know when to use them, but I've found they become useful in patterns where you manage the calls, perhaps for deferred execution, or to trap exceptions.

There are other interesting constructs that can help express important, required, sequential operation. With resource management that execute a block of code and then guarantee that after the block something particular will be called. For example the automatic try with construct in Java 7, or perhaps yielding to a block in Ruby.

Don't look at me like that...

It might seem that all the simple forms that I'm calling implicit are spelling out the steps more clearly and must surely be explicit. The trick is to think in terms of required behaviours: if you have a required behaviour how easy would it be get it wrong with the simple forms? A statement shifted to the wrong place, a sign wrong in a condition, and generally nothing in the simple form to tell you what the right form would have been, what was the intention of the construct. The meaning is implicit not explicit.

Conversely, if you use the explicit forms when sequence, repetition, or selection are truly requirements and you will be announcing your intentions very clearly. You'll be making it easier for libraries and languages to do clever optimisations to make use of parallel processing or for safe and efficient use of memory and other resources.

Tools and techniques that work against us

I'm a keen fan of both agile methods and modern software tools but I've become more and more aware of how they can also work against us.

In the Java world we have Eclipse, a fantastically productive IDE, but it has some features that most developers rely on that work powerfully and inexorably against good design. First, it defaults to hiding your import statements. Second, the auto-complete looks outside the currently available scope for possible matches. Third, it makes adding import statements nearly automatic.

The net result of Eclipse's help with importing is to circumvent Java's quite deep and powerful package structure. You are exposed at all times to all the classes in your application. And people love quickly importing any convenient class, programs lose structure, developers depend on knowing class names rather than judging what they should use from things that have been considered and made available deliberately. Everything is connected to everything forming a black hole of complex interconnections that sucks in vast amounts of developer effort.

Slightly less obvious is ease of adding public methods, (indeed Eclipse will offer to make methods public for you), and by default decorating them doc comments, which means that momentary conveniences expand interfaces, creating more ways to couple different parts of your system together, and accreting functions in interfaces with no sense of those things being right for the API. That momentary convenience creates something that looks deep and intentional but isn't. Of course it's made worse by those lazy agile practitioners who jump at a functional solution now and either neglect or are ignorant of the whole application quality and direction.

Between the power of tools to work to give easy access to anything in the system, and the deliberate small scale attention focusing of some Agile practices, with have a powerful combniation which we can use to quickly dig deep holes and traps.

Sunday, May 8, 2011

Continuation Passing Style in JavaScript

One of the frequently unquestioned assumptions in modern programming languages is "what comes next". We just assume it's the line after the current line, or the first line of the function just called, or the line that follows where the function was called from.

Continuations are objects that say go to this point in the code, and there is no return, where next becomes explicit. They are not goto statements, they are objects that can be passed around and represented in structures. A common example is using continuations to model exceptions but with no requirement on a continuation to be somewhere "above" in the stack.

A continuation syntactically looks a bit like a function but with no going back there is no need for a stack frame. Function call without a new stack frame is a key concept in tail call recursion optimisation. Languages that support one often support the other.

But approximating the style can be useful even in a language which has no explicit support for them, for example JavaScript.

A JavaScript Recursive Descent Parser in Continuation Passing Style

To illustrate I've transformed this straight forward, hand crafted JavaScript, recursive descent expression parser to continuation passing style and along the way also into tail call recursive form. Both versions translate a token stream into tree, (useful for fully bracketed rendering, or evaluation with correct operator precedence).

The transformation means I can not rely on the return values of functions. Instead I can only go forward. I go forward by calling another function. So, the original version calls xpn, expects tokens to be mutated, passes the return value to term_tail along with the mutated tokens, and then returns that result.

 function term(tokens) {
    return term_tail(xpn(tokens), tokens);
  }

But translated it simply says to continue with xpn, then term_tail, and finally whatever c might be.

  function term(a, tokens, c) {
    continue_with([null, tokens], continuation(xpn, term_tail, c));
  }

And I start the whole process by saying at the end continue with done which prints the results. What will end up happening is the recursive tree will become a pipeline.

  continue_with([null, [10, '+', 11, '*', 12]], continuation(expr, done))

You can see I have a special function to prepare my continuation, and another to call them, we're deep in higher order programming territory here... But they're not deep magic, mostly they handle packing and unpacking arguments.

Because there is no coming back from the continuation all results have to accumulate forward in the parameters, and the continuation calls have to be the last thing done in the function, that bit is the translation to Tail Call Recursive form. Where the return of one function would have been used by another function call, that second function becomes a parameter to the first:

  fn2( fn1( 123 ))     // simple style
  fn1(123, fn2)         // continuation style

The last bit of cleverness is that you can insert other continuations in front of the input continuations. In effect, this algorithm works by taking the first function and the last function and squeezing all the other function calls in between.

  function xpn(a, tokens, c) {
    continue_with([null, tokens], continuation(factor, xpn_tail, c));
  }

So xpn is supposed to continue with c but has squeezed factor and xpn_tail in front.

Why Bother in General?

In a good functional programming language this style works without building up anything on the stack. And it's working without actively consuming tokens, allowing immutable data structures for the tokens. That means it can be very memory efficient, and also produces algorithms that are more suitable for parallelisation.

I also have a philosophical preference for being explicit about sequence when sequence is a real requirement, and practicing this style gives me another tool for being explicit about things that are often just assumed, (more on this soon).

As a general bonus tail call recursive functions can be trivially translated into loops which might be important if function call overhead is a problem for a given algorithm.

Why Bother in JavaScript?

The naive version in JavaScript, just modelling the continuation call with a simple function call, is probably a bad idea. It ends up taking all the branches of recursion and glueing them together in one long pipeline; a recipe for running out of stack space. But instead of simple function calls I've used setTimeout, so this potentially time consuming operation is now neatly broken up allowing the UI to remain responsive.

Because I'm managing the continuations calls explicitly, so I could trap all exceptions if that was important, just by changing the functions I use to build and call the continuations. Trapping exceptions this way is a great trick for concurrent programming because you can pass the exception handling continuations from thread to thread even though you're dealing with separate stacks, (you'd have to provide exception handlers as callbacks or additional continuations, a pattern I personally prefer anyway).

And as mentioned, I could translate all this to a loop, and do away with the function call overhead.

Line Noise