Sander van der Burg's blog: March 2014

Tuesday, March 25, 2014

Structured asynchronous programming (Asynchronous programming with JavaScript part 3)

A while ago, I explained that JavaScript execution environments, such as a web browser or Node.js, do not support multitasking. Such environments have a single event loop and when JavaScript code is being executed, nothing else can be done. As a result, it might (temporarily or indefinitely) block the browser or prevent a server from handling incoming connections.

In order to execute multiple tasks concurrently, typically events are generated (such as ticks or timeouts), the execution of the program is stopped so that the event loop can process events, and eventually execution is resumed by invoking the callback function attached to an event. This model works as long as implementers properly "cooperate".

One of its undesired side effects is that code is much harder to structure due to the extensive use of callback functions. Many solutions have been developed to cope with this. In my previous blog posts I have covered the async library and promises as possible solutions.

However, after reading a few articles on the web, some discussion, and some thinking, I came to the observation that asynchronous programming, that is: programming in environments in which executions have to be voluntarily interrupted and resumed between statements and -- as a consequence -- cannot immediately deliver their results within the same code block, is an entirely different programming world.

To me, one of the most challenging parts of programming (regardless of what languages and tools are being used) is being able to decompose and translate problems into units that can be programmed using concepts of a programming language.

In an asynchronous programming world, you have unlearn most of the concepts that are common in the synchronous programming world (to which JavaScript essentially belongs in my opinion) and replace them by different ones.

Are callbacks our new generation's "GOTO statement"?

When I think about unlearning programming language concepts: A classic (and very famous) example that comes into my mind is the "GOTO statement". In fact, a few other programmers using JavaScript claim that the usage of callbacks in JavaScript (and other programming languages as well) are our new generation's "GOTO statement".

Edsger Dijkstra said in his famous essay titled: "A case against the GO TO statement" (published as "Go To Statement Considered Harmful" in the March 1968 issue of the "Communications of the ACM") the following about it:

I become convinced that the go to statement should be abolished from all "higher level" programming languages (i.e. everything except -perhaps- plain machine code)

As a consequence, nearly every modern programming language used these days, lack the GOTO statement and people generally consider it a bad practice to use it. But I have the impression that most of us seem to have forgotten why.

To re-explain Dijkstra's essay a bit in my own words: it was mainly about getting programs correctly implemented by construction. He briefly refers to three mental aids programmers can use (which he explains in more detail in his manuscript titled: "Notes on Structured Programming") namely: enumeration, mathematical induction, and abstraction:

The first mental aid: enumeration, is useful to determine the correctness of a code block executing sequential and conditional (e.g. if-then-else or switch) statements.

Basically, it is about stepping through each statement sequentially and reason whether for each step whether some invariant holds. You could address each step independently with what he describes: "a single textual index".
The second mental aid: mathematical induction, comes in handy when working with (recursive) procedures and loops (e.g. while and doWhile loops).

In his manuscript, he shows that validity of a particular invariant can be proved by looking at the basis (first step of an iteration) first and then generalize the proof to all successive steps.

For these kinds of proofs, a single textual index no longer suffices to address each step. However, using an additional dynamic index that represents each successive procedure call or iteration step still allows one to uniquely address them. The previous index and this second (dynamic) index constitutes something that he calls "an independent coordinate system".
Finally, abstraction (i.e. encapsulating common operations into a procedure) is useful in many ways to me. One of the things Dijkstra said about this is that somebody basically just have to think about "what it does", disregarding "how it works".

The advantage of "an independent coordinate system" is that the value of a variable can be interpreted only with respect to the progress of the process. According to Dijkstra, using the "GOTO statement" makes it quite hard (though not impossible) to define a set of meaningful set of such coordinates, making it harder to reason about correctness and not to make your program a mess.

So what are these coordinates really about you may wonder? Initially, they sound a bit abstract to me, but after some thinking, I have noticed that the way execution/error traces are presented in commonly used programming language these days (e.g. when capturing an exception or using a debugger) use a coordinate system like that IMHO.

These traces have coordinates with two dimensions -- the first dimension is the name of the text file and the corresponding line number that we are currently at (assuming that each line contains a single statement). The second dimension is the stack of function invocations, each showing their corresponding location in the corresponding text files. It also makes sense to me that adding the effects of GOTOs (even when marking each of them with an individual number) to such traces is not helpful, because there could be so many of them that these traces become unreadable.

However, when using structured programming concepts (as described in his manuscript), such as the sequential decomposition, alteration (e.g. if-then-else and switch), and repetition (e.g. while-do, and repeat-until) the first two mental aids can be effectively used to proof validity, mainly because the structure of the program at runtime stays quite close to its static representation.

JavaScript language constructs

Like many other conventional programming languages that are in use these days, the JavaScript programming language supports structured programming language concepts, as well as a couple of other concepts, such as functional programming and object oriented programming through prototypes. Moreover, JavaScript lacks the goto statement.

JavaScript has been originally "designed" to work in a synchronous world, which makes we wonder: what are the effects of using JavaScript's language concepts in an asynchronous world? And are the implications of these effects similar to the effects of using GOTO statements?

Function definitions

The most basic thing one can do in a language such as JavaScript is executing statements, such as variable assignments or function invocations. This is already something that changes when moving from a synchronous world to an asynchronous world. For example, take the following trivial synchronous function definition that simply prints some text on the console:

function printOnConsole(value) {
    console.log(value);
}

When moving to an asynchronous world, we may want to interrupt the execution of the function (yes I know it is not a very meaningful example for this particular case, but anyway):

function printOnConsole(value, callback) {
    process.nextTick(function() {
        console.log(value);
        callback();
    });
}

Because we generate a tick event first when calling the function and then stop the execution, the function returns immediately without doing its work. The callback, that is invoked later, will do it instead.

As a consequence, we do not know when the execution is finished by merely looking when a function returns. Instead, a callback function (provided as a function parameter) can be used, that gets invoked once the work has been done. This is the reason why JavaScript functions in an asynchronous world use callbacks.

As a sidenote: I have seen some people claiming that merely changing the function interface to have a callback, makes their code asynchronous. This is absolutely not true. Code becomes asynchronous if it interrupts and resumes its execution. The callback interface is simply a consequence of providing an equivalent for the return statement that has lost its relevance in an asynchronous world.

Same thing holds for functions that return values, such as the following that translates one numerical digit into a word:

function generateWord(digit) {
    var words = [ "zero", "one", "two", "three", "four",
        "five", "six", "seven", "eight", "nine" ];
    return words[digit];
}

In asynchronous world, we have to use a callback to pass its result to the caller:

function generateWord(digit, callback) {
    var words;
    process.nextTick(function() {
        words = [ "zero", "one", "two", "three", "four", "five",
            "six", "seven", "eight", "nine" ];
        callback(words[digit]);
    });
}

Sequential decomposition

The fact that function interfaces have become different and function invocations have to be done differently, affects all other programming language concepts in JavaScript.

Let's take the simplest structured programming concept: the sequence. Consider the following synchronous code fragment executing a collection of statements in sequential order:

var a = 1;
var b = a + 1;
var number = generateWord(b);
printOnConsole(number); // two

To me it looks straight forward to use enumerative reasoning to conclude that the output shown in the console will be "two".

As explained earlier, in an asynchronous world, we have to pass callback functions as parameters to know when they return. As a consequence, each successive statement has to be executed within the corresponding callback. If we do this in a dumb way, we probably end up writing:

var a = 1;
var b = a + 1;

generateWord(b, function(result) {
    var number = result;
    printOnConsole(number, function() {
        
    }); // two
});

As can be observed in the above code fragment, we end up one indentation level deeper every time we invoke a function, turning the code fragment into pyramid code.

Pyramid code is nasty in many ways. For example, it affects maintenance, because it has become harder to change the order of two statements. It has also become hard to add a statement, say, in the beginning of the code block, because it requires us to refactor all the successive statements. It also becomes a bit harder to read the code because of the nesting and indentation.

However, it also makes me wonder this whether pyramid code is a "new GOTO"? I would say no, because I think we still have not lost our ability to address statements through a "single textual index" and the ability to use enumerative reasoning.

We could also say that the fact that we invoke callback functions for each function invocation introduces the second dynamic index, but on the other hand, we know that a given callback is only called by the same caller, so we can discard that second index because of that.

My conclusion is that we still have enumerative reasoning abilities when implementing a sequence. However, the overhead of each enumeration step is (in my opinion) bigger because we have to keep the indentation and callback nesting into account.

Fortunately, I can create an abstraction to clean up this pyramid code:

function runStatement(stmts, index, callback, result) {
    if(index >= stmts.length) {
        if(typeof callback == "function")
            callback(result);
    } else {
        stmts[index](function(result) {
            runStatement(stmts, index + 1, callback, result);
        }, result);
    }
}

function sequence(stmts, callback) {
    runStatement(stmts, 0, callback, undefined);
}

The above function: sequence() takes an array of functions each requiring a callback as parameter. Each function represents a statement. Moreover, since the abstraction is an asynchronous function itself, we also have to use a callback parameter to notify the caller when it has finished. I can refactor the earlier asynchronous code fragment into the following:

var a;
var b;
var number;

slasp.sequence([
    function(callback) {
        a = 1;
        callback();
    },

    function(callback) {
        b = a + 1;
        callback();
    },
    
    function(callback) {
        generateWord(b, callback);
    },
    
    function(callback, result) {
        number = result;
        printOnConsole(number); // two
    }
]);

By using the sequence() function, we have eliminated all pyramid code, because we can indent the statements on the same level. Moreover, we can also maintain it better, because we do not have to fix the indentation and callback nesting each time we insert or move a statement.

Alteration

The usage of alteration constructs is also slightly different in an asynchronous world. Consider the following example that basically checks whether some variable contains my first name and lets the user know whether this is the case or not:

function checkMe(name) {
    return (name == "Sander");
}
    
var name = "Sander";
    
if(checkMe(name)) {
    printOnConsole("It's me!");
    printOnConsole("Isn't it awesome?");
} else {
    printOnConsole("It's someone else!");
}

(As you may probably notice, I intentionally captured the conditional expression in a function, soon it will become clear why).

Again, I think that it will be straight forward to use enumerative reasoning to conclude that the output will be:

It's me!
Isn't it awesome?

When moving to an asynchronous world (which changes the signature of the checkMe() to have a callback) things become a bit more complicated:

function checkMe(name, callback) {
    process.nextTick(function() {
        callback(name == "Sander");
    });
}

var name = "Sander";

checkMe(name, function(result) {
    if(result) {
        printOnConsole("It's me!", function() {
            printOnConsole("Isn't it awesome?");
        });
    } else {
        printOnConsole("It's someone else!");
    }
});

We can no longer evaluate the conditional expression within the if-clause. Instead, we have to evaluate it earlier, then use the callback to retrieve the result and use that to evaluate the if conditional expression.

Although it is a bit inconvenient not being able to directly evaluate a conditional expression, again I still do not think this affect the ability to use enumeration for similar reasons as the sequential decomposition. The above code fragment basically just adds an additional sequential step, nothing more. So in my opinion, we still have not encountered a new GOTO.

Fortunately, I can also create an abstraction for the above pattern:

function when(conditionFun, thenFun, elseFun, callback) {
    sequence([
        function(callback) {
            conditionFun(callback);
        },
        
        function(callback, result) {
            if(result) {
                thenFun(callback);
            } else {
                if(typeof elseFun == "function")
                    elseFun(callback);
                else
                    callback();
            }
        }
    ], callback);
}

and use this function to express the if-statement as follows:

slasp.when(function(callback) {
    checkMe(name, callback);
}, function(callback) {
    slasp.sequence([
        function(callback) {
            printOnConsole("It's me!", callback);
        },
        
        function(callback) {
            printOnConsole("Isn't it awesome?", callback);
        }
    ], callback);
}, function(callback) {
    printOnConsole("It's someone else!", callback);
});

Now I can embed a conditional expression in my artificial when statement.

Same thing applies to the other alteration construct in JavaScript: the switch statement -- you also cannot evaluate a conditional expression directly if it invokes an asynchronous function invocation. However, I can also make an abstraction (which I have called circuit) to cope with that.

Repetition

How are the repetition constructs (e.g. while and do-while) affected in an asynchronous world? Consider the following example implementing a while loop:

function checkTreshold(approx) {
    return (approx.toString().substring(0, 7) != "3.14159");
}

var approx = 0;
var denominator = 1;
var sign = 1;

while(checkTreshold(approx)) {
    approx += 4 * sign / denominator;
    printOnConsole("Current approximation is: "+approx);
        
    denominator += 2;
    sign *= -1;
}

The synchronous code fragment shown above implements the Gregory-Leibniz formula to approximate pi up to 5 decimal places. To reason about its correctness, we have to use both enumeration and mathematical induction. First, we reason that the first two components of the series are correct, then we can use induction to reason that each successive component of the series is correct, e.g. they have an alternating sign, and a denominator increases with 2 for each successive step.

If we move to an asynchronous world, we have a couple of problems, beyond those that are described earlier. First, repetition blocks the event loop for an unknown amount of time so we must interrupt it. Second, if we interrupt a loop, we cannot resume it with a callback. Therefore, we must write our asynchronous equivalent of the previous code as follows:

function checkTreshold(approx, callback) {
    process.nextTick(function() {
        callback(approx.toString().substring(0, 7) != "3.14159");
    });
}

var approx = 0;
var denominator = 1;
var sign = 1;

(function iteration(callback) {
    checkTreshold(approx, function(result) {
        if(result) {
            approx += 4 * sign / denominator;
            printOnConsole("Current approximation is: "+approx, function() {
                denominator += 2;
                sign *= -1;
                setImmediate(function() {
                    iteration(callback);
                });
            });
        }
    });
})();

In the above code fragment, I have refactored the code into a recursive algorithm. Moreover, for each iteration step, I use setImmediate() to generate an event (I cannot use process.nextTick() in Node.js because it skips processing certain kinds of events) and I suspend the execution. The corresponding callback starts the next iteration step.

So is this implication the new GOTO? I would still say no! Even though we were forced to discard the while construct and use recursion instead, we can still use mathematical induction to reason about its correctness, although certain statements are wrapped in callbacks that make things a bit uglier and harder to maintain.

Luckily, I can also capture the above pattern in an abstraction:

function whilst(conditionFun, statementFun, callback) {
    when(conditionFun, function() {
        sequence([
            statementFun,
            
            function() {
                setImmediate(function() {
                    whilst(conditionFun, statementFun, callback);
                });
            }
        ], callback);
    }, callback);
}

The above function (called: whilst) takes three functions as parameters: the first parameter takes a function returning (through a callback) a boolean that represents the conditional expression, the second parameter takes a function that has to be executed for each iteration, and the third parameter is a callback that gets invoked if the repetition has finished.

Using the whilst() function, I can rewrite the earlier example as follows:

var approx = 0;
var denominator = 1;
var sign = 1;

slasp.whilst(function(callback) {
    checkTreshold(approx, callback)
}, function(callback) {
    slasp.sequence([
        function(callback) {
            approx += 4 * sign / denominator;
            callback();
        },
        
        function(callback) {
            printOnConsole("Current approximation is: "+approx, callback);
        },
        
        function(callback) {
            denominator += 2;
            callback();
        },
        
        function(callback) {
            sign *= -1;
            callback();
        }
    ], callback);
});

The same thing that we have encountered also holds for the other repetition constructs in JavaScript. doWhile is almost the same, but we have to evaluate the conditional expression at the end of each iteration step. We can refactor a for and for-in loop as a while loop, thus the same applies to these constructs as well. For all these constructs I have developed corresponding asynchronous abstractions: doWhilst, from and fromEach.

Exceptions

With all the work done so far, I could already conclude that moving from a synchronous to an asynchronous world (using callbacks) results in a couple of nasty issues, but these issues are definitely not the new GOTO. However, a common extension to structured programming is the use of exceptions, which JavaScript also supports.

What if we expand our earlier example with the generateWord() function to throw an exception if a parameter is given that is not a single positive digit?

function generateWord(num) {
    if(num < 0 || num > 9) {
        throw "Cannot convert "+num+" into a word";
    } else {
        var words = [ "zero", "one", "two", "three", "four", "five",
            "six", "seven", "eight", "nine" ];
        return words[num];
    }
}

try {
    var word = generateWord(1);
    printOnConsole("We have a: "+word);
    word = generateWord(10);
    printOnConsole("We have a: "+word);
} catch(err) {
    printOnConsole("Some exception occurred: "+err);
} finally {
    printOnConsole("Bye bye!");
}

The above code also captures a possible exception and always prints "Bye bye!" on the console regardless of the outcome.

The problem with exceptions in an asynchronous world is basically the same as with the return statement. We cannot just catch an exception because it may not have been thrown yet. So instead of throwing and catching exception, we must simulate them. This is commonly done in Node.js by a introducing another callback parameter called err (that is the first parameter of callback) that is not null if some error has been thrown.

Changing the above function definition to throw errors using this callback parameter is straight forward:

function generateWord(num, callback) {
    var words;
    process.nextTick(function() {
        if(num < 0 || num > 9) {
            callback("Cannot convert "+num+" into a word");
        } else {
            words = [ "zero", "one", "two", "three", "four", "five",
                "six", "seven", "eight", "nine" ];
            callback(null, words[num]);
        }
    });
}

However simulating the effects of a throw, and the catch and finally clauses is not straight forward. I am not going to much into the details (and it's probably best to just just briefly skim over the next code fragment), but this is what I basically what I ended up writing (which is still partially incomplete):

generateWord(1, function(err, result) {
    if(err) {
        printOnConsole("Some exception occured: "+err, function(err) {
            if(err) {
                // ...
            } else {
                printOnConsole("Bye bye!");
            }
        });
    } else {
        var word = result;
        printOnConsole("We have a: "+word, function(err) {
            if(err) {
                printOnConsole("Some exception occurred: "+err, function(err) {
                    if(err) {
                        // ...
                    } else {
                        printOnConsole("Bye bye!");
                    }
                });
            } else {
                generateWord(10, function(err, result) {
                    if(err) {
                        printOnConsole("Some exception occurred: "+err, function(err) {
                            if(err) {
                                // ...
                            } else {
                                printOnConsole("Bye bye!");
                            }
                        });
                    } else {
                        word = result;
                        printOnConsole("We have a: "+word, function(err) {
                            if(err) {
                                printOnConsole("Some exception occurred: "+err, function(err) {
                                    if(err) {
                                        // ...
                                    } else {
                                        printOnConsole("Bye bye!");
                                    }
                                });
                            } else {
                                // ...
                            }
                        });
                     }
                });
            }
        });
    }
});

As you may notice, now the code clearly blows up and you also see lots of repetition because of the fact that we need to simulate the effects of the throw and finally clauses.

To create an abstraction to cope with exceptions, we must adapt all the abstraction functions that I have shown previously to evaluate the err callback parameters. If the err parameter is set to something, we must stop the execution and propagate the err parameter to its callback.

Moreover, I can also define a function abstraction named: attempt, to simulate a try-catch-finally block:

function attempt(statementFun, captureFun, lastlyFun) {
    statementFun(function(err) {
        if(err) {
            if(typeof lastlyFun != "function")
                lastlyFun = function() {};
                    
            captureFun(err, lastlyFun);
        } else {
            if(typeof lastlyFun == "function")
                lastlyFun();
        }
    });
}

and I can rewrite the mess shown earlier as follows:

slasp.attempt(function(callback) {
    slasp.sequence([
        function(callback) {
            generateWord(1, callback);
        },
        
        function(callback, result) {
            word = result;
            printOnConsole("We have a: "+word, callback);
        },
        
        function(callback) {
            generateWord(10, callback);
        },
        
        function(callback, result) {
            word = result;
            printOnConsole("We have a: "+word, callback);
        }
        
    ], callback);
}, function(err, callback) {
    printOnConsole("Some exception occured: "+err, callback);
}, function() {
    printOnConsole("Bye bye!");
});

Objects

Another extension in JavaScript is the ability to construct objects having prototypes. In JavaScript constructors are functions as well as object methods. I think the same applies to these kind of functions just as regular ones -- they cannot return values immediately because they may not have finished their execution yet.

Consider the following example:

function Rectangle(width, height) {
    this.width = width;
    this.height = height;
}

Rectangle.prototype.calculateArea = function() {
    return this.width * this.height;
};

var r = new Rectangle(2, 2);

printOnConsole("Area is: "+r.calculateArea());

The above code fragment simulates a Rectangle class, constructs a rectangle having a width and height of 2, and calculates and displays its area.

When moving to an asynchronous world, we have to take into account all things we did previously. I ended up writing:

function Rectangle(self, width, height, callback) {
    process.nextTick(function() {
        self.width = width;
        self.height = height;
        callback(null);
    });
}

Rectangle.prototype.calculateArea = function(callback) {
    var self = this;
    process.nextTick(function() {
        callback(null, self.width * self.height);
    });
};

function RectangleCons(width, height, callback) {
    function F() {};
    F.prototype = Rectangle.prototype;
    var self = new F();
    Rectangle(self, width, height, function(err) {
        if(err)
            callback(err);
        else
            callback(null, self);
    });
}

RectangleCons(2, 2, function(err, result) {
    var r = result;
    r.calculateArea(function(err, result) {
        printOnConsole("Area is: "+result);
    });
});

As can be observed, all functions -- except for the constructor -- have an interface including a callback.

The reason that I had to do something different for the constructor is that functions that are called in conjunction with new cannot propagate this back to the caller without including weird internal properties. Therefore, I had to create a "constructor wrapper" (named: RectangleCons) that first constructs an empty object with the right prototype. After the empty object has been constructed, I invoke the real constructor doing the initialisation work.

Furthermore, the this keyword only works properly within the scope of the constructor function. Therefore, I had to use a helper variable called self to make the properties of this available in the scope of the callbacks.

Writing a "wrapper constructor" is something we ideally do not want to write ourselves. Therefore, I created an abstraction for this:

function novel() {
    var args = Array.prototype.slice.call(arguments, 0);
    
    var constructorFun = args.shift();
    function F() {};
    F.prototype = constructorFun.prototype;
    F.prototype.constructor = constructorFun;
    
    var self = new F();
    args.unshift(self);
    
    var callback = args[args.length - 1];
    args[args.length - 1] = function(err, result) {
        if(err)
            callback(err);
        else
            callback(null, self);
        };
    
        constructorFun.apply(null, args);
    }
}

And using this abstraction, I can rewrite the code as follows:

function Rectangle(self, width, height, callback) {
    process.nextTick(function() {
        self.width = width;
        self.height = height;
        callback(null);
    });
}

Rectangle.prototype.calculateArea = function(callback) {
    var self = this;
    process.nextTick(function() {
        callback(null, self.width * self.height);
    });
};

slasp.novel(Rectangle, 2, 2, function(err, result) {
    var r = result;
    r.calculateArea(function(err, result) {
        printOnConsole("Area is: "+result);
    });
});

When using novel() instead of new, we can conveniently construct objects asynchronously.

As a sidenote: if you want to use simulated class inheritance, you can still use my inherit() function that takes two constructor functions as parameters described in an earlier blog post. They should also work with "asynchronous" constructors.

Discussion

In this blog post, I have shown that in an asynchronous world, functions have to be defined and used differently. As a consequence, most of JavaScript's language constructs are either unusable or have to be used in a different way. So basically, we have to forget about most common concepts that we normally intend to use in a synchronous world, and learn different ones.

The following table summarizes the synchronous programming language concepts and their asynchronous counterparts for which I have directly and indirectly derived patterns or abstractions:

Concept	Synchronous	Asynchronous
Function interface	function f(a) { ... }	function f(a, callback) { ... }
Return statement	return val;	callback(null, val);
Sequence	a; b; ...	slasp.sequence([ function(callback) { a(callback); }, function(callback) { b(callback); } ... ]);
if-then-else	if(condFun()) thenFun(); else elseFun();	slasp.when(condFun, thenFun, elseFun);
switch	switch(condFun()) { case "a": funA(); break; case "b": funB(); break; ... }	slasp.circuit(condFun, function(result, callback) { switch(result) { case "a": funA(callback); break; case "b": funB(callback); break; ... } });
Recursion	function fun() { fun(); }	function fun(callback) { setImmediate(function() { fun(callback); }); }
while	while(condFun()) { stmtFun(); }	slasp.whilst(condFun, stmtFun);
doWhile	do { stmtFun(); } while(condFun());	slasp.doWhilst(stmtFun, condFun);
for	for(startFun(); condFun(); stepFun() ) { stmtFun(); }	slasp.from(startFun, condFun, stepFun, stmtFun);
for-in	for(var a in arrFun()) { stmtFun(); }	slasp.fromEach(arrFun, function(a, callback) { stmtFun(callback); });
throw	throw err;	callback(err);
try-catch-finally	try { funA(); } catch(err) { funErr(); } finally { funFinally(); }	slasp.attempt(funA, function(err, callback) { funErr(callback); }, funFinally);
constructor	function Cons(a) { this.a = a; }	function Cons(self, a, callback) { self.a = a; callback(null); }
new	new Cons(a);	slasp.novel(Cons, a, callback);

To answer the question whether callbacks are the new GOTO: my conclusion is that they are not the new GOTO. Although they have drawbacks, such as the fact that it becomes harder to read, maintain and adapt code, it does not affect our ability to use enumeration or mathematical induction.

However, if we start using exceptions, then things become way more difficult. Then developing abstractions is unavoidable, but this has nothing to do with callbacks. Simulating exception behaviour in general makes things complicated, which is fueled by the nasty side effects of callbacks.

Another funny observation is that it has become quite common to use JavaScript for asynchronous programming. Since it has been developed for synchronous programming, means that most its constructs are useless. Fortunately, we can cope with that by implementing useful abstractions ourselves (or through third party libraries), but it would be better IMHO that a programming language has the all relevant facilities that are suitable for the domain in which it is going to be used.

Conclusion

In this blog post, I have explained that when moving from a synchronous to an asynchronous world requires forgetting certain programming language concepts and use different asynchronous equivalents.

I have made a JavaScript library out of the abstractions in this blog post (yep, that is yet another abstraction library!), because I think they might come in handy at some point. It is named slasp (SugarLess Asynchronous Structured Programming), because it implements abstractions that are close to the bare bones of JavaScript. It provides no sugar, such as borrowing abstractions from functional programming languages and so on, which most other libraries do.

The library can be obtained from my GitHub page and through NPM and used under the terms and conditions of the MIT license.

Sunday, March 16, 2014

Implementing consistent layouts for websites

Recently, I have wiped the dust off an old dormant project and I have decided to put it on GitHub, since I have found some use for it again. It is a personal project I started a long time ago.

Background

I got the inspiration for this project while working on my bachelor thesis project internship at IBM in 2005. I was developing an application usage analyzer system which included a web front-end implementing their intranet layout. I observed that it was a bit tedious to get it implemented properly. Moreover, I noticed that I had to repeat the same patterns over and over again for each page.

I saw some "tricks" that other people did to cope with these issues, but I considered all of them workarounds -- they were basically a bunch of includes in combination with a bit of iteration to make it work, but looked overly complicated and had all kinds of issues.

Some time before my internship, I learned about the Model-view-controller architectural pattern and I was looking into applying this pattern to the web front-end I was developing.

After some searching on the web using the MVC and Java Enterprise Edition (which was the underlying technology used to implement the system) keywords, I stumbled upon the following JavaWorld article titled: 'Understanding JavaServer Pages Model 2 architecture'. Although the article was specifically about the Model 2 architecture, I considered the Model 1 variant -- also described in the same article -- good enough for what I needed.

I observed that every page of an intranet application looks quite similar to others. For example, they had the same kinds of sections, same style, same colors etc. The only major differences were the selected menu item in the menu section and the contents (such as text) that is being displayed.

I created a model of the intranet layout that basically encodes the structure of the menu section that is being displayed on all pages of the web application. Each item in the menu redirects the user to the same page which -- based on the selected menu option -- displays different contents and a different "active" link. To cite my bachelor's thesis (which was written in Dutch):

De menu instantie bevat dus de structuur van het menu en de JSP zorgt ervoor dat het menu in de juiste opmaak wordt weergegeven. Deze aanpak is gebaseerd is op het Model 1 [model1] architectuur:

which I could translate into something like:

Hence, the menu instance contains the structure of the menu and the JSP is responsible for properly displaying the menu structure. This approach is based on the Model 1 [model1] architecture.

(As a sidenote: The website I am referring to calls "JSP Model 1" an architecture, which I blindly adopted in my thesis. These days, MVC is not something I would call an architecture, but rather an architectural pattern!)

I was quite satisfied with my implementation of the web front-end and some of my coworkers liked the fact that I was capable of implementing the intranet layout completely on my own and to be able to create and modify pages so easily.

Creating a library

After my internship, I was not too satisfied with the web development work I did prior to it. I had developed several websites and web applications that I still maintained, but all of them were implemented in an ad-hoc way -- one web application had a specific aspect implemented in a better way than others. Moreover, I kept reimplementing similar patterns over and over again including layout elements. I also did not reuse code effectively apart from a bit of copying and pasting.

From that moment on, I wanted everything that I had to develop to have the same (and the best possible) quality and to reuse as much code as possible so that every project would benefit from it.

I started a new library project from scratch. In fact, it were two library projects for two different programming languages. Initially I started implementing a Java Servlet/JSP version, since I became familiar with it during my internships at IBM and I considered it to be good and interesting technology to use.

However, all my past projects were implemented in PHP and also most of the web applications I maintained were hosted at shared webhosting providers only supporting PHP. As a result, I also developed a PHP version which became the version that I actually used for most of the time.

I could not use any code from my internship. Apart from the fact that it was IBM's property, it was also too specific for IBM intranet layouts. Moreover, I needed something that was even more general and more flexible so that I could encode all the layouts that I had implemented myself in the past. However, I kept the idea of the Model-1 and Model-2 architectural patterns that I discovered in mind.

Moreover, I also studied some usability heuristics (provided by the Nielsen-Norman Group) which I tried to implement in the library:

Visibility of system status. I tried supporting this aspect, by ensuring that the selected links in the menu section were explicitly marked as such so that users always know where they are in the navigation structure.
The "Consistency and standards" aspect was supported by the fact that every page has the same kinds of sections with the same purposes. For example, the menu sections have the same behavior as well as the result of clicking on a link.
I tried support "Error prevention" by automatically hiding menu links that were not accessible.

I kept evolving and improving the libraries until early 2009. The last thing I did with it was implementing my own personal homepage, which is still up and running today.

Usage

So how can these libraries be used? First, a model has to be created which captures common layout properties and the sub pages of which the application consists. In PHP, a simple application model could be defined as follows:

<?php
$application = new Application(
    /* Title */
    "Simple test website",

    /* CSS stylesheets */
    array("default.css"),

    /* Sections */
    array(
        "header" => new StaticSection("header.inc.php"),
        "menu" => new MenuSection(0),
        "submenu" => new MenuSection(1),
        "contents" => new ContentsSection(true)
    ),

    /* Pages */
    new StaticContentPage("Home", new Contents("home.inc.php"), array(
        "page1" => new StaticContentPage("Page 1", new Contents("page1.inc.php"), array(
            "page11" => new StaticContentPage("Subpage 1.1",
                new Contents("page1/subpage11.inc.php")),
            "page12" => new StaticContentPage("Subpage 1.2",
                new Contents("page1/subpage12.inc.php")),
            "page13" => new StaticContentPage("Subpage 1.3",
                new Contents("page1/subpage13.inc.php")))),
            ...
    )))
);

The above code fragment specifies the following:

The title of the entire web application is: "Simple test website", which will be visible in the title bar of the browser window for every sub page.
Every sub page of the application uses a common stylesheet: default.css
Every sub page has the same kinds of sections:
- The header section always displays the same (static) content which code resides in a separate PHP include (header.inc.php)
- The menu section displays a menu navigation section displaying links reachable from the entry page.
- The submenu section displays a menu navigation section displaying links reachable from the pages in the previous menu section.
- The contents section displays the actual dynamic contents (usually text) that makes the page unique based on the link that has been selected in one of the menu sections.
The remainder of the code defines the sub pages of which the web application consists. Sub pages are organised in a tree-like structure. The first object is entry page, the entry page has zero or more sub pages. Each sub page may have sub pages of their own, and so on.

Every sub page provides their own contents to be displayed in contents section that has been defined earlier. Moreover, the menu sections automatically display links to the reachable sub pages from the current page that is being displayed.

By calling the following view function, with the application model as parameter we can display any of its sub pages:

displayRequestedPage($application);
?>

The above function generates a basic HTML page. The title of the page is composed of the application's title and the selected page title. Moreover, the sections are translated to div elements having an id attribute set to their corresponding array key. Each of these divs contains the contents of the include operations. The sub page selection is done by taking the last few path components of the URL that come after the script component.

If I create a "fancy" stylesheet, a bit of basic artwork and some actual contents for each include, something like this could appear on your screen:

Although the generated HTML by displayRequestedPage() is usually sufficient, I could also implement a custom one if I want to do more advanced stuff. I decomposed most if its aspects in sub functions that can be easily invoked from a custom function that does something different.

I have also created a Java version of the same concepts, which predates the PHP version. In the Java version, the model would look like this:

package test;

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import io.github.svanderburg.layout.model.*;
import io.github.svanderburg.layout.model.page.*;
import io.github.svanderburg.layout.model.page.content.*;
import io.github.svanderburg.layout.model.section.*;

public class IndexServlet extends io.github.svanderburg.layout.view.IndexServlet
{
    private static final long serialVersionUID = 6641153504105482668L;

    private static final Application application = new Application(
        /* Title */
        "Test website",

        /* CSS stylesheets */
        new String[] { "default.css" },

        /* Pages */
        new StaticContentPage("Home", new Contents("home.jsp"))
            .addSubPage("page1", new StaticContentPage("Page 1", new Contents("page1.jsp"))
                .addSubPage("subpage11", new StaticContentPage("Subpage 1.1",
                    new Contents("page1/subpage11.jsp")))
                .addSubPage("subpage12", new StaticContentPage("Subpage 1.2",
                    new Contents("page1/subpage12.jsp")))
                .addSubPage("subpage13", new StaticContentPage("Subpage 1.3",
                    new Contents("page1/subpage13.jsp"))))
        ...
    )
    /* Sections */
    .addSection("header", new StaticSection("header.jsp"))
    .addSection("menu", new MenuSection(0))
    .addSection("submenu", new MenuSection(1))
    .addSection("contents", new ContentsSection(true));

    protected void doGet(HttpServletRequest req, HttpServletResponse resp)
        throws ServletException, IOException
    {
        dispatchLayoutView(application, req, resp);
    }

    protected void doPost(HttpServletRequest req, HttpServletResponse resp)
        throws ServletException, IOException
    {
        dispatchLayoutView(application, req, resp);
    }
}

As may be observed, since Java is statically typed language, more code is needed to express the same thing. Furthermore, Java has no associative arrays in its language, so I decided to use fluent interfaces instead.

Moreover, the model is also embedded in a Java Servlet, that dispatches the requests to a JSP page (WEB-INF/index.jsp) that represents the view. This JSP page could be implemented as follows:

<%@ page language="java" contentType="text/html; charset=UTF-8"
    pageEncoding="UTF-8" import="io.github.svanderburg.layout.model.*"
    import="io.github.svanderburg.layout.model.page.*,test.*"%>
<%
Application app = (Application)request.getAttribute("app");
Page currentPage = (Page)request.getAttribute("currentPage");
%>
<%@ taglib uri="http://svanderburg.github.io" prefix="layout" %>
<layout:index app="<%= app %>" currentPage="<%= currentPage %>" />

The above page takes the application model and the current page (determined by the URL to call it) as requests parameters. It invokes the index taglib (instead of a function in PHP) to compose an HTML page from it. Moreover, I have also encoded sub parts of the index page as reusable taglibs.

Other features

Besides the simple usage scenario shows earlier, the libraries support a collection of other interesting features, such as:

Multiple content section support
Per-page style and script includes
Error pages
Security handling
Controller sections to handle GET or POST parameters. In Java, you can invoke Java Servlets to do this, making the new library technically compliant with the JSP Model-2 architectural pattern.
Using path components as parameters
Internationalised sub pages

Conclusion

In this blog post, I have described an old dormant project that I revived and released. I always had the intention to release it as free/open-source software in the past, but never actually did it until now.

These days, some people do not really consider me a "web guy". I was very active in this domain a long time ago, but I (sort of) put that interest into the background, although I am still very much involved with web application development today (in addition to software deployment techniques and several other interests).

This interesting oatmeal comic clearly illustrates one of the major reasons why I have put my web technology interests into the background. This talk about web technology from Zed Shaw has an overlap with my other major reason.

Today, I am not so interested anymore in making web sites for people or to make this library a killer feature, but I don't mind sharing code. The only thing I care about at this moment is to use it to please myself.

Availability

The Java (java-sblayout) as well as the PHP (php-sblayout) versions of the libraries can be obtained from my GitHub page and used under the terms and conditions of the Apache Software License version 2.0.