Monday, December 30, 2013

Third annual blog reflection

Today, it's three years ago that I started this blog, so I thought this is a good opportunity to reflect about last year's writings.

Obtaining my PhD degree


I think the most memorable thing that happened for me this year is the fact that I finally obtained my PhD degree. From that moment on, I can finally call myself Dr. Sander, or actually (if I take my other degree into account): Dr. Ir. Sander (or in fact: Dr. Ir. Ing. Sander if I take all of them into account, but I believe the last degree has been been superseded by the middle one, but I'm not sure :-) ).

Anyway, after obtaining my PhD degree I don't feel that much different, apart from the fact that I feel relieved that it's done. It took me quite some effort to get my PhD dissertation and all the preparations done for the defense. Besides my thesis, I also had to defend my propositions. Most of them were not supposed to be directly related to my research subject.

Programming in JavaScript


From the moment that I have switched jobs, I have also been involved in a lot of JavaScript programming these days. Every programming language and runtime environment have their weird/obscure problems/challenges, but in my opinion JavaScript is a very special case.

As a former teaching assistant for the concepts of programming languages course, I remain interested in discovering important lessons allowing me to prevent turning code into a mess and dealing with challenges that a particular programming gives. So far, I have investigated object oriented programming through prototypes, and two perspectives of dealing with asynchronous programming problems that come with most JavaScript environments.

Besides programming challenges I also have to perform deployment tasks from JavaScript programs. People who happen to know me know that I prefer Nix and Nix-related solutions. I have developed NiJS: an internal DSL for Nix to make my life a bit easier to do that.

Continuous integration and testing


Another technical aspect I have been working on is setting up a continuous integration facility by using Hydra: the Nix-based continuous integration server. I wrote a couple of blog posts describing its features, how to set it up and how to secure it.

I also made a couple improvements to the Android and iOS Nix build functions, so that I can use Hydra to continuously build mobile apps.

Nix/NixOS development


Besides Hydra, I have been involved with various other parts of the Nix project as well. One of the more interesting things I did is developing a Nix function that can be used to compose FHS-compatible chroot environments. This function is particularly useful to run binary-only software in NixOS that cannot be patched, such as Steam.

I also wrote two blog posts to explain the user environment and development environment concepts.

Fun programming


Besides my PhD defense and all other activities, there was a bit of room to do some fun programming as well. I have improved my Amiga video emulation library (part of my IFF file format experiments project) a bit by implementing support for super hires resolutions and redefining its architecture.

Moreover, I have updated all the packages to use the new version of this library.

Research


After obtaining my PhD degree, I'm basically relieved from my research/publication duties. However, there is one research related thing that caught my attention two months ago.

The journal paper titled: 'Disnix: A toolset for distributed deployment' that got accepted in April last year is finally going to get published in Volume 79 of 'Science of Computer Programming', unbelievable!

Although it sounds like good news that another paper of mine gets published, the thing that disturbs me is that the publication process took an insanely long time! I wrote the first version of this paper for the WASDeTT-3 workshop that was held in October 2010. So that technically means that I started doing the work for the paper several months prior to that.

In February 2011, I have adapted/extended the workshop paper and submitted its first journal draft. Now, in January 2014 it finally gets published, which means that it took almost 3 years to get published (if you take the workshop into account as well, then it's actually closer to 3,5 years!).

In some academic assessment standards, journal papers have more value than conference papers. Although this journal paper should increase my value as a researcher, it's actually crazy if you think too much about it. The first reason is that I wrote the first version of this paper before I started this blog. Meanwhile, I have already written 54 blog articles, two tech reports, published two papers at conferences, and I finished my PhD dissertation.

The other reason is that peer reviewing and publishing should help the authors and the research discipline in general. To me this does not look like any help. Meanwhile, in the current development version of Disnix some aspects of its architecture have evolved considerably compared to what has been described in the paper, so it is no use for anyone else in the research community anymore.

The only value the paper still provides are the general ideas and the way Disnix manifests itself externally.

Although the paper is not completely valueless, and I'm happy it gets published, it also feels weird that I don't depend on it anymore.

Blog posts


As with my previous annual reflections, I will also publish the top 10 of my most frequently read blog posts:

  1. On Nix and GNU Guix. This is a critical blog post that also ended up first in last year's top 10. I think this blog posts will remain at the top position for the time being, since it attracted an insane amount of visitors.
  2. An alternative explanation of the Nix package manager. My alternative explanation of Nix, which I wrote to clarify things. It was also second in last year's top 10.
  3. Setting up a Hydra build cluster for continuous integration and testing (part 1). Apparently, Hydra and some general principles about continuous integration have attracted quite some vistors. However, the follow up blog posts I wrote about Hydra don't seem to be that interesting to outsiders.
  4. Using Nix while doing development. I wrote this blog post 2 days ago, and it attracted quite some visitors. I have noticed that setting up development environments is an attractive feature for Nix users.
  5. Second computer. This is an old blog post about my good ol' Amiga. It was also in all previous top 10s and I think it will remain like that for the time being. The Amiga rocks!
  6. An evaluation and comparison of GoboLinux. Another blog article that remains popular from the beginning. It's still a pity that GoboLinux has not been updated and sticks to their 014.01 release, which dates from 2008.
  7. Composing FHS-compatible chroot environments with Nix (or deploying Steam in NixOS). This is something I have developed to be able to run Steam in NixOS. It seems to have attracted quite some users, which does not come as a surprise. NixOS users want to play Half-Life!
  8. Software deployment complexity. An old blog post about software deployment complexity in general. Still remains popular.
  9. Deploying iOS applications with the Nix package manager. A blog post that I wrote last year describing how we can use the Nix package manager to build apps for the iPhone/iPad. For a long time the Android variant of this blog post was more popular, but recently this blog article surpassed it. I have no clue why.
  10. Porting software to AmigaOS (unconventional style). People still seem to like one of my craziest experiments.

Conclusion


I already have three more blog posts in draft/planning stages and more ideas that I like to explore, so expect more to come. The remaining thing I'd like to say is:

HAPPY NEW YEAR!!!

Saturday, December 28, 2013

Using Nix while doing development

I have noticed that while doing development work, many outsiders experience the way I work as quite odd and consider it to be inefficient.

The first reason is (probably) because I like the command-line for many tasks and I frequently use a unconventional text editor, which some people don't understand. Second, I also often use Nix (and related Nix utilities) during development.

In this blog post, I'd like to elaborate a bit about the second aspect and discuss why it is actually quite useful to do this.

Obtaining and installing the dependencies of a software project


In all software projects I have been involved with so far, the first thing I typically had to do is installing all its required dependencies. Examples of such dependencies are a compiler/interpreter + runtime environment for a specific programming language (such as GCC, the Java Development Kit, Perl, Python, Node.js etc.), a development IDE (such as Eclipse), and many library packages.

I have experienced that this step is often quite time consuming, since many dependencies have to be downloaded and installed. Moreover, I have frequently encountered situations in which a bunch of special settings have to be configured to make everything work right, which can be quite cumbersome and tedious.

Many developers just simply download and install everything on their machine in an ad-hoc manner and manually perform all the steps they have to do.

Automating the deployment of dependencies of a software project


What I often do instead is to immediately use Nix to automate the process of installing all software project dependencies and configuring all their settings.

To fully automate deployment with Nix, all dependencies have to be provided as Nix packages. First, I investigate if everything I need is in the Nix packages collection. Fortunately, the Nix package collection provides packages for many kinds of programming languages and associated libraries. If anything is missing, I package it myself. Although some things are very obscure to package (such as the Android SDK), most things can be packaged in a straight forward manner.

After packaging all the missing dependencies, I must install them and generate an environment in which all the dependencies can be found. For example, in order to let Java projects find their library dependencies, we must set the CLASSPATH environment variable to refer to directories or JAR files containing our required compiled classes, PERL5LIB to let Perl find its Perl modules, NODE_PATH to let Node.js find its CommonJS modules etc.

The Nix packages collection has a nice utility function called myEnvFun that can be used to automatically do this. This function can be used to compose development environments that automatically set nearly all of its required development settings. The following example Nix expression (disnixenv.nix) can be used to deploy a development environment for Disnix:

with import <nixpkgs> {};

myEnvFun {
  name = "disnix";
  buildInputs = [
    pkgconfig dbus dbus_glib libxml2 libxslt getopt
    autoconf automake libtool dysnomia
  ];
}

The development environment can be deployed by installing it with the Nix package manager:

$ nix-env -f disnixenv.nix -i env-disnix

The above command-line invocation deploys all Disnix's dependencies and generates a shell script (named: load-env-disnix) that launches a shell session in an environment in which all build settings are configured. We can enter this development environment by running:

$ load-env-disnix
env-disnix loaded

disnix:[sander@nixos:~]$

As can be seen in the code fragment above, we have entered an environment in which all its development dependencies are present and configured. For example, the following command-line instruction should work, since libxml2 is declared as a development dependency:

$ xmllint --version
xmllint: using libxml version 20901

We can also unpack the Disnix source tarball and run the following build instructions, which should work without any trouble:

$ ./bootstrap
$ ./configure
$ make

Besides deploying a development environment, we can also discard it if we don't need it anymore by running:

$ nix-env -e env-disnix

Automating common deployment tasks


After deploying a development environment, I'm usually doing all kinds of development tasks until the project reaches a more stable state. Once this is the case, I immediately automate most of its deployment operations in a Nix expression (that is often called: release.nix).

This Nix expression is typically an attribute set following Hydra's (the Nix-based continuous integration server) convention in which each attribute represents a deployment job. Jobs that I typically implement are:

  • Source package. This is a job that helps me to easily redistribute the source code of a project to others. For GNU Autotools based projects this typically involves the: make dist command-line instruction. Besides GNU Autotools, I also generate source packages for other kinds of projects. For example, in one of my Node.js projects (such a NiJS) I produce a TGZ file that can be deployed with the NPM package manager.
  • Binary package jobs that actually build the software project for every target system that I want to support, such as i686-linux, x86_64-linux, and x86_64-darwin.
  • A job that automatically compiles the program manual from e.g. Docbook, if this applicable.
  • A program documentation catalog job. Many programming languages allow developers to write code-level documentation in comments from which a documentation catalog can be generated, e.g. through javadoc, doxygen or JSDuck. I also create a job that takes care of doing that.
  • Unit tests. If my project has a unit test suite, I also create a job executing those, since it is required to deploy most of its development dependencies to run the test suite as well.
  • System integration tests. If I have any system integration tests that can run be on Linux, I try to implement a job using the NixOS test driver to run those. The NixOS test driver also automatically deploys all the required environmental dependencies in a VM, such as a DBMS, web server etc. in such a way that you can run them as a unprivileged user without affecting the host system's configuration.

After writing a release Nix expression, I can use the Nix package manager to perform all the deployment tasks for my software project. For example, I can run the following job that comes with Disnix to build a source tarball:

$ nix-build release.nix -A tarball

Another advantage is that I can use the same expression in combination with Hydra to so that I can continuously build and test it.

Spawning development environments from Nix jobs


Apart from being able to deploy a software project and its dependencies, another nice advantage of having its deployment automated through Nix is that I can use the same Nix jobs to reproduce its build environment. By replacing nix-build with nix-shell all its dependencies are deployed, but instead of building the package itself, a shell session is started in which all its dependencies are configured:

$ nix-shell release.nix -A tarball

The above command-line invocation produces a similar build environment as shown earlier with myEnvFun by just using a Nix expression for an existing job.

Discussion


So far I have explained what I typically do with the Nix package manager during development. So why is this actually useful?

  • Besides deploying systems into production environments, setting up a development environment is a similar problem with similar complexities. For me it is logical that I solve these kind of problems the same way as I do in production environments.
  • Packaging all software project dependencies in Nix ensures reliable and reproducible deployment in production environments as well, since Nix ensures dependency completeness and removes as many side effects as possible. In Nix, we have strong guarantees that a system deployed through Nix in production behaves exactly the same in development and vice-versa.
  • We can also share development environments, which is useful for a variety of reasons. For example, we can reproduce the same development environment on a second machine, or give it to another developer to prevent him spending a lot of time in setting up his development environment.
  • Supporting continuous integration and testing with Hydra comes (almost) for free.
  • With relatively little effort you can also enable distributed deployment of your software project with Disnix and/or NixOps.

Although I have encountered people telling me that it's consuming too much time, you also get a lot for it in return. Moreover, in most common scenarios I don't think the effort of automating the deployment of a development environment is that big if you have experience with it.

As a concluding remark, I know that things like Agile software development and continuous delivery of new versions is something that (nearly) everybody wants these days. To implement continuous delivery/deployment, the first step is to automate the deployment of a software project from the very beginning. That is exactly what I'm doing by executing the steps described in this blog post.

People may also want to read an old blog post that contains recommendations to improve software deployment processes.

Thursday, December 12, 2013

Asynchronous programming with JavaScript (part 2)

A while ago, I wrote a blog post about my asynchronous programming experiences/frustrations with JavaScript. The main message of this blog post is that if we want to run stuff concurrently, we have to implement cooperative multitasking by generating events (such as timeouts or ticks) and then stop the execution so that the events get processed that invoke callbacks to resume execution.

As a consequence of using callbacks, it has become much harder to structure code properly. Without any proper software abstractions, we may end up writing pyramid code that is hard to write, read, adapt and maintain. One of the solutions to alleviate such problems is by using software abstractions implemented by libraries such as async.

Recently, I've been experimenting with several JavaScript libraries/frameworks (such as AngularJS, Selenium JavaScript webdriver, and jQuery) and I went to a Titanium SDK (a cross-platform JavaScript framework for developing apps) developer meetup. I have noticed that yet another software abstraction has become quite popular in these worlds.

Promises


The software abstraction I am referring to is called "promises". To get some understanding on what they are supposed to mean, I will cite the Promises/A proposal, that is generally regarded as its official specification:

A promise represents the eventual value returned from the single completion of an operation.

To me this description looks quite abstract. Fortunately, the proposal also outlines a few practical bits. It turns out that promises are supposed to be encoded as objects having the following characteristics:

  • A promise can be in one of the following three states: unfulfilled, fulfilled, and failed. A promise in an unfulfilled state can either transition to a fullfilled state or failed state. The opposite transitions are not possible -- once a promise is fulfilled or failed it should remain in that state.
  • A promise object has a then member referring to a function with the following signature:
    .then(onFullfilled, onError, onProgress);
    
  • The then function parameters are all optional and refer to functions with the following purposes:

    • onFullfilled: A function that gets invoked once the promise reaches its fulfilled state.
    • onError: A function that gets invoked once the promise reaches its failed state.
    • onProgress: A function that may get invoked while the promise is still unfulfilled. Promises are actually not required to support this function at all.

    If any of these parameters is not a function, then they are ignored.
  • Each then function is supposed to return another promise (which also have a then property), allowing users to "chain" multiple then function invocations together. This is how promises allow someone to structure code more properly and prevent people from writing pyramid code, because chaining can be done at the same indentation level.

Semantic issues


Although some of the promise characteristics listed above may look useful, it has turned out that the Promises/A proposal has a number of weaknesses as well.

First of all, it is just a proposal containing some random notes and not an official specification. Furthermore, it has been written in an informal way and it is incomplete and underspecified. For example, it leaves a few semantic issues open, such as what should happen if a promise enters a failed state, when a particular then function invocation in a chain does not implement an onError callback? As a consequence, this has lead to several implementations having different and incompatible behaviour.

To address some of these issues, a second specification has been developed called the Promises/A+ specification. This specification is not intended to completely replace the Promises/A proposal, but instead it is concentrated mostly on the semantics of the then function (not including the onProgress parameter). Moreover, this specification is also written in a much more formal and concise way in my opinion.

For example, the Promises/A+ specification adds the following details:

  • It specifies the ability to invoke a promise's then function multiple times.
  • Propagation of results and errors in a then function invocation chain, if any of their callback function parameters are not set.

When using Promises/A+ compliant promise objects, error propagation works in a similar way compared to a synchronous try { } catch { } block allowing one to execute a group of instructions and use a separate block at the end to catch any exceptions that may be thrown while executing these.

An example: Using the Selenium webdriver


To demonstrate how promises can be used, I have developed an example case with the JavaScript Selenium webdriver, a web browser automation framework used frequently for testing web applications. This package uses promises to encode query operations.

Our testcase works on a webpage that contains a form allowing someone to enter its first and last name. By default, they contain my first and last name:


The corresponding HTML code looks as follows:

<!DOCTYPE html>

<html>
    <head><title>Selenium test</title></head>
    
    <body>
        <h1>Selenium test</h1>
        
        <p>Please enter your names:</p>
        
        <form action="test.html" method="post">
            <p>
                <label>First name</label>
                <input type="text" name="firstname" value="Sander">
            </p>
            <p>
                <label>Last name</label>
                <input type="text" name="lastname" value="van der Burg">
            </p>
            <button type="submit" id="submit">Submit</button>
        </form>
    </body>
</html>

To automatically retrieve the values of both input fields and to click on the submit button, I have written the following JavaScript code:

var webdriver = require('selenium-webdriver');
var remote = require('selenium-webdriver/remote');
var webdrvr = require('webdrvr');

/* Create Selenium server */
var server = new remote.SeleniumServer(webdrvr.selenium.path, {
    args: webdrvr.args
});

/* Start the server */
server.start();

/* Set up a Chrome driver */
var driver = new webdriver.Builder()
    .usingServer(server.address())
    .withCapabilities(webdriver.Capabilities.chrome())
    .build();

/* Execute some tests */

driver.get('file://'+process.cwd()+'/test.html')
.then(function() {
    return driver.findElement(webdriver.By.name('firstname'));
})
.then(function(firstNameInput) {
    return firstNameInput.getAttribute('value');
})
.then(function(firstName) {
    console.log("First name: "+firstName);
    return driver.findElement(webdriver.By.name('lastname'));
})
.then(function(lastNameInput) {
    return lastNameInput.getAttribute('value');
})
.then(function(lastName) {
    console.log("Last name: "+lastName);
    return driver.findElement(webdriver.By.id('submit'));
})
.then(function(submitButton) {
    return submitButton.click();
}, function(err) {
    console.log("Some error occured: "+err);
});

/* Stop the server */
server.stop();

As may be observed from the code fragment above, besides setting up a test driver that uses Google Chrome, all the test operations are encoded as a chain of then function invocations each returning a promise:
  • First, we instruct the test driver to get our example HTML page containing the form by creating a promise.
  • After the page has been opened, we create and return a promise that queries the HTML input element that is used for filling in a first name.
  • Then we retrieve the value of the first name input field, by creating and returning a promise that fetches the value attribute of the HTML input element.
  • We repeat the previous two steps in a similar way to fetch the last name as well.
  • Finally, we create and return promises that queries the submit button and clicks on it.
  • In the last then function invocation, we define a onError callback that catches any errors that might occur when executing any of the previous steps.

As can be observed, we do not have to write any nested callback blocks that result in horrible pyramid code.

Implementing promises


The Selenium example shows us how asynchronous test code can be better structured using promises. The next thing I was thinking about is how can I use promises to fix structuring issues in my own code?

It turns out that there are many ways to do this. In fact, the Promise/A and Promise/A+ specifications are just merely interface specifications and leave it up to developers to properly implement them. For example, it can be done completely manually, but there are also several libraries available providing abstractions for that, such as Q, RSVP, and when.

As an experiment, I have restructured the following pyramid example code fragment from my previous blog post (that I originally used to restructure with async) into a better structured code fragment with promises:

var fs = require('fs');
var path = require('path');

fs.mkdir("out", 0755, function(err) {
    if(err) throw err;
    
    fs.mkdir(path.join("out, "test"), 0755, function(err) {
        if (err) throw err;        
        var filename = path.join("out", "test", "hello.txt");

        fs.writeFile(filename, "Hello world!", function(err) {
            if(err) throw err;
                    
            fs.readFile(filename, function(err, data) {
                if(err) throw err;
                
                if(data == "Hello world!")
                    process.stderr.write("File is correct!\n");
                else
                    process.stderr.write("File is incorrect!\n");
            });
        });
    });
});

I have picked the RSVP library to implement promises, since it claims to be lightweight and fully Promises/A+ compliant. The restructured code looks as follows:

var fs = require('fs');
var path = require('path');
var Promise = require('rsvp').Promise;

/* Promise object definitions */

var mkdir = function(dirname) {
    return new Promise(function(resolve, reject) {
        fs.mkdir(dirname, 0755, function(err) {
            if(err) reject(err);
            else resolve();
        });
    });
};

var writeHelloTxt = function(filename) {
    return new Promise(function(resolve, reject) {
        fs.writeFile(filename, "Hello world!", function(err) {
            if(err) reject(err);
            else resolve();
        });
    });
};

var readHelloTxt = function(filename) {
    return new Promise(function(resolve, reject) {
        fs.readFile(filename, function(err, data) {
            if(err) reject(err);
            else resolve(data);
        });
    });
};

/* Promise execution chain */

var filename = path.join("out", "test", "hello.txt");

mkdir(path.join("out"))
.then(function() {
    return mkdir(path.join("out", "test"));
})
.then(function() {
    return writeHelloTxt(filename);
})
.then(function() {
    return readHelloTxt(filename);
})
.then(function(data) {
    if(data == "Hello world!")
        process.stderr.write("File is correct!\n");
    else
        process.stderr.write("File is incorrect!\n");
}, function(err) {
    console.log("An error occured: "+err);
});

The revised code fragment above is structured as follows:

  • The overall structure consists of two parts: First we define a number of promises and then we execute them by composing a then function invocation chain.
  • I have encapsulated most of the operations from the previous code fragment into reusable promises: one makes a directory, one writes a text file, one reads the text file.
  • A promise is created by instantiating the rsvp.Promise prototype. The constructor function always takes the same function as a parameter taking resolve and reject as parameters. resolve is a callback that can be used to notify a caller that it has reached its fulfilled state. The reject parameter notifies a caller that it has reached its failed state. Both of these functions can take a value as parameter that can be passed to a then function invocation.
  • To be able to parametrize these promises, I have encapsulated them in functions
  • In the remainder of the code, I instantiate the promises and execute them in a then function invocation chain. As can be observed, we have no pyramid code.
  • However, I observed that our refactored code has a big drawback as well -- it has grown significantly in size which can be considered a deterioration compared to the pyramid code fragment.

Discussion


So far, I have given an explanation of what promises are supposed to mean, how they can be used and how they can be created. We have seen that code structuring issues are "solved" by the fact that then function invocations can be chained.

In my previous blog post, I have explored the async library as a means to solve the same kind of issues, which make me wonder which approach is best? To me, the approach of chaining then function invocations looks quite similar to async's waterfall pattern. However, I find the promise approach slightly better, because it uses return values instead of callbacks preventing us to accidentally reach limbo state if we forget to call them.

However, I think async's waterfall pattern has a lower usage barrier because we do not have to capture all the steps we want into promise objects resulting in less overhead of code.

I think that using promises to structure code works best for things that can be encoded as reusable objects, such as those used to implement a system having a Model-View-Controller (MVC) pattern, that e.g. pass reusable objects from the model to the controller and view. In situations in which we just have to execute a collection of ad-hoc operations (such as those in my example) it simply produces a lot of extra overhead.

Finally, async's waterfall pattern as well as chaining then function invocations work well for things that consist of a fixed number of steps that have to be executed sequentially. However, it may also be desirable to execute steps in parallel, or to use asynchronous for and while loops. The promises specifications do not provide any solutions to cope with these kind of code structuring issues.

Some other thoughts


After doing these experiments and writing this blog post, I still have some remaining thoughts:

  • I have noticed that promises are considered the "the next great paradigm" by some people. This actually makes me wonder what the "previous great paradigm" was. Moreover, I think promises are only useful in certain cases, but definitely not for all JavaScript's asynchronous programming problems.
  • Another interesting observation I did is not related to JavaScript or asynchronous programming in general. I have observed that writing specifications, which purpose is to specify what kind of requirements a solution should fulfill, is a skill that is not present with quite a few people doing software engineering. It's not really surprising to me that the Promise/A proposal leads to so much issues. Fortunately, the Promises/A+ authors demonstrate how to properly write a specification (plus that they are aware of the fact that specifications always have limitations and issues).
  • I'm still surprised how many solutions are being developed for a problem that is non-existent in other programming environments. Actually, I consider these software abstractions dealing with asynchronous programming "workarounds", not solutions. Because none of them is ideal there are many of them (the joyent Node wiki currently lists 79 libraries and I know there are more of them), I expect many more to come.
  • Personally, I consider the best solution a runtime environment and/or programming language that has facilities to cope with concurrency. It is really surprising to me that nobody is working on getting that done. UPDATE: There is a bit of light at the end of the tunnel according to Zef's blog titled: "Callback-Free Harmonious Node.js" post that I just read.
UPDATE: I have written a follow up blog post in which I take a look at asynchronous programming in a more fundamental way. I also derived my own abstraction library from my observations.