Monday, June 20, 2016

Using Disnix as a remote package deployer

Recently, I was asked whether it is possible to use Disnix as a tool for remote package deployment.

As described in a number of earlier blog posts, Disnix's primary purpose is not remote (or distributed) package management, but deploying systems that can be decomposed into services to networks of machines. To deploy these kinds of systems, Disnix executes all required deployment activities, including building services from source code, distributing them to target machines in the network and activating or deactivating them.

However, a service deployment process is basically a superset of an "ordinary" package deployment process. In this blog post, I will describe how we can do remote package deployment by instructing Disnix to only use a relevant subset of features.

Specifying packages as services


In the Nix packages collection, it is a common habit to write each package specification as a function in which the parameters denote the (local) build and runtime dependencies (something that Disnix's manual refers to as intra-dependencies) that the package needs. The remainder of the function describes how to build the package from source code and its provided dependencies.

Disnix has adopted this habit and extended this convention to services. The main difference between Nix package expressions and Disnix service expressions is that the latter also take inter-dependencies into account that refer to run-time dependencies on services that may have been deployed to other machines in the network. For services that have no inter-dependencies, a Disnix expression is identical to an ordinary package expression.

This means that, for example, an expression for a package such as the Midnight Commander is also a valid Disnix service with no inter-dependencies:

{ stdenv, fetchurl, pkgconfig, glib, gpm, file, e2fsprogs
, libX11, libICE, perl, zip, unzip, gettext, slang
}:

stdenv.mkDerivation {
  name = "mc-4.8.12";
  
  src = fetchurl {
    url = http://www.midnight-commander.org/downloads/mc-4.8.12.tar.bz2;
    sha256 = "15lkwcis0labshq9k8c2fqdwv8az2c87qpdqwp5p31s8gb1gqm0h";
  };
  
  buildInputs = [ pkgconfig perl glib gpm slang zip unzip file gettext
      libX11 libICE e2fsprogs ];

  meta = {
    description = "File Manager and User Shell for the GNU Project";
    homepage = http://www.midnight-commander.org;
    license = "GPLv2+";
    maintainers = [ stdenv.lib.maintainers.sander ];
  };
}

Composing packages locally


Package and service expressions are functions that do not specify the versions or variants of the dependencies that should be used. To allow services to be deployed, we must compose them by providing the desired versions or variants of the dependencies as function parameters.

As with ordinary Nix packages, Disnix has also adopted this convention for services. In addition, we have to compose a Disnix service twice -- first its intra-dependencies and later its inter-dependencies.

Intra-dependency composition in Disnix is done in a similar way as in the Nix packages collection:

{pkgs, system}:

let
  callPackage = pkgs.lib.callPackageWith (pkgs // self);

  self = {
    pkgconfig = callPackage ./pkgs/pkgconfig { };
  
    gpm = callPackage ./pkgs/gpm { };
  
    mc = callPackage ./pkgs/mc { };
  };
in
self

The above expression (custom-packages.nix) composes the Midnight Commander package by providing its intra-dependencies as function parameters. The third attribute (mc) invokes a function named: callPackage {} that imports the previous package expression and automatically provides the parameters having the same names as the function parameters.

The callPackage { } function first consults the self attribute set (that composes some of Midnight Commander's dependencies as well, such as gpm and pkgconfig) and then any package from the Nixpkgs repository.

Writing a minimal services model


Previously, we have shown how to build packages from source code and its dependencies, and how to compose packages locally. For the deployment of services, more information is needed. For example, we need to compose their inter-dependencies so that services know how to reach them.

Furthermore, Disnix's end objective is to get a running service-oriented system and carries out extra deployment activities for services to accomplish this, such as activation and deactivation. The latter two steps are executed by a Dysnomia plugin that is determined by annotating a service with a type attribute.

For package deployment, specifying these extra attributes and executing these remaining activities are in principle not required. Nonetheless, we still need to provide a minimal services model so that Disnix knows which units can be deployed.

Exposing the Midnight Commander package as a service, can be done as follows:

{pkgs, system, distribution, invDistribution}:

let
  customPkgs = import ./custom-packages.nix {
    inherit pkgs system;
  };
in
{
  mc = {
    name = "mc";
    pkg = customPkgs.mc;
    type = "package";
  };
}

In the above expression, we import our intra-dependency composition expression (custom-packages.nix), and we use the pkg sub attribute to refer to the intra-dependency composition of the Midnight Commander. We annotate the Midnight Commander service with a package type to instruct Disnix that no additional deployment steps need to be performed beyond the installation of the package, such activation or deactivation.

Since the above pattern is common to all packages, we can also automatically generate services for any package in the composition expression:

{pkgs, system, distribution, invDistribution}:

let
  customPkgs = import ./custom-packages.nix {
    inherit pkgs system;
  };
in
pkgs.lib.mapAttrs (name: pkg: {
  inherit name pkg;
  type = "package";
}) customPkgs

The above services model exposes all packages in our composition expression as a service.

Configuring the remote machine's search paths


With the services models shown in the previous section, we have all ingredients available to deploy packages with Disnix. To allow users on the remote machines to conveniently access their packages, we must add Disnix' Nix profile to the PATH of a user on the remote machines:

$ export PATH=/nix/var/nix/profiles/disnix/default/bin:$PATH

When using NixOS, this variable can be extended by adding the following line to /etc/nixos/configuration.nix:

environment.variables.PATH = [ "/nix/var/nix/profiles/disnix/default/bin" ];

Deploying packages with Disnix


In addition to a services model, Disnix needs an infrastructure and distribution model to deploy packages. For example, we can define an infrastructure model that may look as follows:

{
  test1.properties.hostname = "test1";
  test2 = {
    properties.hostname = "test2";
    system = "x86_64-darwin";
  };
}

The above infrastructure model describes two machines that have hostname test1 and test2. Furthermore, machine test2 has a specific system architecture: x86_64-darwin that corresponds to a 64-bit Intel-based Mac OS X.

We can distribute package to these two machines with the following distribution model:

{infrastructure}:

{
  gpm = [ infrastructure.test1 ];
  pkgconfig = [ infrastructure.test2 ];
  mc = [ infrastructure.test1 infrastructure.test2 ];
}

In the above distribution model, we distribute package gpm to machine test1, pkgconfig to machine test2 and mc to both machines.

When running the following command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix executes all activities to get the packages in the distribution model deployed to the machines, such as building them from source code (including its dependencies), and distributing their dependency closures to the target machines.

Because machine test2 may have a different system architecture as the coordinator machine responsible for carrying out the deployment, Disnix can use Nix's delegation mechanism to forward a build to a machine that is capable of doing it.

Alternatively, packages can also be built on the target machines through Disnix:

$ disnix-env --build-on-targets \
  -s services.nix -i infrastructure.nix -d distribution.nix

After the deployment above command-line instructions have succeeded, we should be able to start the Midnight Commander on any of the target machines, by running:

$ mc

Deploying any package from the Nixpkgs repository


Besides deploying a custom set of packages, it is also possible to use Disnix to remotely deploy any package in the Nixpkgs repository, but doing so is a bit tricky.

The main challenge lies in the fact that the Nix packages set is a nested set of attributes, whereas Disnix expects services to be addressed in one attribute set only. Fortunately, the Nix expression language and Disnix models are flexible enough to implement a solution. For example, we can define a distribution model as follows:

{infrastructure}:

{
  mc = [ infrastructure.test1 ];
  git = [ infrastructure.test1 ];
  wget = [ infrastructure.test1 ];
  "xlibs.libX11" = [ infrastructure.test1 ];
}

Note that we use a dot notation: xlibs.libX11 as an attribute name to refer to libX11 that can only be referenced as a sub attribute in Nixpkgs.

We can write a services model that uses the attribute names in the distribution model to refer to the corresponding package in Nixpkgs:

{pkgs, system, distribution, invDistribution}:

pkgs.lib.mapAttrs (name: targets:
  let
    attrPath = pkgs.lib.splitString "." name;
  in
  { inherit name;
    pkg = pkgs.lib.attrByPath attrPath
      (throw "package: ${name} cannot be referenced in the package set")
      pkgs;
    type = "package";
  }
) distribution

With the above service model we can deploy any Nix package to any remote machine with Disnix.

Multi-user package management


Besides supporting single user installations, Nix also supports multi-user installations in which every user has its own private Nix profile with its own set of packages. With Disnix we can also manage multiple profiles. For example, by adding the --profile parameter, we can deploy another Nix profile that, for example, contains a set of packages for the user: sander:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix \
  --profile sander

The user: sander can access its own set of packages by setting the PATH environment variable to:

$ export PATH=/nix/var/nix/profiles/disnix/sander:$PATH

Conclusion


Although Disnix has not been strictly designed for this purpose, I have described in this blog post how Disnix can be used as a remote package deployer by using a relevant subset of Disnix features.

Moreover, I now consider the underlying Disnix primitives to be mature enough. As such, I am announcing the release of Disnix 0.6!

Acknowledgements


I gained the inspiration for writing this blog post from discussions with Matthias Beyer on the #nixos IRC channel.

Saturday, June 11, 2016

Deploying containers with Disnix as primitives for multi-layered service deployments

As explained in an earlier blog post, Disnix is a service deployment tool that can only be used after a collection of machines have been predeployed providing a number of container services, such as a service manager (e.g. systemd), a DBMS (e.g. MySQL) or an application server (e.g. Apache Tomcat).

To deploy these machines, we need an external solution. Some solutions are:

  • Manual installations requiring somebody to obtain a few machines, manually installing operating systems (e.g. a Linux distribution), and finally installing all required software packages, such as Nix, Dysnomia, Disnix and any additional container services. Manually configuring a machine is typically tedious, time consuming and error prone.
  • NixOps. NixOps is capable of automatically instantiating networks of virtual machines in the cloud (such as Amazon EC2) and deploying entire NixOS system configurations to them. These NixOS configurations can be used to automatically deploy Dysnomia, Disnix and any container service that we need. A drawback is that NixOps is NixOS-based and not really useful if you want to deploy services to machines running different kinds of operating systems.
  • disnixos-deploy-network. In a Disnix-context, services are basically undefined units of deployment, and we can also automatically deploy entire NixOS configurations to target machines as services. A major drawback of this approach is that we require predeployed machines running Disnix first.

Although there are several ways to manage the underlying infrastructure of services, they are basically all or nothing solutions with regards to automation -- we either have to manually deploy entire machine configurations ourselves or we are stuck to a NixOS-based solution that completely automates it.

In some scenarios (e.g. when it is desired to deploy services to non-Linux operating systems), the initial deployment phase becomes quite tedious. For example, it took me quite a bit of effort to set up the heterogeneous network deployment demo I have given at NixCon2015.

In this blog post, I will describe an approach that serves as an in-between solution -- since services in a Disnix-context can be (almost) any kind of deployment unit, we can also use Disnix to deploy container configurations as services. These container services can also be deployed to non-NixOS systems, which means that we can alleviate the effort in setting up the initial target system configurations where Disnix can deploy services to.

Deploying containers as services with Disnix


As with services, containers in a Disnix-context could take any form. For example, in addition to MySQL databases (that we can deploy as services with Disnix), we can also deploy the corresponding container: the MySQL DBMS server, as a Disnix service:

{ stdenv, mysql, dysnomia
, name ? "mysql-database"
, mysqlUsername ? "root", mysqlPassword ? "secret"
, user ? "mysql-database", group ? "mysql-database"
}:

stdenv.mkDerivation {
  inherit name;
  
  buildCommand = ''
    mkdir -p $out/bin
      
    # Create wrapper script
    cat > $out/bin/wrapper <<EOF
    #! ${stdenv.shell} -e
      
    case "\$1" in
        activate)
            # Create group, user and the initial database if it does not exists
            # ...

            # Run the MySQL server
            ${mysql}/bin/mysqld_safe --user=${user} --datadir=${dataDir} --basedir=${mysql} --pid-file=${pidDir}/mysqld.pid &
            
            # Change root password
            # ...
            ;;
        deactivate)
            ${mysql}/bin/mysqladmin -u ${mysqlUsername} -p "${mysqlPassword}" -p shutdown
            
            # Delete the user and group
            # ...
            ;;
    esac
    EOF
    
    chmod +x $out/bin/wrapper

    # Add Dysnomia container configuration file for the MySQL DBMS
    mkdir -p $out/etc/dysnomia/containers

    cat > $out/etc/dysnomia/containers/${name} <<EOF
    mysqlUsername="${mysqlUsername}"
    mysqlPassword="${mysqlPassword}"
    EOF
    
    # Copy the Dysnomia module that manages MySQL databases
    mkdir -p $out/etc/dysnomia/modules
    cp ${dysnomia}/libexec/dysnomia/mysql-database $out/etc/dysnomia/modules
  '';
}

The above code fragment is a simplified Disnix expression that can be used to deploy a MySQL server. The above expression produces a wrapper script, which carries out a set of deployment activities invoked by Disnix:

  • On activation, the wrapper script starts the MySQL server by spawning the mysqld_safe daemon process in background mode. Before starting the daemon, it also intitializes some of the server's state, such as creating user accounts under which the daemon runs and setting up the system database if it does not exists (these steps are left out of the example for simplicity reasons).
  • On deactivation it shuts down the MySQL server and removes some of the attached state, such as the user accounts.

Besides composing a wrapper script, we must allow Dysnomia (and Disnix) to deploy databases as Disnix services to the MySQL server that we have just deployed:

  • We generate a Dysnomia container configuration file with the MySQL server settings to allow a database (that gets deployed as a service) to know what credentials it should use to connect to the database.
  • We bundle a Dysnomia plugin module that implements the deployment activities for MySQL databases, such as activation and deactivation. Because Dysnomia offers this plugin as part of its software distribution, we make a copy of it, but we could also compose our own plugin from scratch.

With the earlier shown Disnix expression, we can define the MySQL server as a service in a Disnix services model:

mysql-database = {
  name = "mysql-database";
  pkg = customPkgs.mysql-database;
  dependsOn = {};
  type = "wrapper";
};

and distribute it to a target machine in the network by adding an entry to the distribution model:

mysql-database = [ infrastructure.test2 ];

Configuring Disnix and Dysnomia


Once we have deployed containers as Disnix services, Disnix (and Dysnomia) must know about their availability so that we can deploy services to these recently deployed containers.

Each time Disnix has successfully deployed a configuration, it generates Nix profiles on the target machines in which the contents of all services can be accessed from a single location. This means that we can simply extend Dysnomia's module and container search paths:

export DYSNOMIA_MODULES_PATH=$DYSNOMIA_MODULES_PATH:/nix/var/nix/profiles/disnix/containers/etc/dysnomia/modules
export DYSNOMIA_CONTAINERS_PATH=$DYSNOMIA_CONTAINERS_PATH:/nix/var/nix/profiles/disnix/containers/etc/dysnomia/containers

with the paths to the Disnix profiles that have containers deployed.

A simple example scenario


I have modified the Java variant of the ridiculous Disnix StaffTracker example to support a deployment scenario with containers as Disnix services.

First, we need to start with a collection of machines having a very basic configuration without any additional containers. The StaffTracker package contains a bare network configuration that we can deploy with NixOps, as follows:

$ nixops create ./network-bare.nix ./network-virtualbox.nix -d vbox
$ nixops deploy -d vbox

By configuring the following environment variables, we can connect Disnix to the machines in the network that we have just deployed with NixOps:

$ export NIXOPS_DEPLOYMENT=vbox
$ export DISNIX_CLIENT_INTERFACE=disnix-nixops-client

We can write a very simple bootstrap infrastructure model (infrastructure-bootstrap.nix), to dynamically capture the configuration of the target machines:

{
  test1.properties.hostname = "test1";
  test2.properties.hostname = "test2";
}

Running the following command:

$ disnix-capture-infra infrastructure-bootstrap.nix > infrastructure-bare.nix

yields an infrastructure model (infrastructure-containers.nix) that may have the following structure:

{
  "test1" = {
    properties = {
      "hostname" = "test1";
      "system" = "x86_64-linux";
    };
    containers = {
      process = {
      };
      wrapper = {
      };
    };
    "system" = "x86_64-linux";
  };
  "test2" = {
    properties = {
      "hostname" = "test2";
      "system" = "x86_64-linux";
    };
    containers = {
      process = {
      };
      wrapper = {
      };
    };
    "system" = "x86_64-linux";
  };
}

As may be observed in the captured infrastructure model shown above, we have a very minimal configuration only hosting the process and wrapper containers, that integrate with host system's service manager, such as systemd.

We can deploy a Disnix configuration having Apache Tomcat and the MySQL DBMS as services, by running:

$ disnix-env -s services-containers.nix \
  -i infrastructure-bare.nix \
  -d distribution-containers.nix \
  --profile containers

Note that we have provided an extra parameter to Disnix: --profile to isolate the containers from the default deployment environment. If the above command succeeds, we have a deployment architecture that looks as follows:


Both machines have Apache Tomcat deployed as a service and machine test2 also runs a MySQL server.

When capturing the target machines' configurations again:

$ disnix-capture-infra infrastructure-bare.nix > infrastructure-containers.nix

we will receive an infrastructure model (infrastructure-containers.nix) that may have the following structure:

{
  "test1" = {
    properties = {
      "hostname" = "test1";
      "system" = "x86_64-linux";
    };
    containers = {
      tomcat-webapplication = {
        "tomcatPort" = "8080";
      };
      process = {
      };
      wrapper = {
      };
    };
    "system" = "x86_64-linux";
  };
  "test2" = {
    properties = {
      "hostname" = "test2";
      "system" = "x86_64-linux";
    };
    containers = {
      mysql-database = {
        "mysqlUsername" = "root";
        "mysqlPassword" = "secret";
        "mysqlPort" = "3306";
      };
      tomcat-webapplication = {
        "tomcatPort" = "8080";
      };
      process = {
      };
      wrapper = {
      };
    };
    "system" = "x86_64-linux";
  };
}

As may be observed in the above infrastructure model, both machines provide a tomcat-webapplication container exposing the TCP port number that the Apache Tomcat server has been bound to. Machine test2 exposes the mysql-database container with its connectivity settings.

We can now deploy the StaffTracker system (that consists of multiple MySQL databases and Apache Tomcat web applications) by running:

$ disnix-env -s services.nix \
  -i infrastructure-containers.nix \
  -d distribution.nix \
  --profile services

Note that I use a different --profile parameter, to tell Disnix that the StaffTracker components belong to a different environment than the containers. If I would use --profile containers again, Disnix will undeploy the previously shown containers environment with the MySQL DBMS and Apache Tomcat and deploy the databases and web applications, which will lead to a failure.

If the above command succeeds, we have the following deployment architecture:


The result is that we have all the service components of the StaffTracker example deployed to containers that are also deployed by Disnix.

An advanced example scenario: multi-containers


We could go even one step beyond the example I have shown in the previous section. In the first example, we deploy no more than one instance of each container to a machine in the network -- this is quite common, as it rarely happens that you want to run two MySQL or Apache Tomcat servers on a single machine. Most Linux distributions (including NixOS) do not support deploying multiple instances of system services out of the box.

However, with a few relatively simple modifications to the Disnix expressions of the MySQL DBMS and Apache Tomcat services, it becomes possible to allow multiple instances to co-exist on the same machine. What we basically have to do is identifying the conflicting runtime resources, making them configurable and changing their values in such a way that they no longer conflict.

{ stdenv, mysql, dysnomia
, name ? "mysql-database"
, mysqlUsername ? "root", mysqlPassword ? "secret"
, user ? "mysql-database", group ? "mysql-database"
, dataDir ? "/var/db/mysql", pidDir ? "/run/mysqld"
, port ? 3306
}:

stdenv.mkDerivation {
  inherit name;
  
  buildCommand = ''
    mkdir -p $out/bin
    
    # Create wrapper script
    cat > $out/bin/wrapper <<EOF
    #! ${stdenv.shell} -e
       
    case "\$1" in
        activate)
            # Create group, user and the initial database if it does not exists
            # ...

            # Run the MySQL server
            ${mysql}/bin/mysqld_safe --port=${toString port} --user=${user} --datadir=${dataDir} --basedir=${mysql} --pid-file=${pidDir}/mysqld.pid --socket=${pidDir}/mysqld.sock &
            
            # Change root password
            # ...
            ;;
        deactivate)
            ${mysql}/bin/mysqladmin --socket=${pidDir}/mysqld.sock -u ${mysqlUsername} -p "${mysqlPassword}" -p shutdown
            
            # Delete the user and group
            # ...
            ;;
    esac
    EOF
    
    chmod +x $out/bin/wrapper
  
    # Add Dysnomia container configuration file for the MySQL DBMS
    mkdir -p $out/etc/dysnomia/containers
    
    cat > $out/etc/dysnomia/containers/${name} <<EOF
    mysqlUsername="${mysqlUsername}"
    mysqlPassword="${mysqlPassword}"
    mysqlPort=${toString port}
    mysqlSocket=${pidDir}/mysqld.sock
    EOF
    
    # Copy the Dysnomia module that manages MySQL databases
    mkdir -p $out/etc/dysnomia/modules
    cp ${dysnomia}/libexec/dysnomia/mysql-database $out/etc/dysnomia/modules
  '';
}

For example, I have revised the MySQL server Disnix expression with additional parameters that change the TCP port the service binds to, the UNIX domain socket that is used by the administration utilities and the filesystem location where the databases are stored. Moreover, these additional configuration properties are also exposed by the Dysnomia container configuration file.

These additional parameters make it possible to define multiple variants of container services in the services model:

{distribution, invDistribution, system, pkgs}:

let
  customPkgs = import ../top-level/all-packages.nix {
    inherit system pkgs;
  };
in
rec {
  mysql-production = {
    name = "mysql-production";
    pkg = customPkgs.mysql-production;
    dependsOn = {};
    type = "wrapper";
  };
  
  mysql-test = {
    name = "mysql-test";
    pkg = customPkgs.mysql-test;
    dependsOn = {};
    type = "wrapper";
  };
  
  tomcat-production = {
    name = "tomcat-production";
    pkg = customPkgs.tomcat-production;
    dependsOn = {};
    type = "wrapper";
  };
  
  tomcat-test = {
    name = "tomcat-test";
    pkg = customPkgs.tomcat-test;
    dependsOn = {};
    type = "wrapper";
  };
}

I can, for example, map two MySQL DBMS instances and the two Apache Tomcat servers to the same machines in the distribution model:

{infrastructure}:

{
  mysql-production = [ infrastructure.test1 ];
  mysql-test = [ infrastructure.test1 ];
  tomcat-production = [ infrastructure.test2 ];
  tomcat-test = [ infrastructure.test2 ];
}

Deploying the above configuration:

$ disnix-env -s services-multicontainers.nix \
  -i infrastructure-bare.nix \
  -d distribution-multicontainers.nix \
  --profile containers

yields the following deployment architecture:


As can be observed, we have two instances of the same container hosted on the same machine. When capturing the configuration:

$ disnix-capture-infra infrastructure-bare.nix > infrastructure-multicontainers.nix

we will receive a Nix expression that may look as follows:

{
  "test1" = {
    properties = {
      "hostname" = "test1";
      "system" = "x86_64-linux";
    };
    containers = {
      mysql-production = {
        "mysqlUsername" = "root";
        "mysqlPassword" = "secret";
        "mysqlPort" = "3306";
        "mysqlSocket" = "/run/mysqld-production/mysqld.sock";
      };
      mysql-test = {
        "mysqlUsername" = "root";
        "mysqlPassword" = "secret";
        "mysqlPort" = "3307";
        "mysqlSocket" = "/run/mysqld-test/mysqld.sock";
      };
      process = {
      };
      wrapper = {
      };
    };
    "system" = "x86_64-linux";
  };
  "test2" = {
    properties = {
      "hostname" = "test2";
      "system" = "x86_64-linux";
    };
    containers = {
      tomcat-production = {
        "tomcatPort" = "8080";
        "catalinaBaseDir" = "/var/tomcat-production";
      };
      tomcat-test = {
        "tomcatPort" = "8081";
        "catalinaBaseDir" = "/var/tomcat-test";
      };
      process = {
      };
      wrapper = {
      };
    };
    "system" = "x86_64-linux";
  };
}

In the above expression, there are two instances of MySQL and Apache Tomcat deployed to the same machine. These containers have their resources configured in such a way that they do not conflict. For example, both MySQL instances bind to a different TCP ports (3306 and 3307) and different UNIX domain sockets (/run/mysqld-production/mysqld.sock and /run/mysqld-test/mysqld.sock).

After deploying the containers, we can also deploy the StaffTracker components (databases and web applications) to them. As described in my previous blog post, we can use an alternative (and more verbose) notation in the distribution model to directly map services to containers:

{infrastructure}:

{
  GeolocationService = {
    targets = [
      { target = infrastructure.test2; container = "tomcat-test"; }
    ];
  };
  RoomService = {
    targets = [
      { target = infrastructure.test2; container = "tomcat-production"; }
    ];
  };
  StaffService = {
    targets = [
      { target = infrastructure.test2; container = "tomcat-test"; }
    ];
  };
  StaffTracker = {
    targets = [
      { target = infrastructure.test2; container = "tomcat-production"; }
    ];
  };
  ZipcodeService = {
    targets = [
      { target = infrastructure.test2; container = "tomcat-test"; }
    ];
  };
  rooms = {
    targets = [
      { target = infrastructure.test1; container = "mysql-production"; }
    ];
  };
  staff = {
    targets = [
      { target = infrastructure.test1; container = "mysql-test"; }
    ];
  };
  zipcodes = {
    targets = [
      { target = infrastructure.test1; container = "mysql-production"; }
    ];
  };
}

As may be observed in the distribution model above, we deploy databases and web application to both instances that are hosted on the same machine.

We can deploy the services of which the StaffTracker consists, as follows:

$ disnix-env -s services.nix \
  -i infrastructure-multicontainers.nix \
  -d distribution-advanced.nix \
  --profile services

and the result is the following deployment architecture:


As may be observed in the picture above, we now have a running StaffTracker system that uses two MySQL and two Apache Tomcat servers on one machine. Isn't it awesome? :-)

Conclusion


In this blog post, I have demonstrated an approach in which we deploy containers as services with Disnix. Containers serve as potential deployment targets for other Disnix services.

Previously, we only had NixOS-based solutions to manage the configuration of containers, which makes using Disnix on other platforms than NixOS painful, as the containers had to be deployed manually. The approach described in this blog post serves as an in-between solution.

In theory, the process in which we deploy containers as services first followed by the "actual" services, could be generalized and extended into a layered service deployment model, with a new tool automating the process and declarative specifications capturing the properties of the layers.

However, I have decided not to implement this new model any time soon for practical reasons -- in nearly all of my experiences with service deployment, I have almost never encountered the need to have more than two layers supported. The only exception I can think of is the deployment of Axis2 web services to an Axis2 container -- the Axis2 container is a Java web application that must be deployed to Apache Tomcat first, which in turn requires the presence of the Apache Tomcat server.

Availability


I have integrated the two container deployment examples into the Java variant of the StaffTracker example.

The new concepts described in this blog post are part of the development version of Disnix and will become available in the next release.

Thursday, May 19, 2016

Mapping services to containers with Disnix and a new notational convention

In the last couple of months, I have made a number of major changes to the internals of Disnix. As described in a couple of older blog posts, deployment with Disnix is driven by three models each capturing a specific concern:

  • The services model specifies the available distributable components, how to construct them (from source code, intra-dependencies and inter-dependencies), and their types so that they can be properly activated or deactivated on the target machines.
  • The infrastructure model specifies the available target machines and their relevant deployment properties.
  • The distribution model maps services in the service model to target machines in the infrastructure model.

By running the following command-line instruction with the three models as parameters:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix executes all required activities to get the system deployed, including building, distributing, activating and deactivating services.

I have always described the final step, the activation phase, as deactivating obsolete and activating new services on the target machines. However, this is an over simplification of what really happens.

In reality, Disnix does more than just carrying out an activation step on a target machine -- to get a service activated or deactivated, Disnix invokes Dysnomia that modifies the state of a so-called container hosting a collection of components. As with components, the definition of a container in Dysnomia is deliberately left abstract and represent anything, such as a Java Servlet container (e.g. Apache Tomcat), a DBMS (e.g. MySQL) or the operating system's service manager (e.g. systemd).

So far, these details were always hidden in Disnix and the container mapping was an implicit operation, which I never really liked. Furthermore, there are situations in which you may want to have more control over this mapping.

In this blog post, I will describe my recent modifications and a new notational convention that can be used to treat containers as first-class citizens.

A modified infrastructure model formalism


Previously, a Disnix infrastructure model had the following structure:

{
  test1 = {
    hostname = "test1.example.org";
    tomcatPort = 8080;
    system = "i686-linux";
  };
  
  test2 = {
    hostname = "test2.example.org";
    tomcatPort = 8080;
    mysqlPort = 3306;
    mysqlUsername = "root";
    mysqlPassword = "admin";
    system = "x86_64-linux";
    numOfCores = 1;
    targetProperty = "hostname";
    clientInterface = "disnix-ssh-client";
  }; 
}

The above Nix expression is an attribute set in which each key corresponds to a target machine in the network and each value is an attribute set containing arbitrary machine properties.

These properties are used for a variety of deployment activities. Disnix made no hard distinction between them -- some properties have a special meaning, but most of them could be freely chosen, yet this does not become clear from the model.

In the new notational convention, the target machine properties have been categorized:

{
  test1 = {
    properties = {
      hostname = "test1.example.org";
    };
    
    containers = {
      tomcat-webapplication = {
        tomcatPort = 8080;
      };
    };
    
    system = "i686-linux";
  };
  
  test2 = {
    properties = {
      hostname = "test2.example.org";
    };
    
    containers = {
      tomcat-webapplication = {
        tomcatPort = 8080;
      };
      
      mysql-database = {
        mysqlPort = 3306;
        mysqlUsername = "root";
        mysqlPassword = "admin";
      };
    };
    
    system = "x86_64-linux";
    numOfCores = 1;
    targetProperty = "hostname";
    clientInterface = "disnix-ssh-client";
  }; 
}

The above expression has a more structured notation:

  • The properties attribute refers to arbitrary machine-level properties that are used at build-time and to connect from the coordinator to the target machine.
  • The containers attribute set defines the available container services on a target machine and their relevant deployment properties. The container properties are used at build-time and activation time. At activation time, they are passed as parameters to the Dysnomia module that activates a service in the corresponding container.
  • The remainder of the target attributes are optional system properties. For example, targetProperty defines which attribute in properties contains the address to connect to the target machine. clientInterface refers to the executable that establishes a remote connection, system defines the system architecture of the target machine (so that services will be correctly built for it), and numOfCores defines how many concurrent activation operations can be executed on the target machine.

As may have become obvious, in the new notation it becomes clear what container services the target machine provides, whereas in the old notation they were hidden.

An alternative distribution model notation


I have also introduced an alternative notation for mappings in the distribution model. A traditional Disnix distribution model typically looks as follows:

{infrastructure}:

{
  ...
  StaffService = [ infrastructure.test2 ];
  StaffTracker = [ infrastructure.test1 infrastructure.test2 ];
}

In the above expression, each attribute name refers to a service in the service model and each value to a list of machines in the infrastructure model.

As explained earlier, besides deploying a service to a machine, a service also gets deployed to a container hosted on the machine, which is not reflected in the distribution model.

When using the above notation, Disnix executes a so-called auto mapping strategy to containers. It simply takes the type attribute from the services model (which is used to determine which Dysnomia module to use that carries out the activation and deactivation steps):

StaffTracker = {
  name = "StaffTracker";
  pkg = customPkgs.StaffTracker;
  dependsOn = {
    inherit GeolocationService RoomService;
    inherit StaffService ZipcodeService;
  };
  type = "tomcat-webapplication";
};

and deploys the service to the container with the same name as the type. For example, all services of type: tomcat-webapplication will be deployed to a container named: tomcat-webapplication (and uses the Dysnomia module named: tomcat-webapplication to activate or deactivate it).

In most cases auto-mapping suffices -- we typically only run one container service on a machine, e.g. one MySQL DBMS, one Apache Tomcat application server. That is why the traditional notation remains the default in Disnix.

However, sometimes it may also be desired to have more control over the container mappings. The new Disnix also supports an alternative and more verbose notation. For example, the following mapping of the StaffTracker service is equivalent to the traditional mapping shown in the previous distribution model:

StaffTracker = {
  targets = [ { target = infrastructure.test1; } ];
};

We can use the alternative notation to control the container mapping, for example:

{infrastructure}:

{
  ...

  StaffService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-production";
      }
    ];
  };
  StaffTracker = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-test";
      }
    ];
  };
};

By adding the container attribute to a mapping, we can override the auto mapping strategy and specify the name of the container that we want to deploy to. This alternative notation allows us to deploy to a container whose name does not match the type or to manage networks of machines having multiple instances of the same container deployed.

For example, in the above distribution model, both services are Apache Tomcat web applications. We map StaffService to a container called: tomcat-production and StaffTracker to a container called: tomcat-test. Both containers are hosted on the same machine: test1.

A modified formalism to refer to inter-dependency parameters


As a consequence of modifying the infrastructure and distribution model notations, referring to inter-dependency parameters in Disnix expressions also slightly changed:

{stdenv, StaffService}:
{staff}:

let
  contextXML = ''
    <Context>
      <Resource name="jdbc/StaffDB" auth="Container"
        type="javax.sql.DataSource"
        maxActivate="100" maxIdle="30" maxWait="10000"
        username="${staff.target.container.mysqlUsername}"
        password="${staff.target.container.mysqlPassword}"
        driverClassName="com.mysql.jdbc.Driver"
        url="jdbc:mysql://${staff.target.properties.hostname}:${toString (staff.target.container.mysqlPort)}/${staff.name}?autoReconnect=true" />
    </Context>
  '';
in
stdenv.mkDerivation {
  name = "StaffService";
  buildCommand = ''
    mkdir -p $out/conf/Catalina
    cat > $out/conf/Catalina/StaffService.xml <<EOF
    ${contextXML}
    EOF
    ln -sf ${StaffService}/webapps $out/webapps
  '';
}

The above example is a Disnix expression that configures the StaffService service. The StaffService connects to a remote MySQL database (named: staff) which is provided as an inter-dependency parameter. The Disnix expression uses the properties of the inter-dependency parameter to configure a so called context XML file which Apache Tomcat uses to establish a (remote) JDBC connection so that the web service can connect to it.

Previously, each inter-dependency parameter provided a targets sub attribute referring to targets in the infrastructure model to which the inter-dependency has been mapped in the distribution model. Because it is quite common to map to a single target only, there is also a target sub attribute that refers to the first element for convenience.

In the new Disnix, the targets now refer to container mappings instead of machine mappings and implement a new formalism to reflect this:

  • The properties sub attribute refers to the machine level properties in the infrastructure model
  • The container sub attribute refers to the container properties to which the inter-dependency has been deployed.

As can be observed in the expression shown above, both sub attributes are used in the above expression to allow the service to connect to the remote MySQL database.

Visualizing containers


Besides modifying the notational conventions and the underlying deployment mechanisms, I have also modified disnix-visualize to display containers. The following picture shows an example:



In the above picture, the light grey boxes denote machines, the dark grey boxes denote containers, the ovals services and the arrows inter-dependency relationships. In my opinion, these new visualizations are much more intuitive -- I still remember that in an old blog post that summarizes my PhD thesis I used a hand-drawn diagram to illustrate why deployments of service-oriented systems were complicated. In this diagram I already showed containers, yet in the visualizations generated by disnix-visualize they were missing. Now finally, this mismatch has been removed from the tooling.

(As a sidenote: it is still possible to generate the classic non-containerized visualizations by providing the: --no-containers command-line option).

Capturing the infrastructure model from the machines' Dysnomia container configuration files


The new notational conventions also makes it possible to more easily implement yet another use case. As explained in an earlier blog post, when it is desired to deploy services with Disnix, we need predeployed machines running Nix, Dysnomia and Disnix installed and a number of container services (such as MySQL and Apache Tomcat) first.

After deploying the machines, we must hand-write an infrastructure model reflecting their properties. Hand writing infrastructure models is sometimes tedious and error prone. In my previous blog post, I have shown that it is possible to automatically generate Dysnomia container configuration files from NixOS configurations that capture properties of entire machine configurations.

We can now also do the opposite: generating an expression of a machines' Dysnomia container configuration files and compose an infrastructure model from it. This takes away the majority of the burden of handwriting infrastructure models.

For example, we can write a Dysnomia-enabled NixOS configuration:

{config, pkgs, ...}:

{
  services = {
    openssh.enable = true;
    
    mysql = {
      enable = true;
      package = pkgs.mysql;
      rootPassword = ../configurations/mysqlpw;
    };
    
    tomcat = {
      enable = true;
      commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
      catalinaOpts = "-Xms64m -Xmx256m";
    };
  };
  
  dysnomia = {
    enable = true;
    enableAuthentication = true;
    properties = {
      hostname = config.networking.hostName;
      mem = "$(grep 'MemTotal:' /proc/meminfo | sed -e 's/kB//' -e 's/MemTotal://' -e 's/ //g')";
    };
  };
}

The above NixOS configuration deploys two container services: MySQL and Apache Tomcat. Furthermore, it defines some non-functional machine-level properties, such as the hostname and the amount of RAM (mem) the machine has (which is composed dynamically by consulting the kernel's /proc filesystem).

As shown in the previous blog post, when deploying the above configuration with:

$ nixos-rebuild switch

The Dysnomia NixOS module automatically composes the /etc/dysnomia/properties and /etc/dysnomia/containers configuration files. When running the following command:

$ dysnomia-containers --generate-expr
{
  properties = {
    "hostname" = "test1";
    "mem" = "1023096";
    "supportedTypes" = [
      "mysql-database"
      "process"
      "tomcat-webapplication"
    ];
    "system" = "x86_64-linux";
  };
  containers = {
    mysql-database = {
      "mysqlPassword" = "admin";
      "mysqlPort" = "3306";
      "mysqlUsername" = "root";
    };
    tomcat-webapplication = {
      "tomcatPort" = "8080";
    };
  };
}

Dysnomia generates a Nix expression of the general properties and container configuration files.

We can do the same operation in a network of machines by running the disnix-capture-infra tool. First, we need to write a very minimal infrastructure model that only captures the connectivity attributes:

{
  test1.properties.hostname = "test1";
  test2.properties.hostname = "test2";
}

When running:

$ disnix-capture-infra infrastructure-basic.nix
{
  test1 = {
    properties = {
      "hostname" = "test1";
      "mem" = "1023096";
      "supportedTypes" = [
        "mysql-database"
        "process"
        "tomcat-webapplication"
      ];
      "system" = "x86_64-linux";
    };
    containers = {
      mysql-database = {
        "mysqlPassword" = "admin";
        "mysqlPort" = "3306";
        "mysqlUsername" = "root";
      };
      tomcat-webapplication = {
        "tomcatPort" = "8080";
      };
    };
  };
  test2 = ...
}

Disnix captures the configurations of all machines in the basic infrastructure model and returns an augmented infrastructure model containing all its properties.

(As a sidenote: disnix-capture-infra is not the only infrastructure model generator I have developed. In the self-adaptive adaptive deployment framework built on top of Disnix, I have developed an Avahi-based discovery service that can also generate infrastructure models. It is also more powerful (but quite hacky and immature) because it dynamically discovers the machines in the network, so it does not require a basic infrastructure model to be written. Moreover, it automatically responds to events when a machine's configuration changes.

I have modified the Avahi-based discovery tool to use Dysnomia's expression generator as well.

Also, the DisnixOS toolset can generate infrastructure models from networked NixOS configurations).

Discussion


In this blog post, I have described the result of a number of major internal changes to Disnix that make the containers concept a first class citizen. Fortunately, from an external perspective the changes are minor, but still backwards incompatible -- we must follow a new convention for the infrastructure model and refer to the target properties of inter-dependency parameters in a slightly different way.

In return you will get:

  • A more intuitive notation. As explained, we do not only deploy to a machine, but also to a container hosted on the machine. Now the deployment models and corresponding visualizations reflect this concept.
  • More control and power. We can deploy to multiple containers of the same type on the same machine, e.g. we can have two MySQL DBMSes on the same machine.
  • More correctness. When activating or deactivating a service all infrastructure properties were propagated as parameters to the corresponding Dysnomia module. Why does mysql-database module needs to know about a postgresql-database and vice versa? Now Dysnomia modules only get to know what they need to know.
  • Discovery. We can generate the infrastructure model from Dysnomia container configuration hosted on the target machines with relative ease.

A major caveat is that deployment planning (implemented in the Dynamic Disnix framework) can also potentially be extended from machine-level to container-level.

At the moment, I did make these modifications. This means that Dynamic Disnix can still generate distribution models, but only on machine level. As a consequence, Dynamic Disnix only allows a user to refer to a target's machine-level properties (i.e. the properties attribute in the infrastructure model) for deployment planning purposes, and not to any container-specific properties.

Container-level deployment planning is also something I intend to support at some point in the future.

Availability


The new notational conventions and containers concepts are part of the development version of Disnix and will become available in the next release. Moreover, I have modified the Disnix examples to use the new notations.

Tuesday, April 19, 2016

Managing the state of mutable components in NixOS configurations with Dysnomia


In an old blog post (and research paper) from a couple of years ago, I have described a prototype version of Dysnomia -- a toolset that can be used to deploy so-called "mutable components". In the middle of last year, I have integrated the majority of its concepts into the mainstream version of Dysnomia, because I had found some practical use for it.

So far, I have only used Dysnomia in conjunction with Disnix -- Disnix executes all activities required to deploy a service-oriented system, such as:

  • Building services and their intra-dependencies from source code. By default, Disnix performs the builds on the coordinator machine, but can also optionally delegate them to target machines in the network.
  • Distributing services and their intra-dependency closures to the appropriate target machines in the network.
  • Activating newly deployed services, and deactivating obsolete services.
  • Optionally snapshotting, transferring and restoring the state of services (or a subset of services) that have moved from a target machine to another.

For carrying out the building and distribution activities, Disnix invokes the Nix package manager as it provides a number of powerful features that makes deployment of packages more reliable and reproducible.

However, not all activities required to deploy service-oriented systems are supported by Nix and this is where Dysnomia comes in handy -- one of Dysnomia's objectives is to uniformly activate and deactivate mutable components in containers by modifying the latter's state. The other objective is to uniformly support snapshotting and restoring the state of mutable components deployed in a container.

The definitions of mutable components and containers are deliberately left abstract in a Dysnomia context. Basically, they can represent anything, such as:

  • A MySQL database schema component and a MySQL DBMS container.
  • An Java web application component (WAR file) and an Apache Tomcat container.
  • A UNIX process component and a systemd container.
  • Even NixOS configurations can be considered mutable components.

To support many kinds of component and container flavours, Dysnomia has been designed as a plugin system -- each Dysnomia module has a standardized interface (basically a process taking two standard command line parameters) and implement a set of standard deployment activities (e.g. activate, deactivate, snapshot and restore) for each type of container.

Despite the fact that Dysnomia has originally been designed for use with Disnix (the package was historically known as Disnix activation scripts), it can also be used a standalone tool or in combination with other deployment solutions. (As a sidenote: the reason why I picked the name Dysnomia is, because like Nix, it is the name of a moon of a Trans-Neptunian object).

Similar to Disnix, when deploying NixOS configurations, all activities to deploy the static parts of a system are carried out by the Nix package manager.

However, in the final step (the activation step) a big generated shell script is executed that is responsible for deploying the dynamic parts of a system, such as the updating the GRUB bootloader, reloading systemd units, creating folders that store variable data (e.g. /var), creating user accounts and so on.

In some cases, it may also be desired to deploy mutable components as part of a NixOS system configuration:

  • Some systems are monolithic and cannot be be decomposed into services (i.e. distributable units) of deployment.
  • Some NixOS modules have scripts to initialize the state of a system service on first startup, such as a database, but do it in their own ad-hoc way, e.g. there is no real formalism behind it.
  • You may also want to use Dysnomia's (primitive) snapshotting facilities for backup purposes.

Recently I did some interesting experiments with Dysnomia on NixOS-level. In this blog post, I will show how Dysnomia can be used in conjunction with NixOS.

Deploying NixOS configurations


As described in earlier blog posts, in NixOS, deployment is driven by a single NixOS configuration file (/etc/nixos/configuration.nix), such as:

{pkgs, ...}:

{
  boot.loader.grub = {
    enable = true;
    device = "/dev/sda";
  };

  fileSystems."/" = {
    device = "/dev/disk/by-label/nixos";
    fsType = "ext4";  
  };

  services = {
    openssh.enable = true;
    
    mysql = {
      enable = true;
      package = pkgs.mysql;
      rootPassword = ../configurations/mysqlpw;
    };
  };
}

The above configuration file states that we want to deploy a system using the GRUB bootloader, having a single root partition, running OpenSSH and MySQL as system services. The configuration can be deployed with a single-command line instruction:

$ nixos-rebuild switch

When running the above command-line instruction, the Nix package manager deploys all required packages and configuration files. After all packages have been successfully deployed, the activation script gets executed. As a result, we have a system running OpenSSH and MySQL.

By modifying the above configuration and adding another service after MySQL:

...

mysql = {
  enable = true;
  package = pkgs.mysql;
  rootPassword = ../configurations/mysqlpw;
};

tomcat = {
  enable = true;
  commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
  catalinaOpts = "-Xms64m -Xmx256m";
};

...

and running the same command-line instruction again:

$ nixos-rebuild switch

The NixOS configuration gets upgraded to also run Apache Tomcat as a system service in addition to MySQL and OpenSSH. When upgrading, Nix only builds or downloads the packages that have not been deployed before making the upgrade process much more efficient than rebuilding it from scratch.

Managing collections of mutable components


Similar to NixOS configurations (that represent entire system configurations), we need to manage the deployment of mutable components belonging to a system configuration as a whole. I have developed a new tool called: dysnomia-containers for this purpose.

The following command-line instruction queries all available containers on a system that serve as potential deployment targets:

$ dysnomia-containers --query-containers
mysql-database
process
tomcat-webapplication
wrapper

What the above command-line instruction does is searching all folders in the DYSNOMIA_CONTAINERS_PATH environment variable (that defaults to: /etc/dysnomia/containers) for container configuration files and displays their names, such as mysql-database corresponding to a MySQL DBMS server, and process and wrapper that are virtual containers integrating with the host system's service manager, such as systemd.

We can also query the available mutable components that we can deploy to the above listed containers:

$ dysnomia-containers --query-available-components
mysql-database/rooms
mysql-database/staff
mysql-database/zipcodes
tomcat-webapplication/GeolocationService
tomcat-webapplication/RoomService
tomcat-webapplication/StaffService
tomcat-webapplication/StaffTracker
tomcat-webapplication/ZipcodeService

The above command-line instruction displays all the available mutable component configurations that reside in directories provided by the DYSNOMIA_COMPONENTS_PATH environment variable, such as three MySQL databases and five Apache Tomcat web applications.

We can deploy all the available mutable components to the available containers, by running:

$ dysnomia-containers --deploy
Activating component: rooms in container: mysql-database
Activating component: staff in container: mysql-database
Activating component: zipcodes in container: mysql-database
Activating component: GeolocationService in container: tomcat-webapplication
Activating component: RoomService in container: tomcat-webapplication
Activating component: StaffService in container: tomcat-webapplication
Activating component: StaffTracker in container: tomcat-webapplication
Activating component: ZipcodeService in container: tomcat-webapplication

Besides displaying the available mutable components and deploying them, we can also query which ones have been deployed already:

$ dysnomia-containers --query-activated-components
mysql-database/rooms
mysql-database/staff
mysql-database/zipcodes
tomcat-webapplication/GeolocationService
tomcat-webapplication/RoomServiceWrapper
tomcat-webapplication/StaffService
tomcat-webapplication/StaffTracker
tomcat-webapplication/ZipcodeService

The dysnomia-containers tool uses the set of available and activated components to make an upgrade more efficient -- when deploying a new system configuration, it will deactivate the components that have been activated that are not available anymore, and activate the available components that have not been activated yet. The components that are both in the old and new configuration remain untouched.

For example, if we would run dysnomia-containers --deploy again, then nothing will be deployed or undeployed as the configuration remained identical.

We can also take snapshots of all activated mutable components (for example, for backup purposes):

$ dysnomia-containers --snapshot

After running the above command, the Dysnomia snapshot utility may show you the following output:

$ dysnomia-snapshots --query-all
mysql-database/rooms/faede34f3bf658884020a31ca98f16503da9a90bf3313cc96adc5c2358c0b054
mysql-database/staff/e9af7042064c33379ba9fe9272f61986b5a85de63c57732f067695e499a3a18f
mysql-database/zipcodes/637faa3e79ec6c2db71ac4023e86f29890e54233ea6592680fd88481725d44a3

As may be noticed, for each MySQL database (we have three of them) we have taken a snapshot. (For the Apache Tomcat web applications, no snapshots have been taken because state management for these kinds of components is unsupported).

We can also restore the state from the snapshots that we just have taken:

$ dysnomia-containers --restore

The above command restores the state of all three databases.

Finally, as with services deployed by Disnix, deactivating a mutable component does not imply that its state is removed automatically. Instead, it has been marked as garbage and must be explicitly removed by running:

$ dysnomia-containers --collect-garbage

NixOS integration


To actually make the previously shown deployment activities work, we need configuration files for all the containers and mutable components and put them into locations that are reachable from the DYSNOMIA_CONTAINERS_PATH and DYSNOMIA_COMPONENTS_PATH environment variables.

Obviously, they can be written by hand (as demonstrated in my previous blog post about Dysnomia), but this is not always very practical to do on a system-level. Moreover, there is some repetition involved as a NixOS configuration and container configuration files capture common properties.

I have developed a Dysnomia NixOS module to automate Dysnomia's configuration through NixOS. It can be enabled by adding the following property to a NixOS configuration file:

dysnomia.enable = true;

We can specify container properties in a NixOS configuration file as follows:

dysnomia.containers = {
  mysql-database = {
    mysqlUsername = "root";
    mysqlPassword = "secret";
    mysqlPort = 3306;
  };
  tomcat-webapplication = {
    tomcatPort = 8080;
  };
  ...
};

The Dysnomia module generates the corresponding container configuration files having the same names as each attribute name in the dysnomia.containers set and composes their contents from the sub attribute sets by translating them to text files with key=value pairs.

Most of the dysnomia.containers properties can be automatically generated by the Dysnomia NixOS module as well, since most of them have already been specified elsewhere in a NixOS configuration. For example, by enabling MySQL in a Dysnomia-enabled NixOS configuration:

services.mysql = {
  enable = true;
  package = pkgs.mysql;
  rootPassword = ../configurations/mysqlpw;
};

The Dysnomia module automatically generates the corresponding container properties as shown previously. The Dysnomia NixOS module integrates with all NixOS features for which Dysnomia provides a plugin.

In addition to containers, we can also specify the available mutable components as part of a NixOS configuration:

dysnomia.components = {
  mysql-database = {
    rooms = pkgs.writeTextFile {
      name = "rooms";
      text = ''
        create table room
        ( Room     VARCHAR(10)    NOT NULL,
          Zipcode  VARCHAR(6)     NOT NULL,
          PRIMARY KEY(Room)
        );
      '';
    };
    staff = ...
    zipcodes = ...
  };

  tomcat-webapplication = {
    ...
  };
};

As can be observed in the above example, the dysnomia.components attribute set captures the available mutable components per container. For the mysql-database container, we have defined three databases: rooms, staff and zipcodes. Each attribute refers to a Nix build function that produces an SQL file representing the initial state of the database on first activation (typically a schema).

Besides MySQL databases, we can use the tomcat-webapplication attribute to automatically deploy Java web applications to the Apache Tomcat servlet container. The corresponding values of each mutable component refer to the result of a Nix build function that produce a Java web application archive (WAR file).

The Dysnomia module automatically composes a directory with symlinks referring to the generated mutable component configurations reachable through the DYSNOMIA_COMPONENTS_PATH environment variable.

Distributed infrastructure state management


In addition to deploying mutable components belonging to a single NixOS configuration, I have mapped the NixOS-level Dysnomia deployment concepts to networks of NixOS machines by extending the DisnixOS toolset (the Disnix extension integrating Disnix' service deployment concepts with NixOS' infrastructure deployment).

It may not have been stated explicitly in any of my previous blog posts, but DisnixOS can also be used deploy a network of NixOS configurations to target machines in a network. For example, we can compose a networked NixOS configuration that includes the machine configuration shown previously:

{
  test1 = import ./configurations/mysql-tomcat.nix;
  test2 = import ./configurations/empty.nix;
}

The above configuration file is an attribute set defining two machine configurations. The first attribute (test1) refers to our previous NixOS configuration running MySQL and Apache Tomcat as system services.

We can deploy the networked configuration with the following command-line instruction:

$ disnixos-deploy-network network.nix

As a sidenote: although DisnixOS can deploy networks of NixOS configurations, NixOps does a better job in accomplishing this. Moreover, DisnixOS only supports deployment of NixOS configurations to bare-metal servers and cannot instantiate any VMs in the cloud.

Furthermore, what DisnixOS also does differently compared to NixOps, is invoking Dysnomia to activate or deactivate NixOS configurations -- the corresponding NixOS plugin executes the big monolithic NixOS activation script for the activation step and runs nixos-rebuild --rollback switch for the deactivation step.

I have extended the Dysnomia's nixos-configuration plugin with state management operations. Snapshotting the state of a NixOS configuration simply means running:

$ dysnomia-containers --snapshot

Likewise, restoring the state of a NixOS configuration can be done with:

$ dysnomia-containers --restore

And removing obsolete state with:

$ dysnomia-containers --collect-garbage

When using Disnix to manage state, we may have mutable components deployed as part of a system configuration and mutable components deployed as services in the same environment. To prevent the snapshots of the services to conflict with the ones belonging to a machine's system configuration, we set the DYSNOMIA_STATEDIR environment variable to: /var/state/dysnomia-nixos for system-level state management and to /var/state/dysnomia for service-level state management to keep them apart.

With these additional operations, we can capture the state of all mutable components part of the system configurations in a network:

$ disnixos-snapshot-network network.nix

This yields a snapshot of the test1 machine stored in the Dysnomia snapshot store on the coordinator machine:

$ dysnomia-snapshots --query-latest
nixos-configuration/nixos-system-test1-16.03pre-git/4c4751f10648dfbbf8e25c924391e80913c8a6a600f7b481d73cd88ff3d32730

When inspecting the contents of the NixOS system configuration snapshot, we will observe:

$ cd /var/state/dysnomia/snapshots/$(dysnomia-snapshots --query-latest)
$ find -maxdepth 3 -mindepth 3 -type d
./mysql-database/rooms/faede34f3bf658884020a31ca98f16503da9a90bf3313cc96adc5c2358c0b054
./mysql-database/staff/e9af7042064c33379ba9fe9272f61986b5a85de63c57732f067695e499a3a18f
./mysql-database/zipcodes/637faa3e79ec6c2db71ac4023e86f29890e54233ea6592680fd88481725d44a3

The contents of the NixOS system configuration snapshot consist all snapshots of the mutable components belonging to its system configuration.

Similar to restoring the state of individual mutable components, we can restore the state of all mutable components part of a system configuration in a network of machines:

$ disnixos-snapshot-network network.nix

And remove their obsolete state, by running:

$ disnixos-delete-network-state network.nix

TL;DR: Discussion


In this blog post, I have described an extension to Dysnomia that makes it possible to manage the state of mutable components belonging to a system configuration, and a NixOS module making it possible to automatically configure Dysnomia from a NixOS configuration file.

This new extension makes it possible to deploy mutable components belonging to systems that cannot be divided into distributable deployment units (or services in a Disnix-context), such as monolithic system configurations.

To summarize: if it is desired to manage the state of mutable components in a NixOS configuration, you need to provide a number of additional configuration settings. First, we must enable Dysnomia:

dysnomia.enable = true;

Then enable a number of container services, such as MySQL:

services.mysql.enable = true;

(As explained earlier, the Dysnomia module will automatically generate its corresponding container properties).

Finally, we can specify a number of available mutable components that can be deployed automatically, such as a MySQL database:

dysnomia.components = {
  mysql-database = {
    rooms = pkgs.writeTextFile {
      name = "rooms";
      text = ''
        create table room
        ( Room     VARCHAR(10)    NOT NULL,
          Zipcode  VARCHAR(6)     NOT NULL,
          PRIMARY KEY(Room)
        );
      '';
    };
  };
}

After deploying a Dysnomia-enabled NixOS system configuration through:

$ nixos-rebuild switch

We can deploy the mutable components belonging to it, by running:

$ dysnomia-containers --deploy

Unfortunately, managing mutable components on a system-level also has a huge drawback, in particular in distributed environments. Snapshots of entire system configurations are typically too coarse -- whenever the state of any of the mutable components change, a new system-level composite snapshot is generated that is composed of the snapshots of all mutable components.

Typically, these snapshots contain redundant data that is not shared among snapshot generations (although there are potential solutions to cope with this, I have not implemented any optimizations yet). As explained in my previous Dysnomia-related blog posts, snapshotting individual components can already be quite expensive (such as large databases), and these costs may become significantly larger on a system-level.

Likewise, restoring state on system-level implies that the state of all mutable components will be restored. This is also typically undesired as it may be too destructive and time consuming. Moreover, moving the state from one machine to another when a mutable components gets migrated is also much more expensive.

For more control and more efficient deployment of mutable components, it would typically be better to develop a Disnix service-model so that they can be managed individually.

Because of these drawbacks, I am not prominently advertising DisnixOS' distributed state management features. Moreover, I also did not attempt to integrate these features into NixOps, for the same reasons.

References


The dysnomia-containers tool as well as the distributed infrastructure management facilities have been integrated into the development versions of Dysnomia and DisnixOS, and will become part of the next Disnix release.

I have also added a sub example to the Java version of the Disnix staff tracker example to demonstrate how these features can be used.

As a final note, the Dysnomia NixOS module has not yet been integrated in NixOS. Instead, the module must be imported from a Dysnomia Git clone, by adding the following line to a NixOS configuration file:

imports = [ /home/sander/dysnomia/dysnomia-module.nix ];

Thursday, March 17, 2016

The NixOS project and deploying systems declaratively


Last weekend I was in Wrocław, Poland to attend wroc_love.rb, a conference tailored towards (but not restricted to) applications Ruby related. The reason for me to go to there is because I was invited to give a talk about NixOS.

As I have never visited neither Poland nor a Ruby-related conference before, I did not really know what to expect, but it turned out to be a nice experience. The city, venue and people were all quite interesting, and I liked it very much.

In my talk I basically had two objectives: providing a brief introduction to NixOS and diving into one of its underlying visions: declarative deployment. From my perspective, the former aspect is not particularly new as I have given talks about the NixOS project many times (for example, I also crafted three explanation recipes).

Something that I have not done before is diving into the latter aspect. In this blog post, I'd like to elaborate about it, discuss why it is appealing, and in what extent certain tools reach it.

On being declarative


I have used the word declarative in many of my articles. What is supposed to mean?

I have found a nice presentation online that elaborates on four kinds sentences in linguistics. One of the categories covered in the slides are declarative sentences that (according to the presentation) can be defined as:

A declarative sentence makes a statement. It is punctuated by a period.

As an example, the presentation shows:

The dog in the neighbor's yard is barking.

Another class of sentences that the presentation describes are imperative sentences which it defines as:

An imperative sentence is a command or polite request. It ends in a period or exclamation mark.

The following xkcd comic shows an example:


(Besides these two categories of sentences described earlier, the presentation also covers interrogative sentences and exclamatory sentences, but I won't go into detail on that).

On being declarative in programming


In linguistics, the distinction between declarative and imperative sentences is IMO mostly clear -- declarative sentences state facts and imperative sentences are commands or requests.

A similar distinction exists in programming as well. For example, on Wikipedia I found the following definition for declarative programming (the Wikipedia article cites the article: "Practical Advantages of Declarative Programming" written by J.W. Lloyd, which I unfortunately could not find anywhere online):

In computer science, declarative programming is a programming paradigm -- a style of building the structure and elements of computer programs -- that expresses the logic of a computation without describing its control flow.

Imperative programming is sometimes seen as the opposite of declarative programming, but not everybody agrees. I found an interesting discussion blog post written by William Cook that elaborates on their differences.

His understanding of the declarative and imperative definitions are:

Declarative: describing "what" is to be computed rather than "how" to compute the result/behavior

Imperative: a description of a computation that involves implicit effects, usually mutable state and input/output.

Moreover, he says the following:

I agree with those who say that "declarative" is a spectrum. For example, some people say that Haskell is a declarative language, but I my view Haskell programs are very much about *how* to compute a result.

I also agree with William Cook's opinion that declarative is a spectrum -- contrary to linguistics, it is hard to draw a hard line between what and how in programming. Some programming languages that are considered imperative, e.g. C, modify mutable state such as variables:

int a = 5;
a += 3;

But if we would modify the code to work without mutable state, it still remains more a "how" description than a "what" description IMO:

int sum(int a, int b)
{
    return a + b;
}

int result = sum(5, 3);

Two prominent languages that are more about what than how are HTML and CSS. Both technologies empower the web. For example, in HTML I can express the structure of a page:

<!DOCTYPE html>

<html>
    <head>
        <title>Test</title>
        <link rel="stylesheet" href="style.css" type="text/css">
    </head>
    <body>
        <div id="outer">
            <div id="inner">
                <p>HTML and CSS are declarative and so cool!</p>
            </div>
        </div>
    </body>
</html>

In the above code fragment, I define two nested divisions in which a paragraph of text is displayed.

In CSS. I can specify what the style is of these page elements:

#outer {
    margin-left: auto;
    margin-right: auto;
    width: 20%;
    border-style: solid;
}

#inner {
    width: 500px;
}

In the above example, we state that the outer div should be centered, have a width of 20% of the page, and a solid border should be drawn around it. The inner div has a width of 500 pixels.

This approach can be considered declarative, because you do not have to specify how to render the page and the style of the elements (e.g. the text, the border). Instead, this is what the browser's layout engine figures out. Besides being responsible for rendering, it has a number of additional benefits as well, such as:

  • Because it does not matter (much) how a page is rendered, we can fully utilize a system's resources (e.g. a GPU) to render a page in a faster and more fancy way, and optionally degrade a page's appearance if a system's resources are limited.
  • We can also interpret the page in many ways. For example, we can pass the text in paragraphs to a text to speech engine, for people that are visually impaired.

Despite listing some potential advantages, HTML and CSS are not perfect at all. If you would actually check how the example gets rendered in your browser, then you will observe one of CSS's many odd traits, but I am not going to reveal what it is. :-)

Moreover, despite being more declarative (than code written in an imperative programming language such as C) even HTML and CSS can sometimes be considered a "how" specification. For example, you may want to render a photo gallery on your web page. There is nothing in HTML and CSS that allows you to concisely express that. Instead, you need to decompose it into "lower level" page elements, such as paragraphs, hyperlinks, forms and images.

So IMO, being declarative depends on what your goal is -- in some contexts you can exactly express what you want, but in others you can only express things that are in service of something else.

On being declarative in deployment


In addition to development, you eventually have to deploy a system (typically to a production environment) to make it available to end users. To deploy a system you must carry out a number of activities, such as:

  • Building (if a compiled language is used, such as Java).
  • Packaging (e.g. into a JAR file).
  • Distributing (transferring artifacts to the production machines).
  • Activating (e.g. a Java web application in a Servlet container).
  • In case of an upgrade: deactivating obsolete components.

Deployment is often much more complicated than most people expect. Some things that make it complicated are:

  • Many kinds of steps need to be executed, in particular when the technology used is diverse. Without any automation, it becomes extra complicated and time consuming.
  • Deployment in production must be typically done on a large scale. In development, a web application/web service typically serves one user only (the developer), while in production it may need to serve thousands or millions of users. In order to serve many users, you need to manage a cluster of machines having complex constraints in terms of system resources and connectivity.
  • There are non-functional requirements that must be met. For example, while upgrading you want to minimize a system's downtime as much possible. You probably also want to roll back to a previous version if an upgrade went wrong. Accomplishing these properties is often much more complicated than expected (sometimes even impossible!).

As with linguistics and programming, I see a similar distinction in deployment as well -- carrying out the above listed activities are simply the means to accomplish deployment.

What I want (if I need to deploy) is that my system on my development machine becomes available in production, while meeting certain quality attributes of the system that is being deployed (e.g. it could serve thousands of users) and quality attributes of the deployment process itself (e.g. that I can easily roll back in case of an error).

Mainstream solutions: convergent deployment


There are a variety of configuration management tools claiming to support declarative deployment. The most well-known category of tools implement convergent deployment, such as: CFEngine, Puppet, Chef, Ansible.

For example, Chef is driven by declarative deployment specifications (implemented in a Ruby DSL) that may look as follows (I took this example from a Chef tutorial):

...

wordpress_latest = Chef::Config[:file_cache_path] + "/wordpress-latest.tar.gz"

remote_file wordpress_latest do
  source "http://wordpress.org/latest.tar.gz"
  mode "0644"
end

directory node["phpapp"]["path"] do
  owner "root"
  group "root"
  mode "0755"
  action :create
  recursive true
end

execute "untar-wordpress" do
  cwd node['phpapp']['path']
  command "tar --strip-components 1 -xzf " + wordpress_latest
  creates node['phpapp']['path'] + "/wp-settings.php"
end

The objective of the example shown above is deploying a Wordpress web application. What the specification defines is a tarball that must be fetched from the Wordpress web site, a directory that must be created in which a web application is hosted and a tarball that needs to be extracted into that directory.

The specification can be considered declarative, because you do not have to describe the exact steps that need to be executed. Instead, the specification captures the intended outcome of a set of changes and the deployment system converges to the outcome. For example, for the directory that needs to be created, it first checks if it already exists. If so, it will not be created again. It also checks whether it can be created, before attempting to do it.

Converging, instead of directly executing steps, provides additional safety mechanisms and makes deployment processes more efficient as duplicate work is avoided as much as possible.

There are also a number of drawbacks -- it is not guaranteed (in case of an upgrade) that the system can converge to a new set of outcomes. Moreover, while upgrading a system we may observe downtime (e.g. when a new version of the Wordpress is being unpacked). Also, doing a roll back to a previous configuration cannot be done instantly.

Finally, convergent deployment specifications do not guarantee reproducible deployment. For example, the above code does not capture the configuration process of a web server and a PHP extension module, which are required dependencies to run Wordpress. If we would apply the changes to a machine where these components are missing, the changes may still apply but yield a non working configuration.

The NixOS approach


NixOS also supports declarative deployment, but in a different way. The following code fragment is an example of a NixOS configuration:

{pkgs, ...}:

{
  boot.loader.grub.device = "/dev/sda";

  fileSystems = [ { mountPoint = "/"; device = "/dev/sda2"; } ];
  swapDevices = [ { device = "/dev/sda1"; } ];
  
  services = {
    openssh.enable = true;
    
    xserver = {
      enable = true;
      desktopManager.kde4.enable = true;
    };
  };
  
  environment.systemPackages = [ pkgs.mc pkgs.firefox ];
}

In a NixOS configuration you describe what components constitute a system, rather than the outcome of changes:

  • The GRUB bootloader should be installed on the MBR of partition: /dev/sda.
  • The /dev/sda2 partition should be mounted as a root partition, /dev/sda1 should be mounted as a swap partition.
  • We want Mozilla Firefox and Midnight Commander as end user packages.
  • We want to use the KDE 4.x desktop.
  • We want to run OpenSSH as a system service.

The entire machine configuration can be deployed by running single command-line instruction:

$ nixos-rebuild switch

NixOS executes all required deployment steps to deploy the machine configuration -- it downloads or builds all required packages from source code (including all its dependencies), it generates the required configuration files and finally (if all the previous steps have succeeded) it activates the new configuration including the new system services (and deactivating the system services that have become obsolete).

Besides executing the required deployment activities, NixOS has a number of important quality attributes as well:

  • Reliability. Nix (the underlying package manager) ensures that all dependencies are present. It stores new versions of packages next to old versions, without overwriting them. As a result, you can always switch back to older versions if needed.
  • Reproducibility. Undeclared dependencies do not influence builds -- if a build works on one machine, then it works on others as well.
  • Efficiency. Nix only deploys packages and configuration files that are needed.

NixOS is a Linux distribution, but the NixOS project provides other tools bringing the same (or similar) deployment properties to other areas. Nix works on package level (and works on other systems besides NixOS, such as conventional Linux distributions and Mac OS X), NixOps deploys networks of NixOS machines and Disnix deploys (micro)services in networks of machines.

The Nix way of deploying is typically my preferred approach, but these tools also have their limits -- to benefit from the quality properties they provide, everything must be deployed with Nix (and as a consequence: specified in Nix expressions). You cannot take an existing system (deployed by other means) first and change it later, something that you can actually do with convergent deployment tools, such as Chef.

Moreover, Nix (and its sub projects) only manage the static parts of a system such as packages and configuration files (which are made immutable by Nix by making them read-only), but not any state, such as databases.

For managing state, external solutions must be used. For example, I developed a tool called Dysnomia with similar semantics to Nix but it is not always good solution, especially for big chunks of state.

How declarative are these deployment solutions?


I have heard some people claiming that the convergent deployment models are not declarative at all, and the Nix deployment models are actually declarative because they do not specify imperative changes.

Again, I think it depends on how you look at it -- basically, the Nix tools solve problems in a technical domain from declarative specifications, e.g. Nix deploys packages, NixOS entire machine configurations, NixOps networks of machines etc., but typically you would do these kinds of things to accomplish something else, so in a sense you could still consider these approach a "how" rather than a "what".

I have also developed domain-specific deployment tools on top of the tools part of the Nix project allowing me to express concisely what I want in a specific domain:

WebDSL


WebDSL is a domain-specific language for developing web applications with a rich data model, supporting features such as domain modelling, user interfaces and access control. The WebDSL compiler produces Java web applications.

In order to deploy a WebDSL application in a production environment, all kinds of complicated tasks need to be carried out -- we must install a MySQL server, Apache Tomcat server, deploy the web application to the Tomcat server, tune specific settings, and install a reverse proxy that does caching etc.

You typically do not want to express such things in a deployment model. I have developed a tool called webdsldeploy allowing someone to only express the deployment properties that matter for WebDSL applications on a high level. Underneath, the tool consults NixOps (formerly known as Charon) to compose system configurations hosting the components required to run the WebDSL application.

Conference compass


Conference Compass sells services to conference organizers. The most visible part of their service are apps for conference attendees, providing features such as displaying a conference program, list of speakers and floor maps of the venue.

Each customer basically gets "their own app" -- an app for a specific customers has their preferred colors, artwork, content etc. We use a single code base to produce specialized apps.

To produce such specialized apps, we do not want to specify things such as how to build an app for Android through Nix, an app for iOS through Nix, and how to produce debug and release versions etc. These are basically just technical details.

Instead, we have developed our own custom tool that is driven by a specification that concisely expresses what customizations we want (e.g. artwork) and produces the artefacts we want accordingly.

We use a similar approach for our backends -- each app connects to its own dedicated backend allowing users to configure the content displayed in the app. The configurator can also be used to dynamically update the content that is displayed in the apps. For big customers, we offer an additional service in which we develop programs that automatically import data from their information systems.

For the deployment of these backend instances, we do not want to express things such as machines, database services, and the deployment of NPM and Python packages.

Instead, we use a domain-specific tool that is driven by a model that concisely expresses what configurators we want and which third party integrations they provide. The tool is responsible for instantiating virtual machines in the cloud and deploying the services to it.

Conclusion


In this blog post I have elaborated about being declarative in deployment and discussed in what extent certain tools reach it. As with declarative programming, being declarative in deployment is a spectrum.

References


Some aspects discussed in this blog post are covered in my PhD thesis:
  • I did a more elaborate comparison of infrastructure deployment solutions in Chapter 6. I also cover convergent deployment and used CFEngine as an example.
  • I have covered webdsldeploy in Chapter 11, including some background information about WebDSL and its deployment aspects.
  • The overall objective of my PhD thesis is constructing deployment tools for specific domains. Most of the chapters cover the ingredients to do so, but Chapter 3 explains a reference architecture for deployment tools, having similar (or comparable) properties to tools in the Nix project.

For convenience, I have also embedded the slides of my presentation into this web page: