Wednesday, March 21, 2012

Deployment of mutable components

It's been quiet here for a while. Lately, I'm very busy with all kinds of research related things (including working on my thesis). So I thought it's time for a new research related episode here on this blog... Yesterday, a paper of mine got accepted.

Purely functional deployment


As you may know, the Nix package manager is also known as the purely functional deployment model (which happens to be the title of Eelco Dolstra's PhD thesis), because it borrows concepts from purely functional programming languages, such as Haskell. In Nix, packages are derived from purely functional build functions, also called derivations:

  • The result of a derivation exclusively depends on the specified inputs of a derivation. If a dependency is not specified, it cannot be found and thus a build fails.
  • In a Nix build environment, many side effects are removed. For example, most environment variables are cleared or set to non-existing values, such as HOME=/homeless-shelter. Also certain commands cannot be used because they have side effects, such as date or hostname.
  • Various common build tools (e.g. GCC and binutils) have been patched to ignore impure directories, such as /usr/lib and /usr/include which may accidentally allow a build to succeed, with improper dependency specifications.
  • Nix stores every component in a so-called Nix store using a special filename, such as: /nix/store/a92pq...-firefox-9.0.1 in which the former part: a92pq... is a hash code derived from all build-time dependencies. This filesystem organisation prevents that dependencies can be implicitly found unless they are explicitly specified in a build process, e.g. through configuration flags or environment variables.
  • To improve purity, builds can be optionally performed in a chroot() environment.
  • Nix components which have been successfully built, are made immutable by removing the write permission bits, analogous to purely functional language objects which are immutable.

As explained in many earlier blog posts, although this approach is unconventional, it has a number of distinct advantages, such as the fact that multiple versions or variants of components can be safely stored next to each other, upgrades are always safe, dependencies are always complete and so on.

Mutable components


Although Nix has several powerful features, one aspect that it is lacking is supporting the deployment of (which I call) mutable components, such as databases. Apart from the fact that the state of mutable components changes continuously (which cannot be supported in Nix, because components are made immutable), they also have several other important properties:

  • Mutable components are typically hosted inside a container, such as a DBMS or application server. In order deploy a mutable component filesystem operations are not always enough; also the container must be notified.
  • The state of mutable components is very often stored in an unisolated directory and in a format close to the underlying implementation of the container. In order to transfer a mutable component from one machine to another, copying state files may (in many cases) not work, because they are incompatible with the target architecture or temporarily inconsistent, because of locks and/or unfinished write operations.

Because of these issues, many containers include tools which allow you to capture state in a portable format and in a consistent manner, e.g. mysqladmin. In the paper, I call the actual files representing state, physical state whereas the dump captured by a tool, such as mysqladmin the logical representation of state.

Dysnomia


To support automated deployment of mutable components using concepts similar to Nix, I have developed a tool called Dysnomia, which originates from the Disnix activation scripts package.

Dysnomia provides a generic interface for mutable components. Basically, it's an extended version of the activation scripts, implementing a snapshot and restore operation, which capture the state of a mutable component and restores them (respectively) by invoking tools that can consistently capture state in a portable manner. Also, for certain types it supports incremental snapshots.

Apart from a generic interface to support deployment of mutable components, we also have to identify the logical representation of mutable components in a unique manner, similar to hash codes for immutable Nix components. This makes it possible to perform efficient upgrades, because components with the same name do not have to be deployed again.

In Dysnomia, we address components using the following naming convention:

/dysnomia/<type>/<container identifier>/<component name>/<generation number>

The latter component, the generation number, is derived from a container property, such as a MySQL binlog sequence number or Subversion repository number. It may also be possible to use a timestamp, although this has some practical issues.

Extending Disnix


In a few days time, I have created a very hacky version of Disnix using Dysnomia. I have adapted the deployment process of Disnix and constructed two new variants:

  • The simple version. In the transition phase, we capture all state of mutable components, transfer the state and restore the state. During this phase access may be blocked and can be very expensive for large datasets.
  • The optimised version. In this version, we transfer the state outside the transition phase so that the system is still accessible. After transferring the state, we enter the transition phase and we transfer an incremental dump of the mutable state components to take changes into account which have been made meanwhile. In this version, the blocking time window is reduced to a minimum reducing the costs of blocking as much as possible.

Discussion


After reading all this, you may think that the problem of dealing with state in conjunction with Nix has been solved and the world has become a much better place. Unfortunately, there are a number of drawbacks, which may be significant:

  • The filesystem is used as an intermediate layer, which may be very expensive for very large databases. Database replication tools can solve these problems much more efficiently.
  • It is not part of Disnix yet. I'm still thinking of integrating it, but it radically changes the behaviour of Disnix. Perhaps it may become an optional feature in the future.
  • Also, the costs of managing state may be higher than the benefits that it gives.
  • The number of supported mutable components is still limited, and not all types of mutable components can be efficiently snapshotted.

Where comes the name from?


One of the aspects not covered in the paper is where the name Dysnomia comes from and it probably sounds very strange to you. Unfortunately, the space in the paper is very limited to explain it :-)

This is how I came to it:

  • Nix is derived from the Dutch word for nothing: 'Niks' (because if you don't specify dependencies, Nix cannot find anything) combined with UNIX, hence: Nix. Coincidentally, Nix is also a mythological figure as well as a name for a moon of the dwarf planet Pluto which is part of the Kuiper belt.
  • Disnix and NixOS also use the original naming convention. For example, Disnix is DIStributed NIX.
  • Hydra, our continuous integration server and Charon, our cloud deployment tool are names based on the other moons of Pluto.
  • Apparently, Pluto has a fourth moon, which is recently discovered, but it does not have a name yet. Furthermore, I don't think this hacky tool is worthy to bear that new name.
  • I took another dwarf planet from the Kuiper belt: Eris, which happens to be the largest dwarf planet discovered so far. Apparently it has a moon named: Dysnomia. So that explains where the name comes from. Furthermore, I also think it's better to take a moon of a different dwarf planet, because Dysnomia could also be used without Nix (and related applications).


Conclusion


Although I have a very experimental extension supporting deployment of mutable components, it has a number of drawbacks. When I started thinking about this tool, I thought it was a great idea. However, after implementing it and doing some experiments, I think that in some cases the costs are actually higher than the benefits. But nonetheless, I still think it's important to have a solution for this problem and some knowledge, which could lead to interesting discussions.

More details can be found in the paper titled: 'A Generic Approach for Deploying and Upgrading Mutable Software Components' which can be found on my publications page. As usual, my slides can be found on my talks page, once I have prepared them.

I'm going to present this paper at the HotSWUp workshop in Zurich (which is co-located with ICSE 2012). In my last blog post, I was a bit disappointed about my ICSE rejection, but it turns out that I'm going to be there anyway, just in case you want to meet up :-)

To prove once more that I have nothing against ICSE and that I actually like it, here is some shameless promotion: