Thursday, July 31, 2014

Managing private Nix packages outside the Nixpkgs tree

In a couple of older blog posts, I have explained the basic concepts of the Nix package manager as well as how to write package "build recipes" (better known as Nix expressions) for it.

Although Nix expressions may look unconventional, the basic idea behind specifying packages in the Nix world is simple: you define a function that describes how to build a package from source code and its dependencies, and you invoke the function with the desired variants of the dependencies as parameters to build it. In Nixpkgs, a collection of more than 2500 (mostly free and open source) packages that can be deployed with Nix, all packages are basically specified like this.

However, there might still be some practical issues. In some cases, you may just want to experiment with Nix or package private software not meant for distribution. In such cases, you typically want to store them outside the Nixpkgs tree.

Although the Nix manual describes how things are packaged in Nixpkgs, it does not (clearly) describe how to define and compose packages while keeping them separate from Nixpkgs.

Since it is not officially documented anywhere and I'm getting (too) many questions about this from beginners, I have decided to write something about it.

Specifying a single private package


In situations in which I want to quickly try or test one simple package, I typically write a Nix expression that looks as follows:

with import <nixpkgs> {};

stdenv.mkDerivation {
  name = "mc-4.8.12";
  
  src = fetchurl {
    url = http://www.midnight-commander.org/downloads/mc-4.8.12.tar.bz2;
    sha256 = "15lkwcis0labshq9k8c2fqdwv8az2c87qpdqwp5p31s8gb1gqm0h";
  };
  
  buildInputs = [ pkgconfig perl glib gpm slang zip unzip file gettext
      xlibs.libX11 xlibs.libICE e2fsprogs ];

  meta = {
    description = "File Manager and User Shell for the GNU Project";
    homepage = http://www.midnight-commander.org;
    license = "GPLv2+";
    maintainers = [ stdenv.lib.maintainers.sander ];
  };
}

The above expression is a Nix expression that builds Midnight Commander, one of my favorite UNIX utilities (in particular the editor that comes with it :-) ).

In the above Nix expression, there is no distinction between a function definition and invocation. Instead, I directly invoke stdenv.mkDerivation {} to build Midnight Commander from source and its dependencies. I obtain the dependencies from Nixpkgs by importing the composition attribute set into the lexical scope of the expression through with import <nixpkgs> {};.

I can put the above file (named: mc.nix) in a folder outside the Nixpkgs tree, such as my home directory, and build it as follows:

$ nix-build mc.nix
/nix/store/svm98wmbf01dswlfcvvxfqqzckbhp5n5-mc-4.8.12

Or install it in my profile by running:

$ nix-env -f mc.nix -i mc

The dependencies (that are provided by Nixpkgs) can be found thanks to the NIX_PATH environment variable that contains a setting for nixpkgs. On NixOS, this environment variable has already been set. On other Linux distributions or non-NixOS installations, this variable must be manually configured to contain the location of Nixpkgs. An example could be:

$ export NIX_PATH=nixpkgs=/home/sander/nixpkgs

The above setting specifies that a copy of Nixpkgs resides in my home directory.

Maintaining a collection private packages


It may also happen that you want to package a few of the dependencies of a private package while keeping them out of Nixpkgs or just simply maintaining a collection of private packages. In such cases, I basically define every a package as a function, which is no different than the way it is done in Nixpkgs and described in the Nix manual:

{ stdenv, fetchurl, pkgconfig, glib, gpm, file, e2fsprogs
, libX11, libICE, perl, zip, unzip, gettext, slang
}:

stdenv.mkDerivation rec {
  name = "mc-4.8.12";
  
  src = fetchurl {
    url = http://www.midnight-commander.org/downloads/mc-4.8.12.tar.bz2;
    sha256 = "15lkwcis0labshq9k8c2fqdwv8az2c87qpdqwp5p31s8gb1gqm0h";
  };
  
  buildInputs = [ pkgconfig perl glib gpm slang zip unzip file gettext
      libX11 libICE e2fsprogs ];

  meta = {
    description = "File Manager and User Shell for the GNU Project";
    homepage = http://www.midnight-commander.org;
    license = "GPLv2+";
    maintainers = [ stdenv.lib.maintainers.sander ];
  };
}

However, to compose the package (i.e. calling the function with the arguments that are used as dependencies), I have to create a private composition expression instead of adapting pkgs/top-level/all-packages.nix in Nixpkgs.

A private composition expression could be defined as follows:

{ system ? builtins.currentSystem }:

let
  pkgs = import <nixpkgs> { inherit system; };
in
rec {
  pkgconfig = import ./pkgs/pkgconfig {
    inherit (pkgs) stdenv fetchurl automake;
  };
  
  gpm = import ./pkgs/gpm {
    inherit (pkgs) stdenv fetchurl flex bison ncurses;
  };
  
  mc = import ./pkgs/mc {
    # Use custom pkgconfig and gpm packages as dependencies
    inherit pkgconfig gpm;
    # The remaining dependencies come from Nixpkgs
    inherit (pkgs) stdenv fetchurl glib file perl;
    inherit (pkgs) zip unzip gettext slang e2fsprogs;
    inherit (pkgs.xlibs) libX11 libICE;
  };
}

The above file (named: custom-packages.nix) invokes the earlier Midnight Commander expression (defining a function) with its required parameters.

Two of its dependencies are also composed in the same expression, namely: pkgconfig and gpm that are also stored outside the Nixpkgs tree. The remaining dependencies of Midnight Commander are provided by Nixpkgs.

To make the above example complete, the directory structure of the set of Nix expressions is supposed to look as follows:

pkgs/
  pkgconfig/
    default.nix
    requires-private.patch 
    setup-hook.sh
  gpm/
    default.nix
  mc/
    default.nix
custom-packages.nix

The expressions for gpm and pkgconfig can be copied from Nixpkgs, by running ($nixpkgs should be replaced by the path to Nixpkgs on your system):

cp -a $nixpkgs/pkgs/pkgs/development/tools/misc/pkgconfig pkgs
cp -a $nixpkgs/pkgs/servers/gpm pkgs

Using the above Nix composition expression file (custom-packages.nix), the other Nix expressions it refers to, and by running the following command-line instruction:

$ nix-build custom-packages.nix -A mc
/nix/store/svm98wmbf01dswlfcvvxfqqzckbhp5n5-mc-4.8.12

I can build our package using our private composition of packages. Furthermore, I can also install it into my Nix profile by running:

$ nix-env -f custom-packages.nix -iA mc

Because the composition expression is also a function taking system as a parameter (which defaults to the same system architecture as the host system), I can also build Midnight Commander for a different system architecture, such as a 32-bit Intel Linux system:

$ nix-build custom-packages.nix -A mc --argstr system i686-linux

Simplifying the private composition expression


The private composition expression shown earlier passes all required function arguments to each package definition, which basically requires anyone to write function arguments twice. First to define them and later to provide them.

In 95% of the cases, the function parameters are typically packages defined in the same composition attribute set having the same attribute names as the function parameters.

In Nixpkgs, there is a utility function named callPackage {} that simplifies things considerably -- it automatically passes all requirements to the function by taking the attributes with the same name from the composition expression. So there is no need to write: inherit gpm ...; anymore.

We can also define our own private callPackage {} function that does this for our private composition expression:

{ system ? builtins.currentSystem }:

let
  pkgs = import <nixpkgs> { inherit system; };
  
  callPackage = pkgs.lib.callPackageWith (pkgs // pkgs.xlibs // self);
  
  self = {
    pkgconfig = callPackage ./pkgs/pkgconfig { };
  
    gpm = callPackage ./pkgs/gpm { };
  
    mc = callPackage ./pkgs/mc { };
  };
in
self

The above expression is a simplified version of our earlier composition expression (named: custom-packages.nix) that uses callPackage {} to automatically pass all required dependencies to the functions that build a package.

callPackage itself is composed from the pkgs.lib.callPackageWith function. The first parameter (pkgs // pkgs.xlibs // self) defines the auto-arguments. In this particular case, I have specified that the automatic function arguments come from self (our private composition) first, then from the xlibs sub attribute set from Nixpkgs, and then from the main composition attribute set of Nixpkgs.

With the above expression, we accomplish exactly the same thing as in the previous expression, but with fewer lines of code. We can also build the Midnight Commander exactly the same way as we did earlier:

$ nix-build custom-packages.nix -A mc
/nix/store/svm98wmbf01dswlfcvvxfqqzckbhp5n5-mc-4.8.12

Conclusion


In this blog post, I have described how I typically maintain a single package or a collection packages outside the Nixpkgs tree. More information on how to package things in Nix can be found in the Nix manual and the Nixpkgs manual.

Tuesday, July 22, 2014

Backing up Nix (and Hydra) builds

One of the worst things that may happen to any computer user is that filesystems get corrupted or that storage mediums, such as hard drives, break down. As a consequence, valuable data might get lost.

Likewise, this could happen to machines storing Nix package builds, such as a Hydra continuous build machine that exposes builds through its web interface to end users.

Reproducible deployment


One of the key features of the Nix package manager and its related sub projects is reproducible deployment -- using Nix expressions (which are basically recipes that describe how components are built from source code and its dependencies), we can construct all static components of which a system consists (such as software packages and configuration files).

Moreover, Nix ensures that all dependencies are present and correct, and removes many side effects while performing a build. As a result, producing the same configuration with the same set of expressions on a different machine should yield (nearly) a bit identical configuration.

So if we keep a backup of the Nix expressions stored elsewhere, such as a remote Git repository, we should (in theory) have enough materials to reproduce a previously deployed system configuration.

However, there are still a few inconveniences if you actually have to do this:

  • It takes time to rebuild and redownload everything. Some packages and system configurations might consists of hundreds or thousands of components taking many hours to complete.
  • The source tarballs may not be available from their original download locations anymore. I have encountered these situations quite a few times when I was trying to reproduce very old configurations. Some suppliers may decide to remove old releases after a while, or to move them to different remote locations, which requires me to search for them and to adapt very old Nix expressions, which I preferably don't want to do.
  • We also have to restore state which cannot be done by the Nix package manager. For example, if the Hydra database gets lost, we have to configure all projects, jobsets, user accounts and releases from scratch again, which is tedious and time consuming.

Getting the dependencies of packages


To alleviate the first two inconveniences, we must also backup the actual Nix packages belonging to a configuration including all their dependencies.

Since all packages deployed by the Nix package manager typically reside in a single Nix store folder (typically /nix/store), that may also contain junk and irrelevant stuff, we have to somehow select the packages that we consider relevant.

Binary deployments


In Nix, there are various ways to query specific dependencies of a package. When running the following query on the Nix store path of a build result, such as a Disnix, we can fetch all its runtime dependencies:

$ nix-store --query --requisites /nix/store/sh8025fhmz1wq27663bakmq915a2pf79-disnix-0.3pre1234
/nix/store/31kl46d8l4271f64q074bzi313hjmdmv-linux-headers-3.7.1
/nix/store/94n64qy99ja0vgbkf675nyk39g9b978n-glibc-2.19
...
/nix/store/hjbzw7s8wbvrf7mjjfkm1ah6fhnmyhzw-libxml2-2.9.1
/nix/store/hk8wdzs9s52iw9gnxbi1n9npdnvvibma-libxslt-1.1.28
/nix/store/kjlv4klmrarn87ffc5sjslcjfs75ci7a-getopt-1.1.4
/nix/store/sh8025fhmz1wq27663bakmq915a2pf79-disnix-0.3pre1234

What the above command does is listing the transitive Nix store path references that a package contains. In the above example, these paths correspond to the runtime dependencies of Disnix, since they are referenced from bash scripts, as well as the RPATH fields of the ELF binaries and prevent the executables to run properly if any of them is missing.

According to the nix-store manual page, the above closure refers to a binary deployment of a package, since it contains everything required to run it.

Source deployments


We can also run the same query on a store derivation file. While evaluating Nix expressions to build packages -- including its build-time dependencies --, a store derivation file is generated each time the derivation { } function is invoked.

Every Nix expression that builds something indirectly calls this function. The purpose of a derivation is composing environments in which builds are executed.

For example, if we run the previous query on a store derivation file:

$ nix-store --query --requisites /nix/store/3icf7dxf3inky441ps1dl22aijhimbxl-disnix-0.3pre1234.drv
...
/nix/store/4bj56z61q6qk69657bi0iqlmia7np5vc-bootstrap-tools.cpio.bz2.drv
...
/nix/store/4hlq4yvvszqjrwsc18awdvb0ppbcv920-libxml2-2.9.1.tar.gz.drv
/nix/store/g32zn0z6cz824vbj20k00qvj7i4arqy4-setup-hook.sh
/nix/store/n3l0x63zazksbdyp11s3yqa2kdng8ipb-libxml2-2.9.1.drv
/nix/store/nqc9vd5kmgihpp93pqlb245j71yghih4-libxslt-1.1.28.tar.gz.drv
/nix/store/zmkc3jcma77gy94ndza2f1y1rw670dzh-libxslt-1.1.28.drv
...
/nix/store/614h56k0dy8wjkncp0mdk5w69qp08mdp-disnix-tarball-0.3pre1234.drv
/nix/store/3icf7dxf3inky441ps1dl22aijhimbxl-disnix-0.3pre1234.drv

Then all transitive references to the store derivation files are shown, which correspond to all build-time dependencies of Disnix. According to the nix-store manual page the above closure refers to a source deployment of package, since the store derivations are low-level specifications allowing someone to build a package from source including all its build time dependencies.

Cached deployments


The previous query only returns the store derivation files. These files still need to be realised in order to get a build, that may take some time. We can also query all store derivation files and their corresponding build outputs, by running:

$ nix-store --query --requisites --include-outputs \
    /nix/store/3icf7dxf3inky441ps1dl22aijhimbxl-disnix-0.3pre1234.drv
...
/nix/store/zmkc3jcma77gy94ndza2f1y1rw670dzh-libxslt-1.1.28.drv
...
/nix/store/hk8wdzs9s52iw9gnxbi1n9npdnvvibma-libxslt-1.1.28
...
/nix/store/3icf7dxf3inky441ps1dl22aijhimbxl-disnix-0.3pre1234.drv

The above command only includes the realised store paths that have been built before. By adding the --force-realise parameter to the previous command-line instruction, we can force all outputs of the derivations to be built.

According to the nix-store manual page, the above closure refers to a cached deployment of a package.

Backing up Nix components


Besides querying the relevant Nix store components that we intend to backup, we also have to store them elsewhere. In most cases, we cannot just simply copy the Nix store paths to another location and copy it back into the Nix store at some later point:

  • Some backup locations may use more primitive filesystems than Linux (and other UNIX-like systems). For example, we require filesystem features, such as symlinks and read, write and executable bits.
  • We also require necessary meta-information to allow it to be imported into the Nix store, such as the set of references to other paths.

For these reasons, it is recommendable to use nix-store --export, that serializes a collection of Nix store paths into a single file including their meta-information. For example, the following command-line instruction serializes a cached deployment closure of Disnix:

$ nix-store --export $(nix-store --query --requisites --include-outputs \
    /nix/store/3icf7dxf3inky441ps1dl22aijhimbxl-disnix-0.3pre1234.drv) > disnix-cached.closure

The resulting closure file (disnix-cached.closure) can easily be stored on many kinds of mediums, such as an external harddrive using a FAT32 filesystem. We can import the the closure file into another Nix store by running:

$ nix-store --import < disnix-cached.closure

The above command imports Disnix including all its dependencies into the Nix store. If any dependencies are already in the Nix store, then they are skipped. If any dependency appears to be missing, it returns an error. All these properties can be verified because the serialization contains all the required meta-information.

Storing backups of a collection of Nix components efficiently


In principle, the export and import nix-store operations should be sufficient to make reliable backups of any Nix package. However, the approach I described has two drawbacks:

  • For each package, we serialize the entire closure of dependencies. Although this approach is reliable, it is also inefficient if we want to backup multiple packages at the same time. Typically, many packages share the same common set of dependencies. As a consequence, each backup contains many redundant packages wasting a lot of precious disk space.
  • If we change a package's source code, such as Disnix, and rebuild it, we have to re-export the entire closure again, while many of the dependencies of remain the same. This makes the backup process time considerably longer than necessary.

To fix these inefficiencies, we need an approach that stores serializations of each Nix store path individually, so that we can check which paths have been backed up already and which still need to be serialized. Although we could implement such an approach ourselves, there is already a Nix utility that does something similar, namely: nix-push.

Normally, this command is used to optimize the build times of source builds by making binary substitutes available that can be downloaded instead, but it turns out to be quite practical for making backups as well.

If I run the following instruction on a collection of Nix store paths:

$ nix-push --dest /home/sander/cache /nix/store/4h4mb7lb5c0g390bd33k658dgzahkjn7-disnix-0.3pre1234

A binary cache is created in the /home/sander/cache directory from the closure of the Disnix package. The resulting binary cache has the following structure:

$ ls /home/sander/cache
03qpb8b4j4kc1w3fvwg9f8igc4skfsgj9rqb3maql9pi0nh6aj47.nar.xz
053yi53qigf113xsw7n0lg6fsvd2j1mapl6byiaf9vy80a821irk.nar.xz
05vfk68jlgj9yqd9nh1kak4rig379s09.narinfo
06sx7428fasd5bpcq5jlczx258xhfkaqqk84dx2i0z7di53j1sfa.nar.xz
...
11wcp606w07yh8afgnidqvpd1q3vyha7ns6dhgdi2354j84xysy9.nar.xz
...
4h4mb7lb5c0g390bd33k658dgzahkjn7.narinfo
...

For each Nix store path of the closure, an xz compressed NAR file is generated (it is also possible to use bzip2 or no compression) that contains a serialization of an individual Nix store path (without meta-information) and a narinfo file that contains its corresponding meta-information. The prefix of the NAR file corresponds to its output hash while the prefix of the narinfo file corresponds to the hash component of the Nix store path. The latter file contains a reference to the former NAR file.

If, for example, we change Disnix and run the same nix-push command again, then only the paths that have not been serialized are processed while the existing ones remain untouched, saving redundant diskspace and backup time.

We can also run nix-push on a store derivation file. If a store derivation file is provided, a binary cache is generated from the cached deployment closure.

Restoring a package from a binary cache can be done as follows:

$ nix-store --option binary-caches file:///home/sander/cache \
    --realise /nix/store/3icf7dxf3inky441ps1dl22aijhimbxl-disnix-0.3pre1234

Simply realizing a Nix store path while providing the location to the binary cache as a parameter causes it to download the substitute into the Nix store, including all its dependencies.

Creating releases on Hydra for backup purposes


How can this approach be applied to Hydra builds? Since Hydra stores many generations of builds (unless they are garbage collected), I typically make a selection of the ones that I consider important enough by adding them to a release.

Releases on Hydra are created as follows. First, you have to be logged in and you must select a project from the project overview page, such as Disnix:


Clicking on a project will redirect you to a page that shows you the corresponding jobsets. By unfolding the actions tab, you can create a release for that particular project:


Then a screen will be opened that allows you define a release name and description:


After the release has been created, you can add builds to it. Builds can be added by opening the jobs page and selecting build results, such as build.x86_64-linux:


After clicking on a job, we can add it to a release by unfolding the 'Actions' tab and selecting 'Add to release':


The following dialog allows us to add the build to our recently created: disnix-0.3 release:


When we open the 'Releases' tab of the project page and we select the disnix-0.3 release, we can see that the build has been added:


Manually adding individual builds is a bit tedious if you have many them. Hydra has the ability to add all jobs of an evaluation to a release in one click. The only prerequisite is that each build must tell Hydra (through a file that resides in $out/nix-support/hydra-release-name of the build result) to which release it should belong.

For me adapting builds is a bit inconvenient and I also don't need the ability to add builds to arbitrary releases. Instead, I have created a script that adds all builds of an evaluation to a single precreated release, which does not require me to adapt anything.

For example running:

$ hydra-release-eval config.json 3 "disnix-0.3" "Disnix 0.3"

Automatically creates a release with name: disnix-0.3 and description: "Disnix 0.3", and adds all the successful builds of evaluation 3 to it.

Exporting Hydra releases


To backup Hydra releases, I have created a Perl script that takes a JSON configuration file as parameter that looks as follows:

{
  "dbiConnection": "dbi:Pg:dbname=hydra;host=localhost;user=hydra;",
  
  "outDir": "/home/sander/hydrabackup",
  
  "releases": [
    {
      "project": "Disnix",
      "name": "disnix-0.3",
      "method": "binary"
    },
  ]
}

The configuration file defines an object with three members:

  • dbiConnection contains the Perl DBI connection string that connects to Hydra's PostgreSQL database instance.
  • outDir refers to a path in which the binary cache and other backup files will be stored. This path could refer to (for example) the mount point of another partition or network drive.
  • releases is an array of objects defining which releases must be exported. The method field determines the deployment type of the closure that needs to be serialized, which can be either a binary or cache deployment.

By running the following command, I can backup the releases:

$ hydra-backup config.json

The above command creates two folders: /home/sander/hydrabackup/cache contains the binary cache generated by nix-pull using the corresponding store derivation files or outputs of each job. The /home/sander/hydrabackup/releases folder contains text files with the actual paths belonging to the closures of each release.

The backup approach (using a binary cache) also allows me to update the releases and to efficiently make new backups. For example, by changing the disnix-0.3 release and running the same command again, only new paths are being exported.

One of the things that may happen after updating releases is that some NAR and narinfo files have become obsolete. I have also created a script that takes care of removing them automatically. What it basically does is comparing the release's closure files with the contents of the binary cache and removing the files that are not defined in any of the closure files. It can be invoked as follows:

$ hydra-collect-backup-garbage config.json

Restoring Hydra releases on a different machine can be done by copying the /home/sander/hydrabackup folder to a different machine and by running:

$ hydra-restore config.json

Backing up the Hydra database


In addition to releases, we may want to keep the Hydra database so that we don't have to reconfigure all projects, jobsets, releases and user accounts after a crash. A dump of the database can be created, by running:

$ pg_dump hydra | xz > /home/sander/hydrabackup/hydra-20140722.pgsql.xz

And we can restore it by running the following command:

$ xzcat /home/sander/hydrabackup/hydra-20140722.pgsql.xz | psql hydra

Conclusion


In this blog post, I have described an approach that allows someone to fully backup Nix (and Hydra) builds. Although it may feel great to have the ability to do so, it also comes with a price -- closures consume a lot of disk space, since every closure contains all transitive dependencies that are required to run or build it. In some upgrade scenarios, none of the dependencies can be shared which is quite costly.

In many cases it would be more beneficial to only backup the Nix expressions and Hydra database, and redo the builds with the latest versions of the dependencies, unless there is really a good reason to exactly reproduce an older configuration.

Furthermore, I am not the only person who has investigated Hydra backups. The Hydra distribution includes a backup script named: hydra-s3-backup-collect-garbage that automatically stores relevant artifacts in an Amazon S3 bucket. However, I have no clue how to use it and what it's capabilities are. Moreover, I am an old fashioned guy who still wants store backups on physical mediums rather than in the cloud. :).

The scripts described in this blog post can be obtained from my Github page. If other people consider any these scripts useful, I might reserve some time to investigate whether they can be included in the Hydra distribution package.

UPDATE: I just implemented a fix for Hydra that automatically composes a release name out of the project name and evaluation id if no release name has been defined in any of the builds. The fix has been merged into the main Hydra repository. This makes the hydra-release-eval script obsolete.