Sander van der Burg's blog: February 2011

Friday, February 25, 2011

Free (and Open-Source) software

Last monday, Richard Stallman, the founder of the Free Software Foundation and the GNU project, gave a guest lecture at Delft University of Technology, which I attended. The talk was titled: 'Copyright versus Community'. The contents of the talk was not really about Free Software specifically, but instead it tries answering the question whether the ideas of Free Software extend to other kinds of works.

One of the interesting things he explained in his talk is how copyright has been developed throughout the ages, since its introduction in England in the 1500s. First, it started as a form of censorship for Protestants. Later, the first form of "real" copyright was only applicable for publishers and lasted only 14 years. People were still free to copy printed work if they wish (although this wasn't very doable in most cases). Throughout the time the power of copyright was extended. Nowadays, copyright lasts for something like 75 years and apart from publishers, also individuals get severely punished for the most silly kinds of copyright infringement.

Another thing he pointed out is that governments aren't very ethical either; they often support stronger copyright laws and implement annoying wishes of publishing companies in law, instead of listening to people's desires and needs. He pointed out that it's in the nature of humans to share things which each other. Restricting and taking control is against humanity. Because software has an increasingly importance our daily lives, the importance of Free Software also increases, which do not take freedom away of people. (Maybe somebody is going to threaten me after reading this! :P )

I liked the talk, but I found the question round afterwards quite annoying. There were some good questions, but also some lame ones. As a software developer and researcher I also talk quite frequently about Free (and Open-Source) Software and I often see some common "misconceptions" about what it is and what not and what the implications are. During the question round also some lame questions came by to which Richard Stallman usually responded by saying: "I don't understand the question". Later, he elaborated on the question a bit more. So in this blog I'd like to talk a bit more about some common misconceptions I frequently encounter and I'll try to clarify them a bit, because I have to stress these things out a lot to people.

Meaning

The first class of misconceptions I usually encounter is about the meaning of some terms which are frequently used. When I use the term: "Free Software" often people think that it's about gratis software (free in price). When I use the term: "Open Source" people think that it's about software for which the source code can be obtained. Both of these common misunderstandings don't represent what they are truly about.

Free Software (and Open Source) is not gratis per se. This is probably misunderstood due to the fact that the word 'free' is ambigious in English. The word 'free' refers to freedom, not price. For example, the translation of free software in Dutch is: 'Vrije software' or in French: 'Libre Software'. Moreover, it's perfectly legal to sell free software for whatever price you want, as described here. The point is that it rarely happens in practice nowadays, because one cannot restrict other people from distributing (and selling) it. Also software gets rapidly and easily redistributed through the internet, which is mostly done for free.

Furthermore, Open-Source (and Free Software) is not just about distributing source code. You should also be allowed to study source code, make modifications and redistribute the work (with or without changes). For example, there is software available that includes source code but you are not allowed to distribute your modified versions. This category of software is neither Free nor Open-Source although the source code may be available.

Intermixing

A lot of people intermix the terms "Free Software" and "Open Source" as if it they were the same thing. Although almost all Free Software is Open Source software and vice-versa; they both are two different definitions and represent different philosophies.

Free Software is not just about distributing source code with some additional rights. It's about 4 freedoms, for which having the source code and the ability to study and make changes is an essential precondition. The definition of Open-Source is more or less a pragmatic view of sharing code in an open manner. You need to have to source code and you should be able to study and make/distribute changes. An author, however, may impose some restrictions for integrity. There is also another definition of free software available described in the Debian Free Software Guidelines (mostly used by Debian developers). The Open-Source definition is derived from this.

This page explains some differences between Free Software and Open Source software from the FSF's point of view: http://www.gnu.org/philosophy/free-software-for-freedom.html. It's also important to point out that Free Software and Open Source people are not enemies of each other, although they don't have exactly the same point of view. They work together quite frequently and because in practice nearly all Free Software is Open Source software and vice versa, usually both parties get what they want.

Anti-commercial

A lot of people think that Free Software (and Open Source) is the opposite of commercial software. Similarly, during the talk of Richard Stallman some people thought that all commercial activities were bad/unethical. This is a big mistake and not what free software and freedom in general is about.

Although most Free and Open-Source software is non-commercial, also commercial Free and Open-Source software exists. Commercial activity is just a way for companies to earn money. Free and Open-Source software don't restrict that nor it is considered a bad thing as long as it does not restrict people's freedom.

When I explain this to people, I often receive the question: "But why do companies release software under Free and Open Source licenses then? That's bad for them right?". Usually such companies do not make money from Free and Open Source software by selling copies or licenses, but in different ways, such as by selling support, offering services based around their technology and dual licensing (distributing the same software under a proprietary and free software license simultaneously).

Some companies are quite successful even though their software is Free and Open-Source and can be used by practically anyone. In some cases, releasing software under Free and Open-source licenses gives companies advantages. They can easily built communities around their software and accept external contributions so that the whole community benefits and their commercial services improve.

The opposite of Free (and Open-Source) software is proprietary software. Proprietary software is available both commercially and non-commercially and impose restrictions to users freedom in some degree.

Dangerous

When I try convincing certain people in certain companies to use a particular Free and Open Source product, they often remind me that I should be very careful, because it may be dangerous. Often these people are referring to the implications of copyleft, which is not always fully understood.

I will first explain a bit what copyleft is about. Copyleft is an instrument in a license which guarantees that the 4 freedoms defined in the Free Software definition are preserved in a work in some degree. The most famous copyleft license is the GNU General Public License (GPL). Basically, this license states that a complete derived work must retain all freedoms. The GPL license is not always applicable for all types of software, such as libraries. Therefore, the GNU Lesser General Public License (LGPL) has been developed with a weaker copyleft. This license basically restricts the boundaries of the copyleft to a library instead of a whole work. Also non-copylefted Free and Open-Source software exists. These licenses allow you to incorporate software into your proprietary products, make modifications and keep them secret.

Even though a license is a copyleft license or not, you are always free to make modifications and to run the software for whatever goal you want. It also does not restrict you selling software for whatever price you want. The only obligation that it gives you, is that you have to give users some degree of freedom. If you're not distributing the software itself, you can keep all you custom modifications secret, copylefted or not. These conditions only become applicable when you're distributing software.

In theory, a modified piece of GPL software can be offered as a service, without giving the changes away. This is because you're not distributing the software itself to end users. To guarantee freedom to users of services the GNU Affero General Public License (AGPL) has been developed, which states that users in a network also have the right to access the source code and the same rights. Although an AGPL license exists, most free software products don't use it.

To be clear about this, using Free Software is never dangerous in the sense that you have to give something away. You only have obligations if you want to (re)distribute derived works or when you give external users access to software.

Giving away

Sometimes I ask to people why they don't publish a piece of software under a Free and Open-source license. Frequently, I get an answer saying: "Well, then I give up all my rights and I give everything I have away!". This is also a misconception I want to clarify.

When releasing software under a license, you're still the copyright holder. As a copyright holder you have the ability to change a license, even when it's released under a Free and Open-source license (except when you have received external contributions and didn't let the contributors transfer their copyright to you). Moreover, copyleft may give you some protection, such as the right to obtain the source code of derived works. Releasing software under a free license doesn't mean that it is in the public domain of course.

Conclusion

Hopefully, I have clarified some misconceptions in this blog post a bit. As a Free and Open Source software developer myself, I think it's important to point this out.

References

The following page gives a good overview about all software categories I have discussed in this blog post: http://www.gnu.org/philosophy/categories.html
The contents of the 'Copyright vs Community' talk can be found on the GNU website: http://www.gnu.org/philosophy/copyright-versus-community.html.

Wednesday, February 16, 2011

Disnix: A toolset for distributed deployment

On February the 14th, I have released Disnix 0.2. It seems that I have picked a nice date for it, just like the release date of the of Disnix 0.1, which was released on April the 1st 2010 (and that's no joke). Since I haven't written any blog post about Disnix yet, I'll give some info here.

Disnix is a distributed deployment extension for the Nix package manager. Whereas Nix manages packages and dependencies residing on the same system in the Nix store (which we call intra-dependencies later on), Disnix manages distributable components (or services). Services have intra-dependencies on components residing on the same system, but also dependencies on other services, which may be located on a different machine in the network (which we call inter-dependencies later on). Disnix extends the Nix approach and offers various features to deploy to service-oriented systems, including the management of inter-dependencies.

The figure above shows how Disnix works in a nutshell. In the center the disnix-env command-line tool is shown, which performs the complete deployment process of a service-oriented system. On the left, various models are shown. The services model captures all the service (distributable components) of which a system consists, their types and inter-dependencies. This model includes a reference to all-packages.nix, a Nix expression capturing intra-dependency compositions. The infrastructure model captures the available machines in the network and their relevant properties/capabilities. The distribution model maps services defined in the services model to machines defined in the infrastructure model. On the right, a network of machines is shown, which have all have the DisnixService installed providing remote access to deployment operations.

By writing instances of the models mentioned earlier and by running:

$ disnix-env -s services.nix -i infrastructure.nix \
  -d distribution.nix

The system is built from source-code including all required intra-dependencies. Then the services and intra-dependencies are efficiently transferred to the target machines in the network. Finally, the services are activated in the right order derived from the inter-dependency graph.

By adapting the models and running disnix-env again, an upgrade is performed instead of a full installation. In this case only components which have changed are rebuilt and transferred to the target machines in the network. Moreover, only obsolete services are deactivated and new services are activated.

Similar to writing ordinary Nix expressions for each package, you also write Disnix expressions for each service describing how it can be built from source and its dependencies.

{stdenv, StaffService}:
{staff}:

let
  jdbcURL = "jdbc:mysql://"+
    staff.target.hostname+":"+
    toString (staff.target.mysqlPort)+"/"+
    staff.name+"?autoReconnect=true";
  contextXML = ''
    <Context>
      <Resource name="jdbc/StaffDB" auth="Container"
            type="javax.sql.DataSource"
            maxActivate="100" maxIdle="30" maxWait="10000"
            username="${staff.target.mysqlUsername}"
            password="${staff.target.mysqlPassword}"
            driverClassName="com.mysql.jdbc.Driver"
            url="${jdbcURL}" />
    </Context>
  '';
in
stdenv.mkDerivation {
  name = "StaffService";
  buildCommand = ''
    ensureDir $out/conf/Catalina
    cat > $out/conf/Catalina/StaffService.xml <<EOF
    ${contextXML}
    EOF
    ln -sf ${StaffService}/webapps $out/webapps
  '';
}

The code fragement above shows a Disnix expression for the StaffTracker example included in the Disnix repository. The main difference between this expression and an ordinary Nix expression is that it has two function headers which takes intra-dependencies and inter-dependencies respectively to configure the component. The inter-dependency arguments are used in this expression to generate a so called context XML file, which Apache Tomcat uses to configure resources such as JDBC connections, containing a URL, port number and authentication credentials for a MySQL database residing on a different machine in the network. For other types of components a different configuration file has to be created.

Moreover, you also need to compose a Disnix expression. A Disnix expression must first be composed locally by calling the function with the right intra-dependency arguments. This is done in a similar way as ordinary Nix expressions. Later, the same function is called with the right inter-dependency arguments as well.

{distribution, system, pkgs}:

let customPkgs = import ../top-level/all-packages.nix {
  inherit system pkgs;
};
in
rec {
### Databases
  staff = {
    name = "staff";
    pkg = customPkgs.staff;
    dependsOn = {};
    type = "mysql-database";
  };
  ...

### Web services
  StaffService = {
    name = "StaffService";
    pkg = customPkgs.StaffServiceWrapper;
    dependsOn = {
      inherit staff;
    };
    type = "tomcat-webapplication";
  };
  ...

### Web applications

  StaffTracker = {
    name = "StaffTracker";
    pkg = customPkgs.StaffTracker;
    dependsOn = {
      inherit GeolocationService RoomService;
      inherit StaffService ZipcodeService;
    };
    type = "tomcat-webapplication";
  };
  ...
}

The above expression shows the services model, used to capture of which distributable components a system consists. Basically, this model is a function taking three arguments: the distribution model (shown later), a collection of Nixpkgs and a system identifier indicating the architecture of a target host. The function returns an attribute set in which each attribute represents a service. For each service various properties are defined, such as a name, a pkg attribute referring to a function which creates an intra-dependency composition of a service (defined in an external file not shown here), dependsOn composing the inter-dependencies and a type, which is used for activation and deactivation of the service.

{
  test1 = {
    hostname = "test1.example.org";
    tomcatPort = 8080;
    system = "i686-linux";
  };
  
  test2 = {
    hostname = "test2.example.org";
    tomcatPort = 8080;
    mysqlPort = 3306;
    mysqlUsername = "root";
    mysqlPassword = "secret";
    system = "i686-linux";
  }; 
}

The expression shown above is an infrastructure model, which captures machines in the network and their relevant properties/capabilities. This expression is an attribute set in which each attribute represents a machine in the network. Some properties are mandatory, such as the hostname indicating how the Disnix service can be reached. The system property denotes the system architecture so that a service is built for that particular platform. Other properties can be freely chosen and are used for activation/deactivation of a component.

{infrastructure}:

{
  GeolocationService = [ infrastructure.test1 ];
  RoomService = [ infrastructure.test2 ];
  StaffService = [ infrastructure.test1 ];
  StaffTracker = [ infrastructure.test1 infrastructure.test2 ];
  ZipcodeService = [ infrastructure.test1 ];
  rooms = [ infrastructure.test2 ];
  staff = [ infrastructure.test2 ];
  zipcodes = [ infrastructure.test2 ];
}

The final expression shown above is the distribution model, mapping services to machines in the network. This expression is a function taking the infrastructure model as parameter. The body is an attribute set in which every attribute representing a service refers to a list of machines in the network. It also allows you to map a service to multiple machines for e.g. load balancing.

The models shown earlier are used by Disnix to perform the complete deployment process of a service-oriented system, i.e. building services, transferring services and the activation of services. Because Disnix uses the purely functional properties of Nix, this process is reliable and efficient. If a system is upgraded, no components are removed and overwritten, since everything is stored in isolation in the Nix store. So while upgrading, we can still keep the current system intact. Only during the transition phase in which services are deactivated and activated the system is inconsistent, but Disnix keeps this time window as small as possible. Moreover, a proxy can be used during this phase to queue connections, which makes the upgrade process truly atomic.

Although Disnix supports the deployment of a service-oriented system, some additional extensions have been developed to make deployment more convenient:

DisnixWebService. By default Disnix uses a SSH connection to connect to remote machines in the network. This extension provides a SOAP interface and disnix-soap-client to perform deployment through the SOAP protocol.
DisnixOS. Disnix manages the services of which a system is composed, but not the system configurations of the underlying infrastructure. This extension provides additional infrastructure management features to Disnix based on the techniques described in the blog post titled: Using NixOS for declarative deployment and testing. By using this extension you can automatically deploy a network of NixOS configurations next to the services through Disnix. Moreover, you can also use this extension to generate a network of virtual machines and automatically deploy the system in the virtual network. A screenshot is shown above, which runs the StaffTracker example in a network of three virtual machines.
Dynamic Disnix. Disnix requires developers or system administrators to manually write an infrastructure model and a distribution model. In a network in which events occur, such as a machine which crashes or a new machine with new system resource is added, this introduces a large degree of inflexibility. The Dynamic Disnix toolset offers a discovery service, which dynamically discovers the machines in the network and their relevant properties/capabilities. Moreover, it also includes a distribution model generator, which uses a custom defined policy and a collection of distribution algorithms to dynamically distribute services to machines, based on non-functional properties defined in the services and infrastructure models.
The Dynamic Disnix extension is still under heavy development and not released as part of Disnix 0.2. It will become part of the next Disnix release.

Disnix, the extensions and some examples can be obtained from the Disnix web page: http://nixos.org/disnix. Disnix is also described in several academic papers. The paper: 'Disnix: A toolset for distributed deployment' describes the architecture of the Disnix toolset. This paper is however somewhat outdated, as there are some minor changes in the current implementation. The paper: 'Automated Deployment of a Heterogeneous Service-Oriented System' describes the 0.1 implementation, which we have used for a case study at Philips Research. The publications and presentation slides can be obtained from the publications and talks sections of my homepage. Moreover, there are some earlier publications about Disnix available as well. In a next blog post, I will explain more about the development process and development choices of Disnix.

Tuesday, February 8, 2011

Using NixOS for declarative deployment and testing

Last weekend, I have visited FOSDEM, the Free and Open-source Software Developers' European Meeting held at the Université Libre de Bruxelles (ULB) for the third time. This is a very big event organized every year for free software and open-source people to meet each other. Also, all the major free and open-source projects that you can think of are well represented there.

It's always quite impressive to see the Janson auditorium completely filled with thousands of free and open-source people during the keynote presentations. I took a picture during the keynote of Eben Moglen of the Software Freedom Law Center (shown above) to give an impression how massive this event is.

This time I have also given a presentation about NixOS in the CrossDistro devroom, because I think it's a good idea to promote our ideas and concepts to a bigger audience next to academic people. In my presentation I have explained the complexity of deployment in various scenarios (single installation, distributed environments, virtual machines), the general idea and concepts of NixOS and also some applications we have developed the last couple of months to deal with the complexity of deployment.

One of our recently developed applications is distributed NixOS deployment. So instead of writing a single NixOS configuration, you can also write a network of NixOS configurations, for instance:

{
  storage = 
    {pkgs, config, ...}:
    {
      services.portmap.enable = true;
      services.nfsKernel.server.enable = true;
      services.nfsKernel.server.exports = ''
        /repos 192.168.1.0/255.255.255.0(rw,no_root_squash)
      '';
      services.nfsKernel.server.createMountPoints = true;
    };

  postgresql =
    {config, pkgs, ...}:
    {
      services.openssh.enable = true;
      services.postgresql.enable = true;
      services.postgresql.enableTCPIP = true;
      services.postgresql.authentication = ''
        # Generated file; do not edit!
        local all all                trust
        host  all all 127.0.0.1/32   trust
        host  all all ::1/128        trust
        host  all all 192.168.1.0/24 trust
      '';
    };

  webserver = 
    {config, pkgs, ...}:
    {
      fileSystems = pkgs.lib.mkOverride 50  
        [ { mountPoint = "/repos";
            device = "storage:/repos";
            fsType = "nfs";
            options = "bootwait"; } 
        ];
      
      services.portmap.enable = true;
      services.nfsKernel.client.enable = true;
      services.httpd.enable = true;
      services.httpd.adminAddr = "root@localhost";
      services.httpd.extraSubservices =
        [ { serviceType = "trac"; } ];
      environment.systemPackages =
        [ pkgs.pythonPackages.trac pkgs.subversion ];
    };
      
  client = 
    {config, pkgs, ...}:
    {
      require = [ ./common/x11.nix ];
      services.xserver.desktopManager.kde4.enable = true;
    };
};

The network expression shown above represents a network of machines describing a Trac environment, a web-based management tool for software projects. A Trac environment can be (of course) deployed on a single system, but also on multiple systems. For example, the Subversion server storing source code may be deployed on a different machine as the PostgreSQL database storing tickets and bug reports. In the network expression shown above, we have defined 4 machines, representing a Subversion server, PostgreSQL server, web server and a client machine running the KDE Plasma desktop.

A network of NixOS machines can be automatically deployed by writing a network expression and by typing:

$ nixos-deploy-network network.nix

The nixos-deploy-network tool first builds all NixOS configurations for all the machines. Then it efficiently transfers the system configurations and all its dependencies to the right machines in the network. Because of the purely functional properties of Nix, this phase will not harm the existing configurations because all files are stored safely next to each other in the Nix store and no files are overwritten or automatically removed.

After all system configurations and dependencies have been transferred to the target machines, the system configurations are activated. In this phase system services are stopped and started on each machine in the network, which may bring some downtime to the complete system, but this time window is a small as possible. In case of a failure, a rollback is performed which activates the previous configuration again. This can be done easily in NixOS, since older configurations are still available in the Nix store, unless they are garbage collected.

Another recently developed application is virtualization. By running the following command:

$ nixos-build-vms; ./result/bin/nixos-run-vms

A network of virtual machines is generated and automatically launched closely resembling the configurations defined in the network model. This allows users to experiment with a specific configuration, without having to deploy a collection of physical machines. Another notable feature is that virtual networks are cheap to instantiate. We don't have to create disk images, but instead we mount the Nix store of the host machine through SMB/CIFS. We can safely do this because of the purely functional concept of the Nix store. An impression of a virtual network running Trac is shown above.

We have also developed a NixOS test driver. This can be used to perform automatic distributed test cases in a network of virtual machines.

testScript =
''
  startAll;
      
  $postgresql->waitForJob("postgresql");
  $postgresql->mustSucceed("createdb trac");
      
  $webserver->mustSucceed("mkdir -p /repos/trac");
  $webserver->mustSucceed("svnadmin create /repos/trac");
      
  $webserver->waitForFile("/var/trac");      
  $webserver->mustSucceed("mkdir -p /var/trac/projects/test");
  $webserver->mustSucceed("trac-admin /var/trac/projects/test ".
    "initenv Test postgres://root\@postgresql/trac svn ".
    "/repos/trac");
      
  $client->waitForX;
  $client->execute("konqueror http://webserver/projects/test &");
  $client->waitForWindow(qr/Test.*Konqueror/);
  $client->sleep(30); # loading takes a long time
      
  $client->screenshot("screen");
'';

The code fragment above shows an example of a test suite for the Trac environment. This test suite creates a Trac database on the PostgreSQL server, a Subversion repository on the Subversion repository, then it defines a Trac project using the trac-admin tool and launches a web browser to take a screenshot of the entry page. The test suite is performed by the test driver in a non-interactive manner.

We have applied the distributed deployment and testing techniques to various use cases. We used the nixos-deploy-network tool to deploy our complete Hydra build environment for continuous integration and testing of many software components. We also have implemented various test cases for NixOS, various GNU projects and other free software projects using the NixOS test driver.

I am quite happy to see how well the ideas described in my presentation were received at FOSDEM. It was the first time for me to present there and I didn't really know what to expect. It seems that my talk attracted quite a number of people and I received quite a number of positive reactions and a lot of good questions and suggestions. I have to admit that these questions were far better than the ones I usually receive at academic conferences.

The slides of my FOSDEM presentation (titled: 'Using NixOS for declarative deployment and testing') can be obtained from the talks page of my homepage. The distributed testing techniques are also described in our ISSRE 2010 paper titled: 'Automating System Tests Using Declarative Virtual Machines'. The technical report titled: 'Declarative Testing and Deployment of Distributed Systems' describes an earlier implementation of our declarative deployment and testing techniques. Both papers can be downloaded from my publications page.

The techniques are part of NixOS. The NixOS test driver can also be used on any Linux system running the Nix package manager and KVM, which allows you to still use your favourite Linux distribution if you don't want to switch to NixOS.