Versioning in the world of containers

Versioning is the act of assigning a label to a software package (whatever from a single file to a complex application). You surely are used to maven version scheme, and maybe heard about Semantic Versioning (SemVer for friends). But anything labelling software units in order can be understood as versioning: Timestamps, Commit id trees, or just numbers or strings.

One may question the need for versioning. If you only need the latest “state” for your product, you are right, you don’t need versioning at all. Sadly, you will be missing a lot of interesting capabilities:

  • History of changes until the current state was reached
  • Ability to roll back to a previous state
  • Having multiple states simultaneously for the same unit

Sometimes those capabilities are mandatory, hence versioning is a need.

You probably realised I used several words to refer the same idea: unit, package, component, product… Those all mean the same here: something resulting from a development process, that can be stored as one (or many) files. In fact, when many files are joined into a single file, the result is named a ‘package’, and this is the most common name you will find here.

The fact of assigning a version to a package is named, indeed, versioning. And the union of a package and a version is called ‘artifact’.

Once we have all names clear, I can drop here what for me is one of the most important and complicated questions every software developer should have in mind: how many (and what) artifacts will compose my application?

Let me try to describe the most common cases:

Simplest case

The simplest case is when the application is just a package of code. Maybe is compiled code, or maybe is plain code to be interpreted. No matter what language or paradigm, the simplest case will only need a simple version. And every time code is changed, and those changes have mass enough, a new version is generated.

A version here may be a snapshot of a code repository, a zipped package, a commit id… whatever status holder that can be identified and retrieved.

But most of the time just code is not enough. Code usually depend on third-party libraries (more code). And guess how these dependencies (artifacts) are identified? Versions. So now we have not a single version (for the code), but a collection of versions (code plus dependencies).

Identifying our application with a list of versions is, at least, unmanageable. So all underlying versions are grouped into a single package, that will be itself versioned. A single artifact to rule them all.

Configuration

Probably, our application should run in different places, with slightly different behaviour: localized text resources, different database credentials, special logging format… options are limitless. So we use to define the so-called configuration, somehow consumed by the code, that parameterize those behavioural choices.

Having a single configuration item is pretty common. Just specify the current configuration, and if anything changes, the configuration will adapt. See where are we going? Configuration evolves, and versioning of configuration artifact records this evolution.

So we have the code version, dependencies versions, and now a single configuration version… Single? In fact, is not uncommon have multiple configuration artifacts. One example I faced recently was splitting configuration between “public” and “secret”. Public configuration can be seen by developers or users. But secrets can only be seen by the application itself, and privileged people with special roles. Those two configuration elements had a completely separated life-cycle, separated artifacts, and, hence, separated versions.

Picture 1: Code, Dependencies and Configuration artifacts*.

*Despite “Needs” relation is shown as a horizontal left-to-right arrow, it does not mean that all artifacts on the left need all artifacts on the right. The arrow only intents to show a no-back relation: Code artifacts may depend on Dependencies and/or Configuration, as well as Dependencies may depend on Configuration. But Dependencies will never need Code, and Configuration will never need Dependencies nor Code.

Our list of versions keeps growing: a code version, many dependencies versions, and many configurations!

Test caring

One requirement for creating quality software elements is testing them before (or during) releasing. Test suites are common, if not mandatory, in almost every scenario.

Tests suites can be seen as an independent component, where the tested component is just another dependency. This dependency may be strictly internal (both test and tested are in the same package) or external, making the test component a package separated from the tested component.

When this is the case (a test package), and tested component evolves, the test component should also evolve. And, as we already saw, an evolving package should be managed by a versioning scheme.

A test configuration package may also exist. It may allow to run tests in different conditions, and allow tests to adapt to changing tested components with ease. To be true, I never saw this test configuration packages, nor ever found a situation where they can be of profit. But worth having them in mind.

Picture 2: Test artifacts, dependencies and configuration

And then, containers

So now, we have up to five different artifact types (each one with their own version or versions): code, dependencies, configurations, tests and test configurations. But we still have to introduce containers (after all, that is what this post is about).

The moment you start working on containers, you start reading and hearing about something called “Infrastructure as Code” (IaC). In a few words, you should provide, on one side, a description about the infrastructure your code will be executed onto; and in the other side, the instructions to place your code into this infrastructure and make it run. Those descriptions and instructions are provided in a machine-readable format (code files).

Again, we can find that IaC, should be stored, shared, and changes should be tracked… You know where I am heading to: An IaC artifact is added to the list!

Finally, one last twist: can you imagine a situation where you could have slightly different infrastructures for the same application? There are many, but an example is having different environments for different stages for the software development life-cycle. You may have a development environment, with limited resources, but easily manageable by developers; A test environment, where resources are monitored and automated test suites are executed; and a production environment, with plenty of resources, and strict security constraints. We already saw this concept of having something that allows the execution (deployment, in IaC case) to behave differently based on some decisions: it is called Configuration.

Picture 3: IaC artifacts.

So, with the introduction of the IaC configuration artifact, we have completed the list of artifacts an application may be composed of: code, dependencies, configurations, tests, test configurations, IaC and IaC configurations. And that’s a lot!

Version orchestration

When you orchestrate the deployment of an application via a complex fully-featured automated system, it is not a big deal having as many artifacts as needed. But that’s not usually the case, even more at the beginning of a new project or software development.

Sometimes, instead of managing five or seven different versions, it is easier to reduce the number of versions to be handled or even reduce the number of artifacts. This simplification can be done by two approaches:

  1. Packaging one instrument inside another, creating a single multi-purpose artifact, with a single version. This is useful when both components evolve in synchrony, and hence their versions are almost the same.
  2. Versioning many elements at the same time, even if some of them didn’t change. This allows identifying several related artifacts with a single version id, facilitating their location and combination.

Indeed, those techniques can be applied to different sets of artifacts, or can be alternated depending on needs or technical debt. We can describe several scenarios, depending on how we combine artifacts:

Single artifact

The simplest case we will probably find is mixing all components in a single artifact. Code, configurations, test, etc… The only version that really matters is the Code version, that becomes a de-facto only version.

This approach should only be taken in software elements not relying on IaC, or when most elements are simple and stable, and the variable elements tend to vary simultaneously. We should avoid generating new versions for changes in only one component.

Application Version Descriptor

On the other side of the spectrum, we have the case when all components evolve freely.

In this case, we need a new software element, that somehow lists all the versions for all the artefacts needed for the software to run. This component may have many names: Build Configuration, Dependency Management… But all of them tend to be focused on one of the artifacts (code, IaC…). So I prefer to use the name Application Version Descriptor (AVP).

The AVP should be somehow readable by an automatic process so we can start thinking about automating the combination for the different artifacts.

Hybrid approach

Indeed, extremes tend not to be the best solution in most of the cases. It is usually better to pick the best of both worlds and create a hybrid approach, based on both our needs and our capacities. My experiences have shown me that the following mixtures tend to be useful:

  • Packaging Code and Test components into the same artifact, as they tend to evolve simultaneously.
  • Also packaging together Configuration and Secrets, but only if there are no access limitation onto the latest for the development team.
  • IaC and IaC configuration artifacts (if any) also tend to change together, so it is worth packaging them together into the same artifact.

Final thoughts

At the beginning of any project, many artifact components seem superfluous. But as the software gets more and more complicated, splitting components (and hence problems) into smaller pieces is really helpful to face problems, mainly in automated fashions.

Try to keep in mind the scope of evolution on each of the components. This will help to identify the artifacts, isolate them, and create better integration flows.

In next post, we will see how we can use this clear artifact structure to build the foundations for a Continuous Integration / Continuous Deployment pipeline, and how we can create an artifact hierarchy that supports that pipeline.

Hope you enjoyed the reading. Don’t hesitate dropping questions.

Keep on learning!

Advertisement

Docker Insides

Objectives

You’ll learn how Docker implement containers, and how Linux features are used for doing so.

Prerequisites

You need basic knowledge about Docker and the containers technology. You also need basic Linux understanding, mainly running commands on terminal, and some knowledge about services and processes.

The Docker architecture

Understanding how containers are implemented require, indeed, understanding what is a container. I love Mark Church’s description about containers¹:

Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries – anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.

In my own words, a container provides all pieces of software needed to execute another piece of software: the operating system, the file system, libraries, dependencies, network system, etc. Not the hardware itself, but all software dependencies at its widest meaning.

Docker is a solution for running containers. But “running containers” is a very short description for the actual objective: running software in a repeatable, isolated and secure fashion, while keeping it simple and available.

For doing so, Docker architecture splits those requirements into three components, each one focusing on one need: Docker Client, Docker Host and Docker Registry.

The first thing you should know is that, when you run `docker run` or `docker ps` commands, you are not actually running the Docker Engine. You are just reaching the so-called Docker Cli (meaning docker client or docker command-line-interface). Docker Cli is a thin client that mainly just parses the command line parameters. Then, it uses a REST API² to send commands to, or get results from the Docker Host. Both Cli and REST API are designed with simplicity in mind, so it can be easily used by any user (either human or another program).

Docker Host is the component actually creating and running the containers. Responding to the REST API, it is responsible for providing all software needs for containers to run, as well as ensure containers have required (and only allowed) resources available. Docker Host is a complex component, including parts like the Docker Daemon -the process actually doing the container job-, an internal image registry, etc.

Containers need images as the software to be run. Images are stored in a Docker Registry. While Docker Host contains an internal Registry -where it keeps images for containers in its scope- external registries may also be available. Those registries, public o private, allow Docker Hosts to easily locate and retrieve images. Probably the most common public Docker Registry is Docker Hub (https://hub.docker.com/), hosting both public and private registries.

Communication between Docker Hosts and Docker Registries is also standardized, via an HTTP API. If interested in this API details, description and specs can be found in Docker docs website (https://docs.docker.com/registry/spec/api/).

Docker Architecture

This three component architecture allows Docker to be easily usable, enabling containers to be run safely, almost everywhere.

Docker Containers are Linux

It is not a secret that the most complex part of this architectures is, indeed, securely running containers. And for doing so, Docker relies deeply on Linux kernel features.

Before getting deep into those features, you need to understand what “running a container” means. You should realize that, like any other piece of software, “running” means creating a process that the Operating System (Linux, here) manages and places in a processor. So, by running a container, we actually mean creating a process that performs the container behavior.

The difference between a process and a container is, mainly, isolation. Processes can communicate each other, share resources, etc. But containers are isolated. They run independently from each other, with their own resources and their own limits. And for the sake of this isolation, docker uses different strategies, depending on the resources to isolate.

The first and simplest strategy is creation of namespaces. Namespaces are like prefixes or groups of resource identifiers. All resources attached to a container will be related to the same namespace, and any container can only reach resources related to its namespace. Different kind of resources may have different namespaces²:

  • All processes in a container are grouped into the pid namespace
  • All network interfaces are grouped into the net namespace.
  • IPC resources (pipes, process shared memory, etc) share the same ipc namespace
  • Filesystem mount points in a container refer to the same mnt namespace
  • The uts namespace allow containers to isolate UTS resources, like hostname or domain.

In fact, those namespaces are not exclusive from docker or containers, but a core feature for Linux systems. If you are really interested in knowing more about Linux namespaces, take a look at Wikipedia’s page (https://en.wikipedia.org/wiki/Linux_namespaces), and review references and external links.

Another strategy (optionally) used by docker in order to avoid resources being used without authorization is SELinux. SELinux (Security Enhaced Linux) is a module, introduced in version 2.6.0 for Linux kernel,  that allow definition of mandatory resource access policies. In short, SELinux allow creating “labels” or “policies”, and restricting the actions those policies allow (white list). Those labels, once assigned to resources allow kernel to restrict how resources are used.

While SELinux and Namespaces strategies face the isolation problem, there is another resource-related issue containers should face: usage limits. Docker should be able to avoid containers using too much from limited resources -like memory, CPU time, etc-, avoiding other containers correct behavior.

The kernel feature Docker uses for resource limiting are cgroups. CGroups (or control groups) allow creation of restriction groups. Those groups define the amount of CPU or memory that can be used, and can be assigned to processes. Docker assign container processes to specially created cgroups, ensuring container will not exceed them. If a container process causes a breach of those limits, it will be OOM killed³.

By using those strategies, docker avoid “containerized” processes reach unauthorized resources, or consume too much of them. There are many other strategies (i.e. kernel capabilities restriction, or seccomp), that are nicely introduced in other documents like this (https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/container_security_guide/docker_selinux_security_policy).

Fully understanding Docker and Linux containers is not an easy task. This was just a quick introduction, and I encourage interested readers to follow links and go deeper into strategies, technologies and features that make containers an exceptional technology.

Keep learning!

Bonus:

For those really interested in investigating docker processes and resource utilization, I do recommend using systemd tools. `systemd-cgls` can show you cgroups associated to specific container processes, while `systemd-cgtop`will display resource utilization:

$ systemd-cgls
...
├─docker
│ └─50993dd8689e561467739465a5491f0cf42a671783d4ad24870278118ee88149
│   ├─2399 /usr/bin/python3 -u /sbin/my_init -- /start_java.sh -c -jar /data/app.jar server /data/app-config.yml
│   ├─2426 /usr/bin/runsvdir -P /etc/service
│   ├─2431 /bin/bash /start_java.sh -c -jar /data/app.jar server /data/app-config.yml
│   └─2443 java -XX:NativeMemoryTracking=summary ...
...

Footnotes and references:

¹ Docker Reference Architecture: Designing Scalable, Portable Docker Container Networks: https://success.docker.com/article/networking

² Details about the REST API can be found in Docker Engine API docs: https://docs.docker.com/engine/api/latest/.

³ Out Of Memory. To be true, this will only happen when the container processes ask for more memory than it is granted. Other resources, like CPU or Network may not kill the container, but simply avoid overuse.

Smoothing type functions and value functions

Maybe it is a bit late (8 month from last post!), but I re-read this post yesterday, and something came to my mind.

One of the roots of this post is depicting the similarity between value functions and type functions. While a value function defines the relation between input and output value types, a type function defines the relation between input and output type types. Smooth as silk!

But the way we express value functions and type functions don’t match that much. Let’s review the signature for defining functions in Ceylon:
– Named value function

Float addFloats(Float x, Float y) => x+y;

– Named type function:

alias Pair<Value> given Value satisfies Object => [Value,Value]; 

– Anonymous value function

(Float x, Float y) => x+y

– Anonymous type function

 <Value> given Value satisfies Object => [Value,Value] 

Looking for regularity

If the original post presents the homogeneity between type and value functions, I would like to propose homogeneity between those can be represented.

I can see two sources of irregularity:

Providing output types:

As far as I can see, “providing output type” can be understood as “providing a description for the output”.

Named value functions are the only ones that allow providing the output type (despite you can delegate it to the compiler). So I can’t really see a reason why I cant write anonymous value functions like “Float(Float x, Float y) => x+y”.

Even more, describing the output is not just providing a type, but providing the description (or the restrictions) the output must satisfy. So one should be able to ‘downcast’ the result to something like “Number(Float x, Float y) => x+y”, the same way we do when writing function headers.

For type functions, providing output type isn’t that straightforward. What is the type of the ‘type value’ on the right of the fat arrow? Digging a bit into Ceylon class hierarchy, we can see that Tuples satisfy Sequence, so in our example, we can use Sequence<Object>.

Clever reader may ask: “Why Sequence<Object>? Why not Sequence<Value>?” Running an analogy with value functions show the reason: Would you write “Sequence<x> gimmeASequenceWith(Float x)”? No! ‘x’ is a value, not a type! But you will use the restriction on the input to obtain the restriction on the output, like “Sequence<Float> gimmeASequenceWith(Float x)”.

Given that, we can rewrite the previous function definitions:
– Named value function: Allow optionally ‘downcasting’

Number addFloats(Float x, Float y) => x+y;

– Named type function: We can get rid of ‘alias’ keyword

Sequence<Object> Pair<Value> given Value satisfies Object => [Value,Value];

– Anonymous value function: Allow providing output type and optionally ‘downcasting’

Number(Float x, Float y) => x+y

– Anonymous type function: Allow providing output type

Sequence<Object><Value> given Value satisfies Object => [Value,Value] 

Note to self: This last definition clearly looks like a curried function! It can be understood as a type function that, given a type <Value> , returns another type function that, given a type <Object>, returns a Sequence type!

Locality for input restrictions:

One thing really disturbs me when writing code is locality. Definition should be close to defined, usage should be close to definition, etc… Without locality, you should go up and down the code, looking for the meta.

In value functions, definition for the parameters is just beside the parameter itself. In the value function example, definition for parameter ‘x’ is just besides it, in the form of the ‘Float’ type. Well done!

But in type functions, this is not the case. The restrictions on the input types should be provided ‘later’ with a given clause! This also means to verbosely repeating the name of the input type! What would you thing about the following value function definition?

    Number addFloats(x,y) given x satisfies Float & y satisfies Float => x+y;

The meaning is the same, but the definition look too verbose.So, what if we apply the concept of locality to type parameters in type functions?

    Sequence<Object> Pair<Object Value> => [Value,Value];

At first sight, it may seem confusing two types inside the “<>”. But when you realize that “Value” is not a type but a type value, things start to make a bit more sense. Maybe using a ‘variable-like’ name makes things clearer:

    Sequence<Object> Pair<Object X> => [X,X];

So getting back to the sample signatures, we can rewrite them again:
– Named value function:

Number addFloats(Float x, Float y) => x+y;

– Named type function:

Sequence<Object> Pair<Object Value> => [Value,Value];

– Anonymous value function:

Number(Float x, Float y) => x+y

– Anonymous type function:

Sequence<Object><Object Value> => [Value,Value] 

Summarizing:

So, if we accept all the proposed changes (output types for all and locality for input types), functions can be parsed by the following pseudo-grammar:
For value functions:

 [OutputType|'function'] FunctionName? '(' [InputType InputName]* ')' => ...

And for type functions:

 [OutputType|'alias'] FunctionName? '<' [InputType InputName]* '>' => ...

Really regular, isn’t it?

Complementary Types in Ceylon

So this is my first post ever…

Maybe you are expecting for me to drop some lines about me, why I started a blog, and so… Sorry, this is not the entry you are looking for. I’ll do that later. I have something in my mind that should go out first.

A week ago, I spent some time in JBCN Conf with some mates. Here I finally found Gaving King, the creator of Hibernate, JBoss Seam, and currently working in Ceylon Language. I have been recently getting into Ceylon Language, and I really feel it like the platonic idea Java should chase. Ceylon does not only  have a lot of interesting features, but also is backed by a powerful company (JBoss), so definitely I want it in my knowledge base.

During Gavin’s speech, while talking about the language, an idea came to my mind. I dropped the question to Gavin, and he responded. But I am not that satisfied with the response.

And that’s what this entry is about.

Union and intersection types:

For those who still does not know about Ceylon, it have a (almost) unique feature that rocket boosts his expressibility; the so called union types  and intersection types. Let’s show an example:

void printTree(Leaf|Branch root) {
    switch (root)
    case (is Leaf) { print(root.item); }
    case (is Branch) {
        printTree(root.left);
        printTree(root.right);
    }
}

In this example, printTree is a function that receive a single tree node (root), whose type is Leaf|Branch (read Leaf or Branch). This is a union type. You can interpret it as “root class inherits from or is Leaf class, or inherits from or is Branch class”. This is really useful for limiting the type of an instance at both compile time and runtime.

Alternatively, you may find intersection types. As you may guess, it is the symmetric type definition. Let’s see another example:

interface Food {
	shared formal void eat();
}
interface Drink {
	shared formal void drink();
}
class Guinness() satisfies Food&Drink {
	shared actual void drink () {}
	shared actual void eat() {}
}
void intersections()
{
	Food&Drink specialStuff = Guinness();
	specialStuff.drink();
	specialStuff.eat();
}

Here, you can see the ‘specialStuff’ Guinnes satisfies both interfaces: Food and Drink. Again, we can rewrite the type of specialStuff to “any class that inherits from or is Food, and inherits from or is Drink”.

Complementary types:

Ok, now we have introduced union and intersection types. Lets go for the question dropped at the JBCNConf:

As Gavin agreed, the type system for Ceylon is based on set theory. As I recall from my student years, there where three main operations to be applied onto sets: union, intersection and complementary.

Then, if Ceylon type system is based in set theory, and it have union types and intersection types… why not having complementary types?

Let’s present complementary types with the same example as I try in JBCNConf: coalesce function.

{Element&Object*} coalesce({Element*} elements) => { for (e in elements) if (exists e) e };

In mortal’s language, coalesce’s contract (ignoring implementation) talks about a function that gets a sequence of Element(s) (having no constraint on their type), and return a sequence of Element&Object. In Ceylon, null is the only instance for Null class, and this class is NOT inheriting from Object class, so Element&Object type does not allow nulls. So we have a contract for a function accepting a sequence of objects of any type, and returns a sequence of objects of a type not accepting nulls.

Top of Ceylon’s type system. Thanks to Renato Athaydes (http://renatoathaydes.github.io/)

This sounds good… but this only works if EVERY possible object is either and Object or a Null. Taking a glimpse of Ceylon’s type hierarchy, we can see that, by default, this is true (Object and Null are the only children for Anything, the root of the type hierarchy). But, as far as I know, there is nothing forbidding creating a new class, and making it inherit directly from the Anything class! In this case, any instance in the input sequence for that new class (in other words, any instance not extending Object class) will also be removed from the list. That’s not the expected behaviour.

But having the complementary types (let’s note it with the ‘bang’ prefix), we can rewrite the coalesce function as follows:

{Element&!Null} coalesce({Element*} elements) => { for (e in elements) if (exists e) e };

This naturally describes the expected behaviour: getting a sequence of Elements of any type, and returning a sequence of those same element, but only retaining the ones whose class is outside Null class hierarchy (say retaining those that are not null).

This is not only more descriptive about the contract for coalesce, but also avoid the ‘unknown hierarchy’ problem: You don’t need to know the whole effective type hierarchy to define the contract, you only need to know what you want, and what you don’t want for the contract.

What is this good for?

We have seen a single use for the complementary type. But a single case is not worth the effort for adding this feature to the language.

We can find any other situations where complementary type may be useful for defining function contract. Let’s the community to find them. But where I find it really useful is not for the developer, but for the type checker.

There is one construction widely used in Ceylon (and many other languajes):type casting. Taking the example directly from Ceylon’s language documentation:

void switchOnEnumTypes(Foo|Bar|Baz var) {
    switch(var)
     // Type for var is Foo|Bar|Baz
    case (is Foo) {
        // Type for var is Foo
        print("FOO");
    }
    case (is Bar) {
        // Type for var is Bar
        print("BAR");
    }
    case (is Baz) {
        // Type for far is Baz
        print("BAZ");
    }
}

This is pretty straightforward when FOO, BAR and BAZ classes are disjoint. But if FOO, BAR and BAZ are interfaces, things may become complicated (mixing inheritance is good for many things, but can make other hard).

But introducing the concept of complementary type, we can rewrite the type-checker thoughts as for the following:

void switchOnEnumTypes(Foo|Bar|Baz var) {
    switch(var)
     // Type for var is Foo|Bar|Baz
    case (is Foo) {
        // Type for var is Foo
        print("FOO");
    }
    // Type for var is Foo|Bar|Baz & !Foo -> Bar|Baz
    case (is Bar) {
        // Type for var is Bar
        print("BAR");
    }
    // Type for var is Bar|Baz & !Bar -> Baz
    case (is Baz) {
        // Type for far is Baz
        print("BAZ");
    }
}

Other constructions may get some use about the complementary types. First coming to my mind, the if(is… ) else construction, but probably anything having a type check, and an else at the end.

Being evil

I will not say complementary types are innocuous. As Ceylon is very regular language, and types are always real, one may write the following:

!Integer notAnInt= ... ;
print(notAnInt.hash); // This should raise a compiler error
!Integer&Object  neitherAnInt = ... ;
print(neitherAnInt.hash); // This should be ok. 

The type of notAnInt and neitherAnInt are clearly defined. But having such an fuzzy variable definitions may drive to some edge cases that should be studied in detail. Most of them can probably be solved having an strict attachment to set theory, but I won’t be that bold to say all of them can be solved.

Open discussion

My idea with this post is not saying that Ceylon is incomplete without complementary types. Not even saying that they should be implemented in the type system.

The idea is to open a discussion about them. Are complementary types useful? Do they improve language expressiveness? Will trying to push them into language create non-decidable cases?

Let the community decide (and maybe language leaders have something to say).