Versioning is the act of assigning a label to a software package (whatever from a single file to a complex application). You surely are used to maven version scheme, and maybe heard about Semantic Versioning (SemVer for friends). But anything labelling software units in order can be understood as versioning: Timestamps, Commit id trees, or just numbers or strings.
One may question the need for versioning. If you only need the latest “state” for your product, you are right, you don’t need versioning at all. Sadly, you will be missing a lot of interesting capabilities:
- History of changes until the current state was reached
- Ability to roll back to a previous state
- Having multiple states simultaneously for the same unit
Sometimes those capabilities are mandatory, hence versioning is a need.
You probably realised I used several words to refer the same idea: unit, package, component, product… Those all mean the same here: something resulting from a development process, that can be stored as one (or many) files. In fact, when many files are joined into a single file, the result is named a ‘package’, and this is the most common name you will find here.
The fact of assigning a version to a package is named, indeed, versioning. And the union of a package and a version is called ‘artifact’.
Once we have all names clear, I can drop here what for me is one of the most important and complicated questions every software developer should have in mind: how many (and what) artifacts will compose my application?
Let me try to describe the most common cases:
The simplest case is when the application is just a package of code. Maybe is compiled code, or maybe is plain code to be interpreted. No matter what language or paradigm, the simplest case will only need a simple version. And every time code is changed, and those changes have mass enough, a new version is generated.
A version here may be a snapshot of a code repository, a zipped package, a commit id… whatever status holder that can be identified and retrieved.
But most of the time just code is not enough. Code usually depend on third-party libraries (more code). And guess how these dependencies (artifacts) are identified? Versions. So now we have not a single version (for the code), but a collection of versions (code plus dependencies).
Identifying our application with a list of versions is, at least, unmanageable. So all underlying versions are grouped into a single package, that will be itself versioned. A single artifact to rule them all.
Probably, our application should run in different places, with slightly different behaviour: localized text resources, different database credentials, special logging format… options are limitless. So we use to define the so-called configuration, somehow consumed by the code, that parameterize those behavioural choices.
Having a single configuration item is pretty common. Just specify the current configuration, and if anything changes, the configuration will adapt. See where are we going? Configuration evolves, and versioning of configuration artifact records this evolution.
So we have the code version, dependencies versions, and now a single configuration version… Single? In fact, is not uncommon have multiple configuration artifacts. One example I faced recently was splitting configuration between “public” and “secret”. Public configuration can be seen by developers or users. But secrets can only be seen by the application itself, and privileged people with special roles. Those two configuration elements had a completely separated life-cycle, separated artifacts, and, hence, separated versions.
*Despite “Needs” relation is shown as a horizontal left-to-right arrow, it does not mean that all artifacts on the left need all artifacts on the right. The arrow only intents to show a no-back relation: Code artifacts may depend on Dependencies and/or Configuration, as well as Dependencies may depend on Configuration. But Dependencies will never need Code, and Configuration will never need Dependencies nor Code.
Our list of versions keeps growing: a code version, many dependencies versions, and many configurations!
One requirement for creating quality software elements is testing them before (or during) releasing. Test suites are common, if not mandatory, in almost every scenario.
Tests suites can be seen as an independent component, where the tested component is just another dependency. This dependency may be strictly internal (both test and tested are in the same package) or external, making the test component a package separated from the tested component.
When this is the case (a test package), and tested component evolves, the test component should also evolve. And, as we already saw, an evolving package should be managed by a versioning scheme.
A test configuration package may also exist. It may allow to run tests in different conditions, and allow tests to adapt to changing tested components with ease. To be true, I never saw this test configuration packages, nor ever found a situation where they can be of profit. But worth having them in mind.
And then, containers
So now, we have up to five different artifact types (each one with their own version or versions): code, dependencies, configurations, tests and test configurations. But we still have to introduce containers (after all, that is what this post is about).
The moment you start working on containers, you start reading and hearing about something called “Infrastructure as Code” (IaC). In a few words, you should provide, on one side, a description about the infrastructure your code will be executed onto; and in the other side, the instructions to place your code into this infrastructure and make it run. Those descriptions and instructions are provided in a machine-readable format (code files).
Again, we can find that IaC, should be stored, shared, and changes should be tracked… You know where I am heading to: An IaC artifact is added to the list!
Finally, one last twist: can you imagine a situation where you could have slightly different infrastructures for the same application? There are many, but an example is having different environments for different stages for the software development life-cycle. You may have a development environment, with limited resources, but easily manageable by developers; A test environment, where resources are monitored and automated test suites are executed; and a production environment, with plenty of resources, and strict security constraints. We already saw this concept of having something that allows the execution (deployment, in IaC case) to behave differently based on some decisions: it is called Configuration.
So, with the introduction of the IaC configuration artifact, we have completed the list of artifacts an application may be composed of: code, dependencies, configurations, tests, test configurations, IaC and IaC configurations. And that’s a lot!
When you orchestrate the deployment of an application via a complex fully-featured automated system, it is not a big deal having as many artifacts as needed. But that’s not usually the case, even more at the beginning of a new project or software development.
Sometimes, instead of managing five or seven different versions, it is easier to reduce the number of versions to be handled or even reduce the number of artifacts. This simplification can be done by two approaches:
- Packaging one instrument inside another, creating a single multi-purpose artifact, with a single version. This is useful when both components evolve in synchrony, and hence their versions are almost the same.
- Versioning many elements at the same time, even if some of them didn’t change. This allows identifying several related artifacts with a single version id, facilitating their location and combination.
Indeed, those techniques can be applied to different sets of artifacts, or can be alternated depending on needs or technical debt. We can describe several scenarios, depending on how we combine artifacts:
The simplest case we will probably find is mixing all components in a single artifact. Code, configurations, test, etc… The only version that really matters is the Code version, that becomes a de-facto only version.
This approach should only be taken in software elements not relying on IaC, or when most elements are simple and stable, and the variable elements tend to vary simultaneously. We should avoid generating new versions for changes in only one component.
Application Version Descriptor
On the other side of the spectrum, we have the case when all components evolve freely.
In this case, we need a new software element, that somehow lists all the versions for all the artefacts needed for the software to run. This component may have many names: Build Configuration, Dependency Management… But all of them tend to be focused on one of the artifacts (code, IaC…). So I prefer to use the name Application Version Descriptor (AVP).
The AVP should be somehow readable by an automatic process so we can start thinking about automating the combination for the different artifacts.
Indeed, extremes tend not to be the best solution in most of the cases. It is usually better to pick the best of both worlds and create a hybrid approach, based on both our needs and our capacities. My experiences have shown me that the following mixtures tend to be useful:
- Packaging Code and Test components into the same artifact, as they tend to evolve simultaneously.
- Also packaging together Configuration and Secrets, but only if there are no access limitation onto the latest for the development team.
- IaC and IaC configuration artifacts (if any) also tend to change together, so it is worth packaging them together into the same artifact.
At the beginning of any project, many artifact components seem superfluous. But as the software gets more and more complicated, splitting components (and hence problems) into smaller pieces is really helpful to face problems, mainly in automated fashions.
Try to keep in mind the scope of evolution on each of the components. This will help to identify the artifacts, isolate them, and create better integration flows.
In next post, we will see how we can use this clear artifact structure to build the foundations for a Continuous Integration / Continuous Deployment pipeline, and how we can create an artifact hierarchy that supports that pipeline.
Hope you enjoyed the reading. Don’t hesitate dropping questions.
Keep on learning!