Who we are

We are the developers of Plastic SCM, a full version control stack (not a Git variant). We work on the strongest branching and merging you can find, and a core that doesn't cringe with huge binaries and repos. We also develop the GUIs, mergetools and everything needed to give you the full version control stack.

If you want to give it a try, download it from here.

We also code SemanticMerge, and the gmaster Git client.

Integration strategies

Wednesday, April 02, 2008 Pablo Santos 1 Comments

Previously we were discussing about the future of continuous integration, according to Duvall’s award-winning book and possible alternatives.

Today I’ll be focusing precisely on this topic: different alternatives to handle the integration phase.

Some will prefer to stick with the main-line development style, while others will gravitate through a more controlled (and maybe less agile if we go religious) approach.

Beyond the selected branching strategy, there will be also an integration strategy.

Let’s start with main-line development. This is probably the most well-known and spread technique in version control. How does it work? Simple, all check-ins go to the main branch, as you can see in the following figure:

Of course it has its own advantages and disadvantages.

Main-line pros

  • It is simple to set up and easy to use.
  • Every version control system out there is able to handle it (at least main-line is supported by everyone).

Main-line cons

  • Traditional use leads to continuous project instability.
  • People are updated to the latest version after each commit, so they can be easily infected by the “shooting a moving target” disease.
  • To prevent the previous problems developers are enforced to check-in only when their code is fully tested. It can potentially lead to code being outside the version control for long periods. Besides developers start using the version control only as a deliver mechanism, it’s not a tool for development anymore, they can’t check-in every five minutes just to create check-points unless they’re totally sure the code won’t break the build... Then you loose the ability to use the version control to know why your code was working before the last minor change you made...

Continuous Integration standard practices try to solve all the above problems with two principles:

  • Commit as often as possible, even several times a day, (please note: several times a day is less than you’d be checking-in code if you had your own branch to submit even intermediate non-working code, just to help you during development).
  • Always make sure you don’t break the build. You’ll need a strong test-suite to really help you checking your changes plus the previously submitted ones together don’t break anything. If you follow test driven development, then you’re in a good position to achieve this goal.

What you get in return is a project that evolves very fast, which is great. The problem I usually found is that teams have problems preventing the build to be broken, and they normally prefer to trade fast evolution for stability. This is not always true, of course.

Some alternatives

The alternative I’ll be talking about is the branch per task pattern. Some aliases are: Activity Branching, Task Branching, Side Branching or Transient Branching. It is probably my favorite pattern because of its flexibility, ease of use and its associated benefits. It also sits the basis for new trends in version control like stream management.

How does branch per task looks like? Take a look at the following figure:

Note you have a main-line, which has been labeled as “BL00” and then there’s a task branch there named task001.

There are three important considerations here already:

  • Each task starts from a well-known baseline: you remove the “shooting a moving target” problem from day one.
  • The mainline will only receive stable changes. So you really enforce the “never break the build principle”.
  • You create an explicit link between tasks in you favorite project management system (think about Jira, DevTrack, VersionOne, OnTime, Mantis, Bugzilla or just your internal mechanism!). There’s something really important here: a developer works on a branch, which he knows is related to a given task, because it is named after the task!. So developers are totally aware of the exact planning item they’re working on. Which means both project managers (or scrum masters or whoever you’ve to report to) and developers finally speak the same language!

In fact you’re getting rid of the main-line style problems. The drawback here is that it is a bit more complex to understand (just a little bit!) and not every version control out there supports it (that’s why we developed Plastic!).

Rapidly you’ll be creating more branches to implement more changes, as you can see in the next image:

But the point here is not only when or how you create your branches, but when, how and who integrates them back into the main-line.

I’ll be talking about two different approaches.

Running mini-bigbangs

You’ve decided to go for branch per task and then you have your colleagues creating branches for several days. When you should integrate them all back into the main branch?

I’d normally say: no longer than a week. In fact if you let time between integration span longer than a week you’ll be normally hitting one of the biggest version control problems: big-bang integration!

Big bang integration is a very well-known and documented problem, one of the “roots of all evil” in software development, but it is still out there waiting for new victims.

How does it work? You’re working on a, let’s say, 6 months project. Then you plan 2 milestones and split your team in sub-teams. They work on separate features and they’ll be integrating their work together one week before each milestone. Sounds familiar? Well, I hope it doesn’t because it is a great recipe for disaster!

If you follow this approach instead of a week, probably your first “integration” will last much longer, and needless to say the second one won’t be better... You’ll get an amount of software which never worked together before, and you need to be sure it works... in a week! Crazy!

That’s why it is such a good idea to reduce time between integrations. I’d say integration frequency has to be inversely proportional to the amount of work your team can perform. I mean, if you’re running an small 5 developers group, maybe it is ok if you run an integration a week, but as soon as you get bigger, maybe you’ve to run them more frequently, even more than once a day!

The whole point here is avoiding integration problems. It is exactly the same rule introduced by continuous integration: if something is error prone... do it often to reduce the risk!

And remember the integration problem shouldn’t be actually merging the files and directories. If it is: switch to another version control!. The problem is that even when code compiles correctly, it can break a lot of tests or just hide an unexpected number of critical bugs. As I mentioned, the problem shouldn’t be merging. We were working with a company once which was running weekly integrations with CVS. They needed several hours to just merge the code together.

Then they switched to plastic and they’re using the “spare” time to run a whole test suite. That’s the point of integration, being sure your code works as expected, not being worried about how to merge it.

Being that said let’s take a look at how a short merge iteration looks like:

I named them “mini-bigbangs” because they’re actually big bang integrations: you take a number of separate developed code changes and merge them back together. The key is that they’re short enough so that the real “big-bang” diseases don’t show up.

Why then run this approach still, why not directly going to pure continuous integration on the main-line? Well, having your own branch for each task you develop still sounds as a very good idea, it gives you a great place to create changes, prevents mainline corruption and all the other advantages of branch per task.

You’ll then continue working and your development will look like the following figure:

Until you run again another integration and:

Is it now clearer? Remember you test suite is also a cornerstone for this integration approach. You must enforce a subset of the whole test suite to be run upon task completion (you can set up a build server to do that, polling for new finished tasks, downloading, compiling and unit testing them once they’re marked as finished).

There are some warning lights to watch here: I’ve been running this kind of integration very successfully on different projects, but it can come a time when, for some reason, integrations become a real pain. In my case it actually happened because of the test-suite: it grew too large that checking each task upon integration (something you have to do!) took too long. Then integration started to grow longer and longer, and they became a real pain, as I mentioned before.

Please note that the real problem isn’t at the version control field but at the testing ground: maybe the tests were so fragile or took too long, and they have to be somehow fixed. But anyway, let’s try to figure out some alternatives.

The first one was running staggered integrations: the developer running the “mini-big-bang” decided to group tasks together, integrate them into different intermediate integration branches, test them in parallel, and then merge them all together into the main branch.

Remember the whole point here is not speed up merging, which is already very fast, but be able to run the biggest number of tests in parallel while merging branches.

Merging branch per task and CI

If you’ve a situation with a very large test suite and you need to prevent integrations to last too long, or you want to avoid the integrator’s role (in the purest agile approach) or you go after an even faster release cycle, you can combine the branch per task technique with the continuous integration approaches.

With this approach each developer merges his own branch back into the main-line. Remember he must first (as it’s shown on the second branch) merge down from the main branch to take the latest changes, then run the tests (using a CI server, for instance) and only when they all pass merge the changes up (it will be just a “copy merge”) to main.

Then a build can be triggered on the main branch, even a long big one, and it everything goes right, a new release can be created.

Continuous integration with build branches

There are other options available. Suppose each developer merges back into a specific “build branch” like the following figure shows:

The build branch can be used to run integrations during a work-day, and then over night an automated nightly build process can run. If everything works as expected, the build branch with all the changes made during the last day can be automatically merged back into the main branch, a new release created and used as baseline for development the next day.

Wrapping up

As you can see there are several alternatives when running integrations, from the simplest to the most sophisticated ones. I’d always follow the KISS principle, and try to keep things as simple as possible. Of course this is not always the case, and then you’ve to figure out which is the best alternative for your team.

Do you want to try yourself?

If you want to try it yourself... why don't you download plastic here? There's also a Linux version available here.

Pablo Santos
I'm the CTO and Founder at Códice.
I've been leading Plastic SCM since 2005. My passion is helping teams work better through version control.
I had the opportunity to see teams from many different industries at work while I helped them improving their version control practices.
I really enjoy teaching (I've been a University professor for 6+ years) and sharing my experience in talks and articles.
And I love simple code. You can reach me at @psluaces.

1 comment:

  1. Hi Pablo! Your link to my 1998 branching strategies/patterns paper is broken. The correct URL is http://www.bradapp.net/acme/branching/

    ReplyDelete