Live to merge, merge to live...
This blog post was initially published back in 2008 at DDJ but since DDJ "Guru blogs" moved to the new location some images have been broken so I've decided to publish it again here.
Live to merge, merge to live...
As a professional programmer you’re familiar with a variety of programming languages, you know by heart the basics and the not so basics of data structures and algorithms. You are an expert working in your favorite IDE. You master software patterns and you’re aware of the newest trends in agile methods. But there’s a useful tool in the programmer’s toolbox which is normally more feared than used: the merge tool! This article will explain, step by step, the very basics of merging and will explore the different merge types, their uses and advantages.
Fear of merging
So, you write code, don’t you? And there’s a big chance that you don’t develop code in isolation, right? The code you write for your projects is normally scattered across a number of files. And, according to the Pareto Principle, 20% of the files in your project will receive 80% of the changes. You can try to trace a bad design smell here, you can try to refactor your code from top to bottom day and night, but, unfortunately, that’s just reality: if you and your team work on a project, there’s a huge chance you’ll end up editing the same files at the same time.
For a number of projects out there this is a big problem. I’ve found a number of project managers and software designers trying to avoid concurrent modification wrestling their project plans and software designs. Wouldn't it be better if they were putting such efforts into making better software and finishing it on time?
But, there’s an ancestral fear behind this behavior: the arcane fear of merging. “Hey, if you and me modify the same file... we’ll have to reconcile all our changes!!”. And of course, they assume it will be a painful and error prone process. “So, let's schedule our changes so that only one of us touches the file at the same time”. Ouch!
Long ago software development was about lone eagles working alone. Now software development is about new languages, new tools, but at the end of the day it’s all about collaboration. And, avoiding collaboration doesn’t look like the smartest way of getting the best out of a team.
Then, why are we initially so scared of merging our file changes? I find two key reasons:
Merging explained: automated conflicts
Do you know how a merge tool works? Let’s take a look at a very simple example. I won’t dive into the obscure algorithm details but just make a 1000ft flyby.
Suppose we have a piece of code like the one at the next figure. Then you and I start making changes on the file at the same time. I make a couple of changes at the beginning of the file, and you add a new method below.
Merging our changes manually is possible; it is a small file so it will just take a few minutes to do. It will require both of us to carefully look at our changes, but we’ll make it.
Of course the picture changes if we’ve changed not one but 15 files in total, and up to 8 have been modified in parallel. Also, what would happen if, instead of only the two of us, 5 other people were working on the code at the same time? Yes, the process is doable, but it is time consuming, error prone and... boring!! I bet you have better things to do than manually combine files.
But, what would a merge tool be able to do in our previous example?
The tool will find an automatic conflict. Look carefully, our changes don’t collide, so what we would do manually would be just copy and paste my changes on the right part of your file or vice versa. It is very simple, but doing manually is error prone. This is exactly what a merge tool will do: just put the two set of changes together, with no possible collision or conflict.
Normally, during a merge, the tool won’t even bother asking you to look into such a conflict; it is so simple it can solve it by itself. Of course, almost all the tools out in the market will allow you to set a mode in which all conflicts are reviewed by the user. It will just propose the changes, but you will be the one actually making the decision. Do you feel safer now? Ok, I bet after a week of manually reviewing trivial conflicts you’ll switch to automated mode.
The next figure shows the results of the first merge and how a merge tool will combine the changes together to create the result (remember to click on the images to make them bigger).
You’re in trouble: manual conflicts
But, a developer’s life can be exciting and full of challenges, but it is not easy. So, eventually you’ll face a situation like the one depicted by the next figure:
Yes, know we’ve modified exactly the same code in one of our changes on the file, which makes things much more complicated.
Now you can say “the tool can’t know the right solution!”.
And you’re right. But, as I told you, the merge tool is not a wizard’s device; it is just a programmer’s tool. So, use it correctly and it will make your life much easier.
The 4th figure shows a merge tool in action letting you decide what to do with your manual merge conflict.
Under these circumstances the tool will always prompt the user. It will still save you precious time because it directly focuses you on the problem, but you’ll have to make the decision yourself.
So, the merge tool will help you with automated conflicts not even asking you if you don’t want to (and honestly, it’s the right choice) and will ask you for help whenever it finds a manual conflict, which is basically a code fragment with changes made by two developers at the same time.
The rule of thumb is very easy and will help you trust the tool because there’s no complex code analysis behind it. It just looks into the lines of code: if only one contributor changed the fragment, it is an automatic conflict, otherwise, it is not trivial and the tool will ask.
2-way and 3-way merging
What’s all this fuss about 2-way and 3-way merge tools? What are they all about? Ok, that’s what I’ll be explaining in the next few paragraphs. It basically depends on the number of file versions you consider for your merge operations.
So far, what you’ve seen is a 3-way merge in action:
The result file is the one created after the changes are combined, the one at the bottom of the previous Figure (some tools prefer to hide the base file and just show the result one).
I didn’t explain 2-way merge yet, but it’s not very complicated: it doesn’t consider the base file (also known as the common ancestor) for the merge.
Is it better? Simply put: no, it isn’t. 3-way merge knows what you’ve added or removed to a file while 2-way merge can’t because it doesn’t know how the file was at the beginning.
But, still, I’ve found developers who seem to be more used to 2-way merge tools. Let’s try to figure out why.
Let’s go back to the original Java file, make a couple of very simple changes, and try to merge them with a two-way merge tool. Check the results on the next image.
Do you see the problem? Basically, at each difference the 2-way merge tool won’t be able to decide whether it is modified or removed code, so it will always have to ask you!
This may be good for the paranoid but, believe me, if you have to manage a good number of merges, you’ll end up wasting your time.
The same two conflicts would be automatically solved by a 3-way merge tool.
Of course, there’s a remark here: you can only use 3-way merge with a version control tool handling your code. Otherwise you won’t have access to the base file unless you have a very good memory or a crazy naming convention to keep your old files...
Merge tracking, what’s in it for me?
If you’ve never heard of merge tracking... well, welcome to a whole new world. You’ve probably heard about it after Subversion 1.5 had been released. It has finally introduced merge tracking. It still has some caveats but it’s evolving in the right direction. Many other systems out there have had merge tracking since long ago and most had it as part of the core product since their inceptions.
Anyway, what does it mean?
Merge tracking is deeply related to version control tools. You can run a file merge in isolation, but with merge tracking you rapidly enter the field of SCM (whether you want to translate it as Software Configuration Management or Source Code Management is just your choice).
Merge tracking is also deeply related to branching, but I’ll try to postpone the topic as much as possible.
Have a look at the next image. It represents the merge we’ve been running in the previous examples. We have the original file and then your changes and mine drawn as some sort of tree or graph.
After I merge your changes with mine, a merge link is created telling the system I’ve merged your changes with mine. Also, I would like to highlight that during this change we modified exactly the same lines of code, so it will be a manual merge... Remember it because I’ll use it below.
What’s the benefit? First: you know what you’ve done since your version control system takes care of this information. If you don’t have it, it’ll be harder to figure out what happened.
But, let’s make another set of changes with our sample files. You can check how our tree looks like after the changes at the next figure.
You make a new modification and I make another one, and once the two of us are finished, I decide to merge your changes back with mine again.
How does merge tracking help here? First of all, you remember I mentioned above our first merge was not automatic, don’t you? So, what’s the benefit of merge tracking? It will just merge the changes after the last merge happened, and you won’t have to solve the same manual conflict again. It greatly simplifies merging because it will let you focus on what’s new and you won’t have to merge all the old stuff.
How does the merge tool know what to merge? In any three way merge you’ll need a base (or original file) and two contributors. In the sample highlighted at the previous figure the version control tool, with the help of merge tracking, will first try to find what’s the base file for the two changes we have (two revisions after all).
And how does the system locate it? It will use your tree of versions and try to locate the closest parent revisions of the ones you’re trying to merge. The algorithm is known as nearest common ancestor and it is about finding the closest parent of a couple of nodes on a directed graph.
In our sample the next figure shows the base or common ancestor for the two revisions we’re trying to merge.
Look carefully, if the merge link wasn’t there, the parent would be the original revision, and then the merge tool would have to ask you again about the previously solved conflict (the code would be the same at the two revisions but different from the base).
The merge arrow solves the problem allowing the underlying system to correctly identify which one is the new base for the merge.
Branching
I don’t know whether you realized or not but... we’ve been using branching!
Look back at Figure 7 (two figures back). There’s a set of changes named your set of changes and a set of changes named my set of changes, right?
Well, they’re actually two different branches which is nothing more than a couple of sets of revisions.
They allow you to have parallel sets of changes, which is great when you’re doing development.
It can’t be easier.
I feel like a myth buster today, but as you can see, branching, one of the concerns for a number of developers, is not an issue at all.
Unfortunately branch management is a nightmare with some old fashioned version control tools (think about CVS or SourceSafe, for instance, and even SVN until merge tracking becomes mainstream and stable), and that’s the reason behind all this fear...
Wrapping up...
So far we’ve introduced all the basics (and not so basics) of merging. As you have noticed it is not a difficult task at all once you correctly understand the steps and contributors involved.
Merging is one of the daily tools of a professional developer, but still unknown for a wide amount of users. Mastering the process will make them more productive and will allow projects to evolve faster.
Does Plastic SCM support speculative merging? and what I mean by this, is a preview of merging. That is checking whether a branch would merge cleanly/automagically with another branch/trunk without actually completing the process and consequences of merging.
ReplyDelete@Gregory: sure, from the CLI you can run cm merge and it will tell you the result, and from the GUI you always have a preview before merging.
ReplyDeleteThen you can also do cm merge --merge to actually force the merge...
Let me know if it helps.
@pablo: yes that answers the question. very cool. In our SCM+CI system, we are still constantly running into integration merge conflicts because our CI pipeline gets very deep sometimes. 20-30 turnins. It takes half a day for one integration which is typically 8 wide. So you can image it takes a while for a turnin to make into trunk. I am thinking of a method of performing speculatively merging against our pipeline before sending to our CI system.
ReplyDelete