Linus on branching...
A few months ago, Linus Torvalds shared some interesting thoughts and concerns regarding the Git branching patterns being used in Kernel development.
Since learning what Torvalds has to say is always enlightening, I wanted to delve into the points he mentioned, because they align pretty closely with the techniques we recommend with Plastic SCM. Obviously, the points apply to Git, Plastic SCM, and any other SCM with good branching support, too.
Linus’ words
This is what Linus wrote (I only extracted a fragment, for the complete text, go to http://lkml.org/lkml/2010/9/28/362).
The real problem is that maintainers often pick random - and not at all stable - points for their development to begin with. They just pick some random "this is where Linus -git tree is today", and do their development on top of that. THAT is the problem - they are unaware that there's some nasty bug in that version.
Shooting a moving target
What is Linus talking about? Actually, it’s one of the very well-known issues with trunk development that I described here in the section titled “don’t shoot moving targets!”.
This problem hits “mainline/trunk pattern” followers hard, since they keep updating to “what’s more recent”, probably due to fear of merging. Continuous integration delays the problem, but doesn’t solve it. But this problem can also hit those following a “feature branch” pattern, unless they follow all the rules.
The problem they’re facing is shown in the following picture:
The kernel team uses branches, but what if they’re using the “master” as an integration branch and they’re branching off commits that are not tagged? (As the picture shows).
The problem is that the master branch can be used as an integration point and hence get commits that leave the branch in an intermediate, unstable state. (This is especially true when using fast forward merges on Git, which I don’t really like precisely due to this issue. But I guess Linus must be doing real merges because he DOES know how to use Git. :P) Then you branch from an unstable commit and… you end up in trouble!
Baselines are key
We had exactly the same issue with some teams using Plastic because they failed to understand the importance of baselines. Once your code is stable, tag it, label it, and create branches ONLY from this well-known point!
That’s basically the rule of thumb: create baselines frequently (as many as you can) which of course must be fully tested. (That’s the time-consuming part, since integration is pretty fast nowadays with modern SCMs) and then CREATE BRANCHES ONLY FROM STABLE BASELINES, as the figure shows:
This way if something fails on your task branch (feature branch) you know... it’s your fault!!, because all tests were fine on the baseline -- not just the fast integration ones (like the ones you can run on checkin with continuous integration tools) but also the slow ones that you use to validate a release.
Very easy rule of thumb, great savings!
Or you do what we do: keep trunk pristine - no development should EVER occur in your trunk and it should always be possible to push a stable release from it.
ReplyDeleteYes, it takes discipline, but IMHO it's easier to maintain than getting people to all branch from the same revision.
The problem seems to be caused by not documenting the assumptions about the branches carefully enough (as in whether a branch's commits can be broken or not). I guess nothing prevents you from having a breakable master as long as that is documented somewhere. Unless there is some clearly labeled "production" branch most people will probably assume that the master should be in working condition at all times. So either, a) Linus doesn't use branches correctly, b) the kernel developers haven't documented the fact that the master may be broken or c) none of the above, ie something related to maintaining a project of the Linux Kernel's size which I personally don't have much practical experience of.
ReplyDeleteInsightful post. Will keep this in mind.
ReplyDeleteAn aside: what did you use to create your illustrations?
@Phil: I do agree, the problem is that sometimes is good to have intermediate checkins when you're merging branches back to trunk. Starting from a tagged cset or label is a good practice and it doesn't take that much to get used to it.
ReplyDelete@Anonymous: believe it or not... I just use Visio! :P
ReplyDeletemaster <- next <- feature
ReplyDeleteAssume master is always fully tested and 'stable', branch 'feature' from there, and integrate on 'next'. That's an interesting approach I've seen even in the linux kernel itself.
If you have an automated test suite, you can have it tag the mainline every time it passes; then you can keep stable from getting too far out-of-date from the head, which is the biggest downside to this scheme.
ReplyDeleteThe branch development works fine, but the integration is the part that bites. A follow up article on the best practices for integration would be a good follow-up to this article.
ReplyDeleteThis is actually exactly how development of OpenOffice.org works (with one additional step): http://wiki.services.openoffice.org/wiki/Mercurial/Cws
ReplyDelete"child workspace"/cws is just a feature branch. Once a feature branch is ready it gets merged into the "DEV300_next" branch. After a set of feature branches have been merged and survived basic testing, "DEV300_next" gets tagged as a stable milestone and is pushed to "DEV300" (the master). As:
- no development is done on the master, - branches are only based on the master - and the push to the master is an atomic transaction
the master is reasonably save.
There is just one exception to the "no development on the master" rule and that is buildbreakers (might happen when those where triggered by raceconditions). If something like that sneaks into the master and there is a trivial patch for it the changeset gets pushed to _both_ DEV300_next and DEV300.
@Pablo, the special version that adds gratuitous exclamation marks to the labels.
ReplyDelete@Anonymous: yes, this one. I had to pay extra $300 to get the auto-exclamation feature! :P
ReplyDeleteThe advice here is worth thoughtful consideration. But it really oversimplifies reality -- for many shops -- by neglecting the fact that even those branches from a common stable point all need to be merged back into trunk at some later time. And they're not all going to be merged at the same time. Someone has to to be first, and the longer-lived branches are faced with a choice: whether and when to merge from trunk to get other contributions. As soon as they do that once, the idea of passing around simple, obvious-to-inspection diffs is seen to be a fallacy. Merging from a stable revision is always a good idea, but not necessarily for the reasons claimed here.
ReplyDeleteI *do* have experience with several codebases comparable in size to the linux kernel, with about 40 genuinely active (and a few hyper-active) developers. Always merge from a known good label. But don;t expect that to make life easy. There is no substitute for thinking about every single branch and merge. Good tools and good practices just make it easier to do the right thing. But everyone has to be thinking, all the time.
ReplyDelete@Anonymous: "whether and when to merge from trunk to get other contributions" True, but in my experience branches tend to be much more independent than we initially think, so merging between branches happen but not that often.
ReplyDeleteThat's why I like pure branch per task so much, because branches are so short lived that you almost never branch between task branches.
And even if you do, ok, then of course you can be hit by others bugs, but you still have the integration for review and you are less concerned about starting from a broken point even under these circumstances.
@Anonymous: "I *do* have experience with..."
ReplyDeleteAbsolutely. Tools are just... tools. Then you still need to use them correctly.
What we try to do with plastic is to offer the most powerful possible SCM... but still, if you don't do things in a meaningful way... you're toasted!
Thanks very much for this post. It well explains a simple solution to a fundamental problem.
ReplyDeleteThe link to the original email from Linus is broken. Here's the Internet Archive's version:
ReplyDeletehttps://web.archive.org/web/20101203061221/http://lkml.org/lkml/2010/9/28/362