How we do trunk based development with Plastic SCM

Monday, April 03, 2017 1 Comments

For years, we praised the benefits of branch per task: a new branch for each task in your issue tracker, lasting only a maximum of 2 days (and better just a few hours), and one single developer per branch. This is how we used it and we really preferred this to mainline, even despite of Continuous Integration (CI).

But we started singing that song more than 10 years ago, and many things have changed since. What's new today? Does branch per task still work? Does it blend at all with new practices such as DevOps and Trunk Based Development?

These are the questions we are going to answer in a series of blogposts, during which we explain how we use Plastic ourselves in "eat your own dog food" fashion.

Enter the DevOps buzz

Paul Hammant (of trunk-based development fame) shared a new title with me a few months ago: "The DevOps Handbook". I loved it. It is short enough, intense enough, and I loved the edition. I strongly recommend you to grab one and read it.

DevOps is all about changing the minds of both developers and IT. It is about seeing the whole picture. It is, again, about the definition of done. Done means deployed to production (and eventually released to customers), period.

It is important because otherwise we developers tend to think we are done when the task is marked as finished in the issue tracker, or maybe when it is merged, but we forget about actually putting it in the hands of the users, which is the only real motivation for the change.

IT folks tend to prioritize stability over change speed (and stability is good), which means changes marked as "done" don't actually meet users for a while... which is not good.

Hence, DevOps is all about the practices that enable shorter deploy cycles. The faster, the better. And it seems that Google, Facebook and other top influencers in the world are doing it several times a day, on huge codebases. So, there is no reason that smaller shops cannot achieve the same pace.

Our previous working cycle

For years, our working cycle was as follows:

Task workflow

It all starts with a task. Everything we do is a task in our issue tracker (think JIRA, but we use our own). Then for each task there is a new branch. A developer uses this branch to complete his changes. One task – one branch, that's why it is called branch per task. Then the branch gets peer reviewed, validated (more on this later) and when there is a pile of branches/tasks ready, they get merged by the buildmaster.

Yes, we relied on "controlled integration": one developer playing the buildmaster role, in charge of doing merges to main.

The good: Someone in charge, a last layer of review after the "review + validation" process.

The bad: It does not scale, but if the team is not big enough this is not a problem.

The problem we faced: Releases became an event. Yes, they did, but it was not because of merging (I mean, we live to merge, merge to live... sort of), but because of tests. "Please wait because we need this task into the next version" became a common source of frustration for the buildmaster, and delay for the next version.

What really worked:

  • One task - one branch. The magic formula. It works.
  • Code review each task.
  • Short exploratory test for each task.
  • Automated testing. It is one of the 3 core pillars for me: automated testing + issue tracking + version control.

New cycle: trunk + task branches + automation

What changed? In short, there is a bot doing the merges now, which means more automation and more frequent versions.

(We named the bot HAL, which matches our tradition of evil-named robots - the one that controls test machines is called Ultron).

The following picture shows the new cycle, which is pretty similar to the previous one, but it highlights the important role played by automation now.

New cycle

You finish your task, it gets reviewed, it gets validated, and then the bot merges it to master and tries to run the test suite. If everything goes fine, the bot checks-in the merge to main, and goes for another task. Every checkin to main is a new potential version.

The key benefit - finished tasks hit main faster, which is exactly what we wanted.

Once we have a few checkins ready in main, the bot launches the release process; basically, it runs the rest of the test suite and we do some manual checks too. If everything goes fine, the changeset is labelled and we have a new version ready to be published.

We could actually run the entire test suite on each checkin, but then it would take too long (including the manual checks). That's why we set the number of tasks that can be merged prior to picking a candidate and try to use it to create a new external version.

Every checkin in main, though, can be used in production internally, so we get even faster feedback by using it ourselves as soon as possible.

Before this, twice a week (ideally every day, but it wasn't the real average) our buildmaster took the list of finished tasks, made sure they passed tests individually, merged them, then run a full release test suite. As I said earlier, it became an event.

Now, thanks to automation, we are doing this more often, even several times a day. It means the lead times were reduced. And we still have lots of room for improvement.

Things that changed in our way of working

  • Stable release: Tasks start from stable releases (versions or builds would be more accurate, but internally we often still say "release" instead). Before we only took labelled changesets from main. Now, whatever is the last changeset on main is good.
  • Stack of finished tasks: It was the entry point for a new release. Now, there is a new merge whenever a task is finished. We don't wait to have enough, but as soon as something is ready, it goes to main.
  • Releases are not events anymore: This is the most important game changer for us. The "please wait till this is ready" is over. New versions happen all the time, and we can decide when we want to publish them.
  • Build numbers grow faster: Previously, we only assigned build numbers (BL740, BL741, etc.) to labelled changesets. Intermediate changesets on main could be created during the release process but they were not actual builds, just intermediate steps. Now, every single checkin on main is a good build that can be even put into production internally (and even published if it is selected to pass the rest of tests). Build numbers now fly. We passed from BL800 to BL900 in about a month, something that would take months before.

Focus on automation

Check now the picture below, which just focuses on the same stages a task follows, but this time using a simpler flow instead of a cycle:

New cycle in pipeline mode
  • Task selection: this is not depicted as such, but our CI system selects which branches have to be merged. It checks an attribute in the branch, status, and checks if it is set as "resolved". Then, it also checks if the associated task in the issue tracker is marked as validated.
  • HAL takes care of the merge (it fails if it is not automatic, but fortunately we rely on SemanticMerge for this :P). This is one of the pieces we automated now.
  • Task tests: fast test suite. We already had automation for this. In fact, we used to call it HAL-CI.
  • Checkin if everything goes right. Again, something we didn't previously do this way.

Then, once a few checkins are ready, the last available one is picked as a "release candidate" and:

  • Remaining tests pass. We already had this automated, but it was triggered manually before.
  • Labelling if everything goes right. Automated now.
  • Upload the final installers to the website. It was triggered manually before.

Once we have a new version, we install it locally, we all use it for a few hours, and some designated team members do a more thorough check. Support double checks new features or bugs requested by customers are available as promised, and so on. Release notes are double checked at this stage too.

Publish is the final step, triggered manually. Once the button is hit (a command right now), the new version is visible in our website, in the downloads area.

Unlike many continuous delivery teams nowadays who are basically full-on-cloud, Plastic SCM also means on premise servers and desktop applications. For us, so far, deliver means publishing at this point, although the deploy is not real until customers install and use the new version. As I said, there is room for improvement for us. Automatic upgrade detection from the customer side would greatly help close the circle here.

Wrapping up

Automation and Continuous Integration are not new. They have been around for two decades already in their current form. We had lots of automation here and there, but missed the last mile. It is a small thing in terms of the work to do once you have automated test suites managing dozens of virtual machines both on premise and cloud. But this small thing is a game changer. Suddenly, you have an engine capable of moving things forward by itself. Picking finished work and creating releases. All the time, non-stop, provided the team feeds it with a constant pace of correctly finished tasks.

I know I left many questions unanswered, but you get the big picture. This is a great new work cycle for us and we like how it blends with branch per task.

Stay tuned because there will be more details coming.

1 comment:

  1. That's quite cool!!

    We implement Continuous Delivery using Git (Git flow) on TFS, our CI system (Jenkins in our case) and some automation code written in PowerShell.

    We divide tasks and assign them at the beginning of every sprint in our planning and refinement meetings. The changes are developed in separate branches and at the end a pull request is created in TFS and assigned to the reviewers, which, in turn, can complete the pull request if everything is okay. The CI server is polling for changes every 5 minutes, perfoms a build and deploys into our DEV environment, calculates the next tag in Git and sends notifications via e-mail at the end. Then our QA team tests the changes in the DEV environment.

    Thus we ensure a stable deliverable product any time.

    We started this path more than 2 years ago when apparently lots of projects in the company didn't know about Continuous Delivery. One year later other teams in our project decided to adopt it as well. Several months ago we showed the process to our customer and he liked it a lot.

    Now other projects are beginning to use the same workflow and we are building some PowerShell scripting stuff flexible for different projects and needs.

    ReplyDelete