mergebot: the story of our DevOps initiative

Wednesday, September 05, 2018 Pablo Santos DevOps , mergebot 0 Comments

DevOps is now a pervasive concept in the software development arena. And, beyond the hype, we really believe in the idea of having a continuous flow of tasks (bugs, features, anything) correctly reviewed, tested, merged and deployed, is key for modern software creation (as of 2018).

But, while Plastic, as a version control system, plays a key role in the DevOps picture, we historically tended to remain "process agnostic". "Let teams find their path", we thought.

Now, we believe Plastic has to guide teams adopting best practices. We must make things easier and avoid leaving tough decisions as "exercise to the reader". (Which doesn't mean Plastic won't let you freely implement your own practices if you are an expert, but we'll provide guidance to newcomers).

This is the story of how we decided to automate the "integrator role" with mergebots, how we found a way to close the automation last mile, and how to make it way simpler for teams to integrate their Jenkins/Bamboos/TeamCities with Plastic, Slack, email and their favorite issue trackers.

This is about how Plastic will become proactive in terms of driving the workflow, pushing tasks forward and actively coordinating with CI instead of being just a source of information. Somehow, we decided to become accountable for it :-)

It is also about how we recommend our own formula of success for development: trunk-based development + short lived task branches + mergebots. And, how we coded it inside Plastic for teams to use it.

From involved to committed

It was more than a year ago when we deeply realized that Plastic, as a version control system, had to play a key role in the whole DevOps picture.

I mean, we knew that teams were already using Plastic to implement their DevOps strategies. Our support team was always in close collaboration with many users to integrate with Jenkins, Bamboo, TeamCity and implement solid flows for task and release branches.

But, they had to glue some code here and there, to configure many steps, and to deal with the different ways in which the CI systems handle branches.

So, we thought there should be a better way.

And that's why we started this initiative: a way to help teams using Plastic to implement a continuous flow of short tasks that get code reviewed, tested, merged and deployed. This was always our battle-tested recipe for continuous delivery, our foundation to build consistent and scalable DevOps. But, we never made it particularly easy for users to implement it.

The funny thing is that we always had a strong opinion on how to implement successful continuous delivery, the missing piece was to make ourselves accountable for helping teams achieve it. Actually, moving from involved to committed :-)

Rewind a second: what is DevOps?

I bet many of you don't need me to explain what DevOps is all about, but since it is a wide concept, I think it is good to explain what we understand by DevOps and how we think Plastic fits into the whole picture.

DevOps is all about breaking silos and delivering fast to production.

Just to give you some shocking data: leading organizations (the Amazons, Googles, Netflix of the world) deploy over hundreds of thousands of time a day (100k!). (They achieve this not only with DevOps but also with microservices and other patterns to have really tiny pieces of deployable software).

DevOps is about super-fast release cycles but always with the business in mind. Read a book about modern marketing and growth and you'll immediately learn about running weekly experiments to reach more customers, retain them better, check what they like… and all these techniques take deploying fast to production for granted.

We wrote a blogpost about some DevOps basics here. Focus just on the concepts, the implementation will be now much easier with the new mergebot approach :-)

The task branch approach and how it fits with DevOps

What we learned from talking with hundreds of teams is that they want to implement a workflow where every single task goes through the following:

Tight control of tasks with an issue tracker: Every task the developers will work on is entered in an issue tracker. (It can be a Jira, Polarion, anything). By task, I mean a bug, new feature, refactor, performance improvement, anything.
A new branch for each task branches (a.k.a. task branches): Every task in the issue tracker will have an associated branch to develop it. Task branches are where developers perform their work, where the code is created. A task, and hence a task branch, is assigned to a single developer. And task branches, like tasks, are extremely short lived. They can be 1 or 2 hours long. The limit is 16 hours in pure agile fashion. There are times when more than one developer will work on a single task branch, but it is more an exception to the rule.
Every task is code reviewed: Team members collaborate on this. Code review plays many roles: ensures code quality, enforces a commonly accepted style, best design practices, helps training new members, etc. There was a time when code reviews were often neglected but now we are well past this point.
Merge only if tests pass: If a task is successfully code reviewed, then it is tested and merged to main if tests pass Then, a new version is ready to be deployed.
Deploy: The freshly created version is now ready to be deployed to production.

We define the previous cycle as a combination of task branches + trunk-based development. It is definitely not the only workflow we have in mind. We have a wide range of customers with different needs, but this cycle is a good simplification of what many teams out there want to achieve.

mergebot: a step beyond pull requests

Working in branches, doing pull requests for code review, and merging them with a single click was definitely not new.

So, that's not what we wanted to achieve.

We wanted to go a few steps further.

We created a mergebot: a piece of software that decides when a task branch is ready to be merged, without human intervention. It is based on rules, configurable, predictable and trustable. And, it is fully automated. No more "merge now" button. Merges happen when tasks are ready.

mergebots are powered by our merge stack: we are the only ones doing xmerge and semanticmerge and even automatic merges of code fragments moved across files. This changes everything. It is not just having a script triggering merges, it is ensuring that 20-30% of full branch merges that require manual intervention can be automated with this stack. (Important: I say 20-30% of full branch merges. I mean that a branch can contain hundreds of files and even if one of them can't be merged, the entire branch can't. Thus, this 20-30% considers that. The statistics on files needing manual intervention can decrease to as much as 60-80%).

If you add together automating the "trigger the merges" process plus this reduction in manual conflict resolution and your team dramatically reduces unwanted context switches. You get an extra productivity boost.

How mergebots were born?

Mergebots are automatic integrators.

We evolved the concept of mergebots on top of the concept of the "integrator": we used to like having a human running all merges, so developers got rid of this burden, while someone was accountable for the operation.

But having integrators, whether dedicated or part-time, didn't scale.

Releases became an event, in the bad sense. "Hey JM, please, do the integration tomorrow because these 2 important tasks still need review and we really need them in the next release". Sounds familiar? You end up delaying the release until tomorrow. Creating new releases was an event. But, if the step was fully automated (this is what DevOps is all about) then a release (or version, your call) would be created today, and a new one tomorrow or in a few hours with the 2 tasks you were concerned about. No need to wait, no delays, just a continuous flow.

We like the concept of "bot" because our vision is that a mergebot is a team member that you hire to run operations that were previously run by a human. Then, the human (as it was our case here) is freed to focus on continuous improvement of the full cycle (in our case it was moving tests to cloud to speed up the entire process).

Focus on helping teams: no more exercise to the reader

Most of what I described above was fully achievable with Plastic *before this initiative*. In fact, a number of our customers already implemented similar workflows.

That's why we wrote a series of blogposts describing how to achieve it with different CI systems:

But, the challenge was that you had to connect the dots yourself. Want something to trigger the builds only if the Jira associated to the task branch is set to a given value? You can code it as part of your Jenkins build script, of course. But *you have to code it*.

We want to help teams by providing an off-the-self solution based on task branches that get automatically merged by a "mergebot".

We used to be very "process agnostic". Now, we'll be much more prescriptive because that's basically what most of us appreciate from products: not only to provide the functionality but also the guidance to help us succeed with the problem at hand.

DevOps make task branches shine

Needless to say, I feel comfortable with task branches. You work on your task branch, you can checkin as often as you need, you can tell a story with each checkin to make reviewers more productive (more on this later, since this is one of the keys of the entire initiative).

And, well, we found out that task branches blend really well with DevOps. Task branches are not feature branches, they are much shorter and hence they can live perfectly well even with strategies like trunk based (not to be confused with mainline, I think I should write a blogpost about it too).

We also discovered that teams struggling to use Plastic normally find themselves forcing Plastic to work in Subversion, Git or Perforce modes. They really try to map their previous knowledge and bend Plastic to it.

Then, a frequent pattern is that the team talks to one of our product experts; we explain to them about task branches, and they suddenly change their minds, and everything begins to flow.

Of course, this is not true in 100% of all cases, especially with game studios forced to do file locking for artists and where branching (most of the time) seems to be a no-go (again, not set in stone, since we have super successful game studios doing task branches even for art. Something for another blogpost).

The takeaway point for us here is: if task branches is what we recommend, why don't we simply help teams adopt them from day one? This entire initiative is fully aligned with this goal.

Shortcomings with CI-driven approaches

As I mentioned, over the past few months, we delved deeper into how to implement DevOps with Jenkins, TeamCity and Bamboo combined with Plastic.

This is mostly how CI systems evolved:

Most CI systems were created back when everyone worked in SVN mode: mainline. A single branch and everyone did direct checkins to it. No branches. Branches were even considered evil back then (mostly due to Subversion limitations).
Then, thanks of the rise and wide adoption of Git, all major CI systems started supporting branches. First, it was at most a "a few branches only" approach which didn't match with Plastic needs.
Nowadays, fortunately, they all evolved to support thousands of task branches if needed (of course, I mean thousands of branches across time, not all together the same day!).

This is good news, but we found some shortcomings.

Some CI systems use the following approach when merging branches:

They will merge the branch "task1213" to main, then checkin (or commit if you are coming from Git) and then run the tests. It means the code is checkedin before being tested, potentially causing broken builds.

Other CI systems use a slightly different approach: they will test in A, then merge to main, checkin, and test again in B:

We don't like this option because:

The test in A is not needed and slows down the build. It is not testing considering the latest changes on main.
The checkin in B means the build can be broken, thus, tests might fail, and then main gets broken again.

Our preferred way to merge and test task branches

The option we really like is this one: the branch is merged to main, then tested, and the merge is only checked in if tests pass.

This means you avoid broken builds at all costs, and your main branch will be always clean and ready to release.

There are several options to achieve this behavior. In the diagram above, I'm showing how the merge is storing the result on a shelve, which is something we implemented a few months ago (merge-to creating shelve/workspace-less merges). This is the approach we are using with the DevOps/mergebot initiative.

An alternative, more traditional option, would be to use the workspace were the merge was done to build the code and launch the tests, and only checkin if tests pass.

Note

We implemented this second approach with Atlassian Bamboo successfully and documented it in a series of blogposts:

Merge to a shelve as a way to decouple from CI-driven merges

I'm going to get more technical now ;-)

The reason why we created the "merge-to a shelve" is because we want to push the result of the merge to the CI system to run the tests. This way, we decouple the merge from the testing. In the series of blogposts mentioned above, Bamboo triggers the Plastic merge, keeping the result of the merge in a workspace, building and running tests from there.

With this new "shelve-based approach", we move the complexity out of the CI system. It doesn't need to run merges anymore; all it needs to do is to download a given shelve (which is like a regular changeset for the CI) and test it.

This way, we remove all the merge logic, the downsides described in the previous section, and we provide a consistent behavior independent of the CI system the team is using.

The following diagram shows how Plastic (the mergebot) interacts with the CI system as described above:

Plugs: the connectors with CI, issue trackers and notifiers

The following image is the grand picture of all the components taking part in our mergebot initiative. And they are just 4 :-).

We have mergebots, the logic driving the merge process, your workflow converted into code, basically. Then, we have "plugs" or connectors to external systems. Plugs are just glue code that lets a mergebot seamlessly connect to Jira, Bamboo, Polarion, Slack, or any other CI, Issue Tracker or Notification system.

Let me show you an example:

The "Jira plug" is just a standardized API that lets the mergebot obtain the status of the task associated to branch "task1213" without having to deal with the Jira specific API.

This way, the mergebot could be configured to work with Polarion, Trac or any other, and its logic wouldn't change.

I'm just explaining regular plugins here, nothing new under the sun.

We designed 3 types of plugs:

Continuous Integration Plugs: We provide support for Bamboo, Jenkins and TeamCity. Our goal is to keep adding more.
Notifier Plugs: To send messages. We include Slack and email, but Skype could be an option (although their API is awful), or sending whatsapps if a branch can't be merged. Again, the beauty is that it is very simple to add more.
Issue Tracker Plugs: To integrate with the Jiras, Tracs, Trellos, etc.

As I have said many times, we developed a set of plugs for the most widely used systems, but you'll have the option to add your own. A typical scenario is that a team has their own internal issue tracker (which happens very often). Well, writing a plug to connect it to Plastic is straightforward.

From a more technical point of view, a plug is just a program that connects to the Plastic server using a websocket, then waits for requests. Each type of plug will accept a different set of requests (an API).

Here are some typical actions so that you can see what mergebots can do with plugs:

Check the status of a given task.
Send a message.
Launch a build.
Monitor the status of a build.
Change the status of a task in the issue tracker if the build fails/succeeds.

Custom mergebots and plugs

I already mentioned it for plugs, but it applies to mergebots too. Our goal is to create a gallery of plugs and mergebots, with source code, so anyone can use them as starting point for their own custom work.

Why custom plugs can be useful is simple: integrate with a homegrown system you use, or support one we still officially don't.

What about mergebots? Well, a mergebot is the logic that drives your daily workflow, so while you can stick to one of the standard ones we'll be publishing, chances are you'll need some variations. Things like: once a new release is created, you want a Tweet to be sent automatically to announce it, or you want to automatically create the release notes getting a given Jira field from each task, etc.

Since a one-size-fits-all won't help all teams out there, we thought creating custom ones would be the solution.

Mergebots and plugs are just simple programs, written in any language, that connect to a websocket and respond to certain messages. Mergebots can use the new REST API to do things like running merges, listing branches and many more.

We have our own internal bot for Códice (driving plastic, semantic and gmaster releases) and our goal is to publish it too as reference. It is written in C#, but since the only connection point is a websocket, it is perfectly possible to write your own in Node, Java, Perl or anything else you feel more comfortable with.

Future work: improved code reviews

As part of the overall "DevOps Initiative", we are redoing our code review system. While we are about to release the mergebot part, it will take a little bit longer to have the new code review system ready.

But, I think it is worth mentioning because we consider code reviews to be a central part to the entire process. We have been sticking to "every single branch goes code reviewed before being merged" for 6 years now. I can write an entire blogpost on how it helps training new team members, sharing a common design view, finding issues, keeping the code clean and many others.

I also want to share our experiences and best practices doing checkins that can help with code reviews. Yes, same as "we write code meant to be read" now we "checkin code meant to be reviewed". It is a game changer, so I really want to share these experiences. And, since we are tool makers after all, we are going to capture all these good practices in a blogpost.

Stay tuned because there is much coming in the code review area.

Closing

This has been a long blogpost, but I had so many things to say that I just kept writing and writing.

First, I told you the story of why we changed our minds from product perspective and decided to be more prescriptive, or simply provide more guidance if you prefer.

Then, I described in detail what our "DevOps initiative" is all about. We are changing the way in which Plastic communicates with CI system to implement better continuous delivery. We are creating "bots" (not AIs yet, but maybe soon they will be :P) that act as non-human integrators, automating lots of previously manual work.

Now, if you are already doing continuous integration/deployment with Plastic, we'd like to hear from you, so please reach out to us for more details or ask us any questions that you might have.

Pablo Santos

I'm the CTO and Founder at Códice.
I've been leading Plastic SCM since 2005. My passion is helping teams work better through version control.
I had the opportunity to see teams from many different industries at work while I helped them improving their version control practices.
I really enjoy teaching (I've been a University professor for 6+ years) and sharing my experience in talks and articles.
And I love simple code. You can reach me at @psluaces.

Branched Code

Thoughts on version control, software development, branching and merging from the Plastic dev team

Who we are

mergebot: the story of our DevOps initiative

Wednesday, September 05, 2018 Pablo Santos DevOps , mergebot 0 Comments

From involved to committed

Rewind a second: what is DevOps?

The task branch approach and how it fits with DevOps

mergebot: a step beyond pull requests

How mergebots were born?

Focus on helping teams: no more exercise to the reader

DevOps make task branches shine

Shortcomings with CI-driven approaches

Our preferred way to merge and test task branches

Merge to a shelve as a way to decouple from CI-driven merges

Plugs: the connectors with CI, issue trackers and notifiers

Custom mergebots and plugs

Future work: improved code reviews

Closing

Pablo Santos

Pablo Santos

0 comentarios:

Popular Posts

Labels

Who we are

mergebot: the story of our DevOps initiative Wednesday, September 05, 2018 Pablo Santos DevOps , mergebot 0 Comments

From involved to committed

Rewind a second: what is DevOps?

The task branch approach and how it fits with DevOps

mergebot: a step beyond pull requests

How mergebots were born?

Focus on helping teams: no more exercise to the reader

DevOps make task branches shine

Shortcomings with CI-driven approaches

Our preferred way to merge and test task branches

Merge to a shelve as a way to decouple from CI-driven merges

Plugs: the connectors with CI, issue trackers and notifiers

Custom mergebots and plugs

Future work: improved code reviews

Closing

Pablo Santos

Pablo Santos

0 comentarios:

Popular Posts

Labels

mergebot: the story of our DevOps initiative

Wednesday, September 05, 2018 Pablo Santos DevOps , mergebot 0 Comments