SCM: Continuous vs. Controlled Integration
Agile methodologies. What do you know about them? If you run a development environment with hundreds of developers, this may not be for you, but if you run a tight-knit group, then agile programming is the way to go. Agile programming means frequently starting new iterations of the development cycle. That means every so often – maybe every two weeks – you’ll start a new cycle of planning, development, testing, and, most importantly, release.
This methodology rose at the start of the new century and has introduced a new vision and spirit in software development. Concepts like refactoring, pair programming, and collective code ownership meant nothing twenty years ago. Now, though, those buzz words are everywhere, and if you haven’t yet, you’d better sit up and pay attention.
Agile methodologies have not only influenced the way software is analyzed, developed, and written, they have also changed the way it’s assembled, or integrated.
Today, just about everyone in the software industry has at least heard about continuous integration tools and techniques.
This article will analyze the pros and cons of continuous integration and discuss whether there might still be opportunities for even more agile processes on the horizon.
Free-ride Software Development
Agile software development brings many relevant features to the scene. Different people would stress different features in different situations, but here, I’ll highlight a subset of agile programming features that I’ll call the free-ride methodology. Imagine a dirt biker catching some serious air. Now THAT’S a free ride! And that’s the spirit of agile software development that we’re trying to capture. Free-ride programmers …
- Enforce change. Whether you’re refactoring code or participating in collective code ownership, the message is clear: you need to be adaptable, you need to be able to change anything – and everything – in order to create better code and better serve your clients.
- Create a real team. Free-ride methods put people first, as opposed to traditional project management and software engineering techniques which fiercely look towards reducing staff dependency. Our method encourages interpersonal communication, with a team being people actively working together towards a common goal – not just a bunch of geeks sitting in the same room.
- Have fun. By the way, this won’t ever fit into every kind of organization. If you need to compete in the global market, though, getting the best out of talented individuals is imperative. Achieving such a goal involves motivating the team members, and you have to admit, a fun work environment really helps motivate people.
Not all organizations and not all projects will benefit from using agile techniques or even be able to adopt them. Projects with hundreds of developers and high personnel rotation aren’t normally good candidates for these programming practices. In fact, the standard way to achieve agility in environments like these is to split larger teams into smaller ones, which is not always possible, and when it is not, a tall hierarchy chain is required which is incompatible with agile techniques.
Even in environments where full-on agile programming methods won’t work, there are still ways of introducing more agile working methods, such as those I listed above. Those same techniques will benefit almost all development teams and help them to overcome many of the problems derived from some extended software configuration management (SCM) practices.
SCM’s Role in Agile Development
What is the role of software configuration management in agile processes? SCM used to be perceived jus as a commodity, as a service to be used by developers. However, SCM can play a key role in contributing to the creation of the right environment to achieve the desired agile development goals. The problem is that not every version control or SCM tool allows you to reach those goals. Most of them fail, forcing developers to follow the process that’s best suited to the tool’s capabilities – not the developers’.
Agile programming is all about changing code in a safer way, adapting to requirements faster, and listening to the customer better. Some of the most widely spread SCM agile practices fail at this, giving developers the freedom to perform changes without putting in place all the mechanism to ensure the maximum codebase stability.
From Big Bang to Frequent Integration
As I mentioned before, continuous integration is one of the core practices in agile programming methods. Continuous integration is the extreme response to big bang integration (working in a silo for a long time and then putting all the pieces together at the end) which has been the root cause behind a huge number of failed and delayed projects.
Figure 1 shows a typical development cycle in which integration is done at the end of the project. With only one line of development going on, there’s absolutely no problem.
Figure 1. A regular development process
The problems start flowing freely, however, when you use big bang integration in a real-world situation, like the one depicted in Figure 2. The integration is delayed until the end of the project. When the integrators try to make all the code and components work together, it becomes a real nightmare. The problem is not only caused by the code itself – which needs to be re-worked – it’s because personnel are not used to running integrations. They’re done so rarely.
Figure 2. Big bang integration, big trouble at the end
So this is where frequent integration enters the scene: What if your team integrates their changes on a regular basis? Then, instead of having big trouble at the end of the project, the team will have more frequent, but smaller troubles, reducing the risk and making it much more manageable. Figure 3 depicts a frequent integration process.
Figure 3. Frequent integration
Now the question is: How frequently should I run the integration process? Should I run it once a month, once a week, or twice a day?
Agile programming methods clearly enforce frequent build and release cycles, but many development groups have ended up implementing what has been called non-stop integration. What does that mean? Instead of running integrations frequently, developers integrate all the time. A developer makes a change, checks all the code in, and the build system runs all the available test suites. If the build gets broken (if it doesn’t compile correctly or not all the run tests pass) the developer receives a warning notifying him or that he has to fix the problem. So, in fact, integrations are now continuous because they occur all the time.
The key difference between continuous integration and the evil code and fix cycle is the presence of a well-defined test suite, plus a firm developer’s commitment to run it all the time (or it could be enforced by build software).
But is continuous integration the solution to all version control headaches …. Or does it introduce more problems?
In a perfect world, the test suite would be almost perfect, so if it runs correctly, no problem would ever occur. In reality though, test suites are far from complete, and it’s not a far cry to imagine that a problem introduced by a developer reaches the main code line immediately without being correctly checked. Once it is detected it will be fixed, but in the meantime many other developers could have been affected. Figure 4 illustrates a bug spreading scenario
Figure 4. Bug spreading and mainline instability, aftermath of continuous integration
Imagine the following situation: A developer finishes a given task and wants somebody else from testing to check whether it’s correct or not. To deliver the code, she will check it into the version control system, trigger the build scripts, and then notify her colleague to get the code and check whether everything is correct or not. The only reason to submit the code at that point was making it available in a managed way. If the code has a problem or it doesn’t implement the feature correctly, the mainline would be already infected by the mistake. Because all the team members are basically doing the same, in a short period, there will be a lot of code built upon the unstable baseline.
Figure 5 shows a set of tasks being directly integrated into the mainline, as it would happen with the continuous integration working pattern. There is only one way for developers to deliver code: merging it into the mainline. In the figure, after tasks 1098, 1099, 1100, and 1104 have been delivered, what would happen if task 1098 is detected as a defective one? The answer would be it has to be fixed. But what if we need to release the code stat to a client or even just the testing group, and we already know some changes introduced by 1098 are wrong?
Most likely, features introduced by tasks 1099, 1100, and 1104 are totally independent from 1098 (task independency happens more often after the initial phase of a project during which tasks tend to be extremely dependent on each other due to the project’s infancy) and they could have been properly delivered if another working pattern had been used.
Figure 5. A task introducing a problem and all the rest building on top of it
Table 1. Continuous integration drawbacks
By the way, I don’t mean to say that continuous integration isn’t controlled, it’s just that when I refer to controlled integration as opposed to continuous, I want to highlight that the former occurs frequently, but not all the time, and it is normally run only when a certain milestone is reached (the milestone could perfectly well be a weekly or daily planned integration).
There is also another difference between continuous and controlled integration, and it refers to the roles involved in the process. In a regular continuous integration scenario, all developers perform integrations and solve merge conflicts in code, which is perfectly acceptable on small, well-trained teams. Even agile teams can be affected by personnel rotation, though, or just new members joining, and it’s usually not a good idea to have a brand new developer mixing code he doesn’t yet understand.
In controlled integration a new role shows up: the integrator. The integrator is usually a seasoned team member who is familiar with the bulk of the code, the version control system, and the build and release process. The most important feature of the integrator is not that she knows all the code, which is not even necessary, but that she takes responsibility for the integration process. The integrator’s primary goal is creating a new stable baseline to serve as the base for development during the next iteration.
Figure 6 shows a scenario in which well defined integration points are not present. In such an environment, you’re likely to experience mainline instability and developers are likely to get stuck in a code and fix rut since they’ll be starting their iterations from unstable baselines.
Figure 6. Development cycle with no defined integration points
A controlled integration process will introduce a set of well-known starting points that developers will use to work against. So, as Figure 7 shows, now developers will always start working against a well-known and stable baseline. This way, for instance, between BL004 and BL005, everyone will start their new tasks with BL004 code, so no unnecessary dependencies or unstable developer workspaces will affect the development process.
Figure 7. Well-defined baselines in a controlled integration process
As a side effect of controlled integration, task-oriented development can be introduced. Now, each task a developer works on will be handled independently by the version control system, implementing full parallel development and giving the team more maneuverability during release creation.
Figure 8 highlights the differences between task-oriented parallel development and serialized development processes. When task-oriented patterns are supported by the version control, a look to the branching hierarchy will reflect how the development was, indeed, parallel, something that can only be imagined but not traced by serial development.
Figure 8. Differences between parallel and serial development
Parallel development and branching patterns
This brings us to the crux of real SCM powered parallel development: branches. Branches are unfortunately normally viewed as a necessary evil by many developers. This is mostly because many version control tools systematically discourage branch usage, and the reason behind it is not that branches are evil, but that most of the available tools are extremely bad at dealing with them.
In fact, when most of us think about branches, we consider them just from the project management perspective. You create a branch to split a project, to start maintenance, or to support different code variants. Use just a few branches for very specific tasks.
I’m here to tell you that branches can also be used in a more tactical way to isolate changes and create detailed traceability. Branching patterns like branch-per-task or branch per developer (also known as private branches or workspace branches) open a new world of possibilities in the integration process. Commits no longer have to be associated with just delivering anymore, they can just be used as checkpoints (we call them changesets) creating a real safety net around developers, boosting both productivity and change freedom, which are practices totally aligned with agile programming goals.
Achieving task independency through branching
Using mainline development on a single branch, as it is usually encouraged by continuous integration practitioners, ends up, as mentioned before, in situations like the one depicted in Figure 9, where all the tasks are inexorably linked.
Figure 9. Task dependency forced by construction
What if each task were developed on its own branch? Then developers get not one, but two added services from the version control system. First, they get to create intermediate checkpoints (changesets) when they need to, instead of being forced to wait until the code is finished. Second, they also gain the ability to decide what goes into a given release or not, while still retaining changes under source control.
Figure 10 shows a scenario where the developer switches from one branch to another, thereby avoiding unnecessary task dependency.
Figure 10. Task independency achieved by branching patterns
Table 2. Controlled integration and branching benefits
Controlled integration cycle
So far I have introduced the concept of controlled integration, but how does it happen in practice? The answer is simple: Once a day, once a week, or at most, every couple of weeks. It really depends on the work volume. When the stack of finished tasks reaches a certain quota, they get integrated by the integrator; a new release is created, tested, and then marked as the baseline for the next iteration. Figure 11 shows the full cycle. Notice that testing (unit testing, automated GUI testing, manual checks, etc.) plays a key role in the process. If there were no tests, the cycle wouldn’t make any sense.
Figure 11. Controlled integration cycle
Are there any drawbacks to using the controlled integration cycle? No method is perfect. The following drawbacks are worth mentioning:
- If no build and test server is in place (something quite extended when continuous integration is present) developers run the test suites on their own workstations. This is normally time-consuming and can have an impact in productivity. In case automated GUI testing is used, developer’s workstations will be blocked until the tests finish.
- Results are not published. Using an integration and build server, there will usually be a way to publish the test results, but when tests are run on developer’s workstations, such an option could be more difficult.
Getting the best of both worlds: controlled + continuous tools
What about having the build and test tools normally used in continuous integration mixed with the controlled best practices? This way we would still get the best out of the branching patterns, the added control introduced into the process, and benefit from the build technology spread by the continuous practitioners. Figure 12 shows a mixed process: each time a developer finishes a task, the integration server will trigger a build, getting the code from the associated branch. All the available tests get run and then results are published and made available to the whole team. Now developers can continue working while the tests are run, and they can get feedback after the whole test suite is run.
Figure 12. Mixing controlled and continuous techniques
When a regular controlled integration is performed, the integrator runs a subset of the complete test suite (smoke tests) for each integrated branch. This practice allows one to reject an offending task if it breaks the build or doesn’t pass the tests. The integrator is the one responsible for merging the code, running the tests, labeling the results, packing, and so on. Figure 13 illustrates the process. The problem is that the task itself can be very time consuming. Normally, if the right tool is used and it implements a good merging support, the merge process is extremely fast, but running all the tests again and again will be demanding of the CPU.
Figure 13. Centralized controlled integration
Are there any other options to solve the problem? Consider the proposal depicted by Figure 14. Developers integrate their branches against the mainline from their development branches, and then the integration server triggers the build and test cycle. When a number of tasks have been integrated, the integrator will check the mainline stability and decide to create a new baseline. Notice that this proposal is quite close to continuous integration but it has the following differences:
- Developers still count on their own versioned sandboxes.
- All tasks start from a well-known baseline point, which is known to be stable, so bug spreading is still avoided.
Figure 14. Developers integrate against the mainline; integrators are in charge of the baselines
Figure 15 introduces a new variation on the same theme which mixes controlled and continuous integration together. Now developers continue integrating their changes when they finish a task, but this time they do it against an integration branch. The integrator will be in charge of promoting the changes to the mainline when needed, also creating new baselines. Now the mainline is kept clean and just contains correct and finished code.
Figure 15. Controlled integration + integration branch
SCM tools and practices can play an important role both in the transition to agile practices and enhancing the current ones. Both small and large teams can benefit from better isolation, task independency, and better release assembly.
Isolating tasks and changes in branches introduces an added layer of security and traceability, pushing the freedom to perform changes and incrementing both stability and productivity.
The right choice will heavily depend on the company situation, but deploying version control systems which are agile and utilize branches will give the group the freedom to choose the right pattern for the right stage on the project’s lifecycle.