Plastic SCM development cycle - key practices described

Thursday, June 15, 2017 0 Comments

So far, we shared how we do DevOps and Trunk Based Development using Plastic SCM and a set of tools around it. But I wanted to make the blogposts short, so I left lots of notes and explanation intentionally out for the sake of brevity.

In this 5th blogpost in the series, I will cover why automated testing is so important for us, but also 2 other practices we adopted long ago: code review and explore test each task.

Automated testing

We rely on automated testing from day one (and it dates back to August 2005 actually). Thanks to automated testing, we can refactor and clean up code, something key in a code base that already survived its first decade.

Every new version goes through 50+ hours of automated testing, although it takes 2-3 hours to finish thanks to virtual machines and running tests in parallel.

But that being said, we can share both things that went right and others that went wrong.

These are the type of tests we run:

  • Unit tests.
  • Integration tests (we call them smoke).
  • GUI tests.

Unit tests

Fast, reliable (unless you really write some messy ones) and capable of testing everything. Yes, if you think you really need an integration test because "it can't be tested with unit tests", I bet you are wrong. I was.

Now, we create more unit tests than ever and less and less GUI and smokes. Some are always needed, but the fewer, the better.

Integration tests - what we call smoke internally

We call them smoke, but they are sort of integration tests.

We extended NUnit long ago and created PNUnit (parallel NUnit) and we use it to sync the launch of client and server even on different machines.

Tests are written in C# and automate the command line.

Our mistake: Each test takes between 4-15 seconds to run. Too slow. We didn't realize at the beginning and created tons of them. Many are still in service.

Test slowness end up creating the "new versions are an event" problem I mentioned in the first blogpost.

At least smokes are stable and false positives are not common.

GUI tests

We also have a bunch of them. They can detect UI issues and help avoid regressions.

In our case, they need to start the server, wait for it to be ready, and so on. And well, they are the root of all evil.

I mean, unless you write them really, really well, they end up being fragile (the worst that can happen to an automated test suite).

We rewrote many GUI tests over the years. This is how the whole thing evolved:

  • First, we started with the "record and rerun" technique (offered by many vendors) which proved to be a nightmare (auto-generated code ends up being a mess).
  • Later, we moved to just writing code to create tests (and this happened like a decade ago actually), because, while more difficult to write at first (not really, but you do need programmers for that, and well, we are programmers anyway, so not a big deal for us), it ends up paying off because you have more maintainable tests that you can easily refactor and adapt when the UI changes (and our UI changed over the years, as you can see here.

But, even with coded GUI tests, we were hit by long tests, and by the unavoidable issue of "control not found" that comes with all UI Automation packages I know of (even if you use the native UI automation provided by Windows).

We also hit another big issue: we needed GUI tests on Linux and OS X, and the tools we knew of didn't support that. Yes, nowadays C#/.NET is well-known for its desire to go beyond Windows, but for years, even having Mono, doing GUI cross-platform development wasn't smooth.

That's why we ended up creating our library for cross-platform .NET GUI testing. It deserves a series of blogposts itself, so all I will share today is some basics and a few screencasts so you can see it running:

  • The test code is loaded by the running app. There is no external driver process or testing process if you prefer. This could obviously have an impact because you are modifying the application under test, but we considered it to be a minor issue compared to the benefits.
  • A new thread launches the tests and the test code automates actions on the GUI, so you see how it goes on "autopilot" and starts doing things.
  • We do not rely on "UI Automation" stuff for testing, due to all our previous experiences. What we do is that every single dialog, window, etc, exposes a specific ITesteableXXX interface with methods like ClickCheckin(), RefreshView(), etc. They are supposed to be low level, I mean, we don't do something like "SwitchToADifferentBranch()" but more a sequence of actions like a UI Automation test would do. Of course, we need to be careful on each test we write to avoid creating no-sense test code. The RefreshView() method will do the same as the Refresh button would do, firing the OnClick event for the button or whatever. By the way, it is NOT the same as real UI automation because you are not really hitting the UI code sometimes, but the only risk is that the actual GUI code is not wired to the event code we invoke, which hopefully never happened to us.

Take a look at the 2 following screencasts to see Plastic being tested by this system:

  • Running on Linux:
  • Running on Mac:

Code review each task

Eons ago we tried to do formal reviews. You know, the entire thing with moderator, a short meeting, a scribe, and the author explaining. We only did it like twice. Too heavy. Once more, it became something more like an event, and it ended up being always postponed.

So, we changed our minds and started the "no task if merged if somebody else doesn't review it" about 6 years ago.

We suffered an initial slowdown for a couple of weeks, but then we all got used to it.

Code reviews are great to train newcomers on "how things get done here", to keep some coherence on the design and to keep the code simple, or at least try to.

Reviews became mandatory for difficult changes too, but this was something we somehow did before.

Note: What we commonly accept as code review today is more a walkthrough. You can find out more about the topic in my all times favorite "Code Complete".

Explore test each task

We call it "validation", although it is too big of a name which can invoke the demons of formal validation or something, so I better not step there.

This is how it works: someone gets the feature or bug fix and makes sure it does what it should.

It is important to highlight that "validations" are not manual tests. We are supposed to automate those.

The validation is just trying to figure out if what we did makes sense for the user, if it is usable, if it really fixes the bug (even if a unit test or GUI test was added).

It is some sort of short Exploratory Test that helps ensuring we produce a usable and consistent product. A validation can be as short as 5 minutes (switch to the branch, build the code, run it, setup an example, etc.) or as long as 1-2 hours if the scenarios to test are very complex, but the usual is just 10-20 minutes each at most.

By the way, we used to run lots of longer (1-4h) exploratory testing sessions following what "Explore It!" describes. We generated the reports for each session, captured bugs, followed up with new tasks, etc. We still do it when we have some new big features, but the actual daily practice is the short validations.

We even hired an external team back around 2012 to help us doing weekly Exploratory Tests, but it didn't work as expected. It was a big shock for us. I mean, they were not finding the kind of issues we expected, and we struggled to find out why. The problem was that the testing professionals were not used to most of the development tasks related to the Plastic SCM daily use. We thought we were doing something wrong. But, at the end of the day, in my opinion, you need testers who resemble the kind of users your software will have. And Plastic SCM users are… well, developers. They know how to use Eclipse, Visual Studio, the command line, doing branching, merging, diffing code, creating actual code… the sort of things the external testers we hired were not used to, and that's why we failed.

Nowadays, daily validations and code reviews are part of our day to day in development, and of course they are time consuming, but we considered they are worth every minute. We try to organize our days to avoid interruptions, so many do the pending reviews and validations early in the morning, then the rest of the day can be spent on development.

Conclusion

Automated testing is a big effort in the development of Plastic SCM and SemanticMerge. As such, we invest time and effort trying to improve it, to make it faster and more reliable. Over the years we made some mistakes and also enjoyed some successes, so I thought it would be valuable for many teams to learn from both.

Then code reviews and validations, the manual steps in the pipeline besides development, where we spend about 15-20% of our time as developers but greatly helped to raise the quality bar of our work. Hope you enjoyed!

0 comentarios: