Versioning your development databases

Monday, October 22, 2007 Pablo Santos best practices 4 Comments

One of the questions that usually arises after a new customer starts using Plastic is ok, but what should I do with my data? Most development groups developing business applications use some sort of database backend (SQL Server, MySQL, Firebird, Oracle, whatever) and normally some sort of shared sandbox like the following:

So, when they decide they have enough problems breaking each others code, being locked by someone else’s changes, or continuously “trying to shoot a moving target”, they decide to implement some sort of version control system (or better version control system). Then they move to “stage two”, which can be depicted like:

Ok, but, the same troubles they were having with code are still present dealing with the centralized database: I make a change on the schema and then your “under development copy” stop working because it can’t handle my modification: you’re stopped until you change it (breaking mine again), or fix your code (losing your precious time)... Weren’t you supposed to work in isolation?

The answer is wonderfully explained at the book refactoring databases by Scott W. Ambler: get a private sandbox which includes the database too. If you manage to do so you’ll get rid of the database versioning hell.

Ok, but, if you’re used to directly make changes in the development database using the specific server GUI, how would you keep track of changes?

My recommendation here (and Ambler’s one, if you need to trust a real expert) is: keep your schema in DDL format! Yes, maybe it sounds too arcane for you, used to only delve into the rich SQL server user interfaces, but it will really help. Suppose your database is defined into a single (ok, or even more than one, it doesn’t really matter) file, then anyone making changes will register them in Plastic SCM, creating new versions of the files. Then merging those changes (add a new field, create a key, drop a table, whatever) will be just a matter of merging regular text files, and Plastic excels here!

Ok, but... how do I manage to set up the developer’s workspaces then? Well, my recommendation would be:

Is your system capable of creating a full database from a DDL script? If so, here’s your answer: the developer will compile the system on his workstation, run it and a wonderful new blank database will be created.

If your system is not supposed to be autonomous then you can execute the script yourself.

Well - I hear you saying –but then I’d only test with a minimal set of data... this will lead to performance problems. Yeah! You’re right! So, don’t get rid of your “old” central development server. Use it to run integration tests on top of real sets of data, highlighting possible performance bottlenecks. The new environment will be a combination of isolation and shared resources: the right balance between them will help you working better, a wrong one will, obviously, lead to problems.

Still– you complain – weren’t we supposed to move towards a more agile and decentralized development method? Now we need some way to update this central server, before it was done by each of the developers. Yes, but “before” your team were loosing time trying to figure out what was happening each time (or at least often) a change was made on the database. Now you just have to update this server when a new release (an internal one, not necessarily a customer milestone at all) is created. And you can use the same system your software uses in production to upgrade databases:

Do you upgrade manually? Ok, it doesn’t seem like a good idea, but you’ll have to do the same here. Now you get bored of running the same scripts again and again and create some sort of automation...

Then you get to step two: you’ve some sort of mechanism in place to upgrade from a version “A” to a version “B” without human intervention. Your software can have this feature built in (my favorite), or you can have your own “upgrade app”.

Ok, I’ll do that but, how will my software know the source and target versions? – you ask.

This question is greatly covered in the book I mentioned, and to be honest I was surprised to see there one of the things I’ve been doing for years, in different projects: you create a table inside your schema (you can name it “databaseinfo” for instance) which will hold versioning information. Then your system will be able to determine which the source database version is.

What about the destination one? Well, here my choice is: hardcode it in your software. Whether you do it embedding some sort of resource or with a simple constant, it doesn’t really matter, but it is very useful having your binaries know which is the database version they need to work.

You might be thinking now “yes, but creating a database upgrade system is not an easy task”. Of course it isn’t. If you need a really flexible one, take a look at www.red-gate.com. They have a powerful set of tools to cover all your needs.

But sometimes you don’t really need a wide-range solution, you just need to add some fields, perform some calculations or create some indexes from one version to the next: you can create your own upgrade system, which will be maintained together with the software, and which will be able to perform these very specific operations when it detects the source version is not the same as the destination one. Your upgrade system will have to check which one is the “starting point” and then apply all the rules until it hits the “desired destination version”.

So, you have a beautiful solution to move towards fully managed (in terms of version controlled) database development.

Ok, maybe some of you are actually disappointed because you expected some better integration, something you could just install and forget about all your data issues... I’m afraid there are no silver bullets and the solution will require some work on your side... The good thing is that in exchange you’ll get a better system, easier to modify and evolve...

Pablo Santos

I'm the CTO and Founder at Códice.
I've been leading Plastic SCM since 2005. My passion is helping teams work better through version control.
I had the opportunity to see teams from many different industries at work while I helped them improving their version control practices.
I really enjoy teaching (I've been a University professor for 6+ years) and sharing my experience in talks and articles.
And I love simple code. You can reach me at @psluaces.

4 comments:

AnonymousOctober 23, 2007 at 9:16 AM
Hi,

You may also probably want to check out how SwisSQL DBChangeManager helps you with Collaborative Database development/versioning :

http://www.swissql.com/products/database-compare-synchronize-tool/collaborative-development-solutions.html

Sathish

PS: I work for SwisSQL DBChangeManager
AnonymousOctober 25, 2007 at 5:24 AM
Back to the Red-Gate tools comment (I do not work for Red-Gate). I've been using their tools to build a DB versioning system and my developers can now treat DB objects just like files--check in, check out (visual diff is in the works). Behind the scenes I actually do use DDL definitons to version the entire thing.
Great article, right on!
AnonymousOctober 28, 2007 at 3:27 PM
Refactoring tools for other DBs e.g. Oracle??
Pablo SantosOctober 29, 2007 at 10:59 PM
Doesn't redgate work for Oracle?

Branched Code

Thoughts on version control, software development, branching and merging from the Plastic dev team

Who we are

Versioning your development databases

Monday, October 22, 2007 Pablo Santos best practices 4 Comments

Pablo Santos

Pablo Santos

4 comments:

Popular Posts

Labels

Who we are

Versioning your development databases Monday, October 22, 2007 Pablo Santos best practices 4 Comments

Pablo Santos

Pablo Santos

4 comments:

Popular Posts

Labels

Versioning your development databases

Monday, October 22, 2007 Pablo Santos best practices 4 Comments