The story of Jet: Plastic's super-fast repo storage
I'm going to tell you the story behind Jet, our super-fast repo storage. Many of you have asked me if it was a good replacement for the SQL-based storage that we support, (It is!), and how we came up with a custom format. Well, here is the full story
It was a Wednesday in late August and I was on vacation. I exchanged some Slacks with the team back at the office during the morning. But, things got worse after lunch, so we jumped onto a call.
"There is definitely something broken. It seems it only happens after a week of very intense work, but performance in the database data layer definitely degrades at some point" – was part of the conversation on the phone.
"We are going to upgrade to the newest connector, because it seems like it is leaking something. We really need to move past this and have our own thing".
Having "our own thing" was something we talked about for years. And it was finally the time.
How it all started: Hibernate
I remember doing a "big" checkin (by then) during Christmas 2005. It was only 4 months after we founded the company and I left my laptop running during dinner on Dec 24th. I sneaked out of dinner to check how it was going. The "huge" quake code was being checked-in and it was already 3 hours into it.
It actually took 11 hours to complete.
11 hours!
I mean, today we don't even test an add/checkin cycle with quake because it is too fast to stabilize times. It is about 2 thousand files or so. It takes no time at all!
But, back then, it was taking hours.
Of course, 11 hours was not the Christmas present I wanted. I still felt ok seeing the baby-plastic server dumping logs on the console at light speed for hours. It was able to work for hours at least.
Well, the mistake was using NHibernate. Not that the thing is bad, it simply didn't suit our needs.
We rewrote the "datalayer" after that and called it datalayersql, a name that still persists 13 years later. Check your server binaries and you'll find datalayersql.dll there.
We used plain good-old SQL. It looked so low level at the time we thought it was a mistake. How we dared delving into SQL in the days of ORM? Well, we did, and it paid off.
To SQL or not to SQL
We thought that building on top of a stable relational database was a good idea. We could easily demonstrate it was rock solid, ACID and all that. We didn't want to reinvent the wheel. We were too busy writing merge algorithms, security code, GUIs, installers, and all the rest.
And it paid off. I mean, we save a lot of work and we used it to convince our first customers. Plastic may be new, but it was standing on the shoulders of giants.
Of course, the initial "object oriented ORM approach" made us pay a price. It took quite a while to get rid of the "object oriented style" into a simpler and faster function orientation.
Going embedded
The Plastic server had to be easy to install. So, relying on a pre-installed SQL Server or MySQL was not a good idea for evaluation setups. We needed an embedded database.
Having a Delphi (Borland) background, Firebird seemed to be a good choice. And it was the default datalayer for Plastic for a few years. We still get visits on a blogpost we wrote on how to speed up writings.
It was not that easy on Linux, because the default Firebird embedded didn't work with Mono at the time, but we worked around it with a full Firebird server as a package dependency.
As you can see, we strongly relied on the idea of having a super stable storage and not trying to reinvent the wheel.
Adding support for more and more databases
You must not write abstractions before you have at least three cases. We learned that developing support for Firebird, MySQL and SQL Server. But once the first three abstractions were there, adding more was super simple.
Early Plastic (I think it was version 2), supported a range of relational databases to store all data and metadata: Firebird embedded, Firebird server, SQL Server, SQL Server Compact Edition (embedded), Postgres, MySQL, SQLite and even Oracle. We even tried a DB2 at some point.
Once we discovered SQLite, we soon made it the default. It was super-fast but not good for concurrent users. It was great for evaluations and for hosting your own local repositories, no matter what the size. I used to have 20GB of SQLite databases on my laptop storing a bunch of Plastic repos. Later SQLite + WAL made it even good enough for teams of up to 10-15 members.
Custom optimization
By the way, to make a standard relational database fast enough for a version control capable of matching Git speed (and it was Perforce before that), we needed some tweaks.
Naïve inserts were not enough, as we soon learned. Our optimization toolbelt was full with a variety of bulk copies, temporary tables for lookups, disabling indexes during huge checkins, reducing unneeded indexes for faster inserts, creating the right ones to speed up queries, avoiding looping doing selects and reducing database roundtrips and finally introducing caches to avoid dealing with the db when possible.
Not all these optimizations are general to all databases, so we had custom code here and there to make it faster with the principal databases we recommend: SQLite, MySQL and SQL Server.
And then, the Cloud
Around 2012 we started working on a Cloud service. Hosting repos in the cloud seemed to be a good idea after all ;-)
We didn't simply host our Plastic server on Azure or Amazon. We went beyond that and developed a true multi-tenant, cloud native service with a configurable number of roles attending requests and a load balancer distributing clients. Good bye to simple lists for caches, hello Redis, Memcache and friends.
And, hello database latency.
We always recommended installing the SQL Server or MySQL on the same machine where the Plastic server ran, so it could benefit from in-memory data transfer instead of using the network. And queries were generally good in zero latency scenarios. But many suffered in the new environment, with shared database clusters among the roles.
By that time, we had already denormalized tree storage. I mean, in a version control system you have to store directory trees, with all the files entries and their corresponding directories. This type of structure is not very relational-database-friendly. You can optimize reads to load all directory children in a single query, but recursive tree reading is not standard across databases. So, long before going to Cloud, we were already storing trees as blobs (one db read per directory in the tree) inside the db (as we did with file data itself). With Cloud, we moved blobs outside the database and into actual cloud blob storage (Azure in our case, so Azure blobs).
To handle latency, we optimized the data structures to be able to read just part of the blobs with the exact parts of the trees to load.
Blobs were now highly optimized structures designed for fast speed reads. Pointer-based deserialization to reduce CPU at the maximum, loop unwinding and all that.
How long does it take to write 1 million objects to disk?
We did our best with databases, but still inserting 1 million objects required a bunch of calls to the db. Yes, we were able to bulk copy, so almost one single call. But, there are still many layers involved.
A checkin of 1 million files took a while.
I don't mean it was slow. The time of 11 hours for a quake checkin were long gone. We were able to consistently beat Git in most operations, and to scale quite well too.
But, the question was: how fast can you write 1 million files to a simple binary file? Someone wrote a small console program, created 1 million "revisions" with random metadata, opened a file, and wrote to it. Milliseconds. It took milliseconds.
We knew that; I mean, it was obvious. And those milliseconds were even less if we optimized serialization with a few more pointers here and there. 1 million objects took nothing to write.
"Where binary files were during our entire Plastic life?" – we shouted.
Reinventing the wheel
We had optimized tree storage code for the Cloud. Why not use it for on-premise?
We felt bad. University professors taught us to use relational databases. I mean, we felt like committing heresy. How could we do better than the super clever guys using one of those 7 different databases we supported?
Yeah, Git, and Mercurial, and all the others used their own custom formats. But we were already faster using a safe, trustable, rock-solid, industrial-strength alternative. And we could tell our customers "hey, no worries, your repos are safe on the same database servers your IT folks know how to run".
Now we wanted to go back, bypass the hard-learned principles, and design an ad-hoc storage for the sake of speed.
Customer validation – the fun of the custom
"We run DB2 here. If we go ahead with Plastic, you need to support it" – a potential customer said in a meeting.
"But we do support 7 other databases" – I replied.
"But we are a DB2 shop here".
"Well, but you said we are in a short list against Git, right?" – I asked.
"Yes, you are."
"But, Git doesn't support DB2" – I said.
"Yes, correct. They use their own format, so IT doesn't have to administer the db".
"So, if we use a database, you just trust the one that is your standard in-house. But, if a product uses its own ad-hoc thing, then you don't care, because it simply doesn't use a non-approved database".
"Correct" – they said.
So, basically, having 7 different databases was not always playing in our favor. Sometimes it turned against us and hit us hard.
The relational database way forced us to keep adding more and more backends to match what customers had. But, if we had our own thing, since IT could back it up as plain files, they didn't care.
Go back to the summer day of the failing databases
I started this story telling you how one server dealing with about 4TB of repos was having a bad time. Actually, performance was good overall, but there were peaks of slow response, and somehow it deteriorated over time. The problem happened to be in the database connector code we were using. But, the database itself was shouting too. It is not that 4TB is outside database limits nowadays. But it slowed down Plastic from time to time.
We made a decision that day – it was not going to happen again. It could be a connector, or it could be our code, but not being in control of the full stack wasn't good anymore. I know, I know, it sounds dangerous and control-freakish. But we didn't want to explain to a customer that the server was fine, but his precious database was underperforming for whatever reason.
That day "filesystem datalayer" was resurrected.
Fast as a jet fighter
For years we dabbled with the idea of a "filesystem datalayer". Somewhere in our doc repo I have a scanned notebook page with a few drawings of how it should work. It dates to 2008.
We felt as we were committing heresy as I said above. So, it took us some time to get rid of the bad feeling and just focus on a different solution.
We started porting the Cloud-tree code to on-premise, then the rest of former tables turned into ad-hoc data structures.
Memory-mapped files were extensively used to speed up reads.
Jet was born. We wanted a cool name for it and "datalayerfs" didn't make the cut. Jet. Yep, blackbird the name we wanted but Jet was shorter and clearer in meaning.
Finally released
It was an early Plastic 6.0 version that finally introduced Jet. We used it internally for months, and we loved it. Of course, we ended up coding some super simple transaction system to ensure bulk writes were always consistent, some recovery code we didn't anticipate initially, and the ability for the Plastic server to enter "readonly mode" so that backups could happen while the server was running (memory-mapped access forces you to lock some files).
One of the nice findings was that a poor Bananapi (raspberry + SATA) was faster doing checkins than our main server. A Raspberry on not that fast of storage ran circles around the SQLite counterpart! We were excited. Not long after we decided to replace SQLite with Jet. It was the new default.
And only a few months later (and it was early 2017 already), we migrated some of our major customers to Jet. Huge TB sized repos moved to Jet in almost no time (well, a few hours at worse). We implemented some progressive migration tools, so we could be ready on the migration day, with only the changesets from the previous day left to do. It went seamless. Game studios, corporations, small teams, all happy with Jet.
It doesn't mean we abandoned SQL based servers. Plastic still runs on many of them. I'd say there are still more big servers on SQL than on Jet. But the biggest were moved, all new customers are on Jet, and many ask to migrate every month (which can be achieved seamlessly from the webadmin interface, unless your repos are really big, in which case we offer to help you plan the entire move).
Once upon a time...
And that is the story of Jet. This is how we decided to go ahead and implement a faster, ad-hoc storage for Plastic repos.
We had a big problem – ensuring consistent and predictable access to repos, no matter what the size. We also wanted to eventually reduce code in the data access layer.
On the way, we dealt with some of our fears: reinventing the wheel and probing whether something better than seasoned SQL databases was possible. Of course, we cheated along the way: relational databases mean generic storage. They can do anything. Jet is all about Plastic repos, just that. Nothing generic, nothing special, nothing complex. Just simple structures written to disk in a way in which they can be read as fast as possible.
And we reduced repo size an average of 30% on the way :)
0 comentarios: