Monday, 31 July 2017

Deployment Through the Ages

Vanessa Love just posted this intriguing little snippet on Twitter:

And I got halfway through sticking some notes into the Google doc, and then thought actually this might make a fun blog post. So here’s how deployment has evolved over the 14 years since I first took over the hallowed mantle of [email protected].

2003: Beyond Compare (maybe?)

The whole site was classic ASP – no compilation, no build process, all connection credentials and other settings were managed as application variables in the global.asa file. On a good day, I’d get code running on my workstation, test it in our main target browsers, and deploy it using a visual folder comparison tool. It might have been Beyond Compare; it might have been something else. I honestly can’t remember and the whole thing is lost in the mists of time. But that was basically the process – you’d have the production codebase on one half of your screen and your localhost codebase on the other half, and you’d cherry-pick the bits that needed to be copied across.

Of course, when something went wrong in production, I’d end up reversing the process – edit code directly on live (via UNC share), normally with the phone wedged against my shoulder and a user on the other end; fix the bug, verify the user was happy, and then do a file sync in the other direction to get everything from production back onto localhost. Talk about a tight feedback loop – sometimes I’d do half-a-dozen “deployments” in one phone call. It was a simpler time, dear reader. Rollback plan was to hammer Ctrl-Z until it’s working again; disaster recovery was tape backups of the complete source tree and database every night, and the occasional copy’n’paste backup of wwwroot before doing something ambitious.

Incidentally, I still use Beyond Compare almost daily – I have it configured as my merge tool for fixing Git merge conflicts. It’s fantastic.

2005: Subversion

Once we hired a second developer (hey Dan!) the Beyond Compare approach didn’t really work so well any more, so we set up a Subversion server. You’d get stuff running on localhost, test it, maybe share an http://www.spotlight.com.dylan-pc/ link (hooray for local wildcard DNS) so other people could see it, and when they were happy, you’d do an svn commit, log into the production web server (yep, the production web server – just the one!) and do an svn update. That would pull down the latest code, update everything in-place. There was still the occasional urgent production bugfix. One of my worst habits was that I’d fix something on production and then forget to svn commit the changes, so the next time someone did a deployment (hey Dan!) they’d inadvertently reintroduce whatever bug had just been fixed and we’d get upset people phoning up asking why it was broken AGAIN.

2006: FinalBuilder

This is where we start doing things with ASP.NET in a big way. I still dream about OnItemDataBound sometimes… and wake up screaming, covered in sweat. Fun times. The code has all long since been deleted but I fear the memories will haunt me to my grave.

Anyway. By this point we already had the Subversion server, so we had a look around for something that would check out and compile .NET code, and went with FinalBuilder. It had a GUI for authoring build pipelines and processes, some very neat features, and could deploy .NET applications to IIS servers. This was pretty sophisticated for 2006. 

2008: test server and msdeploy

After one too many botched FinalBuilder deployments, we decided that a dedicated test environment and a better deployment process might be a good idea. Microsoft had just released a preview of a new deployment tool called MSDeploy, and it was awesome. We set up a ‘staging environment’ – it was a spare Dell PowerEdge server that lived under my desk, and I’d know when somebody accidentally wrote an infinite loop because I’d hear the fans spin up. We’d commit changes to Subversion, FinalBuilder would build and deploy them onto the test server, we’d give everything a bit of a kicking in IE8 and Firefox (no Google Chrome until September 2008, remember!) and then – and this was magic back in 2008 – you’d use msdeploy.exe to replicate the entire test server into production! Compared to the tedious and error-prone checking of IIS settings, application pools and so on, this was brilliant. Plus we’d use msdeploy to replicate the live server onto new developers’ workstations, which was a really fast, easy way to get them a local snapshot of a working live system. For the bits that still ran interpreted code, anyway.

2011: TeamCity All The Things!

By now we had separate dev, staging and production environments, and msdeploy just wasn’t cutting it any more. We needed something that can actually build different deployments for each environments – connection strings, credentials, and so on. And there’s now support in Visual Studio for doing XML configuration transforms, so you create a different config file for every environment, check those into revision control, and get different builds for each environment. I can’t remember exactly why we abandoned FinalBuilder for TeamCity, but it was definitely a good idea – TeamCity has been the backbone of our build process ever since, and it’s a fantastically powerful piece of kit.

2012: Subversion to GitHub

At this point, we’d grown from me, on my own doing webmaster stuff, to a team of about six developers. Even Subversion is starting to creak a bit, especially when you’re trying to merge long-lived branches and getting dozens of merge conflicts, so we start moving stuff across to GitHub. It takes a while – I’m talking months – for the whole team to stop thinking of Git as ‘unnecessarily complicated Subversion’ and really grok the workflow, but we got there in the end.

Our deployment process at this point was to commit to the Git master branch, and wait for TeamCity to build the development version of the package. This would get built and deployed. Once it was tested, you’d use TeamCity to build and deploy the staging version – and if that went OK, you’d build and deploy production. Like very step on this journey, it was better than anything we’d had before, but had some obvious drawbacks. Like the fact we had several hundred separate TeamCity jobs and no consistent way of managing them all.

2013: Octopus Deploy and Klondike

When we started migrating from TeamCity 6 to TeamCity 7, it became rapidly apparent that our “build everything several times” process… well, it sucked. It was high-maintenance, used loads of storage space and unnecessary CPU cycles, and we needed a better system.

Enter Octopus Deploy, whose killer feature for us was the ability to compile a .NET web application or project into a deployment NuGet package (an “octopack”), and then apply configuration settings during deployment. We could build a single package, and then use Octopus to deploy and configure it to dev, staging and live. This was an absolute game-changer for us. We set up TeamCity to do continuous integration, so that every commit to a master branch would trigger a package build… and before long, our biggest problem was that we had so many packages in TeamCity that the built-in NuGet server started creaking.

This started life as an experimental build of themotleyfool/NuGet.Lucene – which we actually deployed onto a server we called “Klondike” (because klondike > gold rush > get nuggets fast!) – and it worked rather nicely. Incidentally, that NuGet.Lucene library is now the engine behind themotleyfool/Klondike, a full-spec NuGet hosting application – and I believe our internal hostname was actually the inspiration for their project name. That was a lot of fun for the 18 months or so that Klondike existed but we were still running the old NuGet.Lucene codebase on a server called ‘klondike’. It’s OK, we’ve now upgraded it and everything’s lovely.

It was also in 2013 that we started exploring the idea of automatic semantic versioning – I wrote a post in October 2013 explaining how we hacked together an early version of this. Here’s another post from January 2017 explaining how it’s evolved. We’re still working on it. Versioning is hard.

And now?

So right now, our build process works something like this.

  1. Grab the ticket you’re working on – we use Pivotal Tracker to manage our backlogs
  2. Create a new GitHub branch, with a name like 12345678_fix_the_microfleems – where 12345678 is the ticket ID number
  3. Fix the microfleems.
  4. Push your changes to your branch, and open a pull request. TeamCity will have picked up the pull request, checked out the merge head and built a deployable pre-release package (on some projects, versioning for this is completely automated)
  5. Use Octopus Deploy to deploy the prerelease package onto the dev environment. This is where you get to tweak and troubleshoot your deployment steps.
  6. Once you’re happy, mark the ticket as ‘finished’. This means it’s ready for code review. One of the other developers will pick it up, read the code, make sure it runs locally and deploys to the dev environment, and then mark it as ‘delivered’.
  7. Once it’s delivered, one of our testers will pick it up, test it on the dev environment, run it past any business stakeholders or users, and make sure we’ve done the right thing and done it right.
  8. Finally, the ticket is accepted. The pull request is merged, the branch is deleted. TeamCity builds a release package. We use Octopus to deploy that to staging, check everything looks good, and then promote it to production.

And what’s on our wishlist?

  • Better production-grade smoke testing. Zero-footprint tests we can run that will validate common user journeys and scenarios as part of every deployment – and which potentially also run as part of routine monitoring, and can even be used as the basis for load testing.
  • Automated release notes. Close the loop, link the Octopus deployments back to the Pivotal tickets, so that when we do a production deployment, we can create release notes based on the ticket titles, we can email the person who requested the ticket saying that it’s now gone live, that kind of thing.
  • Deployments on the dashboards. We want to see every deployment as an event on the same dashboards that monitor network, CPU, memory, user sessions – so if you deploy a change that radically affects system resources, it’s immediately obvious there might be a correlation.
  • Full-on continuous deployment. Merge the PR and let the machines do the rest.

So there you go – fourteen years worth of continuous deployments. Of course, alongside all this, we’ve moved from unpacking Dell PowerEdge servers and installing Windows 2003 on them to running Chef scripts that spin up virtual machines in AWS and shut them down again when nobody’s using them – but hey, that’s another story.

No comments: