Tuesday, 14 February 2017

Naming Things is Hard: Spotlight Edition

Like most specialist industries, software is rife with mainstream English words that we’ve taken and misappropriated to mean something completely different. Show business is no different. The software team here at Spotlight sits smack-bang in the intersection between these two specialist fields, and so when we’re talking to our customers and product owners about the systems we build, it’s very important to understand the difference between typecasting and type casting, and exactly what sort of actor model we’re talking about. We therefore present this delightful “double glossary” of everyday terms that you’ll hear here at Spotlight Towers. Because as we all know, there’s only two hard problems in software: cache invalidation, naming things, and off-by-one errors.

Actor

Software: A mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation.

Showbiz: A person whose profession is acting on the stage, in films, or on television.

Agent

Software: A software agent is a computer program that acts for a user or other program in a relationship of agency

Showbiz: A person who finds jobs for actors, authors, film directors, musicians, models, professional athletes, writers, screenwriters, broadcast journalists, and other people in various entertainment or broadcast businesses.

Callback

Software: Any executable code that is passed as an argument to other code, which is expected to call back (execute) the argument at a given time.

Showbiz: A follow-up interview or audition

Casting

Software: Explicitly converting a variable from one type to another

Showbiz: Employing actors to play parts in a film, play or other production. Also the act of doing same.

Client

Software: The opposite of a server

Showbiz: An actor, specifically in the context of the actor’s relationship with their agent or manager Internally at Spotlight we have both internal and external clients/customers

Client Profile

Software: A subset of the .NET framework intended to run on mobile and low-powered devices

Showbiz: An actor’s professional CV, as it appears on their agent's’ website or in various kinds of casting software and directories

Double

Software: Primitive data type representing a floating-point number

Showbiz: A performer who appears in place of another performer, i.e., as in a stunt.

Mirror

Software: A copy of a system that updates from the original in near to real time, often a database or file storage system

Showbiz: An optical device that helps a performer check they’ve applied their makeup correctly

Principal

Software: Used in database mirroring to refer to the primary instance of the database

Showbiz: A performer with lines.

Production

Software: The live infrastructure and code environment

Showbiz: A film, TV or stage show, such as a professional actor might list on their acting CV.

REST

Software: Representational State Transfer - an architectural style used when building hypermedia APIs

Showbiz: What actors do between jobs.

Script

Software: A computer program written in a scripting language

Showbiz: The written dialogue and directions for a play, film or show

“Sequel”

Software: Standard pronunciation of SQL, referring to either the database query language. Also commonly refers to Microsoft’s SQL Server database product.

Showbiz: A published, broadcast, or recorded work that continues the story or develops the theme of an earlier one.

Server

Software: The opposite of a client

Showbiz: Someone working as waiting staff in a restaurant. Who is quite possibly an actor moonlighting as a server to pay the bills between acting jobs.

Spotlight

Software: The native macOS search application

Showbiz: Our company – www.spotlight.com,  “The Home of Casting” – and the directories and services we have created since 1927.

Staging

Software: A replica of a production hosting environment used to test new features and deployments.

Showbiz: The method of presenting a play or dramatic performance; also used to refer to the stage structure itself in theatre and live performance.

Thursday, 26 January 2017

Semantic Versioning with Powershell, TeamCity and GitHub

Here at Spotlight Towers, we’ve been using TeamCity as our main build server since version 6; it’s a fantastic tool and we love it dearly. It got even better a few years back when we paired it with the marvellous Octopus Deploy; TeamCity builds the code and creates a set of deployable packages known as Octopacks; Octopus deploys the packages, and everything works quite nicely. Well, almost everything. One of the few problems that TeamCity + Octopus doesn’t magically solve for us is versioning. In this post, we’re going to look at how we use Git and TeamCity to manage versioning for our individual packages.

If this sounds like your sort of thing, why not come and work for us? That’s right – Spotlight is hiring! We're looking for developers, testers and a new UX/Web designer - check out jobs.spotlight.com and get in touch if you’re interested.

First, let’s establish some principles

  • We are going to respect the semantic versioning convention of MAJOR.MINOR.PATCH, as described at semver.org.
  • Major and minor versions will be incremented manually. We trust developers to know whether their latest commit should be a new major or minor release according to semantic versioning principles.
  • Building the same codebase from the same branch twice should produce the same semantic version number.
  • Packages created from the master branch are release packages.
  • Packages created from a merge head of an open pull request are pre-release packages.
  • Pre-release packages will use the version number that would be assigned if that branch was accepted for release at build time.

Now, here’s the part where we’re going to deviate from the semantic versioning specification, because our packages actually use a four-part version number. We want to include a build number in our package versions, but the official semver extension for doing this – MAJOR.MINOR.PATCH+BUILD - won’t work with NuGet, so we’re going to use a four-part version number MAJOR.MINOR.PATCH.BUILD. Pre-release packages will be appended with a suffix describing which branch they were built from – MAJOR.MINOR.PATCH.BUILD-BRANCH.

OK, here’s an illustrated example that demonstrates what we’re trying to achieve. Master branch is green. Two developers are working on feature branches – blue and red in this example. To create our pre-release builds, we’re using a little-documented but incredibly useful feature of GitHub known as ‘merge heads’. The idea is that if you have an open pull request, the merge head will give you a snapshot of the codebase that would be created by merging the open pull request into master – so you’re not just testing your new feature in isolation, you’re actually building and testing your new feature plus the current state of the master branch. There is one caveat to this, which I’ll explain below.

So, we’ve got TeamCity set up to build and publish packages every time there’s a commit to master or to the merge head of an open pull request, and we’re also occasionally triggering manual builds just to make sure everything’s hanging together properly. Here’s what happens: 

semantic merging 500px

That line there that’s highlighted in yellow is a gotcha. At this point in our workflow, we’ve merged PR1 into our master branch, but because we haven’t pushed anything to the blue branch since this happened, the blue merge head is out of date. PR2 does NOT reflect the latest changes to master, and if we trigger a build manually, we’ll end up with a package that doesn’t actually reflect the latest state of the codebase. The workaround is pretty simple; if you’re creating pre-release builds from merge heads, never run these builds manually; make sure you always trigger the build by pushing a change to the branch.

Now let’s look at how can we get TeamCity to automatically calculate those semantic version numbers whenever a build is triggered. We’ll start with the major and minor version. We’re going to track these by creating a version.txt file in the root of the project codebase, which just contains the major and minor version numbers. If a developer decides that their feature branch represents a new major or minor version, it’s their responsibility to edit version.txt as part of implementing the feature. This also means that prerelease packages built from that branch will reflect the new version number whilst master branches will continue to use the old version until the branch gets merged, which I think is rather elegant.

For the patch version, we’re going to assume that every commit or merge to the master branch represents a new patch version, according to the following algorithm

  • If the current version.txt represents a NEW major/minor version, the patch number is zero
  • Otherwise, the patch number is the patch number of the latest release, incremented by the number of commits to the master branch since that release.

So – how do we know how many commits there have been since the last release? First, each time we build a release branch, we’re going to use Git tags to tag the repository with the version number we’ve just built. TeamCity will do this for you automatically using a build feature called “VCS labeling”:

image

Assuming every release has a corresponding tag, now we need to find the most recent release number, which we can do from the Git command line.

git fetch –tags
git tag –sort=v:refname

Git tags aren’t retrieved by default, so we need to explicitly fetch them before listing them. Then we list all the tags, specifying sort=v:refname which causes tag names to be treated as semantic versions when sorting. (Remember that semver sorting isn’t alphanumeric – in alphanumerics, v9 is higher than v12). Once we’ve got the latest tag, we need to count the number of revisions since that tag was created, which we can do using this syntax:

git rev-list v1.2.3..HEAD –count

To use this in our TeamCity build, we'll need to output the various different formats of that version so that TeamCity can use them. We want to do three things here:

  • Label the VCS root with the three-part semantic version number v1.2.3
  • Update the AssemblyInfo.cs files with the four-part version number 1.2.3.456 – note that we can’t put any prerelease suffix in the AssemblyInfo version.
  • Pass the full version – 1.2.3.456-pr789 – to Octopack when creating our deployable packages with Octopus.

I've wrapped the whole thing up in a Powershell script which runs as part of the TeamCity build process, which is on GitHub:

To use it in your project, add versions.ps1 to the root of your project repo; create a text file called version.txt which contains your major.minor version, and then add a TeamCity build step at the beginning of your build process that looks like this:

image

Finally, it’s worth mentioning that to use command-line git from Powershell, I had to set up TeamCity to use an SSH VCS root rather than HTTP, and install the appropriate SSH keys on the TeamCity build agent. I don't know whether this is a genuine requirement or a quirk of our configuration; your mileage may vary. And I still find Powershell infuriatingly idiosyncratic, but hey - you probably knew that already. :)

Happy versioning! And like I said, if this sort of thing sounds like something you’d like to work on, awesome - we’re hiring! Check out jobs.spotlight.com for more details and get in touch if you’re interested.

Monday, 9 January 2017

Sharing a Dropbox folder using Bootcamp and exFAT

My main laptop for the last few years has been a 15” Retina MacBook Pro. As I’m primarily a .NET developer, I use Bootcamp to run Windows 10, Visual Studio and SQL Server, but I also boot into macOS almost daily for things like audio recording and video editing using Logic Pro and Adobe Premiere. I’m also a huge fan of Dropbox – I use Evernote for text documents (notes, talk ideas and lyrics), and Dropbox for just about everything else.

My MacBook Pro has a 500Gb SSD, which sounds like a lot, but once you’ve got two operating systems it’s surprising how quickly you start running out of space – and one of my biggest frustrations was having two separate Dropbox folders. I have about 40Gb of stuff in Dropbox, which means a 40Gb Dropbox folder on my NTFS Windows partition, and another 4Gb Dropbox folder on the ExtFS macOS partition. And they store exactly the same filesso much so that Dropbox actually became my default sync mechanism for getting files from Windows into macOS and vice versa.

Being a huge nerd, I thought it would be fun to kick off 2017 by completely repaving my laptop, and I wondered if there was some way to store my Dropbox folder, and other big things like my music library, on a shared partition that both operating systems could use. It took a bit of tweaking but I think I’ve managed to get it working – here’s how. The secret sauce that made it all work is a relatively new file system called exFAT. The killer feature of exFAT here is that both Windows 10 and macOS can read and write it natively, but it doesn’t have any of the file size or volume size restrictions of older interoperable formats like FAT32.

Caveat: this is all completely unsupported. If you try any of this and you lose all your files, I’m not going to help you – and neither are Apple, Microsoft or Dropbox. If you’re happy using batch scripts to toggle file attributes and manually partitioning your OS drive, then this lot might help – but messing around with disk partitions and unsupported filesystems is a good way to end up with an unbootable laptop and a lot of bad mojo. So before we go any further,  back up everything. After all, Dropbox is a file sync tool – if some bizarre combination of macOS and exFAT results in it removing a load of files, the last thing I want is for Dropbox to conveniently remove the same files from all my other devices. I know it’s easy enough to revert to an older version of anything that’s in Dropbox, but I figured having a copy of it all on an external HD probably couldn’t hurt.

Next, work out how you want to partition your disk. In my case, I wanted to go for a ~150Gb partition for each operating system, and a 200Gb shared partition that both of them could see. Your mileage may vary, and – as I did – you might find you can’t get exactly what you wanted, but I managed to get pretty close. Now do a completely clean reinstall of macOS. Give it the entire drive and let it do its thing. Once you’re done, use the Boot Camp Assistant to set up and install Windows. When Boot Camp Assistant asks how you’d like to partition your drive, make the Windows partition equivalent to the total size of your desired Windows partition plus your shared partition – so in this case, I temporarily had a 150Gb macOS partition and a 350Gb Windows partition.

Screen Shot 2017-01-07 at 13.06.39

Once you’ve partitioned the disk, install Windows as usual. You should end up with a regular Boot Camp system, that you can boot into macOS or Windows by holding down the alt key during system startup or by using the Boot Camp options in Windows. Now here’s the fun part. Once you’ve installed Windows, but before you install any other apps or files, you’re going to use the Windows disk utility to shrink that Windows partition down to 150Gb and then create a shared exFAT partition in the remaining space. Right-click the BOOTCAMP (C:) partition, click Shrink Volume...

windows_disk_iutility

There’s some limit of the NTFS filesystem that means “you cannot shrink a volume beyond the point where any unmovable files are located”. Now I don’t know how much variance there is between Windows installations, but I couldn’t shrink my C: drive any smaller than 163Gb, despite the fact that the fresh Windows install only took up about 20Gb. For what I needed, this was close enough – if not, I’d have been looking for something that would let me defrag and reorganize the NTFS partition. Or reading up on how an SSD gets fragmented in the first place – I thought they were just big boxes of non-volatile solid-state memory?

Anyway. Next step was to format the new partition as exFAT, called it Shared, and then reboot into macOS and make sure it was visible. I poked around a bit, created some text documents and other files, rebooted back into Windows… After an hour or so this all seemed to be working quite nicely. Next step was to see how Dropbox coped with it. Rather than throw it at my 40Gb Dropbox account, I set up a new Dropbox account, installed Dropbox on both macOS and Windows and signed into to both of them using the new account.

This is where things got a bit weird. Here’s my best hypothesis as to what’s happening.

  1. macOS is storing some additional metadata for every file. On a regular ExtFS filesystem, the metadata is stored somewhere in the filesystem itself. On an exFAT partition, it’s stored in a tiny hidden file (apparently known as an AppleDouble) next to the original. Alongside foo.txt there’s now a 4k file called ._foo.txt. I don’t think this is a Dropbox thing, it just appears to be caused by running macOS on a non-ExtFS drive.
  2. The Dropbox agent on macOS sees these extra files, thinks that they’re important, and uploads them to Dropbox.
  3. The dot-underscore ._ files are then synced across all your other Dropbox devices – I can see them in the web interface as well as on my Windows partition.

If I wasn’t using Dropbox, the ._ files would still be there, but macOS marks them as hidden and Windows appears to respect this – doing a dir /a:h in one of my exFAT folders will show them but by default they don’t appear in Windows Explorer. I figured this wasn’t a big deal – I could live with a bunch of hidden 4K files – so I removed the temporary Dropbox account and installed the real one. This may have been a bit overly optimistic. Dropbox happily synced 40Gb of files from Windows, but when I rebooted into macOS, it created the ._ files for all of them… and then when I rebooted into Windows again, it started trying to sync all the hidden ._ files. After twelve hours of this, I concluded that something wasn’t working quite as I’d hoped, so I modified the plan slightly.

I realized that I don’t actually need the Dropbox agent running on both OSes – just having the shared Dropbox folder is enough – so I uninstalled the Dropbox agent from macOS, and things suddenly got a lot more straightforward. The dot-underscore files are still there. On the exFAT shared partition they’re hidden by default, but Dropbox helpfully syncs all the ._ files into all my other Windows systems and so I’ve created  a batch script in my Dropbox folder that finds any of these and sets them to hidden so they don’t show up in Windows Explorer. It’s a one-liner:

for /r %1 in (._*) do attrib +h "%1"

The last thing I wanted to try before doing it for real was to see how VMWare Fusion coped with the shared partition. As well as dual-booting between macOS and Windows, I’ll also occasionally use VMWare to run the Windows partition as a VM inside macOS, which works pretty well. Here’s what I discovered:

  • As soon as you boot the Windows VM, both the BOOTCAMP partion and the Shared partition disappear from macOS.
  • The BOOTCAMP partition is mounted in VMWare as drive C: (obviously), and the Shared partition appears in the VM as drive D:
  • The Dropbox agent will immediately pick up and sync any files created from macOS
  • When I shut down the VM, the BOOTCAMP and Shared partitions both show up in macOS again and everything’s back to normal.

So, here’s the final setup I ended up with. Windows 10 is running the Dropbox agent, and syncing files with my Dropbox cloud storage as intended. The Dropbox folder is on the shared exFAT partition, so macOS can read and write my local Dropbox folder, but macOS doesn’t sync anything. If I add files to Dropbox when running macOS, when I reboot into Windows again, the sync agent will pick up these new files and sync them to the cloud. Finally, if I’m in macOS and find I need a file that’s been added to Dropbox from another device and hasn’t synced yet, I can either reboot into Windows and sync it, fire up the VMWare machine for a few moments, or just download it via the web interface.

I've been running it for a couple of weeks, including a week of wrangling PowerPoint slides and shared files at NDC London, and it's working really quite nicely. And it's really rather nice to have nearly 200Gb of free disk space after installing both operating systems, all my apps and utilities, and 40Gb worth of Dropbox files.

Tuesday, 3 January 2017

My Life with the Microsoft Natural Keyboard

A moment ago I was catching up on Twitter and I saw this:

Now Scott has written some great posts about keyboards – and mice, and workstation ergonomics in general – over the years, so I immediately clicked the link to see what was so exciting… and wow. This is the new Microsoft Surface Ergonomic Keyboard, and I, too, am now sitting here trying not to buy it. Actually, I’m waiting for a UK layout, and then I suspect that trying not to buy it will rapidly become unmanageable. I may even fail to not buy two of them so I have one at work and one at home.

Top view of keyboard

Now, there’s a couple of things about this keyboard that are interesting only because Microsoft have consistently got them right, and then messed them up, and then got them right again, and then messed them up again - over many, many years. Like an even-numbered Star Trek movie or an odd-numbered version of CorelDraw, this latest one is a good one – and as somebody who’s used every incarnation of the Microsoft Natural keyboard, this seems like a nice opportunity to take a little wander down memory lane.

OK, cue the Wayne’s World dream-sequence time-travel effects… and here’s a keyboard. A 102-key IBM PS/2 keyboard, which was the standard layout for PC keyboards for about a decade, back in the days when keyboards were beige and the cool kids knew all sorts of tricks that would let you load HIMEM.SYS and your mouse driver and still have enough main memory left to run Wing Commander 2.

IBM Model M.png(IBM PS/2 keyboard by Raymangold22 via Wikipedia | CC0 | Link)

Around the time of Windows 95, Microsoft proposed the subtle addition of a two new keys – a Windows key (actually two of them) in those handy gaps next to the Ctrl keys, and a key that I’ve just this second learned is called the menu key, which is basically a right-click key. Other than shortening the space bar slightly, these new keys didn’t really move things around much, which was great, because that hard-earned muscle memory that let you hit triple-key combinations without taking your eyes off the screen worked just fine.

Oh, and they also started making them in colours that weren’t beige, which was a big improvement if you actually had to share your living space with them.

image

It’s around this time that I left school and got my first IT job, which meant I was typing for a good chunk of every day, and within a year I started getting unpleasant pain in my wrists and forearms. During a chance conversation at work, someone mentioned that a former employee there had had the same problem and switched to using an ergonomic keyboard – which was still knocking around in a cupboard somewhere. I tried it out and was instantly smitten.

This was the original Microsoft Natural keyboard, released in 1994.

imagePhoto by DeanW77 via Wikipedia – own work, CC BY-SA 4.0, Link

I absolutely loved it. I literally wore it out – I used it until some of the keys no longer worked, and then went shopping for a replacement… and discovered you couldn’t get the original Natural keyboard anymore; it had been replaced by the Natural Keyboard Elite, released in 1998.

https://www.engadget.com/products/microsoft/natural-keyboard/elite/ 

Looks close enough, right? Except if you look closely, you’ll see that instead of the familiar inverted-T cursor shape and two rows of navigation keys, the cursor keys are in a sort of weird diamond formation and the navigation keys are in two columns.

This was horrible. All those years of muscle memory suddenly gone – every time you’d try to hit Ctrl-PgUp or Shift-End you’d get it wrong. And when you’re in the zone, that’s a horrible, jarring experience – every time it happens it interrupts your flow, wrenches you back to reality and makes you want to throw the damn thing across the room. Imagine driving a car where the brake pedal is above the accelerator instead of alongside it – it was a truly unpleasant experience.

Fortunately, it didn’t take long for them to realise this one was a bit of a mistake. A year later in 1999, they came out with the Natural Keyboard Pro, and it was fantastic.

image
By ----PCStuff via Wikipedia, CC BY-SA 2.5, Link

All the keys were in the right places, and it included a two-port USB hub which was great for plugging in your mouse. It added a bunch of “multimedia” buttons (which I never really used) and a dedicated button for launching the Windows Calculator (which actually proved to be surprisingly useful.) I loved this keyboard dearly. You can guess what happened next… yep, I used it until it wore out, went shopping for a new one, and… yeah. No more Natural Keyboard Pro. Instead, we had this delightful triumph of form over function:


photo via Engadget

The Microsoft Natural Multimedia Keyboard, released in 2004. OK, it’s got the classic inverted-T cursor keys (yay, but - look at that! Not only are the navigation keys all wrong, but the Insert key isn’t even there; we’ve got a massive double-height Delete key instead. If memory serves, Insert was relegated to some sort of funky combo involving the PrtSc key. Oh, and this was also the keyboard where you had to keep F-Lock switched on all the time because the function keys were actually some sort of weird keyboard shortcuts that nobody ever used ever. You know. Like a dedicated key for “Send”, because Ctrl-Enter is just too complicated.

Again, it didn’t take long for them to come out with something better… and boy, did they ever get it right with the next one. The Microsoft Natural Ergonomic Keyboard 4000.


photo via microsoft.com

This was my main keyboard for most of the last decade. I loved this keyboard so much I actually stockpiled it – one at work, one at home, and a few spares standing by in case they wore out. And wear out they did, one by one, until late last year I decided it was time to replenish the stockpile… which is when my long-suffering colleagues half-jokingly asked if, maybe, I’d be prepared to try something quieter. See, the 4000 is lovely, but it’s noisy. The keys have plenty of travel, with a nice satisfying thump at the bottom of each keystroke, and the keyboard’s casing – which is big – makes a rather effective sounding-box. Until the bottom falls out of the London real estate market and we can justify private offices for our developers, FogCreek style, the open plan office is an unfortunate fact of life for most of us… and so, in the spirit of workplace harmony, I ordered a Microsoft Sculpt keyboard.

image
photo via Microsoft

And you know what? It’s pretty close to perfect. It’s comfortable. It’s quiet. It takes up about 60% of the desk space that the old Ergonomic 4000 did. The numeric keypad is actually separate, which I’m completely undecided about… when you actually want to type on it, it’s annoying not having it fixed in place, but being able to pick it up and use it like an old-school calculator is actually surprisingly useful. Except – yep – the navigation key layout is all screwed up. Again. I hit Insert by mistake all the time on this thing, and frequently land on the left cursor when I’m reaching for Ctrl.

But with the release of the Surface keyboard at the top of the post, it looks like, yet again, there’s a truly great keyboard hot on the heels of the not-so-great one. I’m just really curious as to why they keep bringing out models that use non-standard keyboard layouts.

image
photo via Microsoft

When it hits the market here in the UK, I’ll pick one up and let you know how I get on. Now, if they’d only release one that was completely black – with no key markings – then we’d really be onto something. I wonder if you can spray-paint it...

Monday, 14 November 2016

IdentityServer, OpenID Connect and Microsoft CRM Portals

As readers of this blog will know, here at Spotlight we’re in the process of moving nine decades’ worth of legacy business process onto Microsoft Dynamics CRM, aka CRM Online, which I gather is now called Dynamics 365 (because hey, it’s not like naming things was hard enough already, right?)

We’re also investigating a couple of options for building customer-facing systems that integrate with Dynamics. Until last year, there were really three options for this – a product called Adxstudio, a free Microsoft component called the CRM Portal Accelerator, or rolling your own solution using the CRM SDK. Around this time last year, Microsoft quietly retired the Portal Accelerator component and acquired Adxstudio, and since then, they’ve been in the process of assimilating it into the Dynamics product family – which has meant it’s been something of a moving target, both in terms of the supported features and in terms of the quality of documentation and examples.

I’ve previously blogged about one way to integrate Adxstudio with your existing authentication system, but that approach relied completely on running Adxstudio on-premise so you could run your own code as part of the request lifecycle – and as you may have noticed, there’s a bit of a trend in IT at the moment away from running your own servers and towards using hosted managed services, so that patching and backups are somebody else’s problem. Since Microsoft acquired Adxstudio, there’s been a lot of churn around what’s supported and what’s not – I’m guessing that behind the scenes they’re going through the Adxstudio codebase feature-by-feature and making sure it lines up with their plans for the Dynamics 365 platform, but that’s just guesswork on my part.

One of the main integration points I’ve been waiting for is the ability for a Microsoft-hosted Portal solution to use a third-party OpenID Connect endpoint to authenticate users, and it appears in the latest update this is finally supported – albeit with a couple of bumps along the way. Here’s what I’ve had to do to get a proof-of-concept up and running.

Setting up Dynamics CRM Portals

First, you’ll need to set up a Dynamics Portal trial. You can get a 30-day hosted trial of Dynamics CRM Online by signing up here – this actually gives you a full Office 365 organization including things like hosted Active Directory, as well as the Dynamics CRM Online instance we’re using in this example. Next, you’ll need to ask nicely for a trial of the portal add-on – which you can do by filling out the form at crmmanagedtrials.dynamics.com.

Setting up IdentityServer and configuring an ngrok tunnel

Whilst you’re waiting for the nice Microsoft people to send you your trial license, get up and running with IdentityServer. For this prototype, I’m using the MVC Authentication example from the IdentityServer3.Samples project – clone it to your workstation, open the MVC Authentication solution, hit F5, verify you can get up and running on localhost.

Next – in order for Dynamics CRM Online to talk to your IdentityServer instance, you’ll need to make your IdentityServer endpoints visible to the internet. You could do this by deploying your IdentityServer sample to Azure or AWS, but for experiments like this, I like to use a tool called ngrok, which will create temporary, secure tunnels from the internet to your workstation. Download ngrok, unzip it somewhere sensible.

Pick a tunnel name. I’m using authdemo in this example but any valid DNS host name will do. Next, create a local IIS application pointing to the EmbeddedMvc folder in your samples directory, and set the host name to <your tunnel name>.ngrok.io

image

Now run ngrok.exe to create a tunnel from the internet to your new IIS application:

C:\tools\ngrok> ngrok.exe http –subdomain=authdemo 80

ngrok by @inconshreveable

Session Status        online
Version               2.1.18
Region                United States (us)
Web Interface         http://127.0.0.1:4040
Forwarding            http://authdemo.ngrok.io -> localhost:80
Forwarding            https://authdemo.ngrok.io -> localhost:80

Connections           ttl     opn     rt1     rt5     p50     p90
                      0       0       0.00    0.00    0.00    0.00

All being

If that’s worked, you should be able to fire up a browser, go to http://authdemo.ngrok.io/ – replacing ‘authdemo’ with your own tunnel name - and see the IdentityServer3 sample landing page:

image

Configuring IdentityServer

Right. Next thing we need to do is to make a couple of changes to the IdentityServer configuration, so that it’ll run happily on authdemo.ngrok.io instead of on localhost

First, enable logging. Just do it. Use the package manager console to install the Serilog.Sinks.Trace package. Then add this to the top of your Configuration() method inside Startup:

Log.Logger = new LoggerConfiguration()
                .MinimumLevel.Debug()
                .WriteTo.Trace()
                .CreateLogger();

and add this to your web.config, specifying a path that’s writable by the application pool:

<system.diagnostics>
  <trace autoflush="true"
         indentsize="4">
    <listeners>
      <add name="myListener"
           type="System.Diagnostics.TextWriterTraceListener"
           initializeData="C:\logfiles\identityserver.log" />
      <remove name="Default" />
    </listeners>
  </trace>
</system.diagnostics>

Next, do a global search and replace, replacing any occurrence of localhost:44319 with authdemo.ngrok.io – again, substituting your own tunnel name as required.

Next, add a new client to the static EmbeddedMvc.IdentityServer.Clients class the IdentityServer sample project – changing the highlighted values to your own client ID, client secret, and portal instance URL:


new Client {
    ClientName = "Dynamics CRM Online",
    ClientId = "crm",
    Flow = Flows.Hybrid,
    ClientSecrets = new List() { new Secret("secret01".Sha256()) },
    RedirectUris = new List { 
"https://my-portal-instance.microsoftcrmportals.com"
}, PostLogoutRedirectUris = new List {
"https://my-portal-instance.microsoftcrmportals.com"
}, AllowedScopes = new List { "openid" } },

 

Adding IdentityServer as an endpoint in CRM Portals

Finally, you need to add your new IdentityServer as an identity provider. CRM Portals uses the Dynamics CRM platform for all its configuration and data storage, so to add new settings you’ll need to log into your Dynamics CRM Online instance, go into Portals > Site Settings, and add the following values:

Name

Value

Website

Authentication/OpenIdConnect/AuthDemo/Authority

http://authdemo.ngrok.io/identity/

Customer Self-Service

Authentication/OpenIdConnect/AuthDemo/Caption

IdentityServer OpenID Connect Demo

Customer Self-Service

Authentication/OpenIdConnect/AuthDemo/ClientId

crm

Customer Self-Service

Authentication/OpenIdConnect/AuthDemo/ClientSecret

secret01

Customer Self-Service

Authentication/OpenIdConnect/AuthDemo/MetadataAddress

http://authdemo.ngrok.io/identity/.well-known/openid-configuration

Customer Self-Service

Finally, it looks like you’ll need to restart the portal instance to get it to pick up the updated values – which you can do by logging into the Office 365 Admin Center, Admin Centers, CRM, Applications, Portal Add-On, clicking ‘MANAGE’, and pressing the nice big RESTART button on the Portal Actions page:

image

And – assuming everything lines up exactly right – you should now see an additional login button on your CRM Portals instance:

image

Clicking on it will bounce you across to your ngrok-tunnelled IdentityServer MVC app running on localhost:

imageLog in as bob / secret, and you’ll get the OpenID permissions check:

image

…and when you hit ‘Yes, Allow’, you’ll be redirected back to the CRM Portals instance, which will create a new CRM Contact linked to your OpenID Connect identity, and log you in to the portal.

Conclusions

Of course, in the real world there’s a lot more to it than this – there is a huge difference between a proof of concept like this and a production system. These sorts of user journeys form such a key part of delivering great user experience, and integrating multiple systems into your login and authentication/authorization journeys only makes this harder. But it did work, and it wasn’t actually all that complicated to get it up and running. It’s also interesting to see how something like OpenID Connect can be used to integrate a powerful open-source solution like IdentityServer with a heavyweight hosted platform service like CRM Portals.

Whether we end up adopting a hosted solution like CRM Portals – as opposed to just building our own apps that connect to CRM via the SDK or the new OData API – remains to be seen, but it’s nice to see solutions from two radically different sources playing nicely together thanks to the joy of open protocols like OpenID Connect. Long may it continue.

Monday, 31 October 2016

The Laws of Distributed Systems

I’ve spent a lot of time over the last year reading, thinking, and speaking at conferences about distributed systems, organisational structures, and the eponymous laws of software development. Over the course of many conversations and countless blog posts and articles, something has crystallised from thinking about three laws in particular, which – if it’s right - could have substantial implications for all of us as software developers, and for the people who use the systems we build.

TL;DR: if we keep having meetings, the internet will stop working.

(There – that got your attention, didn’t it?)

Moore’s Law

So, let’s recap. Gordon Moore was the co-founder of Intel and Fairchild Semiconductor. Back in 1965, Moore wrote a paper predicting that the number of components per integrated circuit would double every year. In 1975, he revised his forecast to doubling every two years. His predictions have proved accurate for several decades, and will probably continue to do so until the 2020s, but what’s really interesting is what’s happened since 2000.

Here’s the average total transistor count of CPUs against their year of introduction since 1965. Plotted on a a logarithmic axis, it’s pretty close to a straight line.

image_thumb1
[Data source: Wikipedia] 

Now here’s the same graph, but showing transistors per core.

image_thumb3
[Data source: Wikipedia]

See how around 2005 the two series suddenly diverge sharply? That’s because in the early 2000s, we began hitting the physical limits of how many transistors could be integrated into a single CPU. Somewhere around the 4Ghz mark, we hit a wall in terms of raw clock speed, and so the semiconductor industry hit upon the bright idea of multicore CPUs – basically putting more than one CPU into the same physical package.

In the same time frame, we’ve seen an industry-wide shift away from monolithic powerhouse servers towards distributed systems. Modern web apps – which are really just big multiuser systems – run across clusters of dozens or hundreds of ephemeral worker nodes; a radical contrast to the timesharing mainframe systems of the 1970s and 1980s.

The amount of computing power available at a particular price point is still increasing exponentially, but we’re no longer scaling up, we’re scaling out. And the reason we’re scaling is that the load on our systems – our websites, APIs and servers – is also increasing. More people are getting online, people are using more devices and connected services, and those devices are delivering increasingly rich user experiences – which means more data, more power and more bandwidth. Here’s Business Insider’s analysis and forecast of the number of connected devices from 2015 to 2020:

unnamed

To cope with this ever-increasing level of expectation, we need to build systems that will scale out to cope with demand. Our code needs to parallelize. We need to decompose our problems into small, autonomous units of work that can be distributed across as many cores or nodes as we have available, and combine the outputs of those operations to deliver the results our users are expecting.

Amdahl’s Law

This brings us to Amdahl’s Law. Gene Amdahl started out designing mainframe systems for IBM. He was the chief architect of the IBM System/360, and he first presented his eponymous law back in 1967. Amdahl’s Law controls the theoretical performance improvements we can expect by parallelizing some given workload.

Amdahl’s Law is actually Slatency(s) = 1/((1-p) + (p/s)) – but the gist of it is nicely explained by thinking about Christmas dinner. Or Thanksgiving, if that’s your thing. If you’ve got one person with one cooker working alone, it’ll take a good 20 hours to prepare all the trimmings for a Christmas dinner. By adding more people and more cookers, you can parallelise this and so complete it faster – but you reach a point where everything is done and everyone’s stood around waiting for Jeff to finish roasting the turkey, because you can’t roast a turkey in under four hours, no matter how many chefs and ovens you’ve got. 

If you need to parallelize like this, eliminate the turkey. Have steaks instead – because you can cook steaks in parallel. If you suddenly get another 20 guests showing up half an hour before lunch, no problem – you don’t need to wait four more hours to roast another turkey; just get 20 more chefs to cook 20 more steaks and you’ll still be done on time. And because you’re using cloud infrastructure, you can spin up more chefs and griddles instantly to cope with the increased demand.

See, by designing systems to eliminate those non-parallelisable workloads, we create systems that scale smoothly with the available resources. The beauty of that is that, like all the best solutions in software, it turns “how fast is our website” into a pure business decision. You want faster pages? Pay for more servers. No need to rewrite your algorithms; just throw more power at it.

Conway’s Law

Finally, there’s Conway’s Law. First published in 1964, Conway’s Law is the observation that ‘any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.’ As with Moore’s Law, it’s an observation that has proved remarkably prescient over the intervening decades. I’ve spoken at length about how Conway’s Law has affected the teams and projects I’ve worked on personally, but there’s also some interesting examples in the software industry at large. The relationship between the Linux kernel and the various distributions based on it has interesting parallels with Linus Torvalds’ role in the Linux ecosystem. Id Software’s genre-defining Doom and Quake games – tight, cohesive, focused engines, created by a bunch of coders camped out in a beach house with soda, pizza and no distractions, with less tightly-coupled elements like music and level design handled by less close-knit development efforts.  High-profile open source projects like Chromium – the underlying rendering engine, the browser itself, and the ecosystem of plugins and extensions closely reflecting the tight-knit Webkit project, the loosely-organised contributors and pull requests that shape the development of the browser application, and the community of plugin and extension developers who don’t engage with the project directly but rely on published contracts and protocols just as their plugins and extensions do.

And then there’s the really obvious examples, like this one:

image[92] 

Putting it all together…

OK, so let’s look at what happens when we interpret those three laws together. Moore’s Law has informed half a century of user expectations about technology. More people do more stuff on more devices, and they expect those experiences to keep on getting better, faster and more responsive. As we’ve seen above, the increase in raw computing power that’s going to deliver those improvements isn’t about clock speed any more, it’s about parallelism. Amdahl’s Law tells us whether systems will benefit from that parallelism or not – and that systems based around blocking, non-parallelisable, long-running operations will benefit the least from the next decade of computing innovation. And Conway’s Law says that if we don’t want our systems to contain these kinds of blocking, non-parallelisable operations, then we should be looking to eliminate them from our organisations.

Which brings us to the crux of the thing: what’s the organizational equivalent of a long-running non-parallelizable operation?

How about sitting around reading Hacker News because the person who’s asked you to build a “Summary Dashboard” hasn’t told you where to find the data, or what the dashboard should look like, and they’re out of the office right now, they didn’t leave any notes, and you can’t do anything until they get back?

How about a two-hour project update meeting where a series of people sit around telling each other things they could have emailed, or written down in a ticket or on a wiki page?

How about sitting on a train for an hour to get to the office in Canary Wharf where you’re expected to be at 09:00 every day, despite the fact that your source code is hosted in the US, your data centre is in Ireland, your issue tracking system is hosted in Frankfurt and your customers are online 24/7 all over the world?

One of the underlying principles of the agile manifesto is that ‘the most efficient and effective method of conveying information to and within a development team is face-to-face conversation’. I think that’s correct, but I think it might be optimising for the wrong metric. Sure, a conversation is a high-bandwidth, high-interaction discussion medium, and I find face-to-face great for bouncing ideas around and solving problems - but conversations are ephemeral. They’re not captured anywhere, nobody outside the conversation knows what was said, and there’s always the risk the people you’re talking to assure you they get it when they actually haven’t understood a single word you said. Perhaps we should be optimising our communication patterns for discoverability instead of raw bandwidth; trading a little temporary velocity for some long-term efficiency.

This isn’t just about cancelling a couple of meetings and letting people work from home on Fridays. It’s about changing the way we think about collaboration, so that the interaction patterns we want to see in our systems can emerge organically from the interaction patterns used by the people who created them. It’s about taking established architectural patterns and practises used in asynchronous distributed systems, and working out if we can apply those patterns to our teams and our projects. What if you applied event sourcing to your project backlogs, so you don’t keep having to ask people about the context behind a particular decision? Maybe you’re even doing this already – I know a lot of open-source projects that do an excellent job of capturing this history as part of their open issues and tasks so anybody who wants to pick up a particular ticket can see the complete history, the discussion, the arguments and hopefully the eventual consensus. What if you treat your documentation - wiki pages, GitHub pages, READMEs - like the query stores in a CQRS system? Rapid retrieval, read-only, optimised for consumption, and updated as necessary when processing commands (i.e. making changes) that affect the underlying systems that are being documented?

What I find remarkable is that Moore, Amdahl and Conway all published their eponymous laws almost exactly fifty years ago – Conway’s Law was published in 1964, Moore was 1965, Amdahl was 1967. Their observations hail from a decade of astonishing engineering achievements – Apollo, the Boeing 747, the Lockheed SR-71, the geodesic dome – in an era when computers were still highly specialist devices. Sure, you could argue that people working on timesharing systems in the 1960s couldn’t possibly have foreseen the long-term social implications of distributed systems engineering – but remember, this is the generation that landed on the moon using slide rules and No. 2 pencils. Do you really want to bet on them being wrong?

Tuesday, 25 October 2016

The Mystery of the Chinese Junk

You know it’s going to be one of those days when, just as you’re about to put on your headphones and get into ‘the zone’, you overhear somebody saying the fateful words ‘ok, then maybe we’ll need to get Dylan to look at it.’

See, amongst the many hats I wear in the course of a given week, there’s one that’s probably labelled ‘dungeon master’ I’m the one who remembers where all the bodies are buried, because – for all sorts of reasons that made very good sense at the time – I probably helped bury most of them. And on this particular day, the source of so much excitement was our venerable Microsoft Dynamics CRM v4 server. It started out with a sort of general grumbling on the support channel about CRM4 being slow… but by the time it was handed over to me to look into, it was beautifully summarised as ‘dude… there’s Chinese in the Windows event log’

And, sure enough, there is – complete with the lovely Courier typeface that Windows Event Viewer kicks into when you get errors so weird that good old Microsoft Sans Serif can’t even display them:

image

Now, whilst it’s been a while since I’ve done any serious work on our old CRM system, I’m pretty sure it’s not supposed to do that – so we start investigating. Working theory #1: some sort of vulnerability has resulted in attackers injecting Chinese characters into our database – whilst CRM4 is generally pretty well insulated from any public-facing code, there’s one or two places where signup forms would generate CRM Leads, that sort of thing. So we start grepping the entire database for one of the Chinese strings we’ve found in the event log.

Whilst this is going on – and trust me, it takes a while – I decide to share my excitement via the wonder of social media. This turns out to be a Really Good idea, because... well, here's what happened...

"The incoming tabular data stream TDS RPC protocol stream is incorrect. Parameter ("䐀攀氀攀琀椀漀渀匀琀愀琀攀..." Oh. It's gonna be one of THOSE days.

— Dylan Beattie (@dylanbeattie) October 18, 2016

@dwm @dylanbeattie The low bits are all null, so this probably UTF-16LE being mistaken for UTF-16BE (or vice versa...?). pic.twitter.com/D0IO4px9YE

— Fake Unicode ⁰ ⁧ (@FakeUnicode) October 18, 2016

You see in @FakeUnicode's screenshot there, the words ‘DeletionState’ appear quite clearly at the bottom of the message?

Whilst this is going on, our database search comes back reporting that there’s no mysterious Chinese characters in any of our CRM database tables. Which is good, since it means we probably haven’t been compromised. So, next step is to work through that Unicode lead, see if that gets us anywhere. Because .NET has a built-in encoding for big-endian Unicode, this is pretty simple:

var source = "䐀攀氀攀琀椀漀渀匀琀愀琀攀";
var bytes = Encoding.BigEndianUnicode.GetBytes(source);
var result = Encoding.Unicode.GetString(bytes);
Console.WriteLine(result);

Turns out – just as in FakeUnicode’s screenshot – that’s the text “DeletionState” with the byte order flipped. We grabbed a few examples of the ‘Chinese’ text from the event log, ran it through this – sure enough, in every single case it’s a valid CRM database query that’s somehow been flipped into wrong-endian Unicode. At this point we start suspecting some sort of latent bug – this is old software, running on an old operating system,talking to an old database server, and sure enough, a bit of googling turns up a couple of  likely-looking issues, most of which are addressed in various updates to SQL Server 2008. We take a VM snapshot in case everything goes horribly wrong, and one of the Ops gang volunteers to work late to get the server patched.

Next morning, turns out the server hasn’t been patched – because every single download of the relevant service pack has been corrupted. At which point all bets are off, because chances are the problem is actually network-related – which also explains where the ‘Chinese’ is coming from.

OK, let’s capture a stream of bytes from somewhere. Like, say, from the TDS data stream used by the MSCRMAsyncService

image

What does that say? If you think you know the answer, you’re wrong. Pop off and read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – done? Awesome. NOW what do you think it says?

See, we have no idea. It’s a stream of bytes. Without some indication of how we’re supposed to interpret those bytes, it’s meaningless. OK, I’ll give you a clue – it’s UTF-16. Now can you tell what it says? No, you can’t – because (1) you don’t know whether it’s big-endian or little-endian, and (2) you don’t know where it started.

If we assume it’s big-endian, then the first byte pair – 00 48 – would encode the character ‘H’, the second byte pair – 00 65 – would encode ‘e’, and so on. If we assume it’s little-endian, then the first byte pair – 00 48 – encodes the character 䠀 – and suddenly the mysterious Chinese characters in the event log start to make sense.

image

Of course, the data stream between the MSCRMAsyncService and the SQL server hasn’t actually flipped from little-endian UTF-16 to big-endian – what’s happened is that the network connection between them is dropping bytes. And if you drop a single byte – or any odd number of bytes – from a little-endian Unicode stream, you get a sort of off-by-one error right along the rest of the data stream, resulting in all sorts of weirdness – including Chinese in the event logs.

Turns out there was a problem with the virtual network interface on the SQL Server box – which was causing poor performance, timeouts, bizarre query syntax errors, Chinese in the event logs, and corrupted service pack downloads. Fortunately the databases themselves were intact, so we offlined them, cloned the virtual disk they were sitting on, attached that to a different server and brought them back online.

Every once in a while, you get a weird problem like this. I’ve seen maybe half-a-dozen problems in my entire career that made absolutely no sense until they turned out to be a faulty network connection, at which point generally you not only solve the problem, but explain a whole load of other weirdness that you hadn’t got round to investigating yet. The only thing more fun than dodgy networks is dodgy memory – but that’s a post for another day.

Oh, and if you’re wondering about the title of this post, you clearly haven’t studied the classics.