Tuesday, 3 January 2017

My Life with the Microsoft Natural Keyboard

A moment ago I was catching up on Twitter and I saw this:

Now Scott has written some great posts about keyboards – and mice, and workstation ergonomics in general – over the years, so I immediately clicked the link to see what was so exciting… and wow. This is the new Microsoft Surface Ergonomic Keyboard, and I, too, am now sitting here trying not to buy it. Actually, I’m waiting for a UK layout, and then I suspect that trying not to buy it will rapidly become unmanageable. I may even fail to not buy two of them so I have one at work and one at home.

Top view of keyboard

Now, there’s a couple of things about this keyboard that are interesting only because Microsoft have consistently got them right, and then messed them up, and then got them right again, and then messed them up again - over many, many years. Like an even-numbered Star Trek movie or an odd-numbered version of CorelDraw, this latest one is a good one – and as somebody who’s used every incarnation of the Microsoft Natural keyboard, this seems like a nice opportunity to take a little wander down memory lane.

OK, cue the Wayne’s World dream-sequence time-travel effects… and here’s a keyboard. A 102-key IBM PS/2 keyboard, which was the standard layout for PC keyboards for about a decade, back in the days when keyboards were beige and the cool kids knew all sorts of tricks that would let you load HIMEM.SYS and your mouse driver and still have enough main memory left to run Wing Commander 2.

IBM Model M.png(IBM PS/2 keyboard by Raymangold22 via Wikipedia | CC0 | Link)

Around the time of Windows 95, Microsoft proposed the subtle addition of a two new keys – a Windows key (actually two of them) in those handy gaps next to the Ctrl keys, and a key that I’ve just this second learned is called the menu key, which is basically a right-click key. Other than shortening the space bar slightly, these new keys didn’t really move things around much, which was great, because that hard-earned muscle memory that let you hit triple-key combinations without taking your eyes off the screen worked just fine.

Oh, and they also started making them in colours that weren’t beige, which was a big improvement if you actually had to share your living space with them.

image

It’s around this time that I left school and got my first IT job, which meant I was typing for a good chunk of every day, and within a year I started getting unpleasant pain in my wrists and forearms. During a chance conversation at work, someone mentioned that a former employee there had had the same problem and switched to using an ergonomic keyboard – which was still knocking around in a cupboard somewhere. I tried it out and was instantly smitten.

This was the original Microsoft Natural keyboard, released in 1994.

imagePhoto by DeanW77 via Wikipedia – own work, CC BY-SA 4.0, Link

I absolutely loved it. I literally wore it out – I used it until some of the keys no longer worked, and then went shopping for a replacement… and discovered you couldn’t get the original Natural keyboard anymore; it had been replaced by the Natural Keyboard Elite, released in 1998.

https://www.engadget.com/products/microsoft/natural-keyboard/elite/ 

Looks close enough, right? Except if you look closely, you’ll see that instead of the familiar inverted-T cursor shape and two rows of navigation keys, the cursor keys are in a sort of weird diamond formation and the navigation keys are in two columns.

This was horrible. All those years of muscle memory suddenly gone – every time you’d try to hit Ctrl-PgUp or Shift-End you’d get it wrong. And when you’re in the zone, that’s a horrible, jarring experience – every time it happens it interrupts your flow, wrenches you back to reality and makes you want to throw the damn thing across the room. Imagine driving a car where the brake pedal is above the accelerator instead of alongside it – it was a truly unpleasant experience.

Fortunately, it didn’t take long for them to realise this one was a bit of a mistake. A year later in 1999, they came out with the Natural Keyboard Pro, and it was fantastic.

image
By ----PCStuff via Wikipedia, CC BY-SA 2.5, Link

All the keys were in the right places, and it included a two-port USB hub which was great for plugging in your mouse. It added a bunch of “multimedia” buttons (which I never really used) and a dedicated button for launching the Windows Calculator (which actually proved to be surprisingly useful.) I loved this keyboard dearly. You can guess what happened next… yep, I used it until it wore out, went shopping for a new one, and… yeah. No more Natural Keyboard Pro. Instead, we had this delightful triumph of form over function:


photo via Engadget

The Microsoft Natural Multimedia Keyboard, released in 2004. OK, it’s got the classic inverted-T cursor keys (yay, but - look at that! Not only are the navigation keys all wrong, but the Insert key isn’t even there; we’ve got a massive double-height Delete key instead. If memory serves, Insert was relegated to some sort of funky combo involving the PrtSc key. Oh, and this was also the keyboard where you had to keep F-Lock switched on all the time because the function keys were actually some sort of weird keyboard shortcuts that nobody ever used ever. You know. Like a dedicated key for “Send”, because Ctrl-Enter is just too complicated.

Again, it didn’t take long for them to come out with something better… and boy, did they ever get it right with the next one. The Microsoft Natural Ergonomic Keyboard 4000.


photo via microsoft.com

This was my main keyboard for most of the last decade. I loved this keyboard so much I actually stockpiled it – one at work, one at home, and a few spares standing by in case they wore out. And wear out they did, one by one, until late last year I decided it was time to replenish the stockpile… which is when my long-suffering colleagues half-jokingly asked if, maybe, I’d be prepared to try something quieter. See, the 4000 is lovely, but it’s noisy. The keys have plenty of travel, with a nice satisfying thump at the bottom of each keystroke, and the keyboard’s casing – which is big – makes a rather effective sounding-box. Until the bottom falls out of the London real estate market and we can justify private offices for our developers, FogCreek style, the open plan office is an unfortunate fact of life for most of us… and so, in the spirit of workplace harmony, I ordered a Microsoft Sculpt keyboard.

image
photo via Microsoft

And you know what? It’s pretty close to perfect. It’s comfortable. It’s quiet. It takes up about 60% of the desk space that the old Ergonomic 4000 did. The numeric keypad is actually separate, which I’m completely undecided about… when you actually want to type on it, it’s annoying not having it fixed in place, but being able to pick it up and use it like an old-school calculator is actually surprisingly useful. Except – yep – the navigation key layout is all screwed up. Again. I hit Insert by mistake all the time on this thing, and frequently land on the left cursor when I’m reaching for Ctrl.

But with the release of the Surface keyboard at the top of the post, it looks like, yet again, there’s a truly great keyboard hot on the heels of the not-so-great one. I’m just really curious as to why they keep bringing out models that use non-standard keyboard layouts.

image
photo via Microsoft

When it hits the market here in the UK, I’ll pick one up and let you know how I get on. Now, if they’d only release one that was completely black – with no key markings – then we’d really be onto something. I wonder if you can spray-paint it...

Monday, 14 November 2016

IdentityServer, OpenID Connect and Microsoft CRM Portals

As readers of this blog will know, here at Spotlight we’re in the process of moving nine decades’ worth of legacy business process onto Microsoft Dynamics CRM, aka CRM Online, which I gather is now called Dynamics 365 (because hey, it’s not like naming things was hard enough already, right?)

We’re also investigating a couple of options for building customer-facing systems that integrate with Dynamics. Until last year, there were really three options for this – a product called Adxstudio, a free Microsoft component called the CRM Portal Accelerator, or rolling your own solution using the CRM SDK. Around this time last year, Microsoft quietly retired the Portal Accelerator component and acquired Adxstudio, and since then, they’ve been in the process of assimilating it into the Dynamics product family – which has meant it’s been something of a moving target, both in terms of the supported features and in terms of the quality of documentation and examples.

I’ve previously blogged about one way to integrate Adxstudio with your existing authentication system, but that approach relied completely on running Adxstudio on-premise so you could run your own code as part of the request lifecycle – and as you may have noticed, there’s a bit of a trend in IT at the moment away from running your own servers and towards using hosted managed services, so that patching and backups are somebody else’s problem. Since Microsoft acquired Adxstudio, there’s been a lot of churn around what’s supported and what’s not – I’m guessing that behind the scenes they’re going through the Adxstudio codebase feature-by-feature and making sure it lines up with their plans for the Dynamics 365 platform, but that’s just guesswork on my part.

One of the main integration points I’ve been waiting for is the ability for a Microsoft-hosted Portal solution to use a third-party OpenID Connect endpoint to authenticate users, and it appears in the latest update this is finally supported – albeit with a couple of bumps along the way. Here’s what I’ve had to do to get a proof-of-concept up and running.

Setting up Dynamics CRM Portals

First, you’ll need to set up a Dynamics Portal trial. You can get a 30-day hosted trial of Dynamics CRM Online by signing up here – this actually gives you a full Office 365 organization including things like hosted Active Directory, as well as the Dynamics CRM Online instance we’re using in this example. Next, you’ll need to ask nicely for a trial of the portal add-on – which you can do by filling out the form at crmmanagedtrials.dynamics.com.

Setting up IdentityServer and configuring an ngrok tunnel

Whilst you’re waiting for the nice Microsoft people to send you your trial license, get up and running with IdentityServer. For this prototype, I’m using the MVC Authentication example from the IdentityServer3.Samples project – clone it to your workstation, open the MVC Authentication solution, hit F5, verify you can get up and running on localhost.

Next – in order for Dynamics CRM Online to talk to your IdentityServer instance, you’ll need to make your IdentityServer endpoints visible to the internet. You could do this by deploying your IdentityServer sample to Azure or AWS, but for experiments like this, I like to use a tool called ngrok, which will create temporary, secure tunnels from the internet to your workstation. Download ngrok, unzip it somewhere sensible.

Pick a tunnel name. I’m using authdemo in this example but any valid DNS host name will do. Next, create a local IIS application pointing to the EmbeddedMvc folder in your samples directory, and set the host name to <your tunnel name>.ngrok.io

image

Now run ngrok.exe to create a tunnel from the internet to your new IIS application:

C:\tools\ngrok> ngrok.exe http –subdomain=authdemo 80

ngrok by @inconshreveable

Session Status        online
Version               2.1.18
Region                United States (us)
Web Interface         http://127.0.0.1:4040
Forwarding            http://authdemo.ngrok.io -> localhost:80
Forwarding            https://authdemo.ngrok.io -> localhost:80

Connections           ttl     opn     rt1     rt5     p50     p90
                      0       0       0.00    0.00    0.00    0.00

All being

If that’s worked, you should be able to fire up a browser, go to http://authdemo.ngrok.io/ – replacing ‘authdemo’ with your own tunnel name - and see the IdentityServer3 sample landing page:

image

Configuring IdentityServer

Right. Next thing we need to do is to make a couple of changes to the IdentityServer configuration, so that it’ll run happily on authdemo.ngrok.io instead of on localhost

First, enable logging. Just do it. Use the package manager console to install the Serilog.Sinks.Trace package. Then add this to the top of your Configuration() method inside Startup:

Log.Logger = new LoggerConfiguration()
                .MinimumLevel.Debug()
                .WriteTo.Trace()
                .CreateLogger();

and add this to your web.config, specifying a path that’s writable by the application pool:

<system.diagnostics>
  <trace autoflush="true"
         indentsize="4">
    <listeners>
      <add name="myListener"
           type="System.Diagnostics.TextWriterTraceListener"
           initializeData="C:\logfiles\identityserver.log" />
      <remove name="Default" />
    </listeners>
  </trace>
</system.diagnostics>

Next, do a global search and replace, replacing any occurrence of localhost:44319 with authdemo.ngrok.io – again, substituting your own tunnel name as required.

Next, add a new client to the static EmbeddedMvc.IdentityServer.Clients class the IdentityServer sample project – changing the highlighted values to your own client ID, client secret, and portal instance URL:


new Client {
    ClientName = "Dynamics CRM Online",
    ClientId = "crm",
    Flow = Flows.Hybrid,
    ClientSecrets = new List() { new Secret("secret01".Sha256()) },
    RedirectUris = new List { 
"https://my-portal-instance.microsoftcrmportals.com"
}, PostLogoutRedirectUris = new List {
"https://my-portal-instance.microsoftcrmportals.com"
}, AllowedScopes = new List { "openid" } },

 

Adding IdentityServer as an endpoint in CRM Portals

Finally, you need to add your new IdentityServer as an identity provider. CRM Portals uses the Dynamics CRM platform for all its configuration and data storage, so to add new settings you’ll need to log into your Dynamics CRM Online instance, go into Portals > Site Settings, and add the following values:

Name

Value

Website

Authentication/OpenIdConnect/AuthDemo/Authority

http://authdemo.ngrok.io/identity/

Customer Self-Service

Authentication/OpenIdConnect/AuthDemo/Caption

IdentityServer OpenID Connect Demo

Customer Self-Service

Authentication/OpenIdConnect/AuthDemo/ClientId

crm

Customer Self-Service

Authentication/OpenIdConnect/AuthDemo/ClientSecret

secret01

Customer Self-Service

Authentication/OpenIdConnect/AuthDemo/MetadataAddress

http://authdemo.ngrok.io/identity/.well-known/openid-configuration

Customer Self-Service

Finally, it looks like you’ll need to restart the portal instance to get it to pick up the updated values – which you can do by logging into the Office 365 Admin Center, Admin Centers, CRM, Applications, Portal Add-On, clicking ‘MANAGE’, and pressing the nice big RESTART button on the Portal Actions page:

image

And – assuming everything lines up exactly right – you should now see an additional login button on your CRM Portals instance:

image

Clicking on it will bounce you across to your ngrok-tunnelled IdentityServer MVC app running on localhost:

imageLog in as bob / secret, and you’ll get the OpenID permissions check:

image

…and when you hit ‘Yes, Allow’, you’ll be redirected back to the CRM Portals instance, which will create a new CRM Contact linked to your OpenID Connect identity, and log you in to the portal.

Conclusions

Of course, in the real world there’s a lot more to it than this – there is a huge difference between a proof of concept like this and a production system. These sorts of user journeys form such a key part of delivering great user experience, and integrating multiple systems into your login and authentication/authorization journeys only makes this harder. But it did work, and it wasn’t actually all that complicated to get it up and running. It’s also interesting to see how something like OpenID Connect can be used to integrate a powerful open-source solution like IdentityServer with a heavyweight hosted platform service like CRM Portals.

Whether we end up adopting a hosted solution like CRM Portals – as opposed to just building our own apps that connect to CRM via the SDK or the new OData API – remains to be seen, but it’s nice to see solutions from two radically different sources playing nicely together thanks to the joy of open protocols like OpenID Connect. Long may it continue.

Monday, 31 October 2016

The Laws of Distributed Systems

I’ve spent a lot of time over the last year reading, thinking, and speaking at conferences about distributed systems, organisational structures, and the eponymous laws of software development. Over the course of many conversations and countless blog posts and articles, something has crystallised from thinking about three laws in particular, which – if it’s right - could have substantial implications for all of us as software developers, and for the people who use the systems we build.

TL;DR: if we keep having meetings, the internet will stop working.

(There – that got your attention, didn’t it?)

Moore’s Law

So, let’s recap. Gordon Moore was the co-founder of Intel and Fairchild Semiconductor. Back in 1965, Moore wrote a paper predicting that the number of components per integrated circuit would double every year. In 1975, he revised his forecast to doubling every two years. His predictions have proved accurate for several decades, and will probably continue to do so until the 2020s, but what’s really interesting is what’s happened since 2000.

Here’s the average total transistor count of CPUs against their year of introduction since 1965. Plotted on a a logarithmic axis, it’s pretty close to a straight line.

image_thumb1
[Data source: Wikipedia] 

Now here’s the same graph, but showing transistors per core.

image_thumb3
[Data source: Wikipedia]

See how around 2005 the two series suddenly diverge sharply? That’s because in the early 2000s, we began hitting the physical limits of how many transistors could be integrated into a single CPU. Somewhere around the 4Ghz mark, we hit a wall in terms of raw clock speed, and so the semiconductor industry hit upon the bright idea of multicore CPUs – basically putting more than one CPU into the same physical package.

In the same time frame, we’ve seen an industry-wide shift away from monolithic powerhouse servers towards distributed systems. Modern web apps – which are really just big multiuser systems – run across clusters of dozens or hundreds of ephemeral worker nodes; a radical contrast to the timesharing mainframe systems of the 1970s and 1980s.

The amount of computing power available at a particular price point is still increasing exponentially, but we’re no longer scaling up, we’re scaling out. And the reason we’re scaling is that the load on our systems – our websites, APIs and servers – is also increasing. More people are getting online, people are using more devices and connected services, and those devices are delivering increasingly rich user experiences – which means more data, more power and more bandwidth. Here’s Business Insider’s analysis and forecast of the number of connected devices from 2015 to 2020:

unnamed

To cope with this ever-increasing level of expectation, we need to build systems that will scale out to cope with demand. Our code needs to parallelize. We need to decompose our problems into small, autonomous units of work that can be distributed across as many cores or nodes as we have available, and combine the outputs of those operations to deliver the results our users are expecting.

Amdahl’s Law

This brings us to Amdahl’s Law. Gene Amdahl started out designing mainframe systems for IBM. He was the chief architect of the IBM System/360, and he first presented his eponymous law back in 1967. Amdahl’s Law controls the theoretical performance improvements we can expect by parallelizing some given workload.

Amdahl’s Law is actually Slatency(s) = 1/((1-p) + (p/s)) – but the gist of it is nicely explained by thinking about Christmas dinner. Or Thanksgiving, if that’s your thing. If you’ve got one person with one cooker working alone, it’ll take a good 20 hours to prepare all the trimmings for a Christmas dinner. By adding more people and more cookers, you can parallelise this and so complete it faster – but you reach a point where everything is done and everyone’s stood around waiting for Jeff to finish roasting the turkey, because you can’t roast a turkey in under four hours, no matter how many chefs and ovens you’ve got. 

If you need to parallelize like this, eliminate the turkey. Have steaks instead – because you can cook steaks in parallel. If you suddenly get another 20 guests showing up half an hour before lunch, no problem – you don’t need to wait four more hours to roast another turkey; just get 20 more chefs to cook 20 more steaks and you’ll still be done on time. And because you’re using cloud infrastructure, you can spin up more chefs and griddles instantly to cope with the increased demand.

See, by designing systems to eliminate those non-parallelisable workloads, we create systems that scale smoothly with the available resources. The beauty of that is that, like all the best solutions in software, it turns “how fast is our website” into a pure business decision. You want faster pages? Pay for more servers. No need to rewrite your algorithms; just throw more power at it.

Conway’s Law

Finally, there’s Conway’s Law. First published in 1964, Conway’s Law is the observation that ‘any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.’ As with Moore’s Law, it’s an observation that has proved remarkably prescient over the intervening decades. I’ve spoken at length about how Conway’s Law has affected the teams and projects I’ve worked on personally, but there’s also some interesting examples in the software industry at large. The relationship between the Linux kernel and the various distributions based on it has interesting parallels with Linus Torvalds’ role in the Linux ecosystem. Id Software’s genre-defining Doom and Quake games – tight, cohesive, focused engines, created by a bunch of coders camped out in a beach house with soda, pizza and no distractions, with less tightly-coupled elements like music and level design handled by less close-knit development efforts.  High-profile open source projects like Chromium – the underlying rendering engine, the browser itself, and the ecosystem of plugins and extensions closely reflecting the tight-knit Webkit project, the loosely-organised contributors and pull requests that shape the development of the browser application, and the community of plugin and extension developers who don’t engage with the project directly but rely on published contracts and protocols just as their plugins and extensions do.

And then there’s the really obvious examples, like this one:

image[92] 

Putting it all together…

OK, so let’s look at what happens when we interpret those three laws together. Moore’s Law has informed half a century of user expectations about technology. More people do more stuff on more devices, and they expect those experiences to keep on getting better, faster and more responsive. As we’ve seen above, the increase in raw computing power that’s going to deliver those improvements isn’t about clock speed any more, it’s about parallelism. Amdahl’s Law tells us whether systems will benefit from that parallelism or not – and that systems based around blocking, non-parallelisable, long-running operations will benefit the least from the next decade of computing innovation. And Conway’s Law says that if we don’t want our systems to contain these kinds of blocking, non-parallelisable operations, then we should be looking to eliminate them from our organisations.

Which brings us to the crux of the thing: what’s the organizational equivalent of a long-running non-parallelizable operation?

How about sitting around reading Hacker News because the person who’s asked you to build a “Summary Dashboard” hasn’t told you where to find the data, or what the dashboard should look like, and they’re out of the office right now, they didn’t leave any notes, and you can’t do anything until they get back?

How about a two-hour project update meeting where a series of people sit around telling each other things they could have emailed, or written down in a ticket or on a wiki page?

How about sitting on a train for an hour to get to the office in Canary Wharf where you’re expected to be at 09:00 every day, despite the fact that your source code is hosted in the US, your data centre is in Ireland, your issue tracking system is hosted in Frankfurt and your customers are online 24/7 all over the world?

One of the underlying principles of the agile manifesto is that ‘the most efficient and effective method of conveying information to and within a development team is face-to-face conversation’. I think that’s correct, but I think it might be optimising for the wrong metric. Sure, a conversation is a high-bandwidth, high-interaction discussion medium, and I find face-to-face great for bouncing ideas around and solving problems - but conversations are ephemeral. They’re not captured anywhere, nobody outside the conversation knows what was said, and there’s always the risk the people you’re talking to assure you they get it when they actually haven’t understood a single word you said. Perhaps we should be optimising our communication patterns for discoverability instead of raw bandwidth; trading a little temporary velocity for some long-term efficiency.

This isn’t just about cancelling a couple of meetings and letting people work from home on Fridays. It’s about changing the way we think about collaboration, so that the interaction patterns we want to see in our systems can emerge organically from the interaction patterns used by the people who created them. It’s about taking established architectural patterns and practises used in asynchronous distributed systems, and working out if we can apply those patterns to our teams and our projects. What if you applied event sourcing to your project backlogs, so you don’t keep having to ask people about the context behind a particular decision? Maybe you’re even doing this already – I know a lot of open-source projects that do an excellent job of capturing this history as part of their open issues and tasks so anybody who wants to pick up a particular ticket can see the complete history, the discussion, the arguments and hopefully the eventual consensus. What if you treat your documentation - wiki pages, GitHub pages, READMEs - like the query stores in a CQRS system? Rapid retrieval, read-only, optimised for consumption, and updated as necessary when processing commands (i.e. making changes) that affect the underlying systems that are being documented?

What I find remarkable is that Moore, Amdahl and Conway all published their eponymous laws almost exactly fifty years ago – Conway’s Law was published in 1964, Moore was 1965, Amdahl was 1967. Their observations hail from a decade of astonishing engineering achievements – Apollo, the Boeing 747, the Lockheed SR-71, the geodesic dome – in an era when computers were still highly specialist devices. Sure, you could argue that people working on timesharing systems in the 1960s couldn’t possibly have foreseen the long-term social implications of distributed systems engineering – but remember, this is the generation that landed on the moon using slide rules and No. 2 pencils. Do you really want to bet on them being wrong?

Tuesday, 25 October 2016

The Mystery of the Chinese Junk

You know it’s going to be one of those days when, just as you’re about to put on your headphones and get into ‘the zone’, you overhear somebody saying the fateful words ‘ok, then maybe we’ll need to get Dylan to look at it.’

See, amongst the many hats I wear in the course of a given week, there’s one that’s probably labelled ‘dungeon master’ I’m the one who remembers where all the bodies are buried, because – for all sorts of reasons that made very good sense at the time – I probably helped bury most of them. And on this particular day, the source of so much excitement was our venerable Microsoft Dynamics CRM v4 server. It started out with a sort of general grumbling on the support channel about CRM4 being slow… but by the time it was handed over to me to look into, it was beautifully summarised as ‘dude… there’s Chinese in the Windows event log’

And, sure enough, there is – complete with the lovely Courier typeface that Windows Event Viewer kicks into when you get errors so weird that good old Microsoft Sans Serif can’t even display them:

image

Now, whilst it’s been a while since I’ve done any serious work on our old CRM system, I’m pretty sure it’s not supposed to do that – so we start investigating. Working theory #1: some sort of vulnerability has resulted in attackers injecting Chinese characters into our database – whilst CRM4 is generally pretty well insulated from any public-facing code, there’s one or two places where signup forms would generate CRM Leads, that sort of thing. So we start grepping the entire database for one of the Chinese strings we’ve found in the event log.

Whilst this is going on – and trust me, it takes a while – I decide to share my excitement via the wonder of social media. This turns out to be a Really Good idea, because... well, here's what happened...

"The incoming tabular data stream TDS RPC protocol stream is incorrect. Parameter ("䐀攀氀攀琀椀漀渀匀琀愀琀攀..." Oh. It's gonna be one of THOSE days.

— Dylan Beattie (@dylanbeattie) October 18, 2016

@dwm @dylanbeattie The low bits are all null, so this probably UTF-16LE being mistaken for UTF-16BE (or vice versa...?). pic.twitter.com/D0IO4px9YE

— Fake Unicode ⁰ ⁧ (@FakeUnicode) October 18, 2016

You see in @FakeUnicode's screenshot there, the words ‘DeletionState’ appear quite clearly at the bottom of the message?

Whilst this is going on, our database search comes back reporting that there’s no mysterious Chinese characters in any of our CRM database tables. Which is good, since it means we probably haven’t been compromised. So, next step is to work through that Unicode lead, see if that gets us anywhere. Because .NET has a built-in encoding for big-endian Unicode, this is pretty simple:

var source = "䐀攀氀攀琀椀漀渀匀琀愀琀攀";
var bytes = Encoding.BigEndianUnicode.GetBytes(source);
var result = Encoding.Unicode.GetString(bytes);
Console.WriteLine(result);

Turns out – just as in FakeUnicode’s screenshot – that’s the text “DeletionState” with the byte order flipped. We grabbed a few examples of the ‘Chinese’ text from the event log, ran it through this – sure enough, in every single case it’s a valid CRM database query that’s somehow been flipped into wrong-endian Unicode. At this point we start suspecting some sort of latent bug – this is old software, running on an old operating system,talking to an old database server, and sure enough, a bit of googling turns up a couple of  likely-looking issues, most of which are addressed in various updates to SQL Server 2008. We take a VM snapshot in case everything goes horribly wrong, and one of the Ops gang volunteers to work late to get the server patched.

Next morning, turns out the server hasn’t been patched – because every single download of the relevant service pack has been corrupted. At which point all bets are off, because chances are the problem is actually network-related – which also explains where the ‘Chinese’ is coming from.

OK, let’s capture a stream of bytes from somewhere. Like, say, from the TDS data stream used by the MSCRMAsyncService

image

What does that say? If you think you know the answer, you’re wrong. Pop off and read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – done? Awesome. NOW what do you think it says?

See, we have no idea. It’s a stream of bytes. Without some indication of how we’re supposed to interpret those bytes, it’s meaningless. OK, I’ll give you a clue – it’s UTF-16. Now can you tell what it says? No, you can’t – because (1) you don’t know whether it’s big-endian or little-endian, and (2) you don’t know where it started.

If we assume it’s big-endian, then the first byte pair – 00 48 – would encode the character ‘H’, the second byte pair – 00 65 – would encode ‘e’, and so on. If we assume it’s little-endian, then the first byte pair – 00 48 – encodes the character 䠀 – and suddenly the mysterious Chinese characters in the event log start to make sense.

image

Of course, the data stream between the MSCRMAsyncService and the SQL server hasn’t actually flipped from little-endian UTF-16 to big-endian – what’s happened is that the network connection between them is dropping bytes. And if you drop a single byte – or any odd number of bytes – from a little-endian Unicode stream, you get a sort of off-by-one error right along the rest of the data stream, resulting in all sorts of weirdness – including Chinese in the event logs.

Turns out there was a problem with the virtual network interface on the SQL Server box – which was causing poor performance, timeouts, bizarre query syntax errors, Chinese in the event logs, and corrupted service pack downloads. Fortunately the databases themselves were intact, so we offlined them, cloned the virtual disk they were sitting on, attached that to a different server and brought them back online.

Every once in a while, you get a weird problem like this. I’ve seen maybe half-a-dozen problems in my entire career that made absolutely no sense until they turned out to be a faulty network connection, at which point generally you not only solve the problem, but explain a whole load of other weirdness that you hadn’t got round to investigating yet. The only thing more fun than dodgy networks is dodgy memory – but that’s a post for another day.

Oh, and if you’re wondering about the title of this post, you clearly haven’t studied the classics.

Thursday, 15 September 2016

Upcoming Conferences and User Group Talks

I’m appearing at some conferences and user groups in October and November, talking about ReST, the history of the web, and how we can apply the scientific method to software development. Though not all at the same time.

October 19th I'll be speaking at FullStack Bytes at SkillsMatter here in London - one of a series of monthly events around the same themes and ideas as the annual FullStack conference - "JavaScript, NodeJS and the Internet of Things". On October 25th I’m heading up to Telford to give my Real World REST talk at the Shropshire Devs User Group.

November 5th, I’m one of the keynote speakers at Tampere Goes Agile, a free one-day conference in Tampere, Finland all about agile software development. The theme of the conference this year is ‘experimentation’, and I’ll be talking about the scientific method, the history of experimentation and how we can apply scientific principles to our own software projects and agile processes. It’ll be my first time visiting Finland – new country, new conference, and a new topic – and I’m looking forward to it immensely.

I’ll be speaking at Oredev in Malmo, Sweden on the 8/9/10th of November, where as well as a couple of technical talks, Rob Conery has signed me up for a ‘head to head’ session with Jimmy Bogard, which should be entertaining. Thanks, Rob... :)

Finally, I’ll be rounding out 2016 with a couple of talks at BuildStuff - in Vilnius on Nov 16-18 and then in Kyiv on the 21-22. If you’ve ever been, you’ll know that BuildStuff is an excellent conference with great people and great content – and if you haven’t, get yourself a ticket, come along and see for yourself.

Wednesday, 27 July 2016

Affordances, Signifiers, and Cartographobia

One of the teams here is putting the finishing touches on a new online version of Spotlight Contacts, our venerable and much-loved industry guide that started life as a printed handbook way back in 1947. Along the way, we’ve learned some very interesting things about data, and how people perceive that their data is being used.

image

One of the features of the new online version is that every listing includes a location map – a little embedded Google Map showing the business’ location. When we rolled this feature out as part of a recent beta, we got some very unhappy advertisers asking us to please remove the map from their listing immediately. Now, most of these were freelancers who work from home – so you can understand their concerns. But what’s really interesting is that in most cases, they were quite happy for their full street address to stay on the page – it was just the map that they were worried about.

Of course, this immediately resulted in a quite a lot of “what? they want to keep the address and remove the map? ha ha! that’s daft!” from developers – who, as you well know, are prone to occasional outbursts of apoplectic indignation when they have to let go of their abstractions and engage with reality for any length of time – but when you think about it, it actually makes quite a lot of sense.

See, street addresses are used for lots of things. They’re used on contracts and invoices, they’re used to post letters and deliver packages. Yes, you can also use somebody’s address to go and pay them a visit, but there are many, many reasons why you might need to know somebody’s address that have nothing to do with you turning up on their doorstep. In UX parlance, we’d say that the address affords all of these interactions – the presence of a street address enables us to post a letter, write a contract or plan a trip.

A map, on the other hand, only affords one kind of interaction; it tells you how to actually visit somewhere. But because of this, a map is also a signifier. It sends a message saying “come and visit us” – because if you weren’t actually planning to visit us, why would you need to know that Spotlight’s office at 7 Leicester Place is actually in between the cinema and the church, down one of the little alleys that run between Leicester Square and Chinatown? For posting a letter or writing a contract, you don’t care – the street address is enough. But by including a map, you’re sending a message that says “hey – stop round next time you’re in the neighbourhood”, and it’s easy to see why that’s not really something you want if you’re a freelancer working from your home.

It’s important to consider this distinction between affordances and signifiers when you’re designing your user interactions. Don’t just think about what your system can do – think about all the subtle and not-so-subtle messages that your UI is sending.

Here’s the classic Far Side cartoon “Midvale School for the Gifted”, which provides us with some great examples of affordances and signifiers. The fact you can pull the door is an affordance. The sign saying PULL is a signifier – but the handle is both. Looking at it gives you a clue “hey – I could probably pull that!” – and when you do, voila, the door swings open. If you’ve ever found a door where you have to grasp the handle and push,, then you’ve found a false affordance – a handle that’s sat there saying ‘pull me…’ and when you do, nothing happens. And, in software as in the Far Side, there’s going to be times when all the affordances and signifiers in the world are no match for your users’ astonishing capacity to ignore them all and persist in doing it wrong.

(Far Side © Gary Larson)

ASP.NET Authentication with Adxstudio

I’m looking into options for integrating our shiny new CRM system with our website, so we can provide all sorts of neat self-service capabilities and features. One of the applications I’m investigating is a thing called Adxstudio – now owned by Microsoft - which claims to “transform Dynamics CRM into powerful application platform with dozens of apps and starter portals.”

image

This is one of those situations where we really are dealing with ‘solved problems.’ Email campaigns. Customers updating their own contact details, potentially things like forums, helpdesk/ticketing systems – lots of things which are nice-to-have but really aren’t strategic differentiators, and so there’s a compelling argument to find an off-the-shelf solution and just plug it in. We already have a federated authentication system here at Spotlight – something we built a few years ago that provides basic identity and authentication capabilities on top of OAuth2. At the time we built it, OpenID Connect didn’t exist yet, so we’ve got a system that does basically the same thing but isn’t actually compatible with OpenID Connect – and consequently doesn’t work out-of-the-box with Adxstudio. So I’ve been poking around, trying to work out the best way to plug Adxstudio into our infrastructure so we can evaluate it as a solution.

One of the options on the table was to replace our existing authentication system with IdentityServer; another was to implement OpenID Connect support on top of our existing authentication system – both quite elegant solutions, but both of which involve quite a lot more work than is actually required for what we’re trying to do.

The core requirement here is:

  • We already have a CRM Contact record for every user of our system
  • We can look up a user’s CRM Contact GUID during authentication
  • We want to set up the Adxstudio MasterPortal demo so that our customers are seamlessly authenticated and can use Adxstudio features as though they had registered via the Adxstudio registration facility.

Now, one of the nice things about Adxstudio is that it’s built as OWIN middleware, and uses the ASP.NET Identity framework to handle authentication – so what we need to do is work out how to translate the CRM Contact GUID into an IPrincipal/IIdentity instance that we can assign to the HttpContext.Current.User property, and hope that Adxstudio then does the right thing once the HttpContext User is set correctly.

Adxstudio provides an implementation of ApplicationUserManager that’s already registered with the OWIN model, which accepts a CRM Contact GUID (as a string) and returns an instance of ApplicationUser that we can use to spin up a new ClaimsIdentity. So the simplest possible approach here is this snippet of code:

protected void Application_AuthenticateRequest(object sender, EventArgs e) {
  Guid userGuid;
  var cookie = Request.Cookies["crm_contact_guid"];
  if (cookie != null && Guid.TryParse(cookie.Value, out crmContactGuid)) {
    var http = HttpContext.Current;
    var owin = http.GetOwinContext();
    var userManager = owin.Get<ApplicationUserManager>();
    var user = userManager.FindById(crmContactGuid.ToString());
    var identity = user.GenerateUserIdentityAsync(userManager).Result;
    HttpContext.Current.User = new RolePrincipal(identity);
  }
}

Doing this with a Contact that’s been created via the Adxstudio registration thing works just fine – but trying to do it with a ‘vanilla’ contact blows up:

imageAs you can see from that stack trace, deep down buried under several layers of Adxstudio and ASP.NET Identity code, something is trying to construct a System.Security.Claims.Claim() instance and it’s blowing up because we’re passing in a null value for something that’s not allowed to be null. Unfortunately for us, because we don’t have the source for the thing that’s actually blowing up, we can’t see what the actual parameter values are that are causing the exception… so it’s time for a bit of hunch-driven development. :)

I’d already noticed that when you install Adxstudio into your CRM system, it adds a bunch of custom attributes to the Contact entity in Dynamics CRM; here’s a dump of those attributes for a working contact:

Attribute Key Value
adx_changepasswordatnextlogon False
adx_identity_emailaddress1confirmed False
adx_identity_lockoutenabled True
adx_identity_logonenabled True
adx_identity_mobilephoneconfirmed False
adx_identity_passwordhash (omitted for security reasons)
adx_identity_securitystamp ca49a664-0385-4eb4-90c0-6283c9e704ea
adx_identity_twofactorenabled False
adx_identity_username ali_baba
adx_lockedout False
adx_logonenabled False
adx_profilealert False
adx_profileisanonymous False
adx_profilemodifiedon 2016-07-20 09:18:43

The ASP.NET Identity model is generally pretty flexible, but I have a hunch that the username and the security stamp are both required fields because they’re fundamental to the way authentication works. So, let’s try inserting some code into the AuthenticateRequest handler that will check these fields exist, and update them directly in CRM if they don’t:

protected void Application_AuthenticateRequest(object sender, EventArgs e) {
  Guid userGuid;
  var cookie = Request.Cookies["crm_contact_guid"];
  if (cookie != null && Guid.TryParse(cookie.Value, out userGuid)) {
   
var http = HttpContext.Current;
    var owin = http.GetOwinContext();
    var userManager = owin.Get<ApplicationUserManager>();
    var user = userManager.FindById(userGuid.ToString());
    if (String.IsNullOrEmpty(user.UserName)
        ||
        String.IsNullOrEmpty(user.SecurityStamp)
     ) {
       // "Xrm" is the connection string name from web.config
       using (var crm = new OrganizationService("Xrm")) {
         var entity = crm.Retrieve("contact", userGuid, new ColumnSet(true));
         entity.Attributes["adx_identity_securitystamp"] =
           Guid.NewGuid().ToString();
         entity.Attributes["adx_identity_username"] =
           Guid.NewGuid().ToString().Substring(0, 8);
         crm.Update(entity);
      }
    }

    var identity = user.GenerateUserIdentityAsync(userManager).Result;
    HttpContext.Current.User = new RolePrincipal(identity);
  }
}

(For the sake of this demo, all we care about is making sure those values are no longer null. In reality, make sure you understand the significance of the username and security stamp fields in the identity model, and populate them with suitable values.)

OK, this now works sometimes – but only following an IISRESET. Turns out that Adxstudio is actually caching data from CRM locally, so although that new chunk of code is updating the Contact entity into a valid identity, the Adxstudio local cache doesn’t see those changes because it’s looking at an out-of-date copy of the Contact entity. So… time to configure some cache invalidation.

You can read about Adxstudio’s web notifications feature here. Adxstudio includes some code that will call a cache invalidation handler on your own site every time an entity is updated. Which works just fine IF CRM Online can see your Adxstudio portal site. And right now I’m running CRM Online as a 30-day trial and I’m running Adxstudio on localhost, and my workstation isn’t on the internet, so CRM Online can’t see it.

Time to fire up my favourite toolchain – Runscope and Ngrok. First, I’ve set up ngrok so that any requests to mytunnel.ngrok.io will be forwarded to my local machine – you’ll need a paid ngrok license to use custom tunnel names, but if you’re using the free version try this:

D:\tools\ngrok>ngrok http –host-header=adx.local 80 

image

Now, as long as that NGrok process is running, you can hit that URL – http://ef736c25.ngrok.io/ – from anywhere on the internet, and it’ll be tunneled to localhost on port 80 and have the host-header rewritten to be adx.local. This neatly solves the problem of CRM Online not being able to connect to my local Adxstudio instance.

Next, just to give us a bit of insight into what’s going on, I’m going to set up a Runscope bucket for that. Remember – we need to route requests to /cache.axd on our local Adxstudio portal instance, via ngrok, so here’s how to get the Runscope URL you’ll need:

image

So, last step – you see that big URL in the middle?  We need to tell the Adxstudio Web Notifications plugin to notify that URL every time something changes. The option is under CRM > Settings > Web Notification URLs.

Note that the Adxstudio documentation refers to a Configuration screen accessible from the Solutions > Adxstudio Portals Base. It appears this screen doesn’t exist any more – I certainly couldn’t find any trace of it in my CRM Online instance – but it also appears it isn’t necessary, because as soon as I’d created and active Web Notification URL things started happening.

So, now we have something that works – but it still fails on the first Portal request for a particular Contact, probably because the Adxstudio cache isn’t picking up those two new fields fast enough for the login to succeed. To work around this, I’ve put in a thread sleep and then an HTTP redirect, so the first time a user lands on the portal they’ll get a slight delay whilst we populate their Adxstudio attributes, and then they’ll get their personalised screen:

protected void Application_AuthenticateRequest(object sender, EventArgs e) {
  Guid userGuid;
  var cookie = Request.Cookies["crm_contact_guid"];
  if (cookie != null && Guid.TryParse(cookie.Value, out userGuid)) {
   
var http = HttpContext.Current;
    var owin = http.GetOwinContext();
    var userManager = owin.Get<ApplicationUserManager>();
    var user = userManager.FindById(userGuid.ToString());
   if (String.IsNullOrEmpty(user.UserName)
        ||
        String.IsNullOrEmpty(user.SecurityStamp)
     ) {
       using (var crm = new OrganizationService("Xrm")) {
         var entity = crm.Retrieve("contact", userGuid, new ColumnSet(true));
         entity.Attributes["adx_identity_securitystamp"] =
           Guid.NewGuid().ToString();
         entity.Attributes["adx_identity_username"] =
           Guid.NewGuid().ToString().Substring(0, 8);
         crm.Update(entity);
         Thread.Sleep(TimeSpan.FromSeconds(5));
         // Redirect back to the same page, so that Adxstudio will
         // retrieve a fresh copy of the cached data.
         http.Response.Redirect(http.Request.RawUrl);

      }
    }
    var identity = user.GenerateUserIdentityAsync(userManager).Result;
    HttpContext.Current.User = new RolePrincipal(identity);
  }
}

And it works. The final step for me was to spin up a separate web app that lists all the Contacts in the CRM system, with a login handler that puts the CRM Contact GUID into a cookie and redirects the browser to http://adx.local/ – and it works. No registration, no login, and any user with a valid CRM Contact GUID can now log directly into the Adxstudio MasterPortal example.