Wednesday, 27 April 2016

Hintjens

I’m at BuildStuff Vilnius, in November 2015. It’s Thursday night. Mark Rendle and I are doing our comedy panel game quiz thing. We found out about ten minutes ago that we’re doing our show in a nightclub, with no wi-fi, hardly any microphones and… basically it’s a bit of a train crash. And we’re hustling for volunteers to help us make the train crash funny. Between Mark’s friends and mine, we rope in half-a-dozen people. One of them is this weird tall Belgian guy. He’s good; they’re all good. He gets it. He’s funny and engaging and genuinely interesting. We somehow walk away with our collective dignity intact, and people even tell us afterwards that they loved it. Mark and I swear to each other we’re never doing that particular show in a nightclub ever again.

We’re still in Vilnius, it’s Saturday, and I’ve had the day off. I’ve been sightseeing. I’m tired and hungry, and I don’t want to just head back to the hotel, so I find a café with wi-fi, I look up Vilnius on MetalTravelGuide.com, and I find this bar – Bix Baras. I go in, I chat to the staff. Their English, like their beer, is excellent; my Lithuanan barely covers “hello” and “thank you” – but I eat lunch and have a few beers, and then head back to the hotel. When I get there, the tall Belgian guy from the quiz is playing the piano – one of those wonderful grand pianos that adorn hotel bars the world over without anyone ever really playing them. I ask if I can join him; he moves over, I sit down, we play for a while – he’s doing most of the work, I’m just bouncing along on the white notes, picking out pentatonic minor melodies that fit with what he’s doing. It’s fun. It’s nice, and it feels somehow conspiratorial – for starters, we’re playing one of those pianos that you probably walk past every day of your life and assume that it’s for Other People to play, and not for you, and yet here we are.

I go upstairs, change my shirt, ask Seb if he fancies heading downtown for a beer. We need a night off from the whole conference crowd, but on a whim I ask the Belgian guy on the piano if he wants to join us. He says yes, and introduces himself as Pieter. We’ve officially met. We jump in a cab and head downtown.

Bix Baras has good beer, and great snacks, and we talk – myself and Seb and Pieter. We drink, we eat hard cheese and pigs’ ears and Lithuanian dark bread. And it’s remarkable, because there’s no small talk. We talk about ideas, and we share experiences. The conversation is disarmingly easy. I’m not used to this. Most people at conferences talk about tech – about .NET or NodeJS or Docker. We talk about life. We talk about what we do, and why we care. We talk about friendship, and failed relationships, and psychopaths, and adventures. We walk up the road to the pool hall where some of the other BuildStuff gang are having drinks. Pieter and I get talking about speaking. Within the hour, he’s challenged pretty much every idea I’ve ever had about speaking and giving talks, but it doesn’t feel adversarial – there’s something genuinely inspirational about it. We finish our drinks and wander back to the hotel, but the conversation resonates.

Sunday, we fly to Kyiv – a whole crowd of us. I’m walking next to Pieter on the tarmac as we head out to our plane, and he’s talking about how much he’s enjoying the experience – “For the first time ever I feel like I’m on the road with my gang” – and I know exactly what he means. The sense of camaraderie is wonderful – 30-odd hardcore geeks heading out to Ukraine together – yet it somehow didn’t really click until Pieter pointed it out.

In Kyiv, we hang out. We chat. We talk about code, about community, about psychology. I watch his talk about building open source communities. From where I’m sitting, he appears to give a 50-minute talk with no notes and no slides, and solve a Rubik’s cube while he’s doing it. He confides in me afterwards that the cube was a bit of a stunt – shuffle it a couple of turns, memorise them, play them backwards on stage – but that almost doesn’t matter; the talk is brilliant, the audience are involved and engaged, and I’m sat there wondering how much of my life I’ve spent making Powerpoint slides, and why…

Eventually, Pieter turns our conversation in that bar in Vilnius into a blog post – Ten Steps to Better Public Speaking – which is simultaneously gratifying and terrifying. Gratifying that he thinks our conversation is interesting enough to warrant an entire blog post. Terrifying, because when you’re name-checked in a post like that, the only thing you can really do is rise to the challenge, and that means I’m gonna need to REALLY work hard on… well, on every talk I ever give again.

Months pass. I think often of our conversation in that pool hall in Vilnius and the blog post that followed. One day, I email Pieter – “Hey, remember that chat in Vilnius? Do you fancy doing a joint talk at NDC Oslo?” He says yes, I write something up, I send it over, and start worrying about the fact I’ll be sharing a stage with the great Pieter Hintjens – and about the fact I’ve signed up to give a talk that’s gonna drag me out of my comfort zone in almost every way.

By chance, I’m in Brussels in March, en route to a long weekend in Leuven with my girlfriend. I email Pieter, we arrange to meet for lunch: we talk about ideas. He’s riffing on ideas – about opening an office in Brussels for people who need a place to hack; about using mesh networking to build “smart chairs” that tell the pavement café when they need replacing; about speaking and software and people and life. He talks about his father, about euthanasia, about family. We talk briefly about our joint talk and NDC, but not too much; after all, too much rehearsal would undermine the vulnerability. And we part with a hug, and a promise to see each other in Oslo.

I watch Pieter and @jesslynnrose joking on Twitter about gender-swapped TV shows. Pieter posts this: prescient, or just meditative? Then on March 26th, following a whole lot of the kind of fallout that just doesn’t fit into 140 characters, Pieter announces he’s leaving Twitter. I’m sad to see him go, but have no doubt I’ll have many more evenings hanging out and having my preconceptions challenged by this remarkable individual.

Then I get an email. The subject just says “NDC” It reads:

“Hi Dylan,

Seems my cancer has come back... still waiting for detailed prognosis and next steps. Looks pretty bad atm. In any case, no travel for me for the next months.

You're going to have to do the talk by yourself. Stick to the ten rules, watch my Serbian video a couple of times and you'll do fine. :)

Sorry about this.”

I don’t care about the talk. I’m worried about my friend – this sounds bad. I email him back. He replies. Time passes. He rejoins Twitter, because it’s a good way to connect with a lot of people who want to know what’s going on. And then he posts this:

We will try chemotherapy. It's palliative, there is no cure for this. So, time to start saying goodbye.

And then he posts “A Protocol for Dying”, and it’s pretty clear that this is it. One way or another, it won’t be long before Pieter’s not around any more. And people start talking, and posting, and tweeting… and before long, a common thread emerges. It seems you really didn’t need to spend very much time with Pieter for him to leave a lasting impression.

I spent five days with Pieter late last year, and had lunch with him once, a few months ago. I’ve never visited his house, never met his family, never collaborated with him – but the time I’ve spent with him and the conversations we’ve shared have been some of the most profoundly challenging and inspiring interactions I’ve had in a very long time. And it’s not just me. There are countless comments on Pieter’s most recent blog posts from people who met him once or twice – or not at all, in the case of the people who know Pieter through email and through his code – but whom nevertheless believe that knowing him has had a profound impact on their life.

I was in a restaurant earlier tonight, with my girlfriend, Clare, and some of my cow-orkers. We ended up talking about Pieter. Clare met Pieter briefly, for about five minutes, in Bruxelles-Midi railway station back in March. At the time, Clare was feeling completely freaked out at being in an unfamiliar country where she didn’t speak the language or know how things worked, and my meeting up with this weird guy who “looked really stern” didn’t help at all. Pieter warned us (a pair of hardcore Londoners) about the risks and dangers of hanging out in the station, and then helped Clare find her train to Leuven.

When I got that first email from Pieter, I told Clare. When I saw his Twitter post, and when he posted “A Protocol for Dying”, I told Clare – and she’d already read it. And then she said to me tonight “I want to email Pieter. I don’t know him, but I know what’s happening, and I just want him to know that I’ll remember him next time – probably every time? – that I go through Brussels, and I hope one day I’ll be a bit more badass – just like he is.”

So here’s to Pieter, and here’s hoping that long after he’s stopped coding and tweeting and blogging, he’ll still be inspiring all of us to open up, to embrace our vulnerability and “to be a bit more badass”.

Tuesday, 19 April 2016

Coming Soon To A City Near You…

Following on from "The Rest of REST" talk that I've given at several conferences over the last year, I'll be talking about real-world REST at several user groups over the next few months. In this talk, I'll be exploring some REST architectural patterns in more depth, and doing some hands-on demos showing how you can implement these patterns in your own applications.

For this demo, I'll be using NancyFX and the HAL+JSON hypermedia language – there's a beautiful synergy between the dynamic type model in C#, the dynamic model used in NancyFX and the way HAL exploits the dynamic features of JavaScript to extend JSON into a powerful hypermedia format, and I'll be showing how you can wire all these bits together to build flexible, RESTful HTTP APIs.

May 9th I'll be at Smart Devs User Group in Hereford.

May 18th I'll be at The Dev Bakery in Altrincham.

May 26th, I'll be at the Copenhagen .NET User Group in Denmark – and many thanks to the team over at Siteimprove for hosting and sponsoring the event.

Web Governance Tools | Siteimprove

As always - if your user group would be interested in a session on this or one of my of other talks, get in touch!

I'll also be appearing at these conferences throughout 2016:

8/9/10 June I’ll be at NDC { Oslo } in Norway, and I’ll be sticking around for PubConf at NDC { Oslo } on 11th June – the PubConf event in London here in January was an absolute blast, so if you’re going to be in Oslo for NDC, book yourself a hotel for the weekend, see some of the city, and come to PubConf.

22-24 June I’ll be at the Progressive.NET Tutorials at SkillsMatter here in London, running a deep-dive hands-on workshop which will probably be on async/await and asynchronous programming in C# – watch this space for further announcements as we finalise the programme.

8-9 July I’ll be at BuildStuff Odessa – following the successful and hugely enjoyable BuildStuff conference in Kyiv last year, I’m really excited to be going back to Ukraine for a weekend of code, games and sunshine by the sea.

13-15 July I’ll be at FullStack 2016 in London, the conference on “JavaScript, NodeJS and the Internet of Things” – I’ve been helping the team at SkillsMatter finalise the programme for this one, we’ve got some absolutely excellent speakers and sessions lined up, and it promises to be a great event.

3/4/5 August I’m going to be at NDC { Sydney }, joining an amazing programme of speakers to bring the NDC experience to the southern hemisphere.

See you on the road!

National Flag of Denmark  Norwegian Flag Image of National Flag Flag of the United Kingdom  Australian Flag

Friday, 15 April 2016

The Axosoft GitKraken Songwriting Battle!

A few weeks ago, I stumbled across GitKraken – a cross-platform GUI for Git that looks, well, lovely. ‘Cos git is a wonderful, amazing, powerful system, but it’s about time somebody added a little, well, beauty around the place.

image

So I’m using GitKraken daily now, and loving it, and then I find out that they’re having a contest. A song-writing contest. About revision control systems. Now, I love music. I play instruments. I write songs. I write parody songs about tech stuff. There is absolutely no way I can see a songwriting contest about revision control and not enter it… so I did. And here’s my entry. A completely original song, written and recorded for the contest, and called, simply, Git Kraken.

Friday, 8 April 2016

Exupérianism: Improving Things by Removing Things

Last night this popped up on Twitter:

Last year, as part of migrating our main web stack to AWS, we created a set of conventions for things like connection strings and API endpoint addresses across our various environments, and then updated all of our legacy systems to use these conventions instead of fragile per-environment configuration. This meant deleting quite a lot of code, and reviewing pull requests with a lot more red lines than green lines in them – I once reviewed a PR which removed 600 lines of code across fifteen different files, and added nothing. No new files, no new lines - no green at all – and yet that change made one of our most complicated applications completely environment-agnostic. It was absolutely delightful.

When I saw John's tweet, what instantly came to mind was a quote from Antoine de Saint-Exupéry:

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."

So how about we adopt the term "Exupérian" for any change which improves something by making it smaller or simpler? The commit that removes 600 lines of unnecessary configuration. The copy-edit that turns fifteen thousand words of unstructured waffle into ten thousand words of focused, elegant writing. Maybe even that one weekend you spent going through all the clutter in your garage and finally getting rid of your unwanted lamps and old VHS tapes.

Saint-Exupéry was talking about designing aircraft, but I think the principle is equally applicable to software, to writing, to music, to architecture – in fact, to just about any creative process. I was submitting papers to a couple of conferences last week, and discovered that Øredev has a 1,000-character limit for session descriptions. Turns out my session descriptions all end up around 2,000-3,000 characters, and editing those down to 1,000 characters is really hard. But - it made them better. You look at every single word, you think 'does it still work if I remove this?', and it's surprising how often the answer is 'yes'.

Go on, give it a try. Do something #exuperian today. Edit that email before you send it. Remove those two classes that you're sure aren't used any more but you're scared to delete in case they break something. Throw out the dead batteries and expired coupons from your desk drawer. Remove a pointless feature nobody really wants.

Maybe you even have an EU cookie banner you can get rid of? :)

Wednesday, 23 March 2016

How to *really* break the internet.

Yesterday someone broke the internet, and this time it had nothing to with Kim Kardashian. According to The Register and various other sources, developer Azer Koçulu removed over 250 of his own modules from NPM, the package management platform used by almost all open-source JavaScript projects. Among the projects he removed was left-pad, a tiny library that performs string padding. Turns out that literally thousand of open-source projects depend on left-pad – over 2 million downloads last month – and with left-pad removed from NPM, suddenly none of these projects would build any more. Oops.

It's worth remembering this is JavaScript, and one of the many things that makes JavaScript so entertaining is that it doesn't have a standard runtime library. In every other language I have ever used, string padding is either built-in, or it's part of a standard runtime that's available locally on every workstation and build server. But not in JavaScript – if you want to pad a string in JS, you either write your own function, you copy & paste one from StackOverflow, or you import a package to do it for you, and based on the fallout from yesterday it looks like a lot of people went for the package option.

A typical package repository.

Now, package management and dependency management is hard. Projects like NPM and RubyGems and NuGet appear to have made it a lot easier, but in most cases what they're actually doing is asking us to relinquish control of elements of our own projects in exchange for convenience. And they are convenient - as long as you're online, and everything's working, and everybody's being friendly, it's great – but I don't think we, as an industry, have done nearly enough to understand what happens if those assumptions turn out to be false.

Say you're about to get on a long-haul flight; just before they call your gate, you git clone a couple of projects onto your laptop and then board the plane. Would you be able to build the project once in flight? I suspect in many cases the answer is "no", because standard practice these days is to download all the required libraries as the first stage of the build process – and that means you can't build your project without being online.

Ok, that one's easy. This time round you shrug and watch the inflight movies instead, and next time you remember to do a full build before you go offline. Not a big deal. But what would happen if nuget.org was offline? Online services go dark for all sorts of reasons - infrastructure problems, legal action, DoS attacks. Most of the time, they come back, but not always – in 2014, Code Spaces was completely wiped out by an attacker who gained access to their AWS account. Does your build pipeline quietly rely on the fact that nuget.org isn't going to go away? And if it did, how long would it take to get things back up and running?

View post on imgur.com

But NuGet or NPM going down isn't even the worst case scenario. According to NuGet.org, the most download packages over the last six weeks are:

  1. NewtonSoft.Json (2.4M downloads)
  2. jQuery (1.2M downloads)
  3. Microsoft.AspNet.Mvc (993K downloads)
  4. Microsoft.AspNet.Razor (987K downloads)
  5. EntityFramework (974K downloads)

Now, let's imagine a Nefarious Super Villain breaks into James Newton-King's house one night and forces him at gunpoint to deploy a new point release of NewtonSoft.Json. A release that works perfectly, only every time you call JsonConvert.Serialize(), it sneakily posts a copy of the JSON output to an IP address in Russia. How long before you ended up running this malicious code on your own production systems? How long before you noticed something was amiss – assuming you ever noticed at all? OK, somebody would notice eventually – that's the beauty of open-source, after all – but what about closed-source libraries? If you're using EntityFramework, I'd wager good money that you're using a single set of database credentials that has read/write/delete access to all your data – and trusting that the code isn't going to do anything unpleasant.

EDIT: Demis Bellot pointed out on Twitter after I first posted this article that source code is just one of many attack vectors that exploit our faith in package repositories. Unless you're comparing checksums or building your own reference binaries, you're blindly trusting that the source you're reading on Github is the same source that was used to built the binaries that are now running on your production servers.

There's ways around some of this. At the very least, cache the packages used by your recent builds in case you need to run them again. Most package repositories run over HTTP and use stable URLs, so all you really need is a caching proxy between your build pipeline and your package repo servers. Here at Spotlight, we run a NuGet server on our LAN that hosts our own packages, and we've also set up TeamCity so that following a successful build, all the .nupkg files (including the dependencies used by the build) are published to our local NuGet server. We actually started doing this by mistake – we set up a build to publish our own .nupkg files to our server and then realised afterwards it had also picked up all the package dependencies – but it actually works pretty well, and it means that if nuget.org was to go dark for a while, we could still build & deploy software as long as we didn't update any package dependencies.

As for package dependencies as an attack vector for malicious code? That one's a lot harder. We often do some ad-hoc traffic inspection as part of our review process – fire up the application with SQL Monitor and Fiddler running, take a look at the network traffic, database activity, debug logs and so on just to make sure we haven't done something stupid – but it's interesting to think how this could be turned into something more rigorous.

But as with so much else, it boils down to a choice – trust people, or do everything yourself. One is a calculated risk, one is a guaranteed expense, and it's managing that balance between risk and cost that makes IT – and business – so endlessly fascinating.

Monday, 29 February 2016

The Mysterious Case of the Latin General: Collations in SQL Server

Computers deal in absolutes. True, false, yes, no. So when you ask a computer if two things are the same, it can’t get away with “well, sort of… it depends, really; why are you asking?”.

But we use computers to model reality, and reality is a horrible mess of edge cases, quirks and idiosyncracies. Today is Leap Year Day – February 29th. Which means you can do this:

var today = DateTime.Now.AddYears(1).AddYears(-1);
var yesterday = DateTime.Now.AddDays(-1);
if (today == yesterday) { Console.WriteLine("In your face, Einstein!"); }

And if you think that's messy, wait until you start looking at string comparisons. Which, by happy coincidence, was the thing on the top of my stack when I got into work this morning.

OK, pop quiz time! Are these statements true or false?

  1. "foo" == "foo"
  2. "FOO" == "foo"
  3. "café" == "cafe"
  4. "œnophile" == "oenophile"
  5. "strasse" = "straße"
  6. "SIGURÐAR" == "SIGURDAR"

Well? How did you do? See, the answer in almost every case is that lovely old chestnut "it depends". And when you're dealing with text columns in Microsoft SQL Server, the thing it depends on is called a collation. A collation is a set of rules that defines whether or not two strings should be considered to be equal, and what 'alphabetical order' means for ordering a given set of strings. For years, my team has run most of our databases systems with a collation called SQL_Latin1_General_CP1_CI_AI – and the important part there is the CI_AI at the end, which indicates that this is a case-insensitive (CI), accent-insensitive (AI) collation, which means that if you search our database for 'CAFE', you'll find records containing 'café', because the case-insensitive rule says that "C" and "c" are the same, and the accent-insensitive part says "ignore the accent – pretend that é is the same as e"

Now, in recent versions of SQL Server, Microsoft has introduced a new set of collations to choose from – which means that even though we've already decided we want a Latin alphabet, with case-insensitive and accent-insensitive string comparisons, we still have a choice of six:

SELECT * FROM fn_helpcollations() WHERE name LIKE '%Latin1_General%CI_AI'

Latin1_General_CI_AI

Latin1-General, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive

Latin1_General_100_CI_AI

Latin1-General-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive

SQL_Latin1_General_CP1_CI_AI

Latin1-General, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 54 on Code Page 1252 for non-Unicode Data

SQL_Latin1_General_CP1253_CI_AI

Latin1-General, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 124 on Code Page 1253 for non-Unicode Data

SQL_Latin1_General_CP437_CI_AI

Latin1-General, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 34 on Code Page 437 for non-Unicode Data

SQL_Latin1_General_CP850_CI_AI

Latin1-General, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 44 on Code Page 850 for non-Unicode Data

What I'm specifically interested in here – because it's my job as architect to decide this kind of thing - is why we might choose to use the new Latin1_General_CI_AI collation instead of the tried'n'tested SQL_Latin1_General_CP1_CI_AI collation. So I did a little Googling, and dug up this excellent article, which explains a bit about how these collations differ when it comes to Unicode character expansions. See, there are quite a lot of 'letters' which can be written as either one or two characters, depending on your cultural conventions (and what kind of keyboard you've got). Typesetting purists will refer to archæologists and œnophiles, whilst the rest of us have to settle for archaeologists and oenophiles because we don't know how to type "æ" and "œ". And then there's the German "sharp S", which is written as either 'ß' or 'ss' depending on who's writing it.

Now here's the real-world scenario where this matters. Let's say we've got a database of rock music clubs, and someone's added Roadrunner's Paradise in Berlin, and – because they're German, typing in German on a German keyboard – they're put the address in as "Saarbrücker Straße 24, 10405 Berlin". Now one of our British users is on holiday, and they're in Berlin, and they're trying to find out if there are any rock clubs on Saarbrücker Straße. Except because they've got an English guidebook and an English keyboard, they search for "saarbrucker strasse", because they don't know how to type the umlaut or the sharp-S 'ß' character.

Whether their search works or not can potentially depend on several things

  • The character type (varchar or nvarchar) of our StreetAddress column
  • The collation of our StreetAddress column
  • The character type specified in the query – SQL will assume string literals are ANSI strings unless you prefix them with N, so LIKE 'saarbrucker strasse' is different to LIKE N'saarbrucker strasse'
  • Whether or not we explicitly specify a collation in our WHERE clause.

OK, so we have two different column types (varchar vs nvarchar), we have two different collations (Latin1 vs SQL_Latin1), we have two different ways of specifying our search text (N'foo' vs 'foo'), and we have three different ways to specify collation in the query (explicit Latin1, explicit SQL_Latin1, or implicit) - meaning there are 2*2*2*3 = 24 different ways to store and query the data supporting this particular scenario.

So let's try them all. For SCIENCE. Code to create and run these examples is at https://gist.github.com/dylanbeattie/c9705dc5c4efd82f60b8, and when you run it, it works like this:

Column Type

Colum collation

Search text

Query collation

Did it work?

nvarchar

Latin1

'%saarbrucker strasse%'

(implicit)

Yes

nvarchar

Latin1

'%saarbrucker strasse%'

Latin1

Yes

nvarchar

Latin1

'%saarbrucker strasse%'

SQL_Latin1

Yes

nvarchar

Latin1

N'%saarbrucker strasse%'

(implicit)

Yes

nvarchar

Latin1

N'%saarbrucker strasse%'

Latin1

Yes

nvarchar

Latin1

N'%saarbrucker strasse%'

SQL_Latin1

Yes

nvarchar

SQL_Latin1

'%saarbrucker strasse%'

(implicit)

Yes

nvarchar

SQL_Latin1

'%saarbrucker strasse%'

Latin1

Yes

nvarchar

SQL_Latin1

'%saarbrucker strasse%'

SQL_Latin1

Yes

nvarchar

SQL_Latin1

N'%saarbrucker strasse%'

(implicit)

Yes

nvarchar

SQL_Latin1

N'%saarbrucker strasse%'

Latin1

Yes

nvarchar

SQL_Latin1

N'%saarbrucker strasse%'

SQL_Latin1

Yes

varchar

Latin1

'%saarbrucker strasse%'

(implicit)

Yes

varchar

Latin1

'%saarbrucker strasse%'

Latin1

Yes

varchar

Latin1

'%saarbrucker strasse%'

SQL_Latin1

No

varchar

Latin1

N'%saarbrucker strasse%'

(implicit)

Yes

varchar

Latin1

N'%saarbrucker strasse%'

Latin1

Yes

varchar

Latin1

N'%saarbrucker strasse%'

SQL_Latin1

Yes

varchar

SQL_Latin1

'%saarbrucker strasse%'

(implicit)

No

varchar

SQL_Latin1

'%saarbrucker strasse%'

Latin1

Yes

varchar

SQL_Latin1

'%saarbrucker strasse%'

SQL_Latin1

No

varchar

SQL_Latin1

N'%saarbrucker strasse%'

(implicit)

Yes

varchar

SQL_Latin1

N'%saarbrucker strasse%'

Latin1

Yes

varchar

SQL_Latin1

N'%saarbrucker strasse%'

SQL_Latin1

Yes

So there's the difference. If you're dealing with varchar columns (as opposed to nvarchars), and you specify your query text as an ANSI string (as opposed to a Unicode string), then you need to use the Latin1_General_CI_AI collation if you want "strasse" to match "straße"; in all other scenarios, 'ss' is equal to 'ß' (and apparently this is applies to all Latin collations regardless of accent sensitivity).

Of course, this works both ways – by using Latin1_General_CI_AI, you create a scenario where somebody searching for '%aße%' will find English words like 'assessment' and 'molasses', because as far as SQL Server is concerned, 'aße' and 'asse' are the same string. If that's a problem for your particular scenario, you'll need to do something like restrict your search inputs to the basic ASCII character set.

As with almost everything in software development, modelling the real world involves making compromises. It's our job to make sure we're aware of those compromises, and that we understand how they translate back into real-world behaviour, and it's up to us to explain them to users who think our software sucks because it doesn't know what day it is and it says there's no rock bars in Saarbrucker Strasse.

Happy Leap Year Day!

Tuesday, 23 February 2016

Better Hypermedia Through Obfuscation

Here’s a fun and completely twisted idea that came to me on the train this morning. One of the constraints of ReSTful architectures is that they use hypermedia as the engine of application state – in other words, clients shouldn’t maintain their own information about where they “are” and what resources and actions are available to them right now; they should rely on hypermedia embedded in the current resource representation they’ve retrieved from the server. A common example of this is pagination – navigating big lists of resources, using a representation something like this example:


  GET /people

  "_embedded": {
    "people": [ /* array of 10 people */ ]
  },
  "_links": {
    "first": { "href": "/people?page=0" },
    "last": { "href": "/people?page=12" },
    "next": { "href": "/people?page=1" },
    "self": { "href": "/people?page=0" }
  },
  "count": 10,
  "total": 115
}
Now, with an API like this, it’s all too easy for the client – or rather, the person building the client – to go “ok, page number is a zero-based integer; let’s cut a few corners here” and just program something like

for(var i = 0; i < 12; i++) {
    http.get(“https://api.foo.com/people?page=”+page);
}
Now, I’m a big fan of something we call the "pit of success" – the idea being that we "build platforms such that […] developers just fall into doing the 'right thing'", and more generally, the idea that the easiest way to achieve something is also the ‘correct’ way to achieve that thing. So what if we intentionally obfuscate our APIs so that hypermedia navigation becomes easier than building a for() loop? By, for example, requiring that our page number is written out longform, instead of numerically? And, just for fun, we’ll require that it’s in a language that isn’t in the regular ASCII character set. Like Ukrainian:
{
    "first": { "href": "/people?page=один" },
    "last": { "href": "/people?page=дванадцять" },
    "prev": { "href": "/people?page=два" },
    "next": { "href": "/people?page=чотири" },
    "self": { "href": "/people?page=три" }
}
Suddenly, your ‘shortcut’ isn’t a short cut any more. For starters, you’ll probably need to install a special keyboard in order to type those characters – not to mention suddenly your source code files will need to be in UTF8, and I’ll wager that somewhere in your build pipeline there’s a tool that can’t handle UTF8 source code. And you’ll need a routine which can translate integers into Ukrainian string representations… No, the easiest thing to do now is to retrieve the first resource, parse the next.href property, retrieve that resource, and so on until you hit a resource with no next link. Which, of course, is exactly how hypermedia is supposed to work in the first place.