Friday, 21 April 2017

There’s a problem with the phalange!

Yesterday I was throwing together a quick Entity Framework prototype to inspect and wrangle some data held in one of our legacy databases. I’m using the Entity Framework “Code first from Database” approach, where you use the tooling to generate your initial model for you but thereafter you modify it by hand. One of the tables I’m working with here is called LookupRanges, so when I generated a bunch of entities and DbSet<> mappings, I was a bit surprised when I ended up with a class called LookupRanx in my new model.

It took a minute or two to ascertain that yes, Entity Framework had mapped my LookupRanges (plural) table name onto a class called LookupRanx. But where on earth did that ‘Ranx’ come from? My hunch here is that somebody who worked on this pluralization code remembered that the English word phalanx has the plural form phalanges – and so implemented a rule that says ‘any word ending in –anges should be singularizaed to –anx. Out of curiousity, I dug out my huge ASCII file of English words, found all the words ending in *anges, and hacked up a quick SQL script to create tables named for all these words so I could run them through EF and see what class names were generated.

Well, it gets ‘changes’ and ‘phalanges’ right – and gets literally every other case wrong.

image

Now this, to me, is an outstanding example of one of the biggest problems in software development… smart people like working on things that are interesting, and will frequently spend time doing something that’s interesting instead of something that’s important.

Writing a library that can singularize and pluralize English words is fascinating. It’s a never-ending problem with dozens of rules and hundreds of edge cases, and you learn a lot of weird and cool esoteric facts about language and etymology whilst you’re doing it. But in this instance, something started out as a good idea (“hey – wouldn’t it be cool if the model generator would convert plural table names to singular class names?”), and got bogged down in edge cases (“is the plural of ‘tableau’ really ‘tableaux’?”) and – in this instance – ended up with a bizarre bug because one of those so-called edge cases actually ended up breaking the default – and entirely correct – behaviour.

First, phalanges is extremely unlikely to ever show up as the name of a table in an Entity Framework database model. I can think of dozens of real-world scenarios where you’d end up with a table name ending with –Ranges, –Exchanges or –Interchanges, but I’m honestly struggling to think of any remotely likely scenario where you have a Phalanges table in a SQL Server database. This is the kind of thing where the ticket or the user story probably just says ‘implement pluralization’, and then there’s no sub-prioritization or further analysis about just how much pluralization needs to be implemented. Second – it’s kind of a stupid edge case. We’re not trying to win points on University Challenge here, we’re building software. In contemporary English, phalange is an acceptable singular form, and phalanxes is an acceptable plural form. There’s no reason at all why they needed to implement support for this particular edge case. And third: if you really found yourself in a scenario where you had to map the Phalanges table to the Phalanx class, you can just rename it. It’s easy. Visual Studio has first-class support for this kind of refactoring.

And, as if that wasn’t confusing enough, there’s actually source code for an EnglishPluralizationService.cs on Microsoft’s GitHub repository. Somebody obviously had a lot of fun building this, tracking down all those bizarre little edge cases like seraph/seraphim, hippopotamus/hippopotami – but according to this implementation, the plural of phalanx is…  go on. Go and take a look.

Now, though, I’m going to eat an oranx and check my email. Using Microsoft Exchanx, of course.

1 comment:

trysometruth said...

I couldn't agree more that auto-pluralization is a pretty mystifying way to use up resource time when creating developer tools. But I would give a low percentage to the notion that the human who was pouring over pluralization edge cases was doing so because it seemed like a cool and interesting thing to do. To me, this smells like management -- some manager decreed that this was something that should be done, and the person who got tasked with this could think of a hundred things they'd rather be doing. But when one is looking askance at the best usage of time in a busy world filled with important To-Do's, you have to admit writing a blog article with deep investigation into this subject might not rise to the top of importance, and I have to admit commenting on such an article is indicative of how much I deeply wish to procrastinate on my currently assigned task.