Thursday, 27 November 2008

Lies, Damned Lies, and Statistics

One of our big customer contracts is up for renegotiation next month. This involves pulling a list of all the search & site activity that originated from that customer over the last year, and then negotiating based on whether usage is up or down. Over the last few years we've seen 10%-15% increases from this particular account, year-on-year, which is good. Yesterday morning I ran the stats report and got this:

Not good. In fact, very very worrying indeed. Whilst the marketing team went into crisis mode to work out what the hell we were going to do if this was real, I started double-checking to make sure this was genuine. It certainly looked genuine. The graph is horribly organic, the way the decline is gradual, occasional peaks and troughs, but with a very, very definite downward trend. In my experience, when software fails, it tends to fail in big straight lines - everything just stops working completely and stays there.

Turns out the stats were wrong - huge sigh of relief all round - but the reason why they were wrong is, I think, quite interesting. These statistics are calculated using some custom logging routines in our (legacy ASP) web code. When a user first hits the site, we create a record in the UserSession table in our database that stores their IP address, user agent string, user ID, and so on. There's some counter fields in that table that are incremented over the course of the session as the user accesses particular resources, so we can build up a fairly accurate picture of which resources get accessed heavily, by whom, and at what times throughout the day.

Well, it turns out our CreateUserSession() routine was failing if the browser's UserAgent string was more than 127 characters. Historically, this was never a problem, but at some point last year Microsoft started putting all sorts of information about .NET framework versions and plugins into the HTTP_USER_AGENT header sent by Internet Explorer (Scott Hanselman has a great post about this if you're interested)  As various updates were pushed out to our users via Windows Update and corporate rollouts, the user agent strings were getting longer and longer, until one day they'd exceed 127 characters - and that particular PC would stop showing up in our logs. Whenever they'd roll out new hardware, we'd see the stats increase temporarily, until those new boxes were upgraded and the same thing happened. Hence the gradual decline and the fact that non-IE users were unaffected.

We would have noticed this a long time ago, of course - but the CreateUserSession() call was wrapped in a try/catch block that called a notification function when it caught an exception, and somewhere along the line, the notification mechanism for this particular system had been commented out. I'd love to blame someone else for this, but Subversion has a commit with my name on it sometime last year with the relevant line mysteriously commented out.

I believe the kids are calling that an "epic fail". I believe they have a point.

Wednesday, 19 November 2008

HQL-lo World

I've been playing with Castle ActiveRecord for a project I'm working on, and hit a brick wall earlier tonight that left me completely stuck for a couple of hours... and turned out to be incredibly simple and obvious. Turns out I'd refactored one of my business objects - from Page to CmsPage - and hadn't noticed that in one particular place in the code, I was doing this:

var rootPages = new SimpleQuery<CmsPage>(@"from Page p where p.Parent is null");
return (rootPages.Execute());

The Execute() call there was throwing an ActiveRecordException that just said {"Could not perform ExecuteQuery for CmsPage"} - no InnerException, nothing showing up in SQL Profiler, nothing except a bunch of query strings that all looked fine to me:


Even enabling ActiveRecord logging (which was wonderfully easy, by the way) didn't help - I couldn't see anything obviously amiss in the NHibernate logs.

Turns out I'd not yet got my head around a fundamental concept of object-relational mapping, namely that you are querying your objects, not your database. The string literal in the SimpleQuery definition that looks a bit like LINQ is HQL - Hibernate Query Language. I'd used the [ActiveRecord(Table="Page")] attribute to map the renamed class to the underlying DB table, which is still called Page, and it just completely didn't occur to me that the HQL query needs to be changed to reflect the new class name. Change that query to

var rootPages = new SimpleQuery<CmsPage>(@"from CmsPage p where p.Parent is null");

and it works as intended. I fear this ORM stuff is going to take some getting used to...

Saturday, 8 November 2008

Usability Tip of the Day: Label your Form Elements, Dammit.

I see high-profile, expensive, shiny, corporate websites all the time that don’t label their form inputs. It’s easy. It’s accessible. And – in the case of checkboxes and radio buttons, where the form inputs themselves are about this big :, it’s massively helpful, because in almost every modern browser, you can click the label instead of having to click the actual form element. It’s staggering that so-called professional web developers don’t label their form elements properly. Here’s how you do it:

<input type=”radio” id=”beerRadioButton” name=”beverage” value=”beer” />
<label for=”beerRadioButton”>Beer</label>
<input type=”radio” id=”wineRadioButton” name=”beverage” value=”wine” />
<label for=”wineRadioButton”>Wine</label>

(In ASP.NET WebForms, if you set the AssociatedControlId of an <asp:Label /> control, it’ll render an HTML label element with the correct for=”” attribute; if you omit the AssociatedControlId attribute, it won’t even render as a label...)

Here's a form example without the labels wired up properly:

And here's the same radio buttons, with the labels wired up properly. See how in the this example, clicking the labels will select their associated radio-buttons, but in the previous form, you have to actually click the radio-button itself?
Just do it, OK? It helps people using screen readers. It helps mobile browsers. And it helps people downloading Lego instructions after a couple of beers. Trust me.

Working on ASP.NET MVC Beta and Preview code side-by-side

I have an app built against ASP.NET MVC Preview 3 that needs some tweaking, and I'm also working on a couple of projects using ASP.NET MVC Beta, so I'm in the slightly odd situation of trying to build and run preview 3 and beta projects side-by-side. (Yes, I will be updating this code to run against the beta version. I don't have time to do that this weekend, though, and I need some changes live before Monday afternoon.)

I've just checked-out the preview 3 project to make some changes, and although it builds absolutely fine, I'm seeing the lovely Yellow Screen of Death when I try and run it:

Server Error in '/' Application.

Method not found: 'Void System.Web.Mvc.RouteCollectionExtensions.IgnoreRoute   (System.Web.Routing.RouteCollection, System.String)'.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.  Exception Details: System.MissingMethodException: Method not found: 'Void System.Web.Mvc.RouteCollectionExtensions.IgnoreRoute(System.Web.Routing.RouteCollection, System.String)'. 

This is weird, because this code is deployed and running live on a box that doesn't have any versions of MVC installed; in theory, the project is entirely self-contained and XCOPY-deployable. First thing I tried was to shut down Visual Studio, uninstall ASP.NET MVC Beta, reinstall Preview 3, reload VS2008. That worked, so it's definitely the beta doing something strange. This project has hard-wired references to copies of the MVC assemblies in the \Dependencies folder of the solution, which are copied to the \bin folder during the build. It looks like the beta is installing something that's interfering with this process. Frustratingly, the installers also set up the MVC Web Application project type in Visual Studio, so although I can run the site without any versions of MVC installed, I can't open it in VS2008 because of the "project type is not supported" error.

Ok, first thing to realize is that, according to ScottGu's beta release blog post, the beta installs System.Web.Mvc, System.Web.Routing and System.Web.Abstractions to the GAC to allow them to be automatically updated. The preview versions of MVC would only install them to C:\Program Files\Microsoft ASP.NET\.

Given this particular chunk of web.config code:

 <compilation debug="true">
    <add assembly="System.Web.Mvc, Version=, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>

the runtime is going to use the first version of System.Web.Mvc matching the specified culture, version number and public key token. This is significant because the CLR checks the GAC first when resolving assembly references - and if it finds a matching assembly in the GAC, it won't look anywhere else. The ASP.NET MVC previews and beta release all use the same assembly version, culture and public keys, so the CLR has no way of distinguishing between the preview 3 version of System.Web.Mvc and the beta version of the same assembly. They're different DLLs with different file versions, but because the assembly version is the same, the CLR regards them as the same assembly.

There are techniques you can use to override this behaviour, but, according to this thread on StackOverflow, these techniques only work if the assembly in the GAC has a different version to the assembly that's deployed with your application.

Ok - no problem, we'll just remove System.Web.Mvc from the GAC, by running gacutil.exe /u to uninstall it.

C:\Documents and Settings\dylan.beattie>gacutil /u system.web.mvc
Microsoft (R) .NET Global Assembly Cache Utility. Version 3.5.30729.1
Copyright (c) Microsoft Corporation. All rights reserved.

Assembly: system.web.mvc, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35, processorArchitecture=MSIL
Unable to uninstall: assembly is required by one or more applications
Pending references:
SCHEME: <windows_installer> ID: <msi> DESCRIPTION : <windows installer>
Number of assemblies uninstalled = 0
Number of failures = 0

C:\Documents and Settings\dylan.beattie>

Works on MY Machine! OK, that didn't work. Because we installed the ASP.NET MVC beta using Windows Installer, it's registered a dependency on System.Web.Mvc that means we can't uninstall it. So... registry hack time. This is the bit that might kill your PC, wife, cat, whatever.  Editing the registry is dangerous and can cause all kinds of problems, so read this stuff first, and if it sounds like a good idea, proceed at your own risk.

Fire up regedit and navigate to HKEY_CLASSES_ROOT\Installer\Assemblies\Global, and you should find a key in there called


I deleted this key. I also got a bit carried away and deleted the key

System.Web.Mvc,,,31bf3856ad364e35,MSIL  from


as well... but I forgot to try gacutil /u first, so I don't know whether this second step is necessary or not. It seemed like a good idea, though, and doesn't appear to have broken anything, so you may or may not need to delete this second key as well.

Having removed those keys, I could run gacutil /u and remove System.Web.Mvc quite happily:

C:\>gacutil /u System.Web.Mvc
Microsoft (R) .NET Global Assembly Cache Utility. Version 3.5.30729.1
Copyright (c) Microsoft Corporation. All rights reserved.

Assembly: System.Web.Mvc, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35, processorArchitecture=MSIL
Uninstalled: System.Web.Mvc, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35, processorArchitecture=MSIL
Number of assemblies uninstalled = 1
Number of failures = 0


My preview 3 project now builds and runs quite happily against the System.Web.Mvc DLLs installed as part of the website, and the VS2008 MVC Project template still works just like it did before.

Monday, 3 November 2008

A Rant about RAID, with a Bad Metaphor about Eggs, and No Happy Ending.

I went in to work this morning and my main workstation had died over the weekend. Bluescreen on boot, no safe mode, nothing. Windows Update gone bad? We'l l probably never know, given I don't think it's coming back any time soon... but, as with previous overnight machine suicides, it looks like a problem with SATA RAID - specifically, two WD Velociraptors in a RAID-1 (mirror) array controlled by an Intel ICH10R chipset on an Asus P5Q motherboard.

You know your whole eggs & baskets thing, right? SATA RAID is like carefully dividing your eggs into two really good baskets, then tying them together with six feet of wet spaghetti and hanging them off a ceiling fan.

Long story short, I lost a day, and counting. I had to split the mirror into individual drives, switch the BIOS back to IDE, which gave me a bootable OS but - seriously - no text. No captions, no icon labels, no button text, nothing. Just these weird, ghostly empty buttons. Running a repair off the WinXP x64 CD got my labels back, but somehow left Windows on drive D. Another half-hour of registry hacks to get it back to drive C: where it belongs, and I had a creaking but functional system - VS2008 and Outlook are working, but most of my beloved little apps are complaining that someone's moved their cheese. Reinstalling is probably inevitable, along with the deep, deep joy that is reinstalling Adobe Creative Suite when your last remaining "activation" is bound to a PC that now refuses to deactivate it. Even Adobe's support team don't understand activation. Best they could come up with was "yes, that means there's no activations on that system." Err, no, Mr. Adobe, there are. It was very clear on that point. Wouldn't let me run Photoshop without it, you see. "Oh... then you'd better just reformat, and when you reinstall, you'll need to phone us for an activation override". Thanks, guys. I feel the love.

Sorry, I digress. This whole experience is all the more frustrating because RAID mirrors are supposed to be a Good Thing. If you believe the theory, RAID-1 will let you keep on working in the event of a single drive failure. Well... In the last 5 years or so, I haven't had a single workstation die because of a failed hard drive, but I've lost count of the number of times an Intel SATA RAID controller has suddenly thrown a hissy-fit under Windows XP and taken the system down with it. Every time it starts with a bit of instability, ends up a week or two later with bluescreens on boot and general wailing and gnashing of teeth, and every time, running drive diagnostics on the physical disks shows them to be absolutely fine.

This is across four different Intel motherboards - two Abit, one Asus, and a Dell Precision workstation - running both the ICH9R (P35) and ICH10R (P45) chipsets, and various matched pairs of WD Caviar, WD Raptor, WD Velociraptor and Seagate drives. One system was a normal Dell Precision workstation, the others are various home-built combinations, all thoroughly memtest86'ed and burned-in before being put into production doing anything important.

Am I doing something wrong here? I feel like I've invested enough of both my and my employer's time and money in "disaster-proofing" my working environment, and just ended up shooting myself in the foot. I'm beginning to think that having two identical workstations, with a completely non-RAID-related disk-mirroring strategy, is the only way to actually guarantee any sort of continuity - if something goes wrong, you just stick the spare disk in the spare PC and keep on coding. Or hey, just keep stuff backed up and whenever you lose a day or two to HD failure, tell yourself it's nothing compared to the 5-10 days you'd have lost if you'd done something sensible like using desktop RAID in the first place.

[Photo from bartmaguire via Flickr, used under Creative Commons license. Thanks Bart.]

Sunday, 2 November 2008

The Roadcraft of Programming

I was chatting with Jason "Argos" Hughes after the Skillsmatter event last week, and he said something I think is really quite brilliant, so I hope he doesn't mind if I quote him here and expand on his ideas a little.

We were discussing the merits of various different platforms and programing languages, and he said "knowing a language inside-out doesn't make you a better programmer, any more than knowing a lot about a particular car makes you a better driver".


That comment has been going round and round my head ever since, and I think that's one of the most insightful metaphors about programming languages that I've heard. Anyone who's owned a car will know that every make and model - and every individual example of a particular model - has its idiosyncrasies and quirks. I drive a slightly knackered Vauxhall Tigra. On this particular car, I know that I need to replace the cam-belt every 40,000 miles or Really Bad Things might happen. I know that I need to clean the gunk out of the frame around the back window otherwise it fills up with rainwater; I know where the little lever to adjust the seats is, and where all the various controls and switches are, and how to check the oil and change the headlamp bulbs. 

None of this makes me a good driver. In fact, it has absolutely nothing to do with my driving ability. Beyond a basic familiarity with a vehicle's controls and signals, the Highway Code has very little to say about the quirks and idiosyncrasies of particular cars.  On the other hand, it has rather a lot to say about stopping distances, speed limits, lane discipline, the importance of maintaining awareness of your surroundings and communicating your intentions clearly to other road uses. In other words, being a good driver boils down to discipline, restraint, awareness and communication - your choice of vehicle is largely irrelevant. Good drivers are good whatever they're driving, and the choice of car alone can't turn a poor driver into a good one.

I think there are strong parallels here with software development. Good coders are like good drivers; they'll work within the safe parameters of whatever technology they're using, exercise restraint and discipline in the application of that technology, and rely on awareness and communication to make sure that they're doing doesn't create problems for other people.

Programming interviews can easily degenerate into a pop-quiz about the characteristics of a particular language or platform, but maybe we should be approaching them more like a driving test - even to the extent of letting the candidate demonstrate their problem-solving capabilities using whatever languages and tools they're comfortable with, and then discussing the results in terms of clarity, effective communication, restraint and awareness. Even though we're a .NET shop, I can see how a developer who can create elegant solutions in Ruby or Java and explain clearly what they've done might be a better .NET programmer than somebody who knows every quirk of C# and ASP.NET but can't demonstrate those core qualities of discipline, restraint, awareness and communication.

Huagati DBML Tools for Linq-to-SQL

The bridge in Central Park, NY. Nothing to do with DBML tools. Just looks pretty.

I haven't had a chance to use them yet, but the Huagati DBML/EDMX tools look interesting - a set of extensions to the DBML designer in Visual Studio 2008 that provide some additional functionality, including the much-needed ability to update your DBML to reflect changes in your database schema. It's a commercial package costing $119.95 per user, but a free trial license is available.

With Microsoft effectively abandoning Linq-to-SQL, it's good to see tools like this in the wild. Of course, it'd be really good to see Microsoft open-source Linq-to-SQL and let the community develop it as they see fit... but failing that, these tools can make things easier if you're maintaining an existing Linq-to-SQL system.