Saturday, 28 February 2009

Critical Mass

I’m pretty much a control-freak nerd with antisocial tendencies and an aversion to physical exercise. Critical Mass involves joining 200 total strangers on a bicycle ride around London with no plan, no fixed route and no organisers… so I really have no idea why I enjoy it so much. But I do.

It might be that whole snow-day thing, of witnessing a phenomenon that you couldn’t recreate no matter how hard you tried so you really, really try to absorb and remember as much detail as you can. Or it might be the sheer joy of cycling up Whitehall and down Oxford Street without wondering if you’re going to die horribly every time you go near a bus. It’s the last Friday of every month, a whole crowd of cyclists appear under Waterloo Bridge by the BFI and cycle around London. It’s not a protest, it’s not organised, there’s no destination or agenda; it’s just a bunch of people who love cycling and happen to be following one another. Thing is, when the group of cyclists reaches a certain size (the “critical mass” they’re named after), the normal rules of cycling in London are turned upside down, and the traffic has to get out of your way for a change.

The London rides have been happening since 1994, but I only found out about them last year. I’d been following the coverage of the legal dispute between the police and the cyclists about Critical Mass, which was eventually resolved in favour of the cyclists  late last November. Two days later, I was in the cafĂ© at Foyles bookstore on Charing Cross Road and saw the Critical Mass procession going past – two bikes with sound systems, a couple of unicyclists, a penny-farthing or two, and several hundred regular cyclists – and I just thought “yeah, I want to do that!” I went along to the next ride – which was Boxing Day, and absolutely freezing, but really good fun nonethelessand tonight was my second ride. 15 miles or so, from Waterloo to Kings Cross, Euston, then along Hyde Park into Kensington and back through Knightsbridge.

Critical Mass London, February 2009 by you.

Every day, I try to ask myself “why will I remember today? What makes today stand out from all the others?” Well, today’s was cycling down Oxford Street, cyclists as far as the eye can see, shoppers applauding and cheering, AC/DC blasting out from the bike next to me…  awesome. But don’t take my word for it – grab your bike and join us for the next one, wherever you are.

Friday, 27 February 2009

Blog woes…

Windows Live Writer on Windows 7 is a little… unstable. "The “Save” button crashes it. So I was playing around with a couple of other blog clients today… two non-starters and one that helpfully ignored the “post as draft” button. So ignore the post about nulls being null, ‘cos it’s not finished, and bear with me until I get this thing sorted. (Other than the Live Writer problems, Windows 7 is really looking quite nice. Like Vista but quicker and less… opinionated. I like it.)

Thursday, 19 February 2009

A Post About Nothing, or, How I Learned To Stop Worrying And Love Nullable<T>

Iain Holder has an interesting post about Nullable<bool> over on his blog, about the pitfalls of using nullable boolean values:

What does null mean? It's got to mean something. You're either pregnant or you're not. You can't have a third state. A light switch is either on or off. If a light switch doesn't exist then it's potentially very dangerous. Almost as dangerous as having a nullable bool.

nullable_field_warning

Personally, I like null; I think it’s a wonderfully useful and often-misunderstood concept. Null is not dangerous per se, but used correctly, it can be a very effective warning sign.

Null means we don’t know – or alternatively, that question doesn’t make sense in this context.

To run with Iain’s example for a moment, let’s write a patient tracking system for a hospital; every time a patient arrives, we store their medical details in a database – blood type, next of kin, whether they’re pregnant, that kind of thing – so that the doctors can recommend appropriate treatment. All fields are required – no nulls allowed. It’s nice, simple, saves time, saves lives.

All very well, until an ambulance brings in an unconscious woman who might need X-rays. Is she pregnant? She might be – you don’t know; she came in alone, and she’s unconscious. The clock is ticking… an X-ray might save her life, but if she’s pregnant, it could harm her unborn child. If our patient system makes us choose true/false with no third option, we’re potentially making a very dangerous assumption either way. We need a third option, so when a doctor gets that patient’s chart, they see that we don’t know, and they can make sure they do a pregnancy test before sending the patient for X-rays or administering potentially dangerous medication.

I’m not arguing that NULL is the only solution to this problem. The problem is universal, and nullable fields is just one of many possible solutions. It happens to be a solution that’s natively supported in most databases and platforms, and I personally think the semantics of “null = don’t know” are rather nice. You may disagree – but you still need something to indicate when data in your model is potentially inaccurate or missing.

Yes, null has no place in a perfect model…

If your model (whether it’s an OO domain model or a relational database model) is complete, perfect, accurate and consistent, then you’re laughing. You will never have null values because your model maps perfectly onto the problem domain you’re working in, and you know every detail of every  entity in that problem domain. You know every single customer’s date of birth; you have detailed records of the marketing preferences of every person in your database, and your model is so perfectly tuned to your business that there’s no sparse tables, no outer joins -

…but there’s no such thing as a perfect model

A model is an abstraction, and abstractions always leak. The real world isn’t domain-driven, or relational, or object-oriented – these paradigms are just ways of slicing and storing information about the real world that help us solve a problem or do a job. A lot of the time we’re making decisions based on the value of the information, but sometimes we need to make decisions based on whether that information is present or not.

A slightly less contrived real-life example might help. In one of the systems I work on, we have a nullable bit field in our main customer database for e-mail opt-in preferences. We interpret the true/false/null values as follows:

True This customer has agreed to received marketing information via e-mail
False This customer has chosen not to receive marketing information via e-mail
Null We have no record of this customer's preferences.

This is the second Google Image search result for "null". I have no idea what it is - something to do with magnetic topology - but I think it's quite beautiful. When we send out e-mail newsletters, we only include the customers whose field is actually true – that’s the point of opt-in marketing, right? Any customer can log in and change their preferences at any time, so the customers who have opted-out (i.e. their value is false), we leave them alone – they’ve said they don’t want to get hassled, so we don’t hassle them, and if they change their mind, they can log in and reactivate their newsletter any time.

So what about null? Well, when a customer logs in, they get a personalized welcome page. If we see that their opt-in field is null, we add a message to this page saying “Hey, we have this e-mail newsletter you can subscribe to - could you take a moment to let us know whether you’re interested?” The point is, as soon as they’ve expressed a preference one way or the other, we’ll stop showing them this message, so a customer should only see this message the first few times they’ve logged in. We end up with better data; they don’t get unwanted e-mail, and everyone’s happy.

The weird swirly picture there turned up in a Google Images search for “null” – I don’t know enough about magnetic topology to have the faintest idea what it is, but it’s quite beautiful, don’t you think? Image © Colin Beveridge

Specs, Bugs & Rock’n’Roll

Great Scott! I have probably seen Back to the Future over 100 times. It’s one of my favourite movies, and it’s also astonishing how a movie that I loved when I was seven years old has retained it’s appeal for over twenty years - it’s like it somehow evolved from a time-travel movie into a 50s/80s period time-travel movie without ever really looking dated in between. (That, or I’m just a huge nerd with an 80s obsession who never really grew up.)

Anyway, you know the scene where Marty’s playing at his parents’ high-school dance, and he’s about to leave, and the band ask if he’ll play one more – “c’mon, something that really cooks” – and Marty steps up to the mic, jokes about “this one’s an oldie… at least, it’s an oldie where I come from” – and then he plays a storming version of Johnny B. Goode, and invents rock’n’roll? Well, as a kid watching this movie over and over, that’s one of the things that never, ever made any sense to me. Time travel, fine – I mean, they have a nuclear DeLorean, right? – but the band jamming along, in perfect time, to a song they’d never, ever heard before? Inconceivable!

A couple of years later I took up the guitar, probably inspired in no small part by Marty McFly, and along the way I started picking up odd bits of music theory. I remember watching  Back to the Future one day around this time, and noticing something I’d never noticed before… just before Marty starts his whole Johnny B. Goode riff, he says to the band “Ok, guys, this is a blues riff in B; watch me for the changes, and try to keep up, OK?” Right up until that point, I’d been teaching myself guitar, on my own, by playing along to tapes and stuff. I’d never been in a band and never really taken any formal music classes. I had some music theory books, and I knew vaguely what a "blues riff" was, but I’d never stopped to think why that particular chord sequence had a special name. See, when you’re playing guitar by yourself in your bedroom, you don’t need to communicate with other musicians, so a lot of things – sheet music, sight reading, key changes and music theory – just seem like a complete waste of time. Of course, to seasoned pros like the musicians in Marvin Berry’s Starlighters, those two simple bits of information - “blues riff” and “in B” – tell them exactly what’s about to happen – at least, in enough detail that they can join in at the right point, hit the right notes, and work the rest out as they go along.

What’s this got to do with software? Well, blues riff refers to a 12-bar blues chord progression, which is a pattern - in both senses of the word. Trivially, it’s a repeating sequence of things; in this case musical chords. It’s also a pattern in the software/architecture sense - a recurring ‘solution’ that’s implemented in many different contexts. Whilst the implementation details may vary wildly from one implementation to the next, the underlying structure doesn’t. Whether it’s jazz, thrash, funk or fusion, you’ll find twelve-bar patterns cropping up all over the place in contemporary music, and

This is why design patterns are important in software development. Knowing them doesn’t necessarily make you a better programmer. Just as someone can play a wonderful tune without knowing what all the notes are called, it’s quite possible to implement design patterns without being aware that’s what you’re doing. I built some code years ago that used an ‘active record’ design.  At the time, I had never worked with design patterns. I had no idea that my “objects-based-on-tables” solution had a name, or that anyone else was doing the same thing. Without that shared vocabulary, though, collaborating on that code was painful. I had to explain that code line-by-line, explain what it does, how it works, how all the pieces fit together. It’s time-consuming, it’s error-prone, and it’s probably really, really boring for the person listening to the explanation.

Imagine Marty McFly turning to the band and saying “OK, guys, this song has this chord here (plays a B chord) for four bars, then two bars of a this one (plays an A chord) – got that? – and then two bars of that first chord again…” and so on for about ten minutes; and the band are working it out and making notes as he goes along, so they can remember what goes where, and by the time he’s done, the crowd’s got bored and lost interest, Lorraine’s gone home with Biff, and George is out in the parking lot crying quietly into a copy of Amazing Stories.

On the other hand, you can learn the patterns. Not because they’ll make you a better coder, but because they’ll transform your ability to communicate with other pattern-literate developers. You’ll learn them, and you’ll practice them at home, on your own little projects, and then when Lead Architect Martin McFowler turns round at the start of your next project and says “OK, this is a domain model with data access via a repository, we’ll manage references via an identity map and use concrete table inheritance to map the subclasses”, you can sit down, work out how to apply those patterns to your particular project, and bam! – you’ve just invented rock’n’roll.

http://www.seeing-stars.com/Locations/BTTF/Dance-JohnnyBGoode(smaller).JPG

OCTV: Coming Soon to a Screen Near You?

Video surveillance is in the news again; the police insisted that an Islington pub landlord install CCTV cameras as a condition of granting his license application – and the Information Commissioner subsequently pointed out that they’re not really allowed to do that, and now the usual suspects are weighing in on both sides of the same old privacy vs. security debate.  I personally have two contrasting experiences with video surveillance, that make me wonder whether we’re getting this whole thing completely backwards and missing a great opportunity here.

Photo © jodi.martorell via Flickr, used under Creative Commons license.Several years ago, a friend and I were in a pub in central London, and her phone vanished. One moment it was out on the table in front of us; a few minutes later, it was gone – nowhere to be found. We hadn’t left the table, but at some point either it got knocked onto the floor and someone picked it up, or some light-fingered individual just grabbed it whilst we were looking the other way. This happened in a well-lit bar, full of witnesses - and right underneath a CCTV camera. It didn’t do the slightest bit of good. Nobody saw anything – or if they did, they weren’t telling. The CCTV tapes would have shown clearly exactly what happened, but the bar staff point-blank refused to let us see the footage, citing the usual data protection excuses.1

As a counter-example, when I commuted from Southampton to Winchester years ago, the drive home took anywhere from 20 minutes to 2 hours, depending on traffic – so I used to check out the BBC’s traffic webcams before leaving. Even a blurry 320x240 picture that’s five minutes old was good enough to see whether there was a tailback on the M3 or not – sure, it meant once in a while I’d sit at work for an extra hour or two waiting for the traffic to clear, but that was way better than spending that time sat in the traffic jam itself.

Personally, I don’t really object to video surveillance in public places. You’re in a public place, your behaviour and actions can directly affect the people around you. If you’re doing something that you’d rather wasn’t captured on videotape, then I’d probably rather you weren’t doing it in the same pub as me. The problem is that in most cases the only people who can review the resulting footage are the CCTV owners and the police - and whether you question their motives or not, they clearly  have better things to do. The problem isn’t the TV bit of CCTV – it’s the CC. Closing that circuit creates an imbalance of power and feeds an “us and them” mentality that really doesn’t do anybody any favours.

mumsmileSo what if we turned the whole thing upside down? Instead of CCTV, let’s put up signs saying “This area is covered by open circuit television” – and websites where you, or anyone else, can see the footage? Suddenly, all that information is accessible by somebody who actually cares enough to look at it – i.e. you. Even if we’re just dealing with real-time feeds, you can pull up an OCTV feed and see if any of your friends in the pub yet, or whether it’s raining in Camden right now, or how big the queue is outside the cinema.

What about if we made the archived footage available? 24 hours worth would let you pull up the day’s coverage of the street outside your house – so you can prove that the phone company are lying to you about the engineer who they claim “waited outside for an hour”.  You can see who – or what - caused that scratch on the side of your car. A weeks’ worth of archives could show you how many people are out & about on the streets in that neighbourhood where you’re considering buying a house. A year’s worth will show you how many sunny days they’ve had, and how busy it gets when there’s a football game on up the road, and how often the council ‘forget’ to pick up the recycling.

The Metropolitan Police don’t have time to find CCTV evidence for every stolen car or domestic burglary that takes place… because there are too many crimes, too much footage, and not enough surveillance officers. So share it. A police officer can’t justify spending seven hours reviewing footage of a single car theft… but what if it was your car? Would you be prepared to put in those hours? Would your insurance company?

OK, it’s not quite that simple. It’ll make extramarital affairs, gambling problems and skiving off work a bit difficult… sorry about that. On a more serious note, it’ll bring things out into the open that some people would rather stayed hidden… your employer can see where you spend your evenings, your family can see where you really work. Technology’s never going to solve that kind of problem, though - unless by “solve it” you mean “drag it out into the open and mess up everybody’s lives until they learn how to deal with the truth”.

The technology’s not quite there yet, but I think it’s probably pretty close. Solar-powered webcams with built-in GPS and 16Gb of online storage, IPv6 wi-fi connectivity, capturing a HD image every few seconds and uploading the results to some big store in that cloud thingy that all the cool kids are talking about… not cheap (yet), but by no means impossible – and technology just keeps getting cheaper. Our capacity to record and store information is increasing exponentially, and I really don’t think that’s going to stop. When that information ends up locked in vaults and government servers, it’s basically useless – the cost of retrieval and analysis is prohibitive, and short of serious investigations or legal proceedings, that data never comes out again. The internet has demonstrated time and time again that if you share your stuff – images, stories, information – then people will find remarkable, innovative, practical and beautiful things to do with it – things you’d never have imagined. To a government, company or police force, picking valuable data out of literally millions of digital images is an expensive and daunting prospect… but on the web, it’s the kind of thing people do all day, every day, for fun. So what is everyone so afraid of?

1. I discovered recently – far too late to be of any use – that we could have made a formal request under the Data Protection Act, in writing, and paid the £10 administration charge, and received a copy of any footage that we appeared in. Oh well. I know for next time.

Banksy photograph © jodi.martorell via Flickr, used under Creative Commons license.

Tuesday, 10 February 2009

SQL Server 2005 Performance Dashboard

One of those quick "this helped me, it might help you" posts. I've been trying to run SQL Server 2005 Performance Dashboard on one of our live servers, and hit a couple of snags along the way. 

Having installed the dashboard reports and run the setup.sql script included with the download, I tried to view the report on one of our databases and got the following error:

Error: Index (zero based) must be greater than or equal to zero and less than the size of the argument list

Whoa. Bizarre - and rather unhelpful - error message. Clearly this Performance Dashboard thing hasn't been tested at all... and then it occurred to me that I was using SQL2008 client tools to talk to a SQL2005 server, but that the actual catalog (database) was still running in SQL2000 compatibility mode.

image

Trying exactly the same thing (right-click the database, Reports, performance_dashboard_main) using SQL Server 2005 Management Studio (instead of 2008 - same server, different version of the client tools) produces a far more helpful error message:

Error: Unable to display report because the database has a compatibility level of 80. To view this report, you need to use the Database Properties dialog to change the compatibility level to SQL Server 2005 (90).

So - over to the test server, bring up a test instance of the database, and change the compatibility level to 90. Database and apps still appear to be running fine, but trying to run the performance dashboard now produced the following error:

Error: Difference of two datetime columns caused overflow at runtime.

This time Google had the answer (thanks David) although it's not immediately clear what the solution is referring to. What you need to do is open the Setup.SQL script (which is installed along with the performance dashboard - you'll find it at C:\Program Files\Microsoft SQL Server\90\Tools\PerformanceDashboard\). Find line 276, which says:

sum(convert(bigint, datediff(ms, login_time, getdate()))) - sum(convert(bigint, s.total_elapsed_time)) as idle_connection_time, 

and replace this line with

sum(convert(bigint, CAST ( DATEDIFF ( minute, login_time, getdate()) AS BIGINT)*60000 + DATEDIFF ( millisecond, DATEADD ( minute, DATEDIFF ( minute, login_time, getdate() ), login_time ),getdate() ))) - sum(convert(bigint, s.total_elapsed_time)) as idle_connection_time,

As David explains on the linked thread, DATEDIFF is returning an int (here, it's calculating the number of milliseconds between two DATETIME instances) - so any connection that's been active for longer than ~24 days will overflow an INT when you try and convert the connection time to milliseconds.

Anyway, change that line, run setup.sql again, and it works - acres of lovely statistical goodness at my fingertips:

image 

Next step - make sure the SQL 2005 compatibility hasn't broken anything, then modify the compatibility level on the live server, and then I'll be able to get some real live performance stats based on actual web traffic, which should make for interesting reading.

Sunday, 8 February 2009

Videos from SkillsMatter Open Source .NET Exchange

Reviews and videos from the SkillsMatter Open Source .NET event last month, including my jQuery lecture, are now online at the SkillsMatter site.

(Note to self: Don’t say “um” so much next time.)

Saturday, 7 February 2009

Is a Crisp a Value Object?

I’m sat in a railway station in the French Alps, waiting for the TGV to Paris, with nothing much to do except eat crisps and watch Oggy and the Cockroaches on French TV. They say good writing draws inspiration from the world around it, so – because Oggy and the Cockroaches defies all rational discussion – let’s talk about crisps. Ed left a comment on my “lazy loading lunchbox” post that got me thinking about the semantics of value objects – what exactly is a value object, and what role would it play in the whole rucksack/lunchbox scenario.

Ruck sack = Aggregate Root. RuckSackRepository is the only mechanism for retrieving the object graph. The CrispBag is an Entity and of course has no repository. The Crisp Bag is just an array / collection of Crisps, which are in turn Value Objects.

Mmm. Cajun Squirrel.On careful reflection, I don't think a crisp is a value object – at least, not in the playground scenario I was dealing with - because I'd argue that every crisp has a distinct identity and lifecycle. Don't believe me? Here, have some crisps. No, it's OK, go on, have the whole bag... <crunch, crunch, crunch> Now that you're happily tucking into your crisps - guess what? Colin licked one of those crisps earlier when you weren't looking. Suddenly the question of "which crisp?" becomes extremely important. Is the contaminated trouser-crisp one of the handful left safely in the bag - or have you already eaten it? If we model crisps as value objects, we have no way of telling. We can tell how many crisps are left, sure, but we can’t distinguish between them, and we can’t model any operation that would affect the state of a single crisp whilst leaving  the other crisps unaffected.

OK, so if a crisp isn’t a value object, then what is? I think that’s a much harder question to address, because as with so many patterns, the answer depends on the scenario you’re modelling.

The value object pattern is typically used to model quantities, descriptions and measurements, where two objects can be considered equal if they have the same value (state). Two classic examples are dates and money. In most domain models, “January 2nd, 1978” will always refer to exactly the same thing – a particular 24-hour period sometime in the late 1970s – so you can model your date/time fields as value objects (remembering to include time zone information if necessary). If you and I are both born on 2nd January 1978, then it’s valid to assume we have the same birthday, because that date can only refer to one thing. (By way of comparison, if your father and my father are both called Dave Beattie, we’re not necessarily related, because there could be any number of people with that name in our domain model)

Money is interesting, because money is a value object by law in most economies - that's how it's treated in business and in court. In legal terms, this is known as fungibility. Point is, when we ask “does this credit equal that debt?”, we’re only interested in the amount. You can repay a £10 debt with any ten pounds as long as the numbers add up; you don't need to track the specific ten-pound-note that was originally borrowed. When you deposit £50 at the bank, you can't go back a week later and ask if that particular fifty pounds is still there - as soon as you deposit it, it becomes part of your account balance and ceases to exist as £50 in its own right.

Remember, though, that value objects are just one modelling pattern, and you need to be sure that you’ve chosen the right pattern for your particular scenario. If you're writing a system for the CIA to track marked dollar bills, you'll need to model those bills as entities so you can track individual bills (e.g. using their serial number). Point is, when you do this, you're not treating them as currency, you're treating them as artifacts. The same would apply to an antiques house dealing in rare coins - it's not the face value of the coins that matters, it's their age, rarity and historical significance that's important. If you’re writing a game where the player can travel through an infinite number of parallel universes, you might need to track an infinite number of different instances of “January 12th, 1978”.  

If it’s not clear how to model a particular element in your model, try asking “which one?” If the question makes sense within your own scenario:

Me: “Colin licked one of those crisps!”
You : “Which one?”

Me: “Hey, says here this house was once owned by Chris Columbus!”
You : “Which one?”

- then you’re probably dealing with entities. If the question “which one” is meaningless in the context of your domain:

Me: “I only paid £1 in income tax last year!”
You: “Oh yeah? Which one?”

Me: “I was born on January 12th, 1978”
You: “Really? Which one?”

- then you’re probably better off modelling the subject of the question as a value object.