Tuesday, 26 October 2010

“If It Ain’t Broke…” – Why Good Code Doesn’t Guarantee Happy Users

For years, my sites have processed online payments using a firm called Protx. Protx was acquired by Sage a few years back, rebranded as SagePay, and since then they’ve been gradually making changes to their system. Nothing too drastic, and always with plenty of warning that changes were forthcoming. Their payment API is generally rock-solid, powerful, flexible, and has allowed us to cope with all sorts of complicated payment scenarios without too much fuss. So far, so good.

About two months ago, SagePay announced that a preview of their new administration system was running on their test servers, and they invited everyone to go and have a look. Their old system was kinda clunky, but it worked, and our accounts team had been using it quite happily for years. So I went. I looked. The new system had lots of shiny Ajax – OK, fair enough, this is 2010 after all – and horizontal scrolling. No, really. Check this out:

image

I have a 1600x1200 screen because information is nice and I like being able to see lots of it at once. SagePay have decided, for some reason, that our payment transactions are best presented in a fixed-size 800x300px window. Fine, though – that’s what technical previews are for; lots of people reported this and were assured it would be sorted out.

Well, the new system launched over the weekend. The old one’s gone, they new one’s live, and things like the horizontal scrolling have not been sorted out, and customers’ reactions to the new system is, well, a bit negative. A thread on their support forums is full of gems like:

“A complete crock. Should be rolled back immediately.” (here)

“this is a complete and utter disaster. This system needs rolling back as soon as possible.” (here)

“The system is rejecting the order because of the delivery address line 2 which contained 58-62 xxxxx Road. Your system is not allowing the - in the address (This address is the formatted address that is provided by Royal Mail). This needs to be fixed quickly.” (here)

OK, fine. Big company rolls out shiny upgrade, upsets customers, breaks APIs, whatever. Big companies do this a lot.

But - I was expecting something different here, because last week, Mat – SagePay’s “Chief Nerd” – published this blog post, in which he talks quite rationally and eloquently about the forthcoming improvements. And it sounds like quite good stuff, too. Stuff like:

“To accommodate this we’ve changed the way we develop software, not only at a language level, but also at the process level. We’ve embraced Test Driven Development, Agile methodologies, Continuous Integration and parallel automated testing.”

“By writing tests first and only then writing code that passes those tests, we know that our software does what it is supposed to.”

“This demonstrable code quality gives our developers much more confidence in their code, frees them to refactor software that behaves sub-optimally, and ensures the test team’s time isn’t wasted on trivial bugs.”

(quotes from http://sagepay.wordpress.com/2010/10/18/if-it-aint-broke/)

OK. Mat, I believe you. If your comments about improved capacity and security are correct, this actually represents a huge achievement for the SagePay technical team. But I have to wonder… if the chief nerd is doing everything right, why are the customers so upset?

…if the CHIEF NERD is doing EVERYTHING RIGHT, why are the CUSTOMERS so UPSET?

Well, this is what I think probably happened. First, they’ve been working on this since June 2009; the first real customer preview was in September 2010, and it’s now live, just over a month later. That’s not a series of incremental improvements, delivering discrete chunks of business value every few weeks or months. That’s the big rewrite wearing a not-very-convincing Agile disguise.

Second, one of the core tenets of the Agile Manifesto is “customer collaboration over contract negotiation” – and to get that right, you have to know who your customers really are, and collaborate with them. This is tricky, because to the developers, the ‘customer’ is probably the product owner – but in reality, the customer should be the person who’s going to use the product. Now a firm like SagePay probably can’t call their customers in off the street to collaborate on a big project like this – but they can talk to them.

I think it’s really easy – particularly in big companies – to get this wrong. I have made this mistake many times. You launch version 1.0, you spend a couple of years babysitting it and fielding the support calls, and you end up thinking you know exactly what all your customers want… except you don’t, because happy users never call support. They just use the product and get on with their lives. You can find out what your users don’t like from support calls and complaints, but to find out what they do like, you need to get out there and talk to them, and it sounds like many of the features in the new SagePay system were based on feature requests and complaints from a vocal minority who weren’t really representative of the user base as a whole. It feels like you’re doing exactly the right thing, but you’re not actually collaborating with your customer.

REAL USERS don't care about TEST DATA

Second – technical previews are all very well, but real users don’t care about test data. When I tried the new MySagePay tech preview last month, it had about a dozen transactions in it, from 2006, when we were testing an upgrade to our own payment system. That’s not a realistic test of the system, because there’s no incentive to actually do anything with it. The only way to get real feedback is to get real people to use the software to do real work – and you can’t do real work with test data.

In a nutshell, I think they missed the distinction between have we built the right thing? and have we built it right?

TDD, Continuous Integration, refactoring – these will all help you build it right, but agile’s also about making sure you’re building the right thing, and I think SagePay dropped the ball on this one.

Mat even says (my emphasis)

“…any change introduces risk. … We might produce new software that performs identically to the old for a given payment protocol only to find that two thousand customers are using a non-documented “feature” of the old system that we’ve now written out."

Well, that’s exactly what they did. On Friday, https://live.sagepay.com/txstatus/txstatus.asp was returning real-time status for previous payment transactions – and now it’s gone. Vanished. What used to take 500 milliseconds in an automated script now takes a real person 90 seconds or so – including all that lovely horizontal scrolling they have to do.

Still, all is not lost. If SagePay really do have a clean, new architecture, and full test coverage, and a decent agile process in place, then it should be straightforward to respond to this customer feedback. Respond to change, instead of following a plan. Some tweaks at the presentation layer, maybe a couple of new properties on various view models and controllers, a handful of new methods on the supporting services, and it shouldn’t take long to deliver a product that combines the scalability and security of the new system with a UI that does exactly what the customers need.

Mat, if you’re reading this, I’d be really interested to hear what you guys did today whilst your support team were running around putting out fires. I’d love to see what your product backlog looks like right now, and how you’re condensing the torrent of feedback into user stories and work items. There’s not enough sharing in this industry, and if you and your team can be as open about things today as you were this time last week, you’ll probably help a whole bunch of people – including me – next time we find ourselves babysitting a tricky launch.

No comments: