Quality Control

Friday, 1 August 2008
by adrian
filed under About Us
Comments: 51

[Suggested listening while reading this post: Quality Control – Jurassic 5]

Prior to moving to London to join Last.fm I worked on credit card software for a leading international bank. When it comes to dealing with people’s money there isn’t much room for mistakes and buggy code can have major consequences. For these reasons there were a number of processes and systems in place to reduce the likelihood of software errors.

Despite what some of our more critical users may think, we do actually have a number of similar systems (and some novel additions) in place at Last.fm. We use software like Cacti, Ganglia, Nagios and JMX to monitor many aspects of our running infrastructure and the results are made available in a number of ways – from coloured graphs to arcanely-formatted log files. So much information is churned out that one could easily spend all day just looking at all the output until one’s mind buckled under the data overload. For this reason we selectively take the most vital data (things like database load, web request times, uptime status of core machines) and display these on eye-catching displays in our operations room.

Status display screens.

The setup shown above is great for being able to look up and get a quick feel for the current state of our systems. Blinking red and graphs with huge spikes are rarely a good thing. In addition to these displays we also have a number of alerts (e-mail, sms, irccat) that get triggered if things go wrong while we are away from the screens (yes, it does happen). There is nothing quite like the joy of being woken up in the early hours of the morning with a barrage of text messages containing the details of each and every machine that has unexpectedly crashed.

While all of this is very useful for keeping an eye on the code while it is running, it’s also good to be able to put the code through some checks and balances before we unleash it on the wider world. One means to this end is the venerable Hudson – a continuous integration engine that constantly builds our software, checks it for style and common coding errors, then instruments and tests it and reports on any violations that maybe have been introduced since the last time it ran. We have over 30 internal projects that use Hudson and a few thousand tests which run over the code. Hudson comes with a web interface and can be configured to send email when people “break the build” (e.g. by making a change that causes a test to fail). We decided that this wasn’t nearly humiliating enough and followed this suggestion (our setup pictured below) to introduce a more public form of punishment.

The bears that haunt our developer’s nightmares.

These 3 bears sit in a prominent position and watch our developer’s every move. When things are good we have a green bear gently glowing and purring, when changes are being processed a yellow bear joins the party, and if the build gets broken the growling evil red bear makes an appearance. The developer who broke things usually goes a similar shade of red while frantically trying to fix whatever was broken while the others chortle in the background.

Amid all this hi-tech digital trickery, it is sometimes nice to be able to cast one’s mind back to the simpler analogue age and the measuring devices of the past. For example, we hooked up an analogue meter like those used in many industries for decades, fed it some different input and ended up with a literal desktop dashboard that measures average website response time.

Web response time meter.

It is strangely mesmerising to see this meter rev up and down as website demand changes over the day (or we manage to overload our data centre’s power supply and a significant portion of our web farm gets to take an unexpected break from service).

On the whole we have a great variety of options for keeping our eyes on the quality prize, thanks in no small measure to the efforts of the open source software community who crafted all the software I have mentioned. Of course the biggest challenge to ensuring quality is still the human component – getting people to actually use these tools and instilling the desire and motivation to make software as bug-free as possible. If any of you out there use similar tools that you are passionate about let us know. I’d also love to hear if anyone has any other amusing or original systems to keep quality control fun and fresh. For me, I’ve got a glowing green bear to keep company….

Comments

  1. David
    1 August, 13:34

    What do you use to control that web response time meter?

    David – 1 August, 13:34
  2. Francis
    1 August, 13:38

    A lot of this stuff is really really awesome. The monitoring tools are a personal interest of mine, I like seeing them work and used in clever ways. Glad to see you can enjoy your jobs. Or at the very lease write a blog post saying that you enjoy your jobs ;-)

    Francis – 1 August, 13:38
  3. Patrik
    1 August, 13:49

    Oversized regression test gummy bears. Awesome!

    Patrik – 1 August, 13:49
  4. Thomas
    1 August, 14:03

    But i love the taste of the red bears most. Couldn’t be the yellow one the worst case? :-)

    Thomas – 1 August, 14:03
  5. wink
    1 August, 14:22

    I think you guys just crashed the Hudson wiki pages by linking to it in this post ;)

    wink – 1 August, 14:22
  6. Phil Kirkham
    1 August, 14:41

    That analogue meter is cool !!

    Phil Kirkham – 1 August, 14:41
  7. Kris
    1 August, 14:44

    Thanks for this – obviously your efforts are paying off as I can’t remember the last time I had issues on the site (with the exception of the server rack outage of several weeks ago, or whatever it was). Much much more stable than it was three years ago….

    Kris – 1 August, 14:44
  8. mr_maxis
    1 August, 14:46

    Nice stuff and funny indicators :)

    Are you also dealing with the Last.fm Status? Because it’s broken…

    mr_maxis – 1 August, 14:46
  9. wetelectric
    1 August, 15:27

    https://hudson.dev.java.net/
    Looks interesting. Although I worry about the time it takes to set up these things.

    wetelectric – 1 August, 15:27
  10. Alan71
    1 August, 15:29

    Does the angry gummy bear give the chocolate commit sentence?

    Alan71 – 1 August, 15:29
  11. Russ Smith
    1 August, 15:49

    Is that subversion commit status screen a custom app?

    Russ Smith – 1 August, 15:49
  12. anonymous
    1 August, 15:51

    so when’s the old last.fm coming back? it’s blatantly obvious that this whole entry is set up to gain sympathy or to try to be ‘zany’ and ‘cool’ and make people forget about what happened.

    if you really cared about last.fm you would listen to your users. without us, where would you be?

    anonymous – 1 August, 15:51
  13. Laurie Denness
    1 August, 15:58

    Russ Smith: The SVN commit stuff also displays trac ticket’s closing and is a custom piece of front-end loving by our wonderful Matthew Ogle

    If anyone else cares, the graphs screen is made up of some of my awful PHP which displays Cacti graphs in different sizes and time periods in a rotation, and the status page is an even worse piece of PHP that parses the Nagios status file, and makes the best use of the “text-decoration: blink” style tag ;)

    Laurie Denness – 1 August, 15:58
  14. Adrian
    1 August, 16:13

    wetelectric – out of all the Continuous Integration tools I have tried out, Hudson was by far the easiest to set up and configure. You just install tomcat, deploy hudson, add svn/cvs details for your project, enter path to your build file, done! Even just having your projects being compiled is a good start, u can always add the fancier stuff later. Also, there are considerable time savings to be had by preventing bugs instead of hunting them down afterwards. I hope you get the chance to try Hudson out.

    anonymous – I actually started on this blog post before the relaunch and have only now had time to put it live, this is not a big conspiracy I promise. I think the Feedback Forum is a more constructive place for further discussion on this issue.

    Adrian – 1 August, 16:13
  15. anonymous
    1 August, 16:26

    the feedback forum is NOT constructive because you simply don’t listen or respond to any of the suggestions about the real problems with the website. what is the actual point of last.fm now? another facebook/myspace clone? some highranking executive really made a bad decision forcing this change on everyone. i know it’s not your fault and you can’t do or change anything, but don’t you remember what happened to friendster? RIP last.fm :[

    anonymous – 1 August, 16:26
  16. Jason
    1 August, 16:34

    I love that SVN commit monitor, that’s really cool.

    Everything else is old hat, seen it all before :P.

    Jason – 1 August, 16:34
  17. hydo
    1 August, 17:10

    Damn, we need one of those svn commit monitors.

    hydo – 1 August, 17:10
  18. Matthew Ogle
    1 August, 17:12

    @anonymous: We read the forums every day, and we’ve been making changes and improvements nearly daily too. The point of Last.fm is exactly what it’s always been — music. Hopefully you can stick around and help us improve.

    @Jason, glad you like! I might try to spiff it up a bit and post some code if there’s interest, but it’s pretty simple.

    Matthew Ogle – 1 August, 17:12
  19. Andrew
    1 August, 17:37

    At Tellme, we use a red rotating police siren type light that goes off whenever our code tree breaks. It’s an excellent visual reminder to go along with the more detailed e-mail that goes out (which always includes a quick link to the “last checkins” viewer on our source control system).

    Our team is pretty well trained now and build breakages, which are pretty rare, are usually diagnosed and fixed quickly.

    Our system isn’t very sophisticated, we just use an X10 controller and a webserver running on an old box dedicated to this purpose.

    Andrew – 1 August, 17:37
  20. Daryl
    1 August, 18:33

    @anonymous, not every Last.fm user shares your opinions on the new design. Besides, this clearly isn’t an appropriate place to voice to raise this debate.

    @Adrian, I absolutely love this post. An insight into your testing and monitoring systems relates closely to what I do in my day-to-day programming job.

    I also love the gummy bears.

    Daryl – 1 August, 18:33
  21. Daniel
    1 August, 18:45

    Fix your interface.

    Daniel – 1 August, 18:45
  22. ThatGuy
    1 August, 19:13

    I like the suggested listening for reading the post. Have you thought about playing music when the bears are different colors?
    Green = anything Green Day, On the Greener Side by Michelle Shocked

    Yellow = Yellow by Coldplay, Yellow Submarine by The Beatles, Big Yellow Taxi by Joni Mitchell(not the Counting Crows version)

    Red = 99 Red Balloons – Nena, Red Scab – Adam Ant, Flaming Red – Echo and the Bunnymen, Rudolf The Red Nosed Reindeer – lots of artists, Red Red Sun – INXS

    Thoughts?

    ThatGuy – 1 August, 19:13
  23. julio_lima
    1 August, 20:56

    do you guys have ever heard about hp openview? =)

    julio_lima – 1 August, 20:56
  24. method219
    1 August, 21:08

    I’ve just had the most intriguing idea… a steampunk NOC… yes… yes, very interesting…

    method219 – 1 August, 21:08
  25. Kjartan Ólason
    1 August, 21:12

    Give us the source for the SVN commit screen and how to control the meters and the bears, And I promise you, that we’ll love last.fm even more!

    Thank you for a great blog.

    Kjartan Ólason – 1 August, 21:12
  26. Henk Poley
    1 August, 21:23

    I once read about people using a traffic light, and playing different versions of “Hallelujah”: like the Worms Holy Hand-grenade (red) and Rufus Wainwright – Broken Hallelujah (yellow).

    Henk Poley – 1 August, 21:23
  27. jeff
    2 August, 05:37

    so good.

    jeff – 2 August, 05:37
  28. eesn
    2 August, 07:04

    the meter!! :D

    eesn – 2 August, 07:04
  29. Dom
    2 August, 10:51

    This is all well and good, but none of it explains what possessed you to break the site in such a horrific fashion in the first place.

    All these monitoring systems obviously aren’t working when you introduce an update that breaks almost every single feature on the site, yet fail to react when users complain.

    Dom – 2 August, 10:51
  30. Rob Szarka
    2 August, 20:07

    Thanks for the peek under the hood, Adrian. I’m surprised to see so much Java being used, given that AFAIK it’s pretty much PHP driving the site itself. Maybe there are more mature Java-based tools for managing builds and monitoring performance?

    Somebody ought to write up a how-to on that response time meter for Make magazine!

    Rob Szarka – 2 August, 20:07
  31. PAStheLoD
    2 August, 22:48

    @julio_lima: a friend of mine is having regular nightmares over HP OpenView :P

    That analogue ping-o-meter is awesome. Yes, it’s so cool, one must actually have to type text like this, a simple “+1” just isn’t enough :)

    And about the issues/feedback, last.fm should poll the userbase about the interface. Plus leting the o’school folk have the old one, they want so badly, doesn’t seem like an über-impossible task :P

    PAStheLoD – 2 August, 22:48
  32. New
    3 August, 11:06

    Thanks for the great stuff.

    New – 3 August, 11:06
  33. Josh
    3 August, 12:49

    I don’t have much to contribute but I always enjoy these behind-the-scenes posts. Keep up the great work, guys!

    Josh – 3 August, 12:49
  34. anonymoose
    4 August, 10:02

    Just fix the fucking layout. We already know you have cool stuff on this site, we like it, but there is stil the unresolved situation of crappy, unfunctional design. I was honestly expecting the next blog post to be some kind of umtimate response to this problem, but no, you just ignored everyone. Way to go, staff.

    anonymoose – 4 August, 10:02
  35. Mike
    4 August, 16:22

    Can’t help but notice The Holloways stickers on the window!

    Mike – 4 August, 16:22
  36. Seth
    5 August, 02:50

    Im sorry, but the new look of last.fm is disgusting. I havnt visited in a while, but the design is the worst thing I have seen on the internet. What were you guys thinking. Hey lets give ourselves the feel of myspace? Bring back the old look, or improve it. This is a setback for last.fm, the internet and the music community. PLEASE FIX!

    Seth – 5 August, 02:50
  37. coregamer
    5 August, 16:17

    I don’t mind the new look. It’s pretty good.

    coregamer – 5 August, 16:17
  38. Are W
    5 August, 17:59

    I quite like the new last.fm design (had to mention that, given how vocal the dissatisfied lot is). And as a programmer, I really appreciated this insight into how you run your day-to-day development.

    We have a simpler version of your bears – a few LCD screens spread around, which go red when cruisecontrol breaks and play some rather unpleasant sample – sound of glass breaking or similar. The mugshot of the offending coder is shown onscreen. Not very pleasant when you’re a novice dev, I can tell you ;)

    When the build is fixed, the screen turns green and “I feel good” is played, courtesy of James Brown. And it does feel good :)

    Are W – 5 August, 17:59
  39. Daniel Browne
    6 August, 07:07

    I have to say I really like the new Last.fm look, even though I wasn’t really using it alot before the changeover.

    The analog meter and the bears are brilliant! If I worked on a team with more than 2 people I’d do something like that. :)

    Daniel Browne – 6 August, 07:07
  40. afallowhorizon
    8 August, 11:26

    Really, it’s the evil eyebrows that have been drawn on the red bear that makes the picture.

    afallowhorizon – 8 August, 11:26
  41. Alex
    12 August, 10:24

    I enjoy the new look. After using the beta, and flicking between the two, its pretty clear the superiority of the new design.

    Alex – 12 August, 10:24
  42. pepsifloat
    14 August, 18:44

    the new LAST.FM design is terrible.
    we all appreciate reaaaally your hard work and involvement, but pleeeease – listen to the users:
    http://www.last.fm/forum/21717/_/433174/41#f7087270
    nobody like it so please change it back.

    pepsifloat – 14 August, 18:44
  43. generaldusty
    15 August, 13:16

    I like the new site more than the old one. I think it’s more responsive and has a much cleaner design.

    generaldusty – 15 August, 13:16
  44. Peter
    15 August, 14:01

    Simple fix for new site: bring the shoutbox back to the top-right.

    Problem solved.

    [And why am I previewing this comment first?!?]

    Peter – 15 August, 14:01
  45. Joe
    17 August, 05:16

    A lot of people have been complaining about the new layout, and to be honest, I didn’t really care for the old layout. In fact, I like the new layout quite a lot!

    (Disclosure: I am a web designer/developer, but not for last.fm!)

    Joe – 17 August, 05:16
  46. Jack Dunn
    17 August, 12:52

    You people do realise that this isnt the place nor time to be discussing issues on the site design.

    Like the displays alot. I’m of to play with hudson hehe..

    Jack Dunn – 17 August, 12:52
  47. alice tragedy
    17 August, 13:11

    haha. love the bears.

    alice tragedy – 17 August, 13:11
  48. arcadiandream
    18 August, 14:23

    Coming from an age when RJ used to keep us updated about everything, this is really illuminating. Thanks for sharing!

    arcadiandream – 18 August, 14:23
  49. web design company
    19 August, 12:54

    I have a teddy bear named red bear. My sister gave him to me when I was born, she was 3. He still sleeps with me. He is not angry…

    web design company – 19 August, 12:54
  50. empsy
    19 August, 15:49

    i already got used to this new desing, however i still feel like it need a lot improvements

    empsy – 19 August, 15:49
  51. eLLen
    21 August, 00:03

    Those bears are SO cute!
    And very effective at the same time, it seems. X)

    eLLen – 21 August, 00:03

Comments are closed for this entry.