The Adventures of a Data Team Intern

Monday, 12 December 2011
by Ashley Diamond
filed under About Us
Comments: 0

In the beginning was the word, and the word was Last.fm. Then I joined for three months…

In February 2011 I joined Last.fm’s data team as an intern. Since I’m studying part-time through the Open University I was able to start a flexible internship here. The data team work on many of the back end systems that Last.fm relies on, such as the software for the streaming servers, and the services that manage the songs in the music catalogue.

One of Last.fm’s many streaming servers.

I spent my first weeks writing unit tests, trying to increase the line coverage on a few small projects to over 70%. It was great to see real Java code in action as opposed to looking at examples in exercise books – this was a chance to read lots of it. Finding ways to test it meant that I really had to understand it too. Thanks to team lead Adrian’s enthusiasm for testing, I’m left with a real appreciation of unit testing and how it allows you to confidently change code and know that it still does what it was originally designed and built to do.

With the help of the other members on the team I have been able to improve my coding style, naming conventions and object-interaction design as well as briefly dabbling with other things such as Hive, JDBC, Spring and concurrency; but not at the same time (snigger!)

Great Expectations

Last.fm operates a distributed storage structure named Hadoop which consists of over 60 nodes, each with terabytes of data storage and gigabytes of memory. It is constantly being upgraded and new storage space added – this is necessary since it stores over 300GB of extra data every single day. I was invited to help with a hard-disk upgrade on 10 of the nodes, fulfilling a lifelong dream of mine to visit a data centre. I took a trip to London’s luxurious docklands with 80TB of 2TB hard-disks in the back of a taxi.

Hadoop cluster with 10 nodes offline, hard-disks unclipped and waiting to be removed.

I had many expectations of what it would be like at the data centre, and was eagerly anticipating the millions of pounds worth of equipment I might see. However, the reality of the environment in the data centre was upsetting to my nervous system. For several hours I endured a droning in my ears akin to a jet engine, whilst simultaneously being frozen when walking down one aisle (the cold aisle where the servers take in cold air that is pumped though the floor), and then being boiled whilst walking down the next (where the servers kick out all the hot air). I now know why I was advised to wear “hot pants and a warm jacket”. A lot of walking back a forth between aisles to self regulate my own body temperature was required!

The Hadoop cluster’s old hard-disks awaiting removal from their caddies… and hundreds of screws that keep them in place! These hard-disks were recycled by adding extra capacity to the streaming servers.

After this shock to the senses I was left with a noticeable impression of how much time and effort was put into fail-over equipment and practices to ensure that Last.fm’s systems stay running come rain or shine or diesel fumes. All hardware is planned and installed with redundancy, and file-systems are designed so that important data is replicated. For example, the streaming servers’ tracks are replicated at least three times and most systems are mirrored in all three of Last.fm’s data centres.

What you can expect to get from an internship at Last.fm

- An environment where people care about doing things right; the first time around.

- A place where redundancy matters and where people worry about things going wrong; even when they are going right.

- An agile development process with sprint planning.

- Guidance on naming conventions, coding style and object-interaction.

- A purely open-source development environment – the only experience with Microsoft I had was the logo on my keyboard.

- A relaxed atmosphere where people are free to wear and say whatever they want where meetings spontaneously burst out anywhere and at any time around you.

- Opportunities to dip your toes into new technologies such as Hadoop.

- A chance to be surrounded by decades of Java experience, where people can advise you on how to do things right – and kindly point you to an API page when you try and get them to do it for you :).

- The ability to work with software development tools such as: Subversion, Maven, JUnit and Cobertura.

- An insight into building and developing code that is robust, dependable and
only released when it has gone through multiple stages of thorough testing.

One small step for a member of the data team, one giant leap for an intern.

During my time at Last.fm I was fortunate enough to observe a real programming team in its day to day activities and also had a chance to improve my own programming skills by solving real problems.

Internships can be a good way to increase job prospects and I’m grateful to be able to put the name ‘Last.fm’ on my CV. If I had instead worked at ‘QualTekXYZmobile’ then perhaps potential employers won’t have heard of it – or worse think I’ve invented it. Who knows… maybe your interviewer was listening to Last.fm before you walked into your interview!

Library and streaming services outage

Monday, 18 July 2011
by Colin M. Strickland
filed under Announcements and About Us
Comments: 291

Since 04:00 GMT on Sunday morning, the primary Radb service has been exhibiting intermittent problems meeting acceptable service levels. This means that libraries, scrobble counts and the services associated with them (stats, radio stations etc) appear to be broken when you use Last.fm.

We’re working very hard to fix this as soon as possible, and we’ve had engineers on it throughout yesterday and last night, but we wanted to keep you posted here with what’s happened and what we know so far.


UPDATE (19/07/11 11:29):
We’ve re-enabled access to your library and chart service after analysing this morning’s traffic in a little more depth.

As we continue working on the fixing the underlying faults it may be that we have to switch them off again for short periods of time. In the meantime keep checking the status page for information about the service.

Thanks again for your patience.


All of our services here at Last fm are predicated around keeping track of your accumulated music plays – your scrobbles – and using statistical methods on the historical data to build awesome things.

Because scrobbles power everything dynamic and wonderful on the site, we need fast realtime access to the scrobbles and associated summary data ( library, charts, neighbourhoods, etc. ). And with milions of people scrobbling, and the size of the historical scrobbles set, doing this fast in realtime, while updating the same sets with new plays as fast as can be managed is a significant challenge. We have a custom, in memory database service that we designed and implemented to address these needs. It is called Radb. It is usually pretty awesome at answering these sorts of questions in a predictably constant time.

But since 4am on Sunday Radb has been failing. The reasons for this are unclear. Failing over to the redundant service providers helps a little, but not enough. The reasons for this are also unclear. We have engineers working flat out to diagnose this problem.

To remove as much stress as we can from the database service layer, we have additionally disabled most of our radio, library and recommendation services, temporarily. We have also stopped accepting new scrobbles for the meantime, for a similar reason. Scrobbling clients will detect this state and will cache your scrobbles until the submission service is reactivated, so no need to panic about scrobbles being lost.

Keep an eye on the status page, and the appropriate forum posts. We’ll update with more information and estimates as we have them.

The Power of Sound

Wednesday, 29 June 2011
by Chrissie Hines
filed under Announcements and About Us
Comments: 2

A live music experience is such a powerful one, with each person taking moments from a gig that they will enjoy and remember for a long time to come. Some of us go back to see a favourite band or artist time and time again and somehow there is something special and different about each performance.

In the commercial team, which is where I am based, we really enjoy the challenge of pulling some of that spirit into work for relevant and interesting brands. A perfect opportunity to do this arrived with HP who are looking to explore the Power of Sound to help promote their new range of Envy laptops and the sound quality from the incorporated Beats Technology.

So, we’re blending the power of the Hype Charts and our expertise in the live arena to pull together some really special events over the new few months, and we want you to join us.

First up we’ve been on the road chatting to artists at our summer festival shows, asking them about The Power of Sound. The interviews from Liverpool Sound City can be seen here and it features artists as varied as Frank Turner, Akala and Willy Nelson. Each has their own take on the concept, and it’s great to hear each artist talk about it in their own unique and diverse way. I can’t choose a favourite from this batch but I am still amazed that the Dutch Uncles reference Bon Iver, J Dilla, Frank Zappa and Biggie Smalls all in one video!

The interviews from Get Loaded will be ready in the next few days and we’ll be at Sonisphere and SW4 amongst others for more. Keep up to date on new interviews in the radio player (UK only for these I’m afraid!) on our website and on HPUK’s Facebook page.

Next up we’ve got three pop up acoustic sets, all set up with the help of Black Cab Sessions. Our first was with Slow Club in Soho Square, and they were great. They powered on through the rain to sing a couple of songs, including a new one from forthcoming album Paradise. We’ve got pictures up on our Flickr page, and you can find footage from the set and an exclusive interview here.

All this is working towards a main gig at the end of summer… but we’re keeping details about that one secret for now. What I can say is that a team of passionate people are working on getting a great line-up as I type, and we are all set to make it a fantastic event.

As a little bonus, all of the artists that are taking part in the project will help curate an HP Power of Sound Custom Radio station, which will be ready to launch in a few days. If you want to give some input into the content of the station again please head to HP’s FB page.

And last but not least we are delighted to let you all know that we are releasing a Last.fm app for the HP Touchpad which we are sure will get a whole lot more people scrobbling!

Huge thanks to everyone involved in pulling these off. Make sure you keep an eye on our Twitter page for info about the live sets throughout the next few months, and for footage from the sessions!

Wishing you a great summer.

Last.fc Take To The Field

Tuesday, 21 June 2011
by Nick Calafato
filed under About Us
Comments: 8

The fine people at Big Scary Monsters kept up with summer tradition by lovingly putting together the 5th annual BSM 5-a-side Football Tournament – a fun-filled day featuring various music and tech folk attempting to prove their muscle on the football pitch.

The tournament provided the perfect opportunity to show what a very green Last.fc can do away from the comfortable confines of our computers and onto the wide open playing field.

As the torrential rain cleared and the sun peered through, spirits prior to our opening game against the burly Disc Manufacturing Services were high:

It was a rocky start as we conceded a few goals early on – despite Lumberjack‘s best attempts at the long range screamers he’d probably been studying on YouTube prior to kick-off:

The game ended Last.fc 4-7 Disc Manufacturing Services. It was a blow – but we were not to be put down too soon as we brushed aside Rosa Valle in our second group game 5-1 with goals from the quick feet of Pbad, the power of eartle, the flamboyance of Daniel1986 and the beard of yours truly. And of course a world class penalty save from the safe hands of Lumberjack – documented on video here.

Our last opponents in the group stage were our friends in Drowned In Sound. All we needed was a draw to qualify for the Last 16, but with our energy levels hitting red we perished to two strong goals and, with no reply, the game ended Last.fc 0-2 Drowned In Sound.

An early exit didn’t stop us enjoying a few post-match bevvies and an all important Last.fc team shot. Rest assured – we will be back next year fitter and stronger than ever!

Top L>R: Lumberjack, eartle, good_bone, darkspark88, y0b1tch

Bottom L>R: nedflanders1979, Daniel1986, Omar711, Pbad

(Congratulations to the Old Blue Last who won the tournament beating Disc Manufacturing Services 3-0 in the final)

If it doesn't Scrabble, it doesnt count.

Tuesday, 14 June 2011
by Matthew Hawn
filed under Stuff Other People Made and About Us
Comments: 11

I’m in the lab all day, I Scrabble all night
I got a Bedazzler so my outfit’s tight
When it comes to panache I can’t be beat
I got the most style from below 14th street”

- Beastie Boys, “Shazam!”

About a month ago a press release appeared on the HarperCollins website noting that some new terms “from the digital world” were going to be added to the official Scrabble dictionary. Those words included: wiki, fansite, webzine, darknet, and best of all… scrobble.

Not being a habitual reader of lexicographical press releases, I missed this at Last.fm HQ until Dan in our sales team mentioned it in a note he was sending out to some people we work with. This little factoid got me more excited than when Matt, our data griot, decided to livetweet his first listen to the recent Lady Gaga album. You see, it is my firm belief that Scrabble and the music tech world have a lot in common. Hear me out.

Scrabble, like a lot of music, has an annoyingly complicated copyright history: Hasbro owns the rights to the game in the US and Canada, Mattel owns the rights in the rest of the world, and Electronic Arts own the rights to digital versions. It’s amazing you don’t need to be a lawyer to play the game.

Which explains what happened in 2008 when a pair of brothers created an online version of Scrabble (Scrabulous) that worked really well on Facebook. More than half a million new fans of the game were born, but the first response from Scrabble’s corporate masters was to shut it down and sue the pants off the two brothers who made it, rather than figure out how to work with them to bring the game to more people. Which is pretty much how the music industry has worked for the last decade.

But it’s not just lawsuits that Scrabble and music have in common:

- Stephen Malkmus and the guys from Pavement were well known for their Scrabble games on tour. Courtney Love was famous for wanting to play against Stephen and he was equally famous for beating her in tile-to-tile combat.

- Elvis Costello likes to call himself the “rock and roll Scrabble champion.” And this Etsy shop will sell you a Scrabble tile pendant with Declan MacManus’s face on it so you can show your allegiance.

- The Beastie Boys are Scrabble junkies, and Ad-Rock even goes looking for competition on the road, dropping in at local Scrabble club events.

- There are dozens of songs with Scrabble in their titles on Last.fm, including an excellent one about a Scrabble date gone bad from Milky Wimpshake.

The only bad thing about ‘scrobble’ making it into Scrabble is that it’s pretty tough to pull off in a real game; it’s eight letters and you’ll need the only two Bs in the bag. But it’s worth it. ‘Scrobble’ has three 3-point letters in it and it’s worth 14 points on its own… much more if you can hit a multiple-word-score or squeeze it in to an existing cluster.

Our motto is the same in Scrabble as it is for music: Make every play count.


Scrabble cat, a sometimes visitor to Last.fm HQ

Berlin Buzzwords 2011

Monday, 13 June 2011
by Andrew Clegg
filed under About Us
Comments: 5

Last week, Gilda Maurice (from our data team), Steve Whilton (from our product team) and myself went to Berlin for a couple of days. While Steve hurtled round Berlin from meeting to meeting — rather him than me — Gilda and I headed over to the Berlin Buzzwords conference at the Urania conference centre, just south of the famous Tiergarten.

It’s an annual meetup for engineers, scientists and other assorted hackers in the field of ‘big data’. The problems of processing and analysing the amount of data generated on the social web have required a whole new set of approaches, and we’re very keen on keeping up with new developments in this area, especially if they can help us make Last.fm better.

Two of the main open-source data tools we rely on at Last.HQ are Hadoop, a framework for parallel storage and querying of data on a cluster of servers, and Solr, a search engine based on the Lucene toolkit. Solr drives the search functions on the web site, and Hadoop does much of the behind-the-scenes number crunching, such as generating the weekly charts and calculating artist similarities. Lucene and Hadoop have both been very influential in this field, so it was fitting that the conference opened with a keynote from Doug Cutting who originated both projects.

In fact, Doug Cutting’s intro set the tone pretty well — probably half the talks were on Lucene or Hadoop, or other technologies that build on them. We learnt how to tune Solr performance and measure its relevance, how to improve its accuracy with a dash of linguistics, and how to visualize the topics within a given set of search results. Facebook and StumbleUpon presented their experiences of HBase, a Hadoop-backed database for storing and querying massive quantities of user data and content in real time, and JTeam took us through Mahout, a machine-learning toolkit for clustering and classification tasks, also based on Hadoop. A few of the talks went further into computer science theory, but always with a view to producing high-volume applications ready for web-scale data.

It’s hard to pick favourites out of such a dense line-up, but we particularly liked Joseph Turian‘s talk on new data-mining techniques (semantic hashing, graph parallelism and unsupervised semantic parsing), and Stanislaw Osinski‘s session on clustering and visualizing Solr search results with Carrot2, accompanied by a beautiful demo. Mark Miller and Rod Cope gave some sound advice on scaling Solr and HBase, and Chris Wensel took us through designing algorithms to manipulate and extracting data from Hadoop.

Sadly there was no way we could catch all the talks we wanted to see, with three rooms running in parallel each day, but thankfully all the talks were filmed — the slides are available here (apart from a few which are yet to appear), and the organisers will be making all the videos available soon.


Steve and Andy on a Berlin rooftop. Photo by Gilda.

Last.fm starts the summer early

Wednesday, 4 May 2011
by Helen Taylor
filed under About Us and Announcements
Comments: 3

Since 1995, Camden Crawl has established itself as the May Day Bank Holiday weekend’s hottest ticket, and even though it had competition from the Royal Wedding this year it was a great way of kicking our live series Last.fm Presents into gear for the summer season.

The baroque setting of Koko was the venue for our own stage, treating Camden to a cocktail of seven UK acts rising the Hype Charts. First up was Dinosaur Pile-Up, a band touched by the hand of grunge, whose track “My Rock ‘n’ Roll” proved a mission statement for the night. Lethal Bizzle is a star of Last.fm’s grime tag, and he embraced the spirit of Camden Crawl with a stage dive ahead of indie rockers Mazes, who’ve featured heavily in our Hype Chart over the past month.

British Sea Power transformed the Last.fm Presents stage into an ode to nature: foliage sprung up as footage of sea birds played. Epic favourites such as “Waving Flags” gained the biggest reception, before the sets took a turn for the electronic with the last two acts.

Simian Mobile Disco proved that knob-twiddling needn’t be a static affair, and the crowd agreed – four levels of Koko got down to “Audacity of Huge” and “Hustler”– while Hudson Mohawke excelled as last performer of the evening, his mix of r ‘n’ b vocals and groundshaking bass making for a warped electro trip into the night.

If you’re feeling like you missed out on a brilliant night, well, you sort of did. No fear though, Last.fm Presents have a packed festival season ahead this summer.

If you are heading down to (deep breath) The Great Escape, ATP, Liverpool Sound City, Get Loaded, Sonisphere, Rock Werchter, Truck, Field Day, Underage, Summer Sundae or SW4 then keep an eye out for our LFM lobbyists who’ll be ready to shower you with Last.fm goodies, including our tag stickers.

It’s going to be a great summer!

Live in Austin

Thursday, 10 March 2011
by Stefan Baumschlager
filed under Announcements and About Us
Comments: 11

Every spring the music industry descends upon the capital city of Texas to celebrate music in its many facets, genres & tags even. You see where this is going don’t you?

After a two year hiatus we’re bringing back the live SXSW tagging bonanza so that you can go nuts across town with those little red tag stickers. Here’s something to refresh your memory:

It’s simple really; whenever you see someone with sticker sheets in hand ask them to give you a couple so you can share them with your friends and start tagging the real world SXSW.

The fun doesn’t stop there of course! We encourage you to take pictures of your guerrilla tagging, upload your pics to flickr and tag them with ‘tagsxsw2011’ and ‘lastfm:event=1732494’ (that’s right we’re talking about flickr tags now – keep up!).

We’ve also updated the Band Aid group page so that you can easily find the bands you’d be crazy to miss this year! Enter your Last.fm username and you’re on your way.

If you want to could browse the full line up as well as your recommended line up just head to the SXSW 2011 Festival Page. Remember; the bands with the little burning flame icons next to them are the – yes – hot ones, who are destined for big big things in 2011 and beyond.

Finally we’ve got a little mission for you: SXSW has always tons of official showcases & shows, but equally there are a plethora of unofficial shows in someone’s backyard. If you happen to see that we’re missing bands you know are performing in some way shape or form at this year’s SXSW, please take 2 minutes to add them to the line up.

Thank you, and see you in Austin!

PS: if you want to get in touch while I’m out there, please do; follow @baumschlager on Twitter.

Launching Xbox, Part 2 - SSD Streaming

Monday, 14 December 2009
by Mike Brodbelt
filed under About Us and Tips and Tricks
Comments: 18

This is the second in a series of posts from the Last.fm engineering team covering the geekier aspects of our recent Last.fm on Xbox LIVE launch. Part one (“The War Room”) is here.

The music streaming architecture at Last.fm is an area of our infrastructure that has been scaling steadily for some time. The final stage of delivering streams to users fetches the raw mp3 data from a MogileFS distributed file system before passing it through our audio streaming software, which handles the actual audio serving. There are two main considerations with this streaming system: physical disk capacity, and raw IO throughput. The number of random IO operations a storage system can support has a big effect on how many users we can serve from it, so this number (IOPS) is a metric we’re very interested in. The disk capacity of the cluster has effectively ceased being a problem with the capacities available from newer SATA drives, so our biggest concern is having enough IO performance across the cluster to serve all our concurrent users. To put some numbers on this, a single 7200rpm SATA drive can produce enough IOPS to serve around 300 concurrent connections.

We’ve been using MogileFS for years at Last.fm, and it’s served us very well. As our content catalogue has grown, so has our userbase. As we’ve added storage to our streaming cluster, we’ve also been adding IO capacity in step with that, since each disk added into the streaming cluster brings with it more IOPS. From the early days, when our streaming machines were relatively small, we’ve moved up to systems built around the Supermicro SC846 chassis. These provide cost effective high-density storage, packing 24 3.5” SATA drives into 4U, and are ideal for growing our MogileFS pool.

Changing our approach

The arrival of Xbox users on the Last.fm scene pushed us to do some re-thinking on our approach to streaming. For the first time, we needed a way to scale up the IO capacity of our MogileFS cluster independently of the storage capacity. Xbox wasn’t going to bring us any more content, but was going to land a lot of new streaming users on our servers. So, enter SSDs…

Testing our first SSD based systems

We’d been looking at SSDs with interest for some time, as IO bottlenecks are common in any infrastructure dealing with large data volumes. We hadn’t deployed them in any live capacity before though, and this was an ideal opportunity to see whether the reality lived up to the marketing! Having looked at a number of SSD specs and read about many of the problems early adopters had encountered, we felt as though we were in a position to make an informed decision. So, earlier this year, we managed to get hold of some test kit to try out. Our test rig was an 8 core system with 2 X5570 CPUs and 12 Gb RAM (a SunFire X4170).

Into this, we put 2 hard disks for the OS, and 4 Intel X25-E SSDs.

We favoured the Intel SSDs because they’ve had fantastic reviews, and they were officially supported in the X4170. The X25-E drives advertise in excess of 35,000 read IOPS, so we were excited to see what it could do, and in testing, we weren’t disappointed. Each single SSD can support around 7000 concurrent listeners, and the serving capacity of the machine topped out at around 30,000 concurrent connections in it’s tested configuration – here it is half way through a test run (wider image here):

Spot which devices are the SSDs… (wider image here)

At that point its network was saturated, which was causing buffering and connection issues, so with 10GigE cards it might have been possible to push this configuration even higher. We tested both the 32Gb versions (which Sun have explicitly qualified with the X4170), and the 64Gb versions (which they haven’t). We ended up opting for the 64Gb versions, as we needed to be able to get enough content onto the SSDs for us to serve a good number of user requests, otherwise all that IO wasn’t going to do us any good. To get these performance figures, we had to tune the Linux scheduler defaults a bit:-

echo noop > /sys/block/sda/queue/scheduler
echo 32 > /sys/block/sda/queue/read_ahead_kb

This is set for each SSD – by default Linux uses scheduler algorithms that are optimised for hard drives, where each seek carries a penalty, so it’s worth reading extra data in while the drive head is in position. There’s basically zero seek penalty on an SSD, so those assumptions fall down.

Going into production

Once we were happy with our test results, we needed to put the new setup into production. Doing this involved some interesting changes to our systems. We extended MogileFS to understand the concept of “hot” nodes – storage nodes that are treated preferentially when servicing requests for files. We also implemented a “hot class” – when a file is put into this class, MogileFS will replicate it onto our SSD based nodes. This allows us to continually move our most popular content onto SSDs, effectively using them as a faster layer built on top of our main disk based storage pool.

We also needed to change the way MogileFS treats disk load. By default, it looks at the percentage utilisation figure from iostat, and tries to send requests to the most lightly-loaded disk with the requested content. This is another assumption that breaks down when you use SSDs, as they do not suffer from the same performance degradation under load that a hard drive does; a 95% utilised SSD can still respond many times faster than a 10% utilised hard drive. So, we extended the statistics that MogileFS retrieves from iostat to also include the wait time (await) and the service time (svctm) figures, so that we have better information about device performance.

Once those changes had been made, we were ready to go live. We used the same hardware as our final test configuration (SunFire X4170 with Intel X25-E SSDs), and we are now serving over 50% of our streaming from these machines, which have less than 10% of our total storage capacity. The graph below shows when we initially put these machines live.

You can see the SSD machines starting to take up load on the right of the graph – this was with a relatively small amount of initial seed content, so the offload from the main cluster was much smaller than we’ve since seen after filling the SSDs with even more popular tracks.

Conclusions

We all had great fun with this project, and built a new layer into our streaming infrastructure that will make it easy to scale upwards. We’ll be feeding our MogileFS patches back to the community, so that other MogileFS users can make use of them where appropriate and improve them further. Finally, thanks go to all the people who put effort into making this possible – all of crew at Last.HQ, particularly Jonty for all his work on extending MogileFS, and Laurie and Adrian for lots of work testing the streaming setup. Also thanks to Andy Williams and Nick Morgan at Sun Microsystems for getting us an evaluation system and answering lots of questions, and to Gareth Tucker and David Byrne at Intel for their help in getting us the SSDs in time.

Launching Xbox, Part 1 - The War Room

Monday, 7 December 2009
by Laurie Denness
filed under About Us and Tips and Tricks
Comments: 16

As many of you noticed, a few weeks ago we launched Last.fm on Xbox LIVE in the US and UK. It probably goes without saying that this project was a big operation for us, taking up a large part of the team’s time over the last few months. Now that the dust has settled, we thought we’d write a short series of blog posts about how we prepared for the launch and some of the tech changes we made to ensure that it all went smoothly.

0 Hour: Monitoring.

First up, let me introduce myself. My name is Laurie and I’ve been a Sysadmin here at Last.fm for almost two and a half years now. As well as doing the usual sysadmin tasks (turning things off and on again) I also look after our monitoring systems, including a healthy helping of Cacti, a truck of Nagios and a bucket-load of Ganglia. Some say I see mountains in graphs. Others say my graphs are infact whales. But however you look at it, I’m a strong believer in “if it moves, graph it”.

To help with our day-to-day monitoring we use four overhead screens in our operations room, with a frontend for Cacti (CactiView) and Nagios (Naglite2) that I put together. This works great for our small room, but we wanted something altogether more impressive — and more importantly, useful — for the Xbox launch.

At Last.HQ we’re big fans of impressive launches. Not a week goes by without us watching some kind of launch, be it the Large Hadron Collider, or one of the numerous NASA space launches.

We put a plan into action late on Monday evening (the night before launch), and it quickly turned into a “How many monitors can you fit into a room” game. In the end though, being able to see as many metrics as possible became useful.

So, ladies and gentlemen…

Welcome to the war room

Every spare 24” monitor in the office, two projectors, a few PCs and an awesome projector clock for a true “war room” style display (and to indicate food time).

Put it together and this is what you get:


Coupled with a quickly thrown together Last.fm style Nasa logo (courtesy our favourite designer), we were done. And this is where we spent 22 hours on the day of the launch, staring at the graphs, maps, alerts, twitter feeds.. you name it, we had it.

It was pretty exciting to sit and watch the graphs climb higher and higher, and watch the twists and turns as entire areas of the world woke up, went to work, came back from work (or school) and went to sleep. We had conference calls with Microsoft to make sure everything was running smoothly and share the latest exciting stats. (Half a million new users signed up to Last.fm through their Xbox consoles in the first 24 hours!)

As well as the more conventional style graphs, we also had some fun putting together some live numbers to keep up to speed on things in a more real time fashion. This was a simple combination of a shell script full of wizardry to get the raw number, then piped through the unix tools “figlet” (which makes “bubble art” from standard text) and “cowsay” (produces an ASCII version of a cow with a speech bubble saying whatever you please).

Looking after Last.fm on a daily basis is a fun task with plenty of interesting challenges. But when you’ve spent weeks of 12-hour days and working all weekend, it really pays to sit back in a room with all your co-workers (and good friends!) and watch people enjoy it. Your feedback has been overwhelming, and where would we have been without Twitter to tell us what you thought in real time?

Coming Next Time

We had to make several architectural changes to our systems to support this launch, from improved caching layers to modifying the layout of our entire network. Watch this space soon for the story of how SSDs saved Xbox…