What's cooking in the Last.fm playlisting lab

Thursday, 27 September 2012
by Mark Levy
filed under About Us
Comments: 47

In the Music Information Retrieval team here at Last.fm we’re currently developing a new generation of smart playlisting engines, and we’d like take the chance to give you a sneak preview of what they can do, as well as explaining a bit more about playlisting services in general.

You can think of playlisting engines as falling into two categories: one repeatedly chooses which track to stream next when you’re listening to an internet radio station like any of Last.fm’s radio stations; the other selects a single set of tracks from a collection all in one go, like iTunes genius or Google Music’s instant mix. While in theory these do similar jobs, as every good scientist knows, the difference between theory and practice is greater in practice than it is in theory, and in practice the requirements for these two types of playlists can be very different. Our new generation service is designed to provide instant playlists from collections of any size, and you can try a demo right now, or read on to find out more.

Last.fm instant playlisting

We’ll talk a bit more about radio playlisting in a separate post, but one of the main characteristics required from the other type of engine is the ability to choose from music collections of wildly varying sizes. Our existing engines have mostly been targeted at very large commercial catalogues containing millions of recordings – you can see them at work in the Last.fm Spotify app (start playing any track, go to the Now Playing tab in the app and click “Similar Tracks Playlist”).

The new generation of engines is designed to continue to do a really good job when choosing tracks from small personal collections. In practice that means we can’t rely on any single type of information to tell us which other tracks might be a good match for any particular playlist. Luckily thanks to your scrobbles and tags, and a bit of audio analysis and machine learning magic on our side, we have three independent types of information linking artists and tracks. Another new feature is the ability to generate playlists based on mood and other musical properties. Finally when playlisting from personal collections we’ve been able to experiment with ways of choosing the sequence of tracks that aren’t restricted by licensing rules.

But we know we still have a huge amount to learn before any machine can approach the skill of a human DJ, so we’ve built a simple demo to let you try out the services. Please let us know how you think we’re doing and we’ll incorporate your feedback into our final version of the new engines. Thanks for listening!

Genre Timelines and More Distinctive Lyrics

Thursday, 6 September 2012
by Janni Kovacs
filed under About Us
Comments: 10

For the past five months I have had the honour of being the next data team intern at Last.fm, building software and trying to make sense of what people now call Big Data™. In particular during my time here I looked at biographical data for artists, i.e. the place and the year a band was formed. This data is generated by Last.fm’s users and attached to artists’ wiki pages (see the factbox on the right of the page). There’s a nice number of artists where this type of data is available, so I was wondering what kind of analyses we could do with it.

When did this genre take off?

One thing that I was looking for in the data was empirical evidence of when certain genres became popular. Since we have a massive amount of user tag data available we can easily correlate tags and years and measure “popularity” of a genre by counting the number of artists formed in a specific year. Even with this data being skewed a bit towards the more popular artists, you can definitely see spikes of popularity for certain genres where you’d expect them:

Click for a larger version

Props to our users getting punk and post-punk in the right order!

If you’re a fan of metal music maybe the following chart, showing the progression of metal subgenres from hard rock to death metal, will be of interest:

Click for a larger version

Distinctive lyrics for cities

Andrew did a fantastic job a while ago generating distinctive lyrics for certain genres. I was wondering if we could generate distinctive lyrics for cities as well. By taking about 75.000 song lyrics, matching them to artist’s location metadata from our wikis and applying a simple term frequency function to each word, we can generate a list of words that occur in some cities more often than in others. Please take these results with a grain of salt as they are skewed by several factors, especially towards the more popular artists:

Click to open full images in a new window.

Warning: they contain lyrics you may find offensive. Not safe for work.

London

Atlanta
 

Los Angeles

New York

Seattle

I really like that “sorry” is in London’s top 10

In internships you’ll often find that you’re given pointless work just to occupy yourself. This is not the case at Last.fm. You’ll be able to work on in-production code and be given plenty of time to do things on your own, whatever interests you. So even though the ball pit is no more (turns out they have to be cleaned once in a while), if you enjoy working on backend software and exploring immense data sets then this is the right place to do it.

Last.fc win the World Cup (almost)!

Thursday, 21 June 2012
by Michael Coffey
filed under About Us and Lunch Table
Comments: 16

Last weekend we decided to take a break from counting your scrobbles and spend our time playing a bit of football instead. It was a chance to swap our football table for a football pitch and take on a few other music related entities at the 6th annual Big Scary Monsters 5-a-side tournament. I was suffering from a hurty knee, but went along to inspire the team Coach Taylor style. “CLEAR EYES, FULL HEARTS!”

Top L>R: Ben Spittle, Michael Horan, Michael Coffey, Sven Over, Paul Blunden

Bottom L>R: Dan Sleath, Matt Clark, Dom Amodio, Nick Calafato

There were 24 teams competing, first in a group then a knockout stage. Last year we went out in the group stage, but felt we’d had a tough group and were eager to prove we could do better this time. However, not even me shouting “man on”, “down the line”, or “well played, that’s liquid” could stop us losing our first game to a fantastic Abeano 7-0. Not a great start, but we followed it with a 2-2 draw against Fanzine about Rocking and then two wins against The Xcerts and Hassle, both 2-1, mainly thanks to the amazing Dan “The Cat” Sleath in goal. Our final group game was against our old friends Drowned in Sound. We’d lost to them at last year’s tournament and were outplayed in a friendly we’d arranged in the meantime, but neither team could break the deadlock and the game ended goalless.

Coach Coffey was an inspiration to his team.

This all meant we were through to the knockout stage where in round 1 we were up against Punktastic, which we’d heard were “pretty handy”. Both teams were tired, but another two late goals from Last.fm meant we were through to play Tall Ships in the quarter-finals. It was another tough game, but a clean sheet and a last minute goal amazingly put us through to the semi-finals. Something none of us really believed was possible at the start of the day.

Last.fc warming up on Wembley’s doorstep.

We were on a high, dreaming of an open top bus tour of the “Shoreditch Triangle” upon our return, but next up were last year’s champions, Old Blue Last. It was clear straight away that these guys had played football before and not even our star goalkeeper could stop the onslaught. The dream was over, but we felt the 4-0 scoreline was respectable against a team of such quality.

Finally, Old Blue Last beat Abeano 2-0 in a rather exciting and closely fought final. We took solace in the fact these were the only teams we’d lost against, claimed third place having beaten the other semi-final loser, and went home looking forward to next year after a great time was had by all.

An update on Last.fm Password Security

Friday, 8 June 2012
by Matthew Hawn
filed under About Us and Announcements
Comments: 63

Hello from Last.fm HQ,

Earlier this week, Last.fm received an email that let us know a text file containing cryptographic strings for passwords (known as “hashes”) that might be connected to Last.fm had been posted to a password cracking forum. We immediately checked the file against our user database, and while this review continues, we felt it was important enough to act on.

We immediately implemented a number of key security changes around user data and we chose to be cautious and alert Last.fm users. We recommend that users change their password on Last.fm and on any other sites that use a similar password. All the updated passwords since yesterday afternoon have been secured with a more rigorous method for user data storage.

To reach as many users as quickly as possible, we are sending these alerts via social media, direct email and on the Last.fm site itself.

We take the security of our users very seriously, and going forward we want you to know we’re redoubling our efforts to protect our users’ data.

Thanks for your support,

The Last.fm Team

The Adventures of a Data Team Intern

Monday, 12 December 2011
by Ashley Diamond
filed under About Us
Comments: 0

In the beginning was the word, and the word was Last.fm. Then I joined for three months…

In February 2011 I joined Last.fm’s data team as an intern. Since I’m studying part-time through the Open University I was able to start a flexible internship here. The data team work on many of the back end systems that Last.fm relies on, such as the software for the streaming servers, and the services that manage the songs in the music catalogue.

One of Last.fm’s many streaming servers.

I spent my first weeks writing unit tests, trying to increase the line coverage on a few small projects to over 70%. It was great to see real Java code in action as opposed to looking at examples in exercise books – this was a chance to read lots of it. Finding ways to test it meant that I really had to understand it too. Thanks to team lead Adrian’s enthusiasm for testing, I’m left with a real appreciation of unit testing and how it allows you to confidently change code and know that it still does what it was originally designed and built to do.

With the help of the other members on the team I have been able to improve my coding style, naming conventions and object-interaction design as well as briefly dabbling with other things such as Hive, JDBC, Spring and concurrency; but not at the same time (snigger!)

Great Expectations

Last.fm operates a distributed storage structure named Hadoop which consists of over 60 nodes, each with terabytes of data storage and gigabytes of memory. It is constantly being upgraded and new storage space added – this is necessary since it stores over 300GB of extra data every single day. I was invited to help with a hard-disk upgrade on 10 of the nodes, fulfilling a lifelong dream of mine to visit a data centre. I took a trip to London’s luxurious docklands with 80TB of 2TB hard-disks in the back of a taxi.

Hadoop cluster with 10 nodes offline, hard-disks unclipped and waiting to be removed.

I had many expectations of what it would be like at the data centre, and was eagerly anticipating the millions of pounds worth of equipment I might see. However, the reality of the environment in the data centre was upsetting to my nervous system. For several hours I endured a droning in my ears akin to a jet engine, whilst simultaneously being frozen when walking down one aisle (the cold aisle where the servers take in cold air that is pumped though the floor), and then being boiled whilst walking down the next (where the servers kick out all the hot air). I now know why I was advised to wear “hot pants and a warm jacket”. A lot of walking back a forth between aisles to self regulate my own body temperature was required!

The Hadoop cluster’s old hard-disks awaiting removal from their caddies… and hundreds of screws that keep them in place! These hard-disks were recycled by adding extra capacity to the streaming servers.

After this shock to the senses I was left with a noticeable impression of how much time and effort was put into fail-over equipment and practices to ensure that Last.fm’s systems stay running come rain or shine or diesel fumes. All hardware is planned and installed with redundancy, and file-systems are designed so that important data is replicated. For example, the streaming servers’ tracks are replicated at least three times and most systems are mirrored in all three of Last.fm’s data centres.

What you can expect to get from an internship at Last.fm

- An environment where people care about doing things right; the first time around.

- A place where redundancy matters and where people worry about things going wrong; even when they are going right.

- An agile development process with sprint planning.

- Guidance on naming conventions, coding style and object-interaction.

- A purely open-source development environment – the only experience with Microsoft I had was the logo on my keyboard.

- A relaxed atmosphere where people are free to wear and say whatever they want where meetings spontaneously burst out anywhere and at any time around you.

- Opportunities to dip your toes into new technologies such as Hadoop.

- A chance to be surrounded by decades of Java experience, where people can advise you on how to do things right – and kindly point you to an API page when you try and get them to do it for you :).

- The ability to work with software development tools such as: Subversion, Maven, JUnit and Cobertura.

- An insight into building and developing code that is robust, dependable and
only released when it has gone through multiple stages of thorough testing.

One small step for a member of the data team, one giant leap for an intern.

During my time at Last.fm I was fortunate enough to observe a real programming team in its day to day activities and also had a chance to improve my own programming skills by solving real problems.

Internships can be a good way to increase job prospects and I’m grateful to be able to put the name ‘Last.fm’ on my CV. If I had instead worked at ‘QualTekXYZmobile’ then perhaps potential employers won’t have heard of it – or worse think I’ve invented it. Who knows… maybe your interviewer was listening to Last.fm before you walked into your interview!

Library and streaming services outage

Monday, 18 July 2011
by Colin M. Strickland
filed under Announcements and About Us
Comments: 291

Since 04:00 GMT on Sunday morning, the primary Radb service has been exhibiting intermittent problems meeting acceptable service levels. This means that libraries, scrobble counts and the services associated with them (stats, radio stations etc) appear to be broken when you use Last.fm.

We’re working very hard to fix this as soon as possible, and we’ve had engineers on it throughout yesterday and last night, but we wanted to keep you posted here with what’s happened and what we know so far.


UPDATE (19/07/11 11:29):
We’ve re-enabled access to your library and chart service after analysing this morning’s traffic in a little more depth.

As we continue working on the fixing the underlying faults it may be that we have to switch them off again for short periods of time. In the meantime keep checking the status page for information about the service.

Thanks again for your patience.


All of our services here at Last fm are predicated around keeping track of your accumulated music plays – your scrobbles – and using statistical methods on the historical data to build awesome things.

Because scrobbles power everything dynamic and wonderful on the site, we need fast realtime access to the scrobbles and associated summary data ( library, charts, neighbourhoods, etc. ). And with milions of people scrobbling, and the size of the historical scrobbles set, doing this fast in realtime, while updating the same sets with new plays as fast as can be managed is a significant challenge. We have a custom, in memory database service that we designed and implemented to address these needs. It is called Radb. It is usually pretty awesome at answering these sorts of questions in a predictably constant time.

But since 4am on Sunday Radb has been failing. The reasons for this are unclear. Failing over to the redundant service providers helps a little, but not enough. The reasons for this are also unclear. We have engineers working flat out to diagnose this problem.

To remove as much stress as we can from the database service layer, we have additionally disabled most of our radio, library and recommendation services, temporarily. We have also stopped accepting new scrobbles for the meantime, for a similar reason. Scrobbling clients will detect this state and will cache your scrobbles until the submission service is reactivated, so no need to panic about scrobbles being lost.

Keep an eye on the status page, and the appropriate forum posts. We’ll update with more information and estimates as we have them.

The Power of Sound

Wednesday, 29 June 2011
by Chrissie Hines
filed under Announcements and About Us
Comments: 2

A live music experience is such a powerful one, with each person taking moments from a gig that they will enjoy and remember for a long time to come. Some of us go back to see a favourite band or artist time and time again and somehow there is something special and different about each performance.

In the commercial team, which is where I am based, we really enjoy the challenge of pulling some of that spirit into work for relevant and interesting brands. A perfect opportunity to do this arrived with HP who are looking to explore the Power of Sound to help promote their new range of Envy laptops and the sound quality from the incorporated Beats Technology.

So, we’re blending the power of the Hype Charts and our expertise in the live arena to pull together some really special events over the new few months, and we want you to join us.

First up we’ve been on the road chatting to artists at our summer festival shows, asking them about The Power of Sound. The interviews from Liverpool Sound City can be seen here and it features artists as varied as Frank Turner, Akala and Willy Nelson. Each has their own take on the concept, and it’s great to hear each artist talk about it in their own unique and diverse way. I can’t choose a favourite from this batch but I am still amazed that the Dutch Uncles reference Bon Iver, J Dilla, Frank Zappa and Biggie Smalls all in one video!

The interviews from Get Loaded will be ready in the next few days and we’ll be at Sonisphere and SW4 amongst others for more. Keep up to date on new interviews in the radio player (UK only for these I’m afraid!) on our website and on HPUK’s Facebook page.

Next up we’ve got three pop up acoustic sets, all set up with the help of Black Cab Sessions. Our first was with Slow Club in Soho Square, and they were great. They powered on through the rain to sing a couple of songs, including a new one from forthcoming album Paradise. We’ve got pictures up on our Flickr page, and you can find footage from the set and an exclusive interview here.

All this is working towards a main gig at the end of summer… but we’re keeping details about that one secret for now. What I can say is that a team of passionate people are working on getting a great line-up as I type, and we are all set to make it a fantastic event.

As a little bonus, all of the artists that are taking part in the project will help curate an HP Power of Sound Custom Radio station, which will be ready to launch in a few days. If you want to give some input into the content of the station again please head to HP’s FB page.

And last but not least we are delighted to let you all know that we are releasing a Last.fm app for the HP Touchpad which we are sure will get a whole lot more people scrobbling!

Huge thanks to everyone involved in pulling these off. Make sure you keep an eye on our Twitter page for info about the live sets throughout the next few months, and for footage from the sessions!

Wishing you a great summer.

Last.fc Take To The Field

Tuesday, 21 June 2011
by Nick Calafato
filed under About Us
Comments: 8

The fine people at Big Scary Monsters kept up with summer tradition by lovingly putting together the 5th annual BSM 5-a-side Football Tournament – a fun-filled day featuring various music and tech folk attempting to prove their muscle on the football pitch.

The tournament provided the perfect opportunity to show what a very green Last.fc can do away from the comfortable confines of our computers and onto the wide open playing field.

As the torrential rain cleared and the sun peered through, spirits prior to our opening game against the burly Disc Manufacturing Services were high:

It was a rocky start as we conceded a few goals early on – despite Lumberjack‘s best attempts at the long range screamers he’d probably been studying on YouTube prior to kick-off:

The game ended Last.fc 4-7 Disc Manufacturing Services. It was a blow – but we were not to be put down too soon as we brushed aside Rosa Valle in our second group game 5-1 with goals from the quick feet of Pbad, the power of eartle, the flamboyance of Daniel1986 and the beard of yours truly. And of course a world class penalty save from the safe hands of Lumberjack – documented on video here.

Our last opponents in the group stage were our friends in Drowned In Sound. All we needed was a draw to qualify for the Last 16, but with our energy levels hitting red we perished to two strong goals and, with no reply, the game ended Last.fc 0-2 Drowned In Sound.

An early exit didn’t stop us enjoying a few post-match bevvies and an all important Last.fc team shot. Rest assured – we will be back next year fitter and stronger than ever!

Top L>R: Lumberjack, eartle, good_bone, darkspark88, y0b1tch

Bottom L>R: nedflanders1979, Daniel1986, Omar711, Pbad

(Congratulations to the Old Blue Last who won the tournament beating Disc Manufacturing Services 3-0 in the final)

If it doesn't Scrabble, it doesnt count.

Tuesday, 14 June 2011
by Matthew Hawn
filed under Stuff Other People Made and About Us
Comments: 11

I’m in the lab all day, I Scrabble all night
I got a Bedazzler so my outfit’s tight
When it comes to panache I can’t be beat
I got the most style from below 14th street”

- Beastie Boys, “Shazam!”

About a month ago a press release appeared on the HarperCollins website noting that some new terms “from the digital world” were going to be added to the official Scrabble dictionary. Those words included: wiki, fansite, webzine, darknet, and best of all… scrobble.

Not being a habitual reader of lexicographical press releases, I missed this at Last.fm HQ until Dan in our sales team mentioned it in a note he was sending out to some people we work with. This little factoid got me more excited than when Matt, our data griot, decided to livetweet his first listen to the recent Lady Gaga album. You see, it is my firm belief that Scrabble and the music tech world have a lot in common. Hear me out.

Scrabble, like a lot of music, has an annoyingly complicated copyright history: Hasbro owns the rights to the game in the US and Canada, Mattel owns the rights in the rest of the world, and Electronic Arts own the rights to digital versions. It’s amazing you don’t need to be a lawyer to play the game.

Which explains what happened in 2008 when a pair of brothers created an online version of Scrabble (Scrabulous) that worked really well on Facebook. More than half a million new fans of the game were born, but the first response from Scrabble’s corporate masters was to shut it down and sue the pants off the two brothers who made it, rather than figure out how to work with them to bring the game to more people. Which is pretty much how the music industry has worked for the last decade.

But it’s not just lawsuits that Scrabble and music have in common:

- Stephen Malkmus and the guys from Pavement were well known for their Scrabble games on tour. Courtney Love was famous for wanting to play against Stephen and he was equally famous for beating her in tile-to-tile combat.

- Elvis Costello likes to call himself the “rock and roll Scrabble champion.” And this Etsy shop will sell you a Scrabble tile pendant with Declan MacManus’s face on it so you can show your allegiance.

- The Beastie Boys are Scrabble junkies, and Ad-Rock even goes looking for competition on the road, dropping in at local Scrabble club events.

- There are dozens of songs with Scrabble in their titles on Last.fm, including an excellent one about a Scrabble date gone bad from Milky Wimpshake.

The only bad thing about ‘scrobble’ making it into Scrabble is that it’s pretty tough to pull off in a real game; it’s eight letters and you’ll need the only two Bs in the bag. But it’s worth it. ‘Scrobble’ has three 3-point letters in it and it’s worth 14 points on its own… much more if you can hit a multiple-word-score or squeeze it in to an existing cluster.

Our motto is the same in Scrabble as it is for music: Make every play count.


Scrabble cat, a sometimes visitor to Last.fm HQ

Berlin Buzzwords 2011

Monday, 13 June 2011
by Andrew Clegg
filed under About Us
Comments: 5

Last week, Gilda Maurice (from our data team), Steve Whilton (from our product team) and myself went to Berlin for a couple of days. While Steve hurtled round Berlin from meeting to meeting — rather him than me — Gilda and I headed over to the Berlin Buzzwords conference at the Urania conference centre, just south of the famous Tiergarten.

It’s an annual meetup for engineers, scientists and other assorted hackers in the field of ‘big data’. The problems of processing and analysing the amount of data generated on the social web have required a whole new set of approaches, and we’re very keen on keeping up with new developments in this area, especially if they can help us make Last.fm better.

Two of the main open-source data tools we rely on at Last.HQ are Hadoop, a framework for parallel storage and querying of data on a cluster of servers, and Solr, a search engine based on the Lucene toolkit. Solr drives the search functions on the web site, and Hadoop does much of the behind-the-scenes number crunching, such as generating the weekly charts and calculating artist similarities. Lucene and Hadoop have both been very influential in this field, so it was fitting that the conference opened with a keynote from Doug Cutting who originated both projects.

In fact, Doug Cutting’s intro set the tone pretty well — probably half the talks were on Lucene or Hadoop, or other technologies that build on them. We learnt how to tune Solr performance and measure its relevance, how to improve its accuracy with a dash of linguistics, and how to visualize the topics within a given set of search results. Facebook and StumbleUpon presented their experiences of HBase, a Hadoop-backed database for storing and querying massive quantities of user data and content in real time, and JTeam took us through Mahout, a machine-learning toolkit for clustering and classification tasks, also based on Hadoop. A few of the talks went further into computer science theory, but always with a view to producing high-volume applications ready for web-scale data.

It’s hard to pick favourites out of such a dense line-up, but we particularly liked Joseph Turian‘s talk on new data-mining techniques (semantic hashing, graph parallelism and unsupervised semantic parsing), and Stanislaw Osinski‘s session on clustering and visualizing Solr search results with Carrot2, accompanied by a beautiful demo. Mark Miller and Rod Cope gave some sound advice on scaling Solr and HBase, and Chris Wensel took us through designing algorithms to manipulate and extracting data from Hadoop.

Sadly there was no way we could catch all the talks we wanted to see, with three rooms running in parallel each day, but thankfully all the talks were filmed — the slides are available here (apart from a few which are yet to appear), and the organisers will be making all the videos available soon.


Steve and Andy on a Berlin rooftop. Photo by Gilda.