Tales of a data team intern

Friday, 26 June 2009
by fredrik
filed under About Us
Comments: 13

For the last six months, I have been a part of Last.fm’s data team while writing my master thesis together with Per – and lived to tell the tale.

Presently, I’m sitting in a quiet room where the heat makes more noise than do the people in it. See, music is a big thing here and the hackers on my team mostly prefer to code with their ears hugged by earphones. It wasn’t always this hot, though.

Back in January, Per and I landed in a chilly London where the people wore hats and greeted one another in charming ways. We were on a mission: write a data store that serves as back-end to an in-house visualization tool that renders graphs out of interesting numbers: scrobbles per king of pop, subscription signups broken down by country, that sort of thing.

For someone who digs distributed systems and Big Data, Last.fm is heaven. There is a sizable Hadoop cluster and many more terabytes of data than one can comfortably fathom. At the end of our tenure – and this is the best part – we were to release the code as open source. Sure enough, we had landed our dream gig.

The offices have got that typical Shoreditch media/tech post-startup vibe going: there are copies of The Economist and Wired in the foyer, a flipped-over skateboard lies next to the umbrella rack. The company occupies a whole floor which is split into two sides: the bizniz lot occupies one end (this is where the fax machine is) and the dev teams & ops (nerf guns at the ready) rule the other end. Working hours are generally between ten and seven, so coming in at eight o’clock sharp on your first day is definitely advised against. Not that anyone would.

As those who went before me have noted, Last.fm is a rather awesome place to work. There are some seriously brilliant heads here. Not only is the staff enthusiastic and contagiously dedicated, they also take a large pride in their work and that’s key to producing quality stuff. With passion it’s true.

Per and I were given an insane amount of freedom in implementing our data store. While we did get all the help we needed, both from our closest collaborators as well as anyone else around the office who we harassed with questions, all decisions regarding the project and its execution where ours to make. At the end of the day, we were ourselves fully responsible for our own fortunes and I think it is only from that kind of freedom and trust that truly brilliant things can come. And how did we fare then? Quite well, thank you. Zohmg is out there, and although it may not change the world just this year it might make the life of a data analyst or two a tad easier.

All in all it has been a killer internship experience: I got to present at HUGUK, became involved with HBase and met people who have instilled inspiration in me that will last a long time. I have realized the benefits of working alongside incredibly passionate fellows who are committed to perfect their trade. It will be hard to settle for anything less in the future.

Comments

  1. Luke
    26 June, 20:38

    Nice hours. Wish mine could be like that!

    ..RIP MJ..

    Luke – 26 June, 20:38
  2. Jocke
    27 June, 06:59

    Oh, you’re finished? Congratz. I’m looking forward to your presentation.

    Jocke – 27 June, 06:59
  3. Andreas
    27 June, 12:18

    What’s so innovative about Zohmg? Creating data cubes with dimensions, hierarchies and measurements is standard work in business intelligence, so nothing knew here for me to see. Did you just invent the wheel a second time?

    Andreas – 27 June, 12:18
  4. Erik Frey
    28 June, 00:24

    Andreas, Zohmg sits on a platform far more scalable (and cheaper) than the business intelligence stuff of yore, to handle a scale of data orders of magnitude greater. You might say they invented a way, waaaaay bigger wheel.

    Nice work, guys!

    Erik Frey – 28 June, 00:24
  5. Fredrik Möllerstrand
    30 June, 14:55

    Andreas: As Erik points out, our contribution is a distributed implementation of classic data cube stores. In particular, Zohmg is based on the Bigtable-like HBase which is a rather exciting technology in its own right.

    Fredrik Möllerstrand – 30 June, 14:55
  6. YuanBaby
    3 July, 05:21

    i love this post ,Tales of a data team intern

    YuanBaby – 3 July, 05:21
  7. tnelx
    12 July, 11:26

    last.fm should work with spotify to get their recommendation artist thing better.

    tnelx – 12 July, 11:26
  8. ismail yk
    14 July, 12:37

    Michael Jackson’s songs will play everytime.
    He can break to music lists or datas.

    ismail yk – 14 July, 12:37
  9. dofus kamas
    16 July, 09:51

    great post, impressive

    Thanks for your sharing!!

    dofus kamas – 16 July, 09:51
  10. Alastair
    17 July, 13:42

    It’s a shame nobody at lastfm has the technical savvy to get the scrobbling feature working properly for ipods, and an even bigger shame that the admins don’t pay a single scrap of attention to the thousands of users constantly reporting bugs and faults with scrobbling in the forums…

    Alastair – 17 July, 13:42
  11. ladylostandblue
    19 July, 18:09

    you really should write books you have a special style

    ladylostandblue – 19 July, 18:09
  12. Maik
    6 August, 13:32

    I agree with ladylostandblue. Good writing style. :)
    But one question: Why you chose Python?
    Would not something like C/C++ be much faster on such a big amount of data?

    Maik

    Maik – 6 August, 13:32
  13. seslichat
    6 August, 19:02

    Thank You Veri Mach

    seslichat – 6 August, 19:02

Comments are closed for this entry.