The Adventures of a Data Team Intern

Monday, 12 December 2011
by ashley
filed under About Us
Comments: 0

In the beginning was the word, and the word was Then I joined for three months…

In February 2011 I joined’s data team as an intern. Since I’m studying part-time through the Open University I was able to start a flexible internship here. The data team work on many of the back end systems that relies on, such as the software for the streaming servers, and the services that manage the songs in the music catalogue.

One of’s many streaming servers.

I spent my first weeks writing unit tests, trying to increase the line coverage on a few small projects to over 70%. It was great to see real Java code in action as opposed to looking at examples in exercise books – this was a chance to read lots of it. Finding ways to test it meant that I really had to understand it too. Thanks to team lead Adrian’s enthusiasm for testing, I’m left with a real appreciation of unit testing and how it allows you to confidently change code and know that it still does what it was originally designed and built to do.

With the help of the other members on the team I have been able to improve my coding style, naming conventions and object-interaction design as well as briefly dabbling with other things such as Hive, JDBC, Spring and concurrency; but not at the same time (snigger!)

Great Expectations operates a distributed storage structure named Hadoop which consists of over 60 nodes, each with terabytes of data storage and gigabytes of memory. It is constantly being upgraded and new storage space added – this is necessary since it stores over 300GB of extra data every single day. I was invited to help with a hard-disk upgrade on 10 of the nodes, fulfilling a lifelong dream of mine to visit a data centre. I took a trip to London’s luxurious docklands with 80TB of 2TB hard-disks in the back of a taxi.

Hadoop cluster with 10 nodes offline, hard-disks unclipped and waiting to be removed.

I had many expectations of what it would be like at the data centre, and was eagerly anticipating the millions of pounds worth of equipment I might see. However, the reality of the environment in the data centre was upsetting to my nervous system. For several hours I endured a droning in my ears akin to a jet engine, whilst simultaneously being frozen when walking down one aisle (the cold aisle where the servers take in cold air that is pumped though the floor), and then being boiled whilst walking down the next (where the servers kick out all the hot air). I now know why I was advised to wear “hot pants and a warm jacket”. A lot of walking back a forth between aisles to self regulate my own body temperature was required!

The Hadoop cluster’s old hard-disks awaiting removal from their caddies… and hundreds of screws that keep them in place! These hard-disks were recycled by adding extra capacity to the streaming servers.

After this shock to the senses I was left with a noticeable impression of how much time and effort was put into fail-over equipment and practices to ensure that’s systems stay running come rain or shine or diesel fumes. All hardware is planned and installed with redundancy, and file-systems are designed so that important data is replicated. For example, the streaming servers’ tracks are replicated at least three times and most systems are mirrored in all three of’s data centres.

What you can expect to get from an internship at

- An environment where people care about doing things right; the first time around.

- A place where redundancy matters and where people worry about things going wrong; even when they are going right.

- An agile development process with sprint planning.

- Guidance on naming conventions, coding style and object-interaction.

- A purely open-source development environment – the only experience with Microsoft I had was the logo on my keyboard.

- A relaxed atmosphere where people are free to wear and say whatever they want where meetings spontaneously burst out anywhere and at any time around you.

- Opportunities to dip your toes into new technologies such as Hadoop.

- A chance to be surrounded by decades of Java experience, where people can advise you on how to do things right – and kindly point you to an API page when you try and get them to do it for you :).

- The ability to work with software development tools such as: Subversion, Maven, JUnit and Cobertura.

- An insight into building and developing code that is robust, dependable and
only released when it has gone through multiple stages of thorough testing.

One small step for a member of the data team, one giant leap for an intern.

During my time at I was fortunate enough to observe a real programming team in its day to day activities and also had a chance to improve my own programming skills by solving real problems.

Internships can be a good way to increase job prospects and I’m grateful to be able to put the name ‘’ on my CV. If I had instead worked at ‘QualTekXYZmobile’ then perhaps potential employers won’t have heard of it – or worse think I’ve invented it. Who knows… maybe your interviewer was listening to before you walked into your interview!


Comments are closed for this entry.