Audio Fingerprinting for Clean Metadata

Wednesday, 29 August 2007
by rj
filed under Announcements
Comments: 275

[UPDATE]
Phew, so we’ve received >1 million fingerprints so far.. not bad for the first 24hrs. The most fingerprints submitted by a single user is 12,203. I’m sure that record won’t stand for long tho :)
The ‘server overloaded’ message should be silenced now. We are currently receiving ~42 fingerprints per second.
More news to follow tomorrow.
[/UPDATE]

The veteran Scrobblers amongst you will probably remember our “moderation system” – this was a user-voting system that let you propose and merge artists, ultimately fixing misspelled artists by creating aliases to the correct version.

We are planning to bring this back in a big way, addressing not only artists, but albums and tracks too.

We don’t want to have to vote on the really obvious stuff (“01 – Radiohead”), so we are going to do as much as possible automatically, with various algorithms and data mining tricks. The entries we can’t be 100% sure about, and the remaining stuff, will again be thrown open to a public vote.

Phase 1 is now underway with the first public “beta” release of our new fingerprinting technology. This will mature into a nice sexy (free) API that lets you grab clean metadata based on an audio fingerprint. For now, all that it does is send the fingerprint data to bootstrap the moderation system. This doesn’t change any MP3 files on your computer. It does send useful fingerprint data to our moderation system so we can get the ball rolling. If you have a big MP3 collection, it will take a while… Thankfully it remembers where it got to, so you don’t have to do it all in one session.

Grab the fingerprinting app and let it scan your MP3 collection:

Download for:
Windows
Mac OS X
Linux .deb
Source code

What we’ll do next is figure out all the popular (mis)spellings for tracks with the same fingerprints. We will publish lots of stats, example data and graphs showing our progress as the fingerprint database grows in the coming weeks. We need people with MP3 collections (of any size/quality) to download and run the fingerprinter to make this work, so spread the word.

Remember, you don’t need to clean up your ID3 tags before running the fingerprint app: This time round, people with imperfect tags are actually going to be of some use to us, and don’t deserve all the terrible things we normally wish on them ;)

Download the app, and watch this space for lots of stats and graphs detailing our findings in the coming days and weeks!

RJ