Unfortunately your assessment of using MD5 isn't quite accurate. The MD5 for a particular song (say, mp3) can vary even if it's ripped from the same CD using the same software. A file with a different bitrate, sampling rate, done on a different encoder, same encoder different computer, etc etc will all have different MD5 checksums. All it takes is 1 extra byte of information to get totally different MD5s. MD5 is only useful for comparing files that cannot possibly vary.