I think it will work well enough to just drop all leading numbers and dashes (and dots). That would still likely be a unique title for the artist, especially with the other stats like density and duration being tested for.
My current thinking is this:
--convert to lowercase
--convert " and " to "+";
--convert "&" to "+";
--remove all whitespace
--remove umlauts (for example, "ö" becomes "o");
--remove digits from the beginning until a non-digit character is reached
--convert "featuring" to "feat."
--convert "remix" to "mix"
--remove these: ".",",",">","<","'","´","`","(",")","[","]","-","album","version","the","original","amazonexclusive","soundtrack","explicit","disc","cd","single","bonus","track","edition","deluxe","main","musicalscore"
--remove all digits
--It saves the string between some of the steps so if one of them (like removing all digits) leaves too little to work with the result of an earlier step is used instead.
--These steps are applied to both the title and artist.
--removes the artist name from the title if it is in there.
--last, it combines the title and artist into a search string for the track
You can test it here:
http://air.audio-surf.com/as/airgame_findaddsongid.php?song=02 - Keyboard Milk&artist=Röyksopp
The modified title and artist are only used by the scoreboard server and players never see them.