Author Topic: More robust song search  (Read 1096 times)

murlough23

  • Sr. Member
  • ****
  • Posts: 403
    • View Profile
    • Email
More robust song search
« on: February 18, 2010, 04:58:34 pm »
Some ideas for the "Search Songs" feature on the audio-surf site:

1) Search for substrings instead of exact string matches. Sometimes I want to check before playing a song to see how others have it tagged, but unless I can guess the exact syntax, I can't find it in the search. (Is it "Jason Mraz -- Lucky"? "Jason Mraz featuring Colbie Caillat - Lucky"? Or "Jason Mraz -- Lucky feat. Colbie Caillat?")

2) Failing that (since I know LIKE operators can be much more expensive database operations than simple string matches), a wildcard search (e.g. "Lucky*") might be nice, as a way to indicate that I want to do a LIKE search without the system assuming that every single search is a LIKE. (Caveat: This could mess things up when a song title actually has an asterisk in it.)

3) Ability to search on artist AND song title combined. Say I'm looking for the song "Shine" by Vienna Teng. A million other artists have songs called "Shine". Vienna Teng has many other songs that have been played enough to bump "Shine" off of the search results list. It's impossible to find the scores for this exact song, short of actually playing it and then viewing the results.

4) In conjunction with #3, return more search results. Especially if I'm looking for variants of ways that a particular song has been tagged, it would be nice to see literally every song by an artist that is in the database. Those variants are likely to be low popularity-wise, so I don't want the list to cut off after the top N results. (I realize that a list which goes on forever wouldn't be optimal, though, so maybe just make N a higher number like 100 instead of 20 or whatever it is currently.)

5) Be more forgiving about punctuation. Apostrophes are already being automatically removed (though searching for anything with an apostrophe in it returns no results - replace 'em out in the search terms if you're gonna replace 'em in the stored data), and ampersand is seen as equivalent to "and". Alphanumeric characters matching are probably a good enough indicator that song titles match. This would also help to clear up some discrepancies such as "REM" vs. "R.E.M." (Caveat: What happens when a song title is all punctuation? I know some smart-aleck band probably named a song "@" or something. Perhaps in that case, an exact string match would be necessary to bring up that result, since otherwise you'd be searching on a blank song title. Ugh. I hate special cases.)

blue_h3x

  • Hero Member
  • *****
  • Posts: 4577
    • View Profile
    • AS Tournament
Re: More robust song search
« Reply #1 on: February 19, 2010, 02:29:03 am »
A little of of string manipulation would solve most of these. If the input string is split by spaces and then each subsrting used as a search term, ie
"some band"
would be split into ["some","band"] and then use a combination of queries
Code: [Select]
WHERE SONG = "some" AND SONG = "band"
Code: [Select]
WHERE SONG = "some" OR SONG = "band"
Code: [Select]
WHERE SONG LIKE "%some%" AND SONG LIKE "%band%"
Code: [Select]
WHERE SONG LIKE "%some%" OR SONG LIKE "%band%"
This would stick more direct matches first and then more abstract matches.
The same can be done for artist
Austria is just like Yorkshire, but they have bigger hills.... oh and they have real snow too

Jagori

  • Jr. Member
  • **
  • Posts: 99
    • View Profile
    • Email
Re: More robust song search
« Reply #2 on: February 19, 2010, 10:45:23 am »
I'd like to see this too.  I've been unable to find songs because they're not in the first 20 or whatever results when searching by band, and I couldn't guess what combination of spaces and punctuation were used in the title or the like.

Razaeria

  • Hero Member
  • *****
  • Posts: 511
    • View Profile
    • Email
Re: More robust song search
« Reply #3 on: February 19, 2010, 12:15:55 pm »
Very supportive of these features. It's a little bit too nitpicky as of the moment.

Aquinox

  • Newbie
  • *
  • Posts: 30
    • View Profile
    • Email
Re: More robust song search
« Reply #4 on: February 20, 2010, 02:34:46 pm »
I hate the fact that the database part of audiosurf is too simple and not proof of the many different tagging/naming schemes existing, while the building of the track is done just about perfect. I think that part of track buildingcan be combined with the DB (to compare tracks), using waveforms and databases like discogs/gracenote.

If the latter is not possible to program, please, at least make the track name/artists/featuring artists/versions more universal, so if I play a track my results will appear along other players which played the same track with a slightly different filename.

murlough23

  • Sr. Member
  • ****
  • Posts: 403
    • View Profile
    • Email
Re: More robust song search
« Reply #5 on: February 20, 2010, 03:29:28 pm »
I've noticed a few songs where, if you mouse over the scores, the track has the ups and downs at the same point but the overall slope is different. I don't know if these are truly diffrent recordings of the song (since the times are the same) or if that's just a side effect of different bit rates, etc. But that might make it hard for the game to use waveform analysis to determine that two seemingly distinct tracks are the same.

Aquinox

  • Newbie
  • *
  • Posts: 30
    • View Profile
    • Email
Re: More robust song search
« Reply #6 on: February 20, 2010, 03:49:57 pm »
Hmm.. So that means the concept of multiple users riding the same track is non existant, as the slightly different recordings/formats cause the track to be different? That means the scores don't say that much, even if the length is the same..

murlough23

  • Sr. Member
  • ****
  • Posts: 403
    • View Profile
    • Email
Re: More robust song search
« Reply #7 on: February 20, 2010, 03:58:59 pm »
Hmm.. So that means the concept of multiple users riding the same track is non existant, as the slightly different recordings/formats cause the track to be different? That means the scores don't say that much, even if the length is the same..

Hard to say. It seems intuitive that if the track sounds the same to human ears, any difference in the curve or traffic should be negligible. But I don't know how the algorithm that generates the track work. There's some loss implicit in mp3 coding, and most of what's lost is the spectrum of sound that the human ear can't detect anyway... at least until the bitrate gets low enough that things start to get noticeably garbled. If elements of the track are being generates based on those "out of range" frequencies or sounds, then I guess I would expect a WAV file of a song to have a different contour than an mp3 of the same song encoded at 320 kbps, which in turn would be different from one at 128 kbps, which would also differ from one at variable bitrate, etc.

Somebody really oughta test this theory. I've actually got a lot of curious questions that involve detective work regarding what elements of the sound (or properties of the waveform) cause the racetrack to have particular properties. It's kind of fun to try and reverse engineer it.

Mincus

  • Hero Member
  • *****
  • Posts: 2394
    • View Profile
Re: More robust song search
« Reply #8 on: February 20, 2010, 04:24:02 pm »
Detecting songs from the audio itself would be hard, however Audiosurf already does enough analysis of the track to make a "rollercoaster" image.
I suspect that it would be possible to compare (to region a region of say 10%) tracks with similar names and split scoreboards accordingly.

There are reasons I doubt this will be done however:
This would require a complete overhaul of the scoreboards, analysing every entry in  them and re-adjusting them. That's a lot of work for one (maybe two if pwntastic got involved since I understand he does some of the web stuff) man to do.
Working through the name matching algorithm, coupled with reimplementing the scoreboards in Audiosurf itself (something Dylan has shown reluctance to do already in the most recent update) would also be a lot of work.

Audiosurf is a good game and it's easily worth twice what we pay for it, but for such a low (and one-off) price you can't expect perfection. The score system isn't 100%, but it works for the vast majority of songs in the majority of circumstances, I don't think we can expect more than that for the $10, €10 or £7 we paid for it.

I do wonder how many would pay an optional additional payment to fund development like this though. I suspect not enough (although many on the forums might). I do think Audiosurf is priced too low, but raising its price now would be unfair. BUT, I'd rather Dylan worked on a new engine personally (and that I would be willing to give some additional funding to).

murlough23

  • Sr. Member
  • ****
  • Posts: 403
    • View Profile
    • Email
Re: More robust song search
« Reply #9 on: February 20, 2010, 04:38:10 pm »
I don't expect perfection. Just suggesting improved features because there is a place that allows us to do so.

blue_h3x

  • Hero Member
  • *****
  • Posts: 4577
    • View Profile
    • AS Tournament
Re: More robust song search
« Reply #10 on: February 20, 2010, 04:56:34 pm »
Granted that feedback and suggestions are welcome, but, you have to understand the workload vs gain ratio plays a huge part in what gets done. Re-working the scores will be a huge task to take on, more so for just one, or maybe two people
Austria is just like Yorkshire, but they have bigger hills.... oh and they have real snow too

murlough23

  • Sr. Member
  • ****
  • Posts: 403
    • View Profile
    • Email
Re: More robust song search
« Reply #11 on: February 20, 2010, 06:02:34 pm »
Granted that feedback and suggestions are welcome, but, you have to understand the workload vs gain ratio plays a huge part in what gets done. Re-working the scores will be a huge task to take on, more so for just one, or maybe two people

I totally understand. Being a programmer myself, I knew the difference between an easy mod that will just be a quick tweak to the code, and a time-consuming one that wouldn't really be worth the time/money spent.

In this case, we're asking about stuff like having the program analyze waveforms or even song titles, and basically playing "artificial intelligence" to determine what looks like it's the same. I think "organic intelligence" would be the better approach here - let users suggest "Hey, these two things look equivalent". That would put the onus on us, the players, to clean up the database. It would just need some sanity checks to make sure it wasn't abused. This still isn't "easy", but I think it would gain us a lot.

Wikipedia might be a good source of inspiration here. Anybody can edit it, but certain articles that are prone to vandalism are often "protected". Only trusted users (who have been part of the system for a while and not done anything untoward) can edit them. There's already a system to report suspicious scores, so maybe there should be a way to report "These two songs were merged by So-and-So" and report that if it looks suspicious, with repeat offenders losing their ability to do this.

Just thinking out loud. That idea's not a perfect solution, but it's something to build on.

Aquinox

  • Newbie
  • *
  • Posts: 30
    • View Profile
    • Email
Re: More robust song search
« Reply #12 on: February 21, 2010, 06:23:43 am »
Detecting songs from the audio itself would be hard, however Audiosurf already does enough analysis of the track to make a "rollercoaster" image.
I suspect that it would be possible to compare (to region a region of say 10%) tracks with similar names and split scoreboards accordingly.

There are reasons I doubt this will be done however:
This would require a complete overhaul of the scoreboards, analysing every entry in  them and re-adjusting them. That's a lot of work for one (maybe two if pwntastic got involved since I understand he does some of the web stuff) man to do.
Working through the name matching algorithm, coupled with reimplementing the scoreboards in Audiosurf itself (something Dylan has shown reluctance to do already in the most recent update) would also be a lot of work.
Granted that feedback and suggestions are welcome, but, you have to understand the workload vs gain ratio plays a huge part in what gets done. Re-working the scores will be a huge task to take on, more so for just one, or maybe two people

I totally understand. Being a programmer myself, I knew the difference between an easy mod that will just be a quick tweak to the code, and a time-consuming one that wouldn't really be worth the time/money spent.

In this case, we're asking about stuff like having the program analyze waveforms or even song titles, and basically playing "artificial intelligence" to determine what looks like it's the same. I think "organic intelligence" would be the better approach here - let users suggest "Hey, these two things look equivalent". That would put the onus on us, the players, to clean up the database. It would just need some sanity checks to make sure it wasn't abused. This still isn't "easy", but I think it would gain us a lot.

Wikipedia might be a good source of inspiration here. Anybody can edit it, but certain articles that are prone to vandalism are often "protected". Only trusted users (who have been part of the system for a while and not done anything untoward) can edit them. There's already a system to report suspicious scores, so maybe there should be a way to report "These two songs were merged by So-and-So" and report that if it looks suspicious, with repeat offenders losing their ability to do this.

Just thinking out loud. That idea's not a perfect solution, but it's something to build on.


I think something like this would be do-able, maybe also a system showing how much blocks of each color there were in the track played, to verify they are the same can be incorporated.. And add a report feature but a bit more elaborate than like it is now..

I also find the search engine (which also means the database) is not working good ; I have played the track 'striker' by John O callaghan, when I only look for striker it doesn't appear at all, when I look to john o callaghan only the ummet ozcan remix appears. the original mix can't be found at all, and it does exist, because i got a message I got dethroned..
« Last Edit: February 22, 2010, 12:57:47 pm by Aquinox »

murlough23

  • Sr. Member
  • ****
  • Posts: 403
    • View Profile
    • Email
Re: More robust song search
« Reply #13 on: March 04, 2010, 01:35:09 pm »
I also find the search engine (which also means the database) is not working good ; I have played the track 'striker' by John O callaghan, when I only look for striker it doesn't appear at all, when I look to john o callaghan only the ummet ozcan remix appears. the original mix can't be found at all, and it does exist, because i got a message I got dethroned..

Is his name actually "John O'Callaghan"? If so, try searching without the apostrophe AND without a space after the O. Anything with an apostrophe goes into the database with the apostrophe stripped out, and spaces will cause an artist or song to be seen as a distinct string. Annoyingly, if you search for anything with an apostrophe in it, you will get zero results. If the system is smart enough to strip apostrophes out of the artist/song names for the scores it keeps track of, you'd think it could be similarly gracious when searching.

Passerby

  • Hero Member
  • *****
  • Posts: 1494
    • View Profile
Re: More robust song search
« Reply #14 on: March 04, 2010, 03:44:48 pm »
Hmm.. So that means the concept of multiple users riding the same track is non existant, as the slightly different recordings/formats cause the track to be different? That means the scores don't say that much, even if the length is the same..

Hard to say. It seems intuitive that if the track sounds the same to human ears, any difference in the curve or traffic should be negligible. But I don't know how the algorithm that generates the track work. There's some loss implicit in mp3 coding, and most of what's lost is the spectrum of sound that the human ear can't detect anyway... at least until the bitrate gets low enough that things start to get noticeably garbled. If elements of the track are being generates based on those "out of range" frequencies or sounds, then I guess I would expect a WAV file of a song to have a different contour than an mp3 of the same song encoded at 320 kbps, which in turn would be different from one at 128 kbps, which would also differ from one at variable bitrate, etc.

Somebody really oughta test this theory. I've actually got a lot of curious questions that involve detective work regarding what elements of the sound (or properties of the waveform) cause the racetrack to have particular properties. It's kind of fun to try and reverse engineer it.

the argument about if it sound the same it should play the same can be kinda flawed. things like comparsons between mp3 and flac or 16bit 44.1khz wav, i know that there are a lot of people who cant tell the differnec between a 320kb/s mp3 and a 16bit 44.1 wav but i know there is a lot of people like me that can hear the difference and all of the artifacts that mp3's leave in the high end.