On another note, you might even consider parsing your referer with parse_url() and separating the major pieces like domain and path out to different columns. Then create a multi-column index on the domain and path - it would probably also be smaller and probably just as fast (and possibly easier to work with). It's hard to say without giving it a shot, though.
Main Topics
Browse All Topics





by: gr8gonzoPosted on 2009-11-06 at 08:16:05ID: 25760433
Theoretically, yes, it would reduce the index size, but I wouldn't put a UNIQUE index on the hash. Given that you have over 2 million rows, there's a decent chance you will have even more rows in the future, and while hash collisions are rare (the result of two different strings having the same hash), they still do happen once in a blue moon. If you put a UNIQUE index on it, you might end up being unable to insert a row because of the hash collision.
Still, indexing the hashes (but not unique) could probably provide you with 1 record almost every time, and SOMETIMES 2, so you would have to adjust your code so that it checks the # of records returned, and then if it's more than 1, you can do another more thorough lookup on the real value.