Link to home
Start Free TrialLog in
Avatar of Tx74Qp853mR
Tx74Qp853mR

asked on

Migrating High Scores to MySQL

Experts,

I'm working with a high score list based on Flash, PHP, and a flat file.  The back end needs migration to MySQL from flat file.  Flash will remain as the front end, and PHP/MySQL will be the new back end.

The high score list was implemented as follows:

Download ---

PHP reads the flat file and sends to Flash three lists.  One list for each game play level (easy, medium and hard.)  Each list is sorted by score.

Currently you cannot select different time periods for the lists, such as daily, weekly, monthly ... the served lists are based on all game plays regardless of when each play occurred.

The list for each game play level is sorted by score, and contains:
1) player name
2) score
3) city
4) country
5) ip
6) timestamp

Flash displays one sorted list for each game play level (easy, medium, and hard.)

Upload ---

Flash sends to PHP an encrypted slug consisting of (player name, game level, score, city, and country).  The Flash code generates game level and score.  The user provides player name, city, and country.

PHP decrypts the slug, adds ip and timestamp, and appends a new record to the flat file.

I need to:
a) Replace the flat file with MySQL
b) Create top score lists for daily, weekly, and monthly plays at each game play level
c) Automatically determine the player's "country" from his ip

Can you help with the following:
1) MySQL table(s) design
2) SQL queries design for daily, weekly and monthly top score lists for each game play level
3) Automatically extracting "country" from ip.

Many thanks.
SOLUTION
Avatar of Frosty555
Frosty555
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Determining the country from an IP address involves looking up the IP prefix in a database of known values.  There is no mathematical or logical approach to which IP is in which country, only a list based on what has been formally agreed upon.

You can download a CSV of all the IP prefixes and the countries they belong to, and add them to your MYSQL database. Once youv'e got taht you can easily do queries on the IP to identify the country (if it is possible).

A complete example is available here:
http://www.tutorialized.com/view/tutorial/Easy-Country-from-IP-with-PHP/33749

Look at how IPs are stored in this database. They are stored as a "DOUBLE" - e.g. in long integer format instead of in string format. PHP has a function called ip2long() which does the conversion. You will want to decide whether you want to store your IPs in your highscore table as doubles, or as strings, and try to stay fairly consistent throughout.

http://www.tutorialized.com/view/tutorial/Easy-Country-from-IP-with-PHP/33749
I don't know how your data is stored in you flat file. You may need to parse it line by line in PHP and execute INSERT statements into your database in order to import it. If it's in CSV format, you might be able to use phpmyadmin or mysql administrator to import the data.

In the example below, you can see a simple way of stepping through a text file line by line:
http://www.php-mysql-tutorial.com/wikis/php-tutorial/reading-a-file-using-php.aspx

You will of course have to use PHP's parsing functions like substr(), and explode() to take each line and get the appropriate data out of it, and then you'll have to query your mysql database with INSERT commands to insert the data you retrieved into the database.

If you encounter issues with speed (calling INSERT over and over is quite slow, only a few hundred records a second at best), a good way to speed it up is to use "grouped" insert statements. For example:

INSERT INTO mytable (name, score, ip) VALUES
('john', 50, '111.111.111.111')
(mary', 35, '111.111.111.111')
(joe', 12, '111.111.111.111')
('john', 50, '111.111.111.111')
(mary', 35, '111.111.111.111')
(joe', 12, '111.111.111.111');

And then you send that whole thing as a single query to MySql. So you build these "grouped" queries in batches of ten, fifty, maybe even a few hundred at a time, and then hit the database only once with your massive query string. This is a much faster way of doing inserts, but only necessary if you're actually going to be waiting longer for the slow way than it would take to code the fast way ;)
One last note, I suggest you send a salted "key" with your highscore when you do the upload. This will preven fraudulent uploads.

Basically, included in your encrypted slug of data, include an MD5 hash of, say, the person's name, score, the current day of the month, and a special secret "key" all concatenated together. Your PHP should perform a similar hash and ensure that everything checks out. This will make it difficult for somebody to forge a highscore submission by using alternative methods of communicating with your "upload" php script, and the only way it would fail is if somebody submits a score right at the stroke of midnight.
Ok, I spent some time with this, in the meantime Frosty555 has answered... here is my answer anyways:

How many games will be stored? If there will be very many, more than a few hundred each day, you should normalize, i.e. split the data into multiple tables: player, city, country and game. For a smaller project, something like the below could be used.

For the definition of a week (starting sunday or monday), modify the code according to this:

http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_week

For extracting the country from the ip, search google for "geolocation php", there are many scripts available, free and commercial. The free ones are less accurate, but might be good enough for your needs. I found a free one here: http://www.linuxjournal.com/article/7856 it is provided at the bottom of the snippet, slightly modified. But beware, it failed for several tests I did, i.e. it claimed the ip's was in Holland, when I know they are in Sweden and Norway.

# 1) MySQL table(s) design
 
create table games (
  id int not null primary key auto_increment,
  player varchar(255) not null,
  score int not null,
  city varchar(40) not null,
  country varchar(40) not null,
  ip int,
  timestamp datetime not null,
  level enum('easy','medium','hard') not null,
  index(timestamp)
);
 
# Inserting a game:
 
insert into games set 
  score=$score,
  level='$level',
  player='$player',
  city='$city',
  country='$country',
  ip=inet_aton('$ip'),
  timestamp=now()
 
# 2) SQL queries design for daily, weekly and monthly top score lists for each game play level
 
# today
 
select player,city,country,score
from games
where level = '$level' and timestamp between 
  CURDATE() and NOW()
order by score desc
limit 10
 
# some day ($date format: YYYY-MM-DD)
 
select player,city,country,score
from games
where level = '$level' and timestamp between 
  '$date' and 
  '$date' + interval 1 day - interval 1 second
order by score desc
limit 10
 
 
# this week
 
select player,city,country,score
from games
where level = '$level' and timestamp between 
  CURDATE() - interval 7 day and NOW() and -- this is to utilize the index
  WEEK(timestamp,0) = WEEK(CURDATE(),0)    -- configure week here, see link above
order by score desc
limit 10
 
# some week ($weekstart format: YYYY-MM-DD)
# calculate $weekstart with php
 
select player,city,country,score
from games
where level = '$level' and timestamp between 
  '$weekstart' and
  '$weekstart' + interval 7 day - interval 1 second 
order by score desc
limit 10
 
# this month ($monthstart format: YYYY-MM-DD)
# calculate $monthstart with php
 
select player,city,country,score
from games
where level = '$level' and timestamp between 
  '$monthstart' and NOW()
order by score desc
limit 10
 
# some month ($monthstart format: YYYY-MM-DD)
# calculate $monthstart with php
 
select player,city,country,score
from games
where level = '$level' and timestamp between 
  '$monthstart' and 
  '$monthstart' + interval 1 month - interval 1 second
order by score desc
limit 10
 
# geo location, script from this page: 
# http://www.linuxjournal.com/article/7856
 
function getLocationCaidaNetGeo($ip) { 
  $NetGeoURL = "http://netgeo.caida.org/perl/netgeo.cgi?target=".$ip; 
  if($NetGeoFP = fopen($NetGeoURL,r)) { 
    ob_start();
    fpassthru($NetGeoFP);
    $NetGeoHTML = ob_get_contents();
    ob_end_clean();
    fclose($NetGeoFP);
  }
  preg_match ("/LAT:(.*)/i", $NetGeoHTML, $temp) or die("Could not find element LAT");
  $location['LAT'] = $temp[1];
  preg_match ("/LONG:(.*)/i", $NetGeoHTML, $temp) or die("Could not find element LONG");
  $location['LONG'] = $temp[1];
  preg_match ("/COUNTRY:(.*)/i", $NetGeoHTML, $temp) or die("Could not find element COUNTRY");
  $location['COUNTRY'] = $temp[1];
  return $location;
}

Open in new window

Avatar of Tx74Qp853mR
Tx74Qp853mR

ASKER

@Frosty555

I need the SQL query to return the high scores ... so in addition to the queries you show above, which sort by time period, I'll need the query to sort by high score before selecting the top 10 records.  Can you show how to do that?

The flat file is a proprietary format using PHP.  It will need parsing to translate into MySQL.

The Flash-PHP interface uses a "salting" algorithm, so I think I'm covered there ... but thanks for mentioning it ... you're precisely right ... it's absolutely essential, to prevent cheats.

Many thanks
Oh, I forgot the order by clause. You would just put an "ORDER BY score DESC" before the LIMIT clause for any of the sql statements I posted.

Example:

SELECT * FROM highscores
WHERE TO_DAYS(now()) - TO_DAYS(datecreated) < 7
ORDER BY score DESC
LIMIT 10
@cxr

>> Ok, I spent some time with this, in the meantime Frosty555 has answered... here is my answer anyways: <<

Awesome ... no worries mate.

Many thanks.
@Frosty555

Thanks mate.

Will keep the thread open while I have a go at these.
@cxr

>> How many games will be stored? If there will be very many, more than a few hundred each day, you should normalize, i.e. split the data into multiple tables: player, city, country and game. For a smaller project, something like the below could be used. <<

What is the reason for normalization when storing more than say 400 scores per day?
(note: I assume what you mean by "games" is game-plays ... what is stored is the score, that is why I talked about 400+ "scores" per day.)

@ALL ...

Now let's talk about multiple "games" ... what I mean by a "game" is for example, chess, checkers, monopoly, etc.  Each of these is a different "game" in the following context:

If a website has multiple games, is it better to keep the top scores for each game in separate MySQL databases, or is it better to keep the top scores for all games in the same DB?

From a DB management point of view it's better to have a single large DB for the top-scores of all games ... i.e. only one DB to backup, etc.

But from a performance point of view it may be better to have separate top-score DBs for each game ... i.e. the SQL queries are simpler and probably would execute faster.  With a single large DB for all games, the tables need a "game" attribute, adding complexity to the queries.

On the other hand, if all games and top-score DBs are running on the same server there may be very little if any performance improvement with separate top-score DBs for each game, as opposed to a single large top-score DB for all games.

What are your thoughts on this matter?

I have a choice of using MySQL 4.1 or 5.0 ... any thoughts on which version is better and why?
>> What is the reason for normalization when storing more than say 400 scores per day?

Efficiency. With the currently proposed designs, player names, cities and countries are duplicated many times. This is bad db design. I can not find where you talked about 400+ game plays each day? It should not be a problem, though. Calculate the average record length, and multiply by number of records stored for one month, which is the longest period you need to do a query for. Then multiply by the number of simultaneous web requests for monthly highscore list. The total should not exceed available memory for the db server.

>> If a website has multiple games, is it better to keep the top scores for each game in separate MySQL
>> databases, or is it better to keep the top scores for all games in the same DB?

I would put it all in the same database, in the same table. Just add a column 'gametype' or similar. Let this be an integer, and store the names for the games in a separate table.

>> From a DB management point of view it's better to have a single large DB for the top-scores of all games ... i.e. only one DB to backup, etc.

Yes, but not only because of backups. By keeping it in the same database you also make it easier to do different types of statistics: country/city with most active players, time of day with most players etc, regardless of game type.

The single "game" or "gametype" attribute will add very little complexity, allmost insignificant. Full normalization would add more complexity, but the result would be a very fast database with "unlimited" possibilities for expansions and growth.

Whith a "gametype" integer in the single table design, the timestamp index should be a compound index: (gametype,timestamp).
>> I have a choice of using MySQL 4.1 or 5.0 ... any thoughts on which version is better and why?

It is not so important, you are not going to use any new features (SP, views, triggers, federated engine...).

Which 4.1 and which 5.0? Make sure it is relatively new.

http://dev.mysql.com/doc/refman/4.1/en/news-4-1-x.html
http://dev.mysql.com/doc/refman/5.0/en/news-5-0-x.html
@cxr

How would you backup a fully normalized DB? ... i.e. wouldn't DB backup require shutting down the site to prevent corrupting the backup?  With multiple game-types using the DB, the probability of a DB write during backup is very high, and equally high is the probability of corrupting the backup, no?

>> I can not find where you talked about 400+ game plays each day? <<

I simply used 400+ game plays per day to put a specific number on your threshold for DB normalization: "more than a few hundred [game plays] each day" ... and to ask why you would normalize above that threshold ... your answer was: "Efficiency."

>> The single "game" or "gametype" attribute will add very little complexity, allmost insignificant. Full normalization would add more complexity, but the result would be a very fast database with "unlimited" possibilities for expansions and growth. <<

I don't understand the internal workings of MySQL ... my instinct is that higher complexity would mean lower performance, but that is exactly the opposite of what you're saying ... and I believe you.

Can you show how to do a fully normalized DB for multiple game-types, including tables and queries?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@cxr

For simplicity I may leave "city" out entirely, so the user would only provide his "name" ... everything else would come from the client app and the server, including "country", which is derived from ip.

In that case, would the "player" table be indexed by "country_id"?

The "game level" is a sticky point ... some games have the 3 levels "easy, medium, hard", but other games don't have those levels.  Any ideas on how to get around this with a single DB for all games?
>> In that case, would the "player" table be indexed by "country_id"?

In the player table, table_id must be primary key, which is a special case of unique index. The player name ('player' column) should also have an index, to make lookups based on name fast. This index should probably also be unique, to prevent multiple players from having the same player name. This depends on your system, it is not an error the allow multiple players with the same name. If you need country_id in this table or not, and if this column should be indexed, also depends on your system. Do you need to associate a player with a country? Is it illegal for a player from one country to play if he is travelling and is in a different country? If so, you need the column player.country_id. Do you need to get fast access to a list of all players from a given country? Then you need an index on player.country_id.

If a game has no levels, you can store it as any level, for instance 'easy', and just code this into the application. You can also add more levels in the enum() data type, you can have up to 255 levels stored in a single byte. They are stored in the database as numbers 1, 2, 3 and so on, the words 'easy', 'medium' and 'hard' are stored just once in the table definition.
@cxr and Frosty555

I haven't had a chance to work through your examples, and I don't want to hold up your points much longer.

If I post a new question on this topic, how should I alert you, so that you may have the opportunity to continue commenting on the topic?
You can use the "Post a related question" feature, then we get a notification.