Link to home
Start Free TrialLog in
Avatar of williamskellorn
williamskellorn

asked on

Filter MySQL stored coordinates to best in 24 hour cycle using PHP

I have a database of messages received from a number of satellite-transmitter tagged birds on migration.

Amongst other data, the key columns in the table are ptt (a unique number to identify each tag/bird), location_class (a value that indicates the quality / accuracy of the message, in ascending order of quality the possible values are Z,B,A,0,1,2,3), location_date ( a unix timestamp for the message) and latitude and longitude.

I am currently using PHP to make a MySQL query to retrieve position coordinates and timestamps for each message with a location class of 1, 2 or 3 to populate JS gmap coordinate and timestamp arrays and render and animate routes via Googlemaps.

I would like to further limit the number of coordinate / timestamp pairs returned for each ptt by filtering to a single pair of coordinates and timestamp for each 24 hour cycle (which can start with the value of the first timestamp returned for each bird), choosing the message with the best location_class and latest timestamp during that cycle.

The aim is to reduce the amount of 'noise' in the map routes (and improve loading time) by reducing the amount of data going into the JS arrays.

I am unsure how best to go about this - my SQL knowledge is limited - whether it is possible to do this as part of the SQL query or whether I need to do an intermediate operation in PHP after retrieving the query to slim down the coordinates and timestamps. The PHP will be cached, so I am not concerned about performance on the server side.

A simplified snippet (below and attached) shows the current code to produce the JS arrays.

<?php
//set default timezone for date conversion
date_default_timezone_set('UTC');

$connection = mysql_connect ("localhost", "user", "pass");
$db_selected = mysql_select_db("birds", $connection);

// get a unique list of all birds
$get_ptts = "
SELECT distinct ptt from bird_data 
ORDER BY ptt";
$result=mysql_query($get_ptts) or die ("Could not retrieve unique PTT numbers: ".mysql_error());
// set up fresh arrays for bird names, earliest and latest transmission date for each bird (to later calculate the browseable date range) 
$birdsArray = array();
$allEarliestDates = array();
$allLatestDates = array();
?>

<script type="text/javascript">

<?php
// Iterates through the MySQL results, creating JS arrays for each PTT.
while ($row = @mysql_fetch_assoc($result)) {

	
	// Var for unique PTT number of current bird
	$bird = $row['ptt'];

	//get ptt, location_date, location_class, latitude, longitude for best row from each transmission for current bird
	$get_best_locs = "
	SELECT ptt, location_date, location_class, latitude, longitude from bird_data  
	WHERE (ptt=$bird)
	and (location_class=1 OR location_class=2 OR location_class=3)
	GROUP BY location_date
	ORDER BY location_date";

	$result2=mysql_query($get_best_locs); //or die ("Could not retrieve all best locations data: ".mysql_error());
	if (mysql_num_rows($result2)) {  // if one or more non-empty rows are returned..
	
		// find total number of rows returned to work out latest coordinates
		$number = mysql_num_rows($result2); //or die ("Could not retrieve number of rows of locations data: ".mysql_error());
		$i = 1;
		//Create fresh coordinates and location timestamp arrays
		$coordinates = "";
		$positionTime = "";
		
		while ($row2 = @mysql_fetch_assoc($result2)) {
			// print javascript to push each coordinate and timestamp row on to arrays for each bird
			print "
			coordsAll_" . $cuckoo_names[$bird] . ".push(new google.maps.LatLng(" . $row2['latitude'] . "," . $row2['longitude'] . ")); timesAll_" . $cuckoo_names[$bird] . ".push(" . $row2['location_date'] . "); // date - " . date('d.m.Y', $row2['location_date']) . " ";
		
			// if this is the first entry, add the date to the earliest transmission dates array for all birds
			if($i === 1) {
				array_push($allEarliestDates, $row2['location_date']);
				}
			
			// Create variables with latest location and date
			if($i == $number)
				{
				$LatestCoordinates = "new google.maps.LatLng(" .$row2['latitude'] . "," . $row2['longitude'] . ")";
				$LatestTransDate = $row2['location_date'];
			
				// add the latest date to the latest transmission date array for all birds
				array_push($allLatestDates, $row2['location_date']);
				}
			$i++;
			}

		} // end if $result2 contains non-empty rows
			
	} // end while PTT unique list from db exists
 ?>
 // JS  script continues here to create map code .....
 </script>

Open in new window

best-in-cycle-example.php
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Words like "filter" and "best" are not really terms of art in location-aware programming.  But with some discovery we might be able to suggest a good approach.

How many tables and rows are involved in the data base?  I only see bird_data so far.  Is that the only one?  How many rows in the table?  How many unique bird identities ("ptt")?

You might consider using a "down-select" process.  In this process, you would create a temporary table, perhaps using ENGINE=MEMORY, that would contain one row per bird.  The table would be populated in a first query, then the temporary table would be used to generate the JavaScript.

If you want to post some of the representative data, perhaps a hundred lines showing a few birds, we might be able to experiment with different queries to help winnow the results set down to just what you want.
Avatar of Julian Hansen
Ok so if I understand you correctly you want this

For Ptt: 1 whose first message was at 10am 10 June 2013 the next value for this PTT you want to choose is the one with the best location class before 10am 11 June 2013 and the next with the best location class before 10am 12 June 2013 - is that correct?

Just an asside you can do this in MySQL

SELECT * from bird_data where location class in (1,2,3)

Otherwise something like this might work

select max(location_class), ptt, year(location_date), month(location_date), day(location_date) from bird_data
where location_class in (1,2,3)
group by ptt, year(location_date), month(location_date), day(location_date);

This won't give you a 24 hour period starting from the actual time of the first message but will group the records by day showing the highest location class per ptt for each day.

The query will also not contain the lattitude and longitude of the record so you would need to use the results from this query to retrieve the data you are looking for with co-ordinates. See next post
query will also not contain the lattitude and longitude
Those data elements are part of the bird_data rows.  Just add them to the SELECT query.
(deleted)
This should work
SELECT MAX(location_class) AS location_class, ptt, lattitude, longitude, location_date FROM bird_data
WHERE location_class IN (1,2,3)
GROUP BY ptt, YEAR(location_date), MONTH(location_date), DAY(location_date);

Open in new window

Spell check: latitude vs lattitude?  That's why I want the author to give us some test data, so we can shake the typos out and give tested answers.
With correct spelling of latitude

SELECT MAX(location_class) AS location_class, ptt, latitude, longitude, location_date FROM bird_data
WHERE location_class IN (1,2,3)
GROUP BY ptt, YEAR(location_date), MONTH(location_date), DAY(location_date);

Open in new window

Avatar of williamskellorn
williamskellorn

ASKER

Thanks for the swift responses.

I've attached a sample of 200 rows exported via phpMyAdmin with standard settings to give a clear idea of the structure of the table. I'm aware some of the column types could be more appropriately defined - I inherited this project.

Some further clarification, thanks for requesting Ray -

bird_data is the only table in the DB.
32 columns including the column types mentioned above.
Currently 34,000 rows with 500 - 700 being added daily. This will tail off as we lose contact with some birds.
Currently 30 unique 'ptt' birds.
Over the next year the number of rows and unique ptts are likely to double - project may extend for another two years.

Thanks for the suggestions so far.
bird-data.sql
Excellent - thanks for posting the data.  I have business meetings for much of the rest of the day, but I'll return to it later.  Don't worry about column types - we can fix those ;-)  One thing that immediately comes to mind would be to set the lat/lon columns to DECIMAL data type - facilitates arithmetic.

More to follow, ~Ray
Your table has location_date as a varchar - is this correct?

How do we interpret that data?
ASKER CERTIFIED SOLUTION
Avatar of Julian Hansen
Julian Hansen
Flag of South Africa image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Julian H - yes, the location_date column stores unix timestamps, though only as varchar strings - inappropriately, as I'm aware.

Where the cycle begins and ends is arbitrary - as long as all rows are assessed and for each ptt number, for each period of 24 hours, we have the coordinates and timestamp of the row with the best location class, and if there are more than one with that class, the one with the latest timestamp - one row per cycle, no alternates..I hope this is what you mean when you refer to alternates.

Many thanks for actually putting a script together. I will hopefully have time to give this a go within the next two days - tied up on another project.

I'll report back then and dole out some points.
You can run the script on my server.   Just click the link and see if it's giving you what you expect.
http://www.laprbass.com/RAY_temp_williamskellorn.php

Best regards, ~Ray
In that case the query I posted should do the trick.

The storing of timestamp as varchar is not a major issue - it will result in a slight performance hit over having it as a native datetime but I don't think it is enough to put special effort into changing a working process. The MySQL FROM_UNIXTIME seems to do the job.
Great, that query was spot-on.

In using FROM_UNIXTIME to convert the timestamp I believe the output date / time value will be given in the timezone of the SQL server - I'm now looking at ensuring it stays in UTC instead, but this shouldn't be too hard.

Many thanks!
You are welcome - thanks for the points.
I believe the output date / time value will be given in the timezone of the SQL server
Yes, the server time in PHP is not connected to the SQL server.
Revisiting this issue as I've come around again to a point where I can implement this. If more appropriate I'll ask as a new question.

In testing I've found that the query suggested consistently returns one timestamp and set of coordinates per 24 hour period, for each bird, as expected. However, the coordinates returned are those for the table row first encountered with a location_class value of 1,2 or 3 within the time window, rather than the row with the max location_class within the time window. When investigating this, printing out the 'location_class' value always gives the max value during the period, regardless of whether it is taken from same row as the coordinates chosen (which are always from the first row encountered with a location_class of 1,2 or 3).

Do I need to have another query (possibly using a temporary table in memory) to somehow ensure the location_class, coordinates and timestamp are chosen from the same row, which has the max value for location_class? At this point I care less if this is the first row encountered rather than the preferred latest row with the max location_class, as long as it is the max!

If necessary I'll provide simplified versions of the code in use and an examples of the input and output data where I can see this occurring.

Thanks.