Solved

Storing and Retrieving Matrices from a Database

Posted on 2012-04-13
10
386 Views
Last Modified: 2016-02-10
How best do I store and retrieve matrices from a database? Taking MySQL, for example, do I simply use 3 tables: the document, the attributes and the document_attributes, or is that too simplistic and poor for performance. Do I not use mysql and something of the NoSQL sort like MongoDB?
0
Comment
Question by:aristanoble
  • 5
  • 3
10 Comments
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37842911
Please post a sample of the data that constitutes your matrix (or matrices).  Your suggestion about three tables sounds intuitively right (assuming document_attributes is a junction table) but it might be easier to understand the question if we can see the data.  Thanks, ~Ray
0
 
LVL 1

Author Comment

by:aristanoble
ID: 37843659
Well, the articles table, for example, would contain a list of articles: article-1, article-2, ...article-n.
The attributes table would contain a list of attributes: attribute-1, attribute-2, ... attribute-n.
And the document_attributes junction table would contain a list of "junctions":
article-1_id:attribute-1_id, article-1_id:attribute-2_id, article-1_id:attribute-3_id
article-2_id:attribute-1_id
article-3_id:attribute-3_id,article-3_id:attribute-4_id;
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37843703
That sounds correct to me.  What kind of data is this in the article and attribute tables?  Please post the CREATE TABLE statements and some example data for these tables, thanks.

Data base performance is usually most affected by the size of the server memory, the indexing strategy and the complexity of the queries.  A query that joins two tables via a junction table is not one that might be called complex.
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
LVL 1

Author Comment

by:aristanoble
ID: 37845589
Thanks Ray for your help. I haven't created the tables because I was first trying to figure out the best way to store this data or even if I should be using MySQL. So I'm still in the planning stages. The reason why I ask about how to store the data is because when it comes to retrieving the data to produce an incidence matrix, for example. Take for example the comparison of n documents and n terms:

      d1 d2 d3 ... dn
t1   0   1   0        1
t2   1   1   0        0
t3   1   1   1        0
...
tn   0   0   1        1

Where d = document, t = term, and 0/1 denotes the presence or absence of the term.
How would I select the data to be able to produce such a matrix?
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37846170
How would I select the data...?
The answer will be dependent on the data, which is why I want to see some test data so I understand what we're dealing with!

If I have some spare time this morning I may try to create a little test data set so I can show you how I would generate this matrix.  It's a fairly well understood design pattern in computer science.  Intersections generate the "1" values.
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 37846287
See if this makes sense to you.  If not, please post back with questions.  HTH, ~Ray
http://www.laprbass.com/RAY_temp_aristanoble.php
<?php // RAY_temp_aristanoble.php
error_reporting(E_ALL);
echo "<pre>";

// INTERSECTING DATA SETS:
// ADENINE CYTOSINE GUANINE THYMINE
// A       C T      G A     T


// CONNECTION AND SELECTION VARIABLES FOR THE DATABASE
$db_host = "localhost"; // PROBABLY THIS IS OK
$db_name = "??";        // GET THESE FROM YOUR HOSTING COMPANY
$db_user = "??";
$db_word = "??";

// LIVE DATABASE CREDENTIALS
require_once('RAY_live_data.php');


// OPEN A CONNECTION TO THE DATA BASE SERVER
// MAN PAGE: http://php.net/manual/en/function.mysql-connect.php
if (!$db_connection = mysql_connect("$db_host", "$db_user", "$db_word"))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB CONNECTION: ";
    echo "<br/> $errmsg <br/>";
}

// SELECT THE MYSQL DATA BASE
// MAN PAGE: http://php.net/manual/en/function.mysql-select-db.php
if (!$db_sel = mysql_select_db($db_name, $db_connection))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB SELECTION: ";
    echo "<br/> $errmsg <br/>";
    die('NO DATA BASE');
}
// IF WE GOT THIS FAR WE CAN DO QUERIES



// CREATING A TABLE OF MOLECULES
$sql
= "CREATE TEMPORARY TABLE molecules
( m_key      INT         NOT NULL AUTO_INCREMENT
, nucleobase VARCHAR(24) NOT NULL DEFAULT ''
,                        PRIMARY KEY (m_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// CREATING A TABLE OF LETTERS
$sql
= "CREATE TEMPORARY TABLE alphabet
(a_key      INT         NOT NULL AUTO_INCREMENT
, letter     VARCHAR(1)  NOT NULL DEFAULT ''
,                        PRIMARY KEY(a_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// CREATING A TABLE OF INTERSECTIONS
$sql
= "CREATE TEMPORARY TABLE alphabet_and_molecules
( i_key         INT         NOT NULL AUTO_INCREMENT
, alphabet_key  INT NOT NULL DEFAULT 0
, molecules_key INT NOT NULL DEFAULT 0
,               PRIMARY KEY(i_key)
,                       KEY(alphabet_key)
,                       KEY(molecules_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// LOAD THE TABLES
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('ADENINE')  " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('CYTOSINE') " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('GUANINE')  " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('THYMINE')  " );

mysql_query( "INSERT INTO alphabet (letter) VALUES ('A') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('C') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('G') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('T') " );

// QUERY FOR THE X-AXIS MOLECULE NAMES
$sql = "SELECT m_key, nucleobase FROM molecules";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    $xaxis[$row['m_key']] = $row['nucleobase'];
    echo PHP_EOL . $row['m_key'] . '=' . $row['nucleobase'];
}
echo PHP_EOL;

// QUERY FOR THE Y-AXIS ALPHABET
$sql = "SELECT a_key, letter FROM alphabet";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    $yaxis[$row['letter']] = $row['a_key'];
    echo PHP_EOL . $row['a_key'] . '=' . $row['letter'];
}
echo PHP_EOL;

// LOAD THE INTERSECTION TABLE
foreach ($xaxis as $molecules_key => $nucleobase)
{
    // CONSIDER EACH LETTER IN THE nucleobase
    $ltrs = str_split($nucleobase);
    foreach ($ltrs as $ltr)
    {
        // IF THIS LETTER IS PART OF OUR ALPHABET
        if (array_key_exists($ltr, $yaxis))
        {
            $alphabet_key = $yaxis[$ltr];
            $iql = "INSERT INTO alphabet_and_molecules ( alphabet_key, molecules_key ) VALUES ( $alphabet_key, $molecules_key )";
            $ies = mysql_query($iql) or die( "FAIL: $iql<br/>" . mysql_error() );
        }
    }
}

// SHOW THE INTERSECTION TABLE
$sql = "SELECT alphabet_key, molecules_key FROM alphabet_and_molecules";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['alphabet_key'] . ' INTERSECTS WITH MOLECULES ' . $row['molecules_key'];
}
echo PHP_EOL;

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'T'
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'T'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'C'
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'C'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'Z' (FALSE)
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'Z'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
$num = mysql_num_rows($res);
if (!$num) echo PHP_EOL . 'ALPHABET Z INTERSECTS WITH MOLECULES IN NO ROWS';
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}
echo PHP_EOL;

// SHOW THE TABULAR MATRIX
echo "<table>";
echo PHP_EOL;

echo "<tr><td></td>";
foreach ($xaxis as $nucleobase)
{
    echo "<td>$nucleobase &nbsp; </td>";
}
echo "</tr>";
echo PHP_EOL;

foreach ($yaxis as $letter => $a_key)
{
    echo "<tr>";
    echo "<td>$letter &nbsp; </td>";
    foreach ($xaxis as $m_key => $nucleobase)
    {
        $sql
		= "SELECT
		  alphabet_key
		, molecules_key
		FROM
		  alphabet_and_molecules
		WHERE
		  alphabet_key = $a_key
		AND
		  molecules_key = $m_key
		"
		;
		$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
		$num = mysql_num_rows($res);
		if ($num)
		{
		    echo "<td>$num</td>";
		}
		else
		{
		    echo "<td>0</td>";
		}
    }
    echo "<tr>";
    echo PHP_EOL;
}
echo "</table>";
echo PHP_EOL;

Open in new window

0
 
LVL 1

Author Comment

by:aristanoble
ID: 37853949
I understand the solution and it's very insightful, thank you. My biggest concern would be beginning at line 259. Iterating over n molecules and connecting to mysql inside that iteration could be costly time wise and resource wise. Would there be a better way with or without MySQL (i.e. using another method/specialized software)?
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 37855605
... connecting to mysql inside that iteration could be costly time wise ...
"Connecting" is a term of art in data base work, and what you're actually looking at there is a query, not a connection.  Unless you have tested it and timed it you could not know how long that query takes, but I would like to suggest that it is representative of the relational nature of relational data bases.  It gives us the relationships of the alphabet keys to the molecules keys.  If these columns are indexed, it is essentially an in-memory operation and will complete in lightning fast times.  Having had some experience with these kinds of relational tables, I would not waste time looking for a better way to do it.
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

These days socially coordinated efforts have turned into a critical requirement for enterprises.
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question