Solved

Storing and Retrieving Matrices from a Database

Posted on 2012-04-13
10
383 Views
Last Modified: 2016-02-10
How best do I store and retrieve matrices from a database? Taking MySQL, for example, do I simply use 3 tables: the document, the attributes and the document_attributes, or is that too simplistic and poor for performance. Do I not use mysql and something of the NoSQL sort like MongoDB?
0
Comment
Question by:aristanoble
  • 5
  • 3
10 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 37842911
Please post a sample of the data that constitutes your matrix (or matrices).  Your suggestion about three tables sounds intuitively right (assuming document_attributes is a junction table) but it might be easier to understand the question if we can see the data.  Thanks, ~Ray
0
 
LVL 1

Author Comment

by:aristanoble
ID: 37843659
Well, the articles table, for example, would contain a list of articles: article-1, article-2, ...article-n.
The attributes table would contain a list of attributes: attribute-1, attribute-2, ... attribute-n.
And the document_attributes junction table would contain a list of "junctions":
article-1_id:attribute-1_id, article-1_id:attribute-2_id, article-1_id:attribute-3_id
article-2_id:attribute-1_id
article-3_id:attribute-3_id,article-3_id:attribute-4_id;
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 37843703
That sounds correct to me.  What kind of data is this in the article and attribute tables?  Please post the CREATE TABLE statements and some example data for these tables, thanks.

Data base performance is usually most affected by the size of the server memory, the indexing strategy and the complexity of the queries.  A query that joins two tables via a junction table is not one that might be called complex.
0
 
LVL 1

Author Comment

by:aristanoble
ID: 37845589
Thanks Ray for your help. I haven't created the tables because I was first trying to figure out the best way to store this data or even if I should be using MySQL. So I'm still in the planning stages. The reason why I ask about how to store the data is because when it comes to retrieving the data to produce an incidence matrix, for example. Take for example the comparison of n documents and n terms:

      d1 d2 d3 ... dn
t1   0   1   0        1
t2   1   1   0        0
t3   1   1   1        0
...
tn   0   0   1        1

Where d = document, t = term, and 0/1 denotes the presence or absence of the term.
How would I select the data to be able to produce such a matrix?
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 108

Expert Comment

by:Ray Paseur
ID: 37846170
How would I select the data...?
The answer will be dependent on the data, which is why I want to see some test data so I understand what we're dealing with!

If I have some spare time this morning I may try to create a little test data set so I can show you how I would generate this matrix.  It's a fairly well understood design pattern in computer science.  Intersections generate the "1" values.
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 37846287
See if this makes sense to you.  If not, please post back with questions.  HTH, ~Ray
http://www.laprbass.com/RAY_temp_aristanoble.php
<?php // RAY_temp_aristanoble.php
error_reporting(E_ALL);
echo "<pre>";

// INTERSECTING DATA SETS:
// ADENINE CYTOSINE GUANINE THYMINE
// A       C T      G A     T


// CONNECTION AND SELECTION VARIABLES FOR THE DATABASE
$db_host = "localhost"; // PROBABLY THIS IS OK
$db_name = "??";        // GET THESE FROM YOUR HOSTING COMPANY
$db_user = "??";
$db_word = "??";

// LIVE DATABASE CREDENTIALS
require_once('RAY_live_data.php');


// OPEN A CONNECTION TO THE DATA BASE SERVER
// MAN PAGE: http://php.net/manual/en/function.mysql-connect.php
if (!$db_connection = mysql_connect("$db_host", "$db_user", "$db_word"))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB CONNECTION: ";
    echo "<br/> $errmsg <br/>";
}

// SELECT THE MYSQL DATA BASE
// MAN PAGE: http://php.net/manual/en/function.mysql-select-db.php
if (!$db_sel = mysql_select_db($db_name, $db_connection))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB SELECTION: ";
    echo "<br/> $errmsg <br/>";
    die('NO DATA BASE');
}
// IF WE GOT THIS FAR WE CAN DO QUERIES



// CREATING A TABLE OF MOLECULES
$sql
= "CREATE TEMPORARY TABLE molecules
( m_key      INT         NOT NULL AUTO_INCREMENT
, nucleobase VARCHAR(24) NOT NULL DEFAULT ''
,                        PRIMARY KEY (m_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// CREATING A TABLE OF LETTERS
$sql
= "CREATE TEMPORARY TABLE alphabet
(a_key      INT         NOT NULL AUTO_INCREMENT
, letter     VARCHAR(1)  NOT NULL DEFAULT ''
,                        PRIMARY KEY(a_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// CREATING A TABLE OF INTERSECTIONS
$sql
= "CREATE TEMPORARY TABLE alphabet_and_molecules
( i_key         INT         NOT NULL AUTO_INCREMENT
, alphabet_key  INT NOT NULL DEFAULT 0
, molecules_key INT NOT NULL DEFAULT 0
,               PRIMARY KEY(i_key)
,                       KEY(alphabet_key)
,                       KEY(molecules_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// LOAD THE TABLES
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('ADENINE')  " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('CYTOSINE') " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('GUANINE')  " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('THYMINE')  " );

mysql_query( "INSERT INTO alphabet (letter) VALUES ('A') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('C') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('G') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('T') " );

// QUERY FOR THE X-AXIS MOLECULE NAMES
$sql = "SELECT m_key, nucleobase FROM molecules";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    $xaxis[$row['m_key']] = $row['nucleobase'];
    echo PHP_EOL . $row['m_key'] . '=' . $row['nucleobase'];
}
echo PHP_EOL;

// QUERY FOR THE Y-AXIS ALPHABET
$sql = "SELECT a_key, letter FROM alphabet";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    $yaxis[$row['letter']] = $row['a_key'];
    echo PHP_EOL . $row['a_key'] . '=' . $row['letter'];
}
echo PHP_EOL;

// LOAD THE INTERSECTION TABLE
foreach ($xaxis as $molecules_key => $nucleobase)
{
    // CONSIDER EACH LETTER IN THE nucleobase
    $ltrs = str_split($nucleobase);
    foreach ($ltrs as $ltr)
    {
        // IF THIS LETTER IS PART OF OUR ALPHABET
        if (array_key_exists($ltr, $yaxis))
        {
            $alphabet_key = $yaxis[$ltr];
            $iql = "INSERT INTO alphabet_and_molecules ( alphabet_key, molecules_key ) VALUES ( $alphabet_key, $molecules_key )";
            $ies = mysql_query($iql) or die( "FAIL: $iql<br/>" . mysql_error() );
        }
    }
}

// SHOW THE INTERSECTION TABLE
$sql = "SELECT alphabet_key, molecules_key FROM alphabet_and_molecules";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['alphabet_key'] . ' INTERSECTS WITH MOLECULES ' . $row['molecules_key'];
}
echo PHP_EOL;

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'T'
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'T'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'C'
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'C'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'Z' (FALSE)
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'Z'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
$num = mysql_num_rows($res);
if (!$num) echo PHP_EOL . 'ALPHABET Z INTERSECTS WITH MOLECULES IN NO ROWS';
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}
echo PHP_EOL;

// SHOW THE TABULAR MATRIX
echo "<table>";
echo PHP_EOL;

echo "<tr><td></td>";
foreach ($xaxis as $nucleobase)
{
    echo "<td>$nucleobase &nbsp; </td>";
}
echo "</tr>";
echo PHP_EOL;

foreach ($yaxis as $letter => $a_key)
{
    echo "<tr>";
    echo "<td>$letter &nbsp; </td>";
    foreach ($xaxis as $m_key => $nucleobase)
    {
        $sql
		= "SELECT
		  alphabet_key
		, molecules_key
		FROM
		  alphabet_and_molecules
		WHERE
		  alphabet_key = $a_key
		AND
		  molecules_key = $m_key
		"
		;
		$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
		$num = mysql_num_rows($res);
		if ($num)
		{
		    echo "<td>$num</td>";
		}
		else
		{
		    echo "<td>0</td>";
		}
    }
    echo "<tr>";
    echo PHP_EOL;
}
echo "</table>";
echo PHP_EOL;

Open in new window

0
 
LVL 1

Author Comment

by:aristanoble
ID: 37853949
I understand the solution and it's very insightful, thank you. My biggest concern would be beginning at line 259. Iterating over n molecules and connecting to mysql inside that iteration could be costly time wise and resource wise. Would there be a better way with or without MySQL (i.e. using another method/specialized software)?
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 37855605
... connecting to mysql inside that iteration could be costly time wise ...
"Connecting" is a term of art in data base work, and what you're actually looking at there is a query, not a connection.  Unless you have tested it and timed it you could not know how long that query takes, but I would like to suggest that it is representative of the relational nature of relational data bases.  It gives us the relationships of the alphabet keys to the molecules keys.  If these columns are indexed, it is essentially an in-memory operation and will complete in lightning fast times.  Having had some experience with these kinds of relational tables, I would not waste time looking for a better way to do it.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
MySQL database data submission 7 39
function to display dynamic data in dropdown 8 27
Not needed 13 54
sql sentence 2 12
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now