Storing and Retrieving Matrices from a Database

How best do I store and retrieve matrices from a database? Taking MySQL, for example, do I simply use 3 tables: the document, the attributes and the document_attributes, or is that too simplistic and poor for performance. Do I not use mysql and something of the NoSQL sort like MongoDB?
LVL 1
aristanobleAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ray PaseurCommented:
Please post a sample of the data that constitutes your matrix (or matrices).  Your suggestion about three tables sounds intuitively right (assuming document_attributes is a junction table) but it might be easier to understand the question if we can see the data.  Thanks, ~Ray
0
aristanobleAuthor Commented:
Well, the articles table, for example, would contain a list of articles: article-1, article-2, ...article-n.
The attributes table would contain a list of attributes: attribute-1, attribute-2, ... attribute-n.
And the document_attributes junction table would contain a list of "junctions":
article-1_id:attribute-1_id, article-1_id:attribute-2_id, article-1_id:attribute-3_id
article-2_id:attribute-1_id
article-3_id:attribute-3_id,article-3_id:attribute-4_id;
0
Ray PaseurCommented:
That sounds correct to me.  What kind of data is this in the article and attribute tables?  Please post the CREATE TABLE statements and some example data for these tables, thanks.

Data base performance is usually most affected by the size of the server memory, the indexing strategy and the complexity of the queries.  A query that joins two tables via a junction table is not one that might be called complex.
0
IT Pros Agree: AI and Machine Learning Key

We’d all like to think our company’s data is well protected, but when you ask IT professionals they admit the data probably is not as safe as it could be.

aristanobleAuthor Commented:
Thanks Ray for your help. I haven't created the tables because I was first trying to figure out the best way to store this data or even if I should be using MySQL. So I'm still in the planning stages. The reason why I ask about how to store the data is because when it comes to retrieving the data to produce an incidence matrix, for example. Take for example the comparison of n documents and n terms:

      d1 d2 d3 ... dn
t1   0   1   0        1
t2   1   1   0        0
t3   1   1   1        0
...
tn   0   0   1        1

Where d = document, t = term, and 0/1 denotes the presence or absence of the term.
How would I select the data to be able to produce such a matrix?
0
Ray PaseurCommented:
How would I select the data...?
The answer will be dependent on the data, which is why I want to see some test data so I understand what we're dealing with!

If I have some spare time this morning I may try to create a little test data set so I can show you how I would generate this matrix.  It's a fairly well understood design pattern in computer science.  Intersections generate the "1" values.
0
Ray PaseurCommented:
See if this makes sense to you.  If not, please post back with questions.  HTH, ~Ray
http://www.laprbass.com/RAY_temp_aristanoble.php
<?php // RAY_temp_aristanoble.php
error_reporting(E_ALL);
echo "<pre>";

// INTERSECTING DATA SETS:
// ADENINE CYTOSINE GUANINE THYMINE
// A       C T      G A     T


// CONNECTION AND SELECTION VARIABLES FOR THE DATABASE
$db_host = "localhost"; // PROBABLY THIS IS OK
$db_name = "??";        // GET THESE FROM YOUR HOSTING COMPANY
$db_user = "??";
$db_word = "??";

// LIVE DATABASE CREDENTIALS
require_once('RAY_live_data.php');


// OPEN A CONNECTION TO THE DATA BASE SERVER
// MAN PAGE: http://php.net/manual/en/function.mysql-connect.php
if (!$db_connection = mysql_connect("$db_host", "$db_user", "$db_word"))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB CONNECTION: ";
    echo "<br/> $errmsg <br/>";
}

// SELECT THE MYSQL DATA BASE
// MAN PAGE: http://php.net/manual/en/function.mysql-select-db.php
if (!$db_sel = mysql_select_db($db_name, $db_connection))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB SELECTION: ";
    echo "<br/> $errmsg <br/>";
    die('NO DATA BASE');
}
// IF WE GOT THIS FAR WE CAN DO QUERIES



// CREATING A TABLE OF MOLECULES
$sql
= "CREATE TEMPORARY TABLE molecules
( m_key      INT         NOT NULL AUTO_INCREMENT
, nucleobase VARCHAR(24) NOT NULL DEFAULT ''
,                        PRIMARY KEY (m_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// CREATING A TABLE OF LETTERS
$sql
= "CREATE TEMPORARY TABLE alphabet
(a_key      INT         NOT NULL AUTO_INCREMENT
, letter     VARCHAR(1)  NOT NULL DEFAULT ''
,                        PRIMARY KEY(a_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// CREATING A TABLE OF INTERSECTIONS
$sql
= "CREATE TEMPORARY TABLE alphabet_and_molecules
( i_key         INT         NOT NULL AUTO_INCREMENT
, alphabet_key  INT NOT NULL DEFAULT 0
, molecules_key INT NOT NULL DEFAULT 0
,               PRIMARY KEY(i_key)
,                       KEY(alphabet_key)
,                       KEY(molecules_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// LOAD THE TABLES
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('ADENINE')  " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('CYTOSINE') " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('GUANINE')  " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('THYMINE')  " );

mysql_query( "INSERT INTO alphabet (letter) VALUES ('A') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('C') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('G') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('T') " );

// QUERY FOR THE X-AXIS MOLECULE NAMES
$sql = "SELECT m_key, nucleobase FROM molecules";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    $xaxis[$row['m_key']] = $row['nucleobase'];
    echo PHP_EOL . $row['m_key'] . '=' . $row['nucleobase'];
}
echo PHP_EOL;

// QUERY FOR THE Y-AXIS ALPHABET
$sql = "SELECT a_key, letter FROM alphabet";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    $yaxis[$row['letter']] = $row['a_key'];
    echo PHP_EOL . $row['a_key'] . '=' . $row['letter'];
}
echo PHP_EOL;

// LOAD THE INTERSECTION TABLE
foreach ($xaxis as $molecules_key => $nucleobase)
{
    // CONSIDER EACH LETTER IN THE nucleobase
    $ltrs = str_split($nucleobase);
    foreach ($ltrs as $ltr)
    {
        // IF THIS LETTER IS PART OF OUR ALPHABET
        if (array_key_exists($ltr, $yaxis))
        {
            $alphabet_key = $yaxis[$ltr];
            $iql = "INSERT INTO alphabet_and_molecules ( alphabet_key, molecules_key ) VALUES ( $alphabet_key, $molecules_key )";
            $ies = mysql_query($iql) or die( "FAIL: $iql<br/>" . mysql_error() );
        }
    }
}

// SHOW THE INTERSECTION TABLE
$sql = "SELECT alphabet_key, molecules_key FROM alphabet_and_molecules";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['alphabet_key'] . ' INTERSECTS WITH MOLECULES ' . $row['molecules_key'];
}
echo PHP_EOL;

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'T'
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'T'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'C'
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'C'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'Z' (FALSE)
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'Z'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
$num = mysql_num_rows($res);
if (!$num) echo PHP_EOL . 'ALPHABET Z INTERSECTS WITH MOLECULES IN NO ROWS';
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}
echo PHP_EOL;

// SHOW THE TABULAR MATRIX
echo "<table>";
echo PHP_EOL;

echo "<tr><td></td>";
foreach ($xaxis as $nucleobase)
{
    echo "<td>$nucleobase &nbsp; </td>";
}
echo "</tr>";
echo PHP_EOL;

foreach ($yaxis as $letter => $a_key)
{
    echo "<tr>";
    echo "<td>$letter &nbsp; </td>";
    foreach ($xaxis as $m_key => $nucleobase)
    {
        $sql
		= "SELECT
		  alphabet_key
		, molecules_key
		FROM
		  alphabet_and_molecules
		WHERE
		  alphabet_key = $a_key
		AND
		  molecules_key = $m_key
		"
		;
		$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
		$num = mysql_num_rows($res);
		if ($num)
		{
		    echo "<td>$num</td>";
		}
		else
		{
		    echo "<td>0</td>";
		}
    }
    echo "<tr>";
    echo PHP_EOL;
}
echo "</table>";
echo PHP_EOL;

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
aristanobleAuthor Commented:
I understand the solution and it's very insightful, thank you. My biggest concern would be beginning at line 259. Iterating over n molecules and connecting to mysql inside that iteration could be costly time wise and resource wise. Would there be a better way with or without MySQL (i.e. using another method/specialized software)?
0
Ray PaseurCommented:
... connecting to mysql inside that iteration could be costly time wise ...
"Connecting" is a term of art in data base work, and what you're actually looking at there is a query, not a connection.  Unless you have tested it and timed it you could not know how long that query takes, but I would like to suggest that it is representative of the relational nature of relational data bases.  It gives us the relationships of the alphabet keys to the molecules keys.  If these columns are indexed, it is essentially an in-memory operation and will complete in lightning fast times.  Having had some experience with these kinds of relational tables, I would not waste time looking for a better way to do it.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Algorithms

From novice to tech pro — start learning today.