?
Solved

Storing and Retrieving Matrices from a Database

Posted on 2012-04-13
10
Medium Priority
?
395 Views
Last Modified: 2016-02-10
How best do I store and retrieve matrices from a database? Taking MySQL, for example, do I simply use 3 tables: the document, the attributes and the document_attributes, or is that too simplistic and poor for performance. Do I not use mysql and something of the NoSQL sort like MongoDB?
0
Comment
Question by:aristanoble
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
10 Comments
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 37842911
Please post a sample of the data that constitutes your matrix (or matrices).  Your suggestion about three tables sounds intuitively right (assuming document_attributes is a junction table) but it might be easier to understand the question if we can see the data.  Thanks, ~Ray
0
 
LVL 1

Author Comment

by:aristanoble
ID: 37843659
Well, the articles table, for example, would contain a list of articles: article-1, article-2, ...article-n.
The attributes table would contain a list of attributes: attribute-1, attribute-2, ... attribute-n.
And the document_attributes junction table would contain a list of "junctions":
article-1_id:attribute-1_id, article-1_id:attribute-2_id, article-1_id:attribute-3_id
article-2_id:attribute-1_id
article-3_id:attribute-3_id,article-3_id:attribute-4_id;
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 37843703
That sounds correct to me.  What kind of data is this in the article and attribute tables?  Please post the CREATE TABLE statements and some example data for these tables, thanks.

Data base performance is usually most affected by the size of the server memory, the indexing strategy and the complexity of the queries.  A query that joins two tables via a junction table is not one that might be called complex.
0
Get MySQL database support online, now!

At Percona’s web store you can order your MySQL database support needs in minutes. No hassles, no fuss, just pick and click. Pay online with a credit card.

 
LVL 1

Author Comment

by:aristanoble
ID: 37845589
Thanks Ray for your help. I haven't created the tables because I was first trying to figure out the best way to store this data or even if I should be using MySQL. So I'm still in the planning stages. The reason why I ask about how to store the data is because when it comes to retrieving the data to produce an incidence matrix, for example. Take for example the comparison of n documents and n terms:

      d1 d2 d3 ... dn
t1   0   1   0        1
t2   1   1   0        0
t3   1   1   1        0
...
tn   0   0   1        1

Where d = document, t = term, and 0/1 denotes the presence or absence of the term.
How would I select the data to be able to produce such a matrix?
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 37846170
How would I select the data...?
The answer will be dependent on the data, which is why I want to see some test data so I understand what we're dealing with!

If I have some spare time this morning I may try to create a little test data set so I can show you how I would generate this matrix.  It's a fairly well understood design pattern in computer science.  Intersections generate the "1" values.
0
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 37846287
See if this makes sense to you.  If not, please post back with questions.  HTH, ~Ray
http://www.laprbass.com/RAY_temp_aristanoble.php
<?php // RAY_temp_aristanoble.php
error_reporting(E_ALL);
echo "<pre>";

// INTERSECTING DATA SETS:
// ADENINE CYTOSINE GUANINE THYMINE
// A       C T      G A     T


// CONNECTION AND SELECTION VARIABLES FOR THE DATABASE
$db_host = "localhost"; // PROBABLY THIS IS OK
$db_name = "??";        // GET THESE FROM YOUR HOSTING COMPANY
$db_user = "??";
$db_word = "??";

// LIVE DATABASE CREDENTIALS
require_once('RAY_live_data.php');


// OPEN A CONNECTION TO THE DATA BASE SERVER
// MAN PAGE: http://php.net/manual/en/function.mysql-connect.php
if (!$db_connection = mysql_connect("$db_host", "$db_user", "$db_word"))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB CONNECTION: ";
    echo "<br/> $errmsg <br/>";
}

// SELECT THE MYSQL DATA BASE
// MAN PAGE: http://php.net/manual/en/function.mysql-select-db.php
if (!$db_sel = mysql_select_db($db_name, $db_connection))
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>NO DB SELECTION: ";
    echo "<br/> $errmsg <br/>";
    die('NO DATA BASE');
}
// IF WE GOT THIS FAR WE CAN DO QUERIES



// CREATING A TABLE OF MOLECULES
$sql
= "CREATE TEMPORARY TABLE molecules
( m_key      INT         NOT NULL AUTO_INCREMENT
, nucleobase VARCHAR(24) NOT NULL DEFAULT ''
,                        PRIMARY KEY (m_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// CREATING A TABLE OF LETTERS
$sql
= "CREATE TEMPORARY TABLE alphabet
(a_key      INT         NOT NULL AUTO_INCREMENT
, letter     VARCHAR(1)  NOT NULL DEFAULT ''
,                        PRIMARY KEY(a_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// CREATING A TABLE OF INTERSECTIONS
$sql
= "CREATE TEMPORARY TABLE alphabet_and_molecules
( i_key         INT         NOT NULL AUTO_INCREMENT
, alphabet_key  INT NOT NULL DEFAULT 0
, molecules_key INT NOT NULL DEFAULT 0
,               PRIMARY KEY(i_key)
,                       KEY(alphabet_key)
,                       KEY(molecules_key)
)"
;
$res = mysql_query($sql);

// IF mysql_query() RETURNS FALSE, GET THE ERROR REASONS
if (!$res)
{
    $errmsg = mysql_errno() . ' ' . mysql_error();
    echo "<br/>QUERY FAIL: ";
    echo "<br/>$sql <br/>";
    die($errmsg);
}

// LOAD THE TABLES
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('ADENINE')  " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('CYTOSINE') " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('GUANINE')  " );
mysql_query( "INSERT INTO molecules (nucleobase) VALUES ('THYMINE')  " );

mysql_query( "INSERT INTO alphabet (letter) VALUES ('A') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('C') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('G') " );
mysql_query( "INSERT INTO alphabet (letter) VALUES ('T') " );

// QUERY FOR THE X-AXIS MOLECULE NAMES
$sql = "SELECT m_key, nucleobase FROM molecules";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    $xaxis[$row['m_key']] = $row['nucleobase'];
    echo PHP_EOL . $row['m_key'] . '=' . $row['nucleobase'];
}
echo PHP_EOL;

// QUERY FOR THE Y-AXIS ALPHABET
$sql = "SELECT a_key, letter FROM alphabet";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    $yaxis[$row['letter']] = $row['a_key'];
    echo PHP_EOL . $row['a_key'] . '=' . $row['letter'];
}
echo PHP_EOL;

// LOAD THE INTERSECTION TABLE
foreach ($xaxis as $molecules_key => $nucleobase)
{
    // CONSIDER EACH LETTER IN THE nucleobase
    $ltrs = str_split($nucleobase);
    foreach ($ltrs as $ltr)
    {
        // IF THIS LETTER IS PART OF OUR ALPHABET
        if (array_key_exists($ltr, $yaxis))
        {
            $alphabet_key = $yaxis[$ltr];
            $iql = "INSERT INTO alphabet_and_molecules ( alphabet_key, molecules_key ) VALUES ( $alphabet_key, $molecules_key )";
            $ies = mysql_query($iql) or die( "FAIL: $iql<br/>" . mysql_error() );
        }
    }
}

// SHOW THE INTERSECTION TABLE
$sql = "SELECT alphabet_key, molecules_key FROM alphabet_and_molecules";
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['alphabet_key'] . ' INTERSECTS WITH MOLECULES ' . $row['molecules_key'];
}
echo PHP_EOL;

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'T'
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'T'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'C'
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'C'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}

// SHOW HOW TO USE THE INTERSECTION TABLE TO FIND molecules WITH A GIVEN letter 'Z' (FALSE)
$sql
= "SELECT
  a_key
, letter
, m_key
, nucleobase
, alphabet_key
, molecules_key
FROM
  alphabet
, molecules
, alphabet_and_molecules
WHERE
  letter = 'Z'
AND
  a_key = alphabet_key
AND
  m_key = molecules_key
"
;
$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
$num = mysql_num_rows($res);
if (!$num) echo PHP_EOL . 'ALPHABET Z INTERSECTS WITH MOLECULES IN NO ROWS';
while ($row = mysql_fetch_assoc($res))
{
    echo PHP_EOL . 'ALPHABET ' . $row['letter'] . ' INTERSECTS WITH MOLECULES ' . $row['nucleobase'];
}
echo PHP_EOL;

// SHOW THE TABULAR MATRIX
echo "<table>";
echo PHP_EOL;

echo "<tr><td></td>";
foreach ($xaxis as $nucleobase)
{
    echo "<td>$nucleobase &nbsp; </td>";
}
echo "</tr>";
echo PHP_EOL;

foreach ($yaxis as $letter => $a_key)
{
    echo "<tr>";
    echo "<td>$letter &nbsp; </td>";
    foreach ($xaxis as $m_key => $nucleobase)
    {
        $sql
		= "SELECT
		  alphabet_key
		, molecules_key
		FROM
		  alphabet_and_molecules
		WHERE
		  alphabet_key = $a_key
		AND
		  molecules_key = $m_key
		"
		;
		$res = mysql_query($sql) or die( "FAIL: $sql<br/>" . mysql_error() );
		$num = mysql_num_rows($res);
		if ($num)
		{
		    echo "<td>$num</td>";
		}
		else
		{
		    echo "<td>0</td>";
		}
    }
    echo "<tr>";
    echo PHP_EOL;
}
echo "</table>";
echo PHP_EOL;

Open in new window

0
 
LVL 1

Author Comment

by:aristanoble
ID: 37853949
I understand the solution and it's very insightful, thank you. My biggest concern would be beginning at line 259. Iterating over n molecules and connecting to mysql inside that iteration could be costly time wise and resource wise. Would there be a better way with or without MySQL (i.e. using another method/specialized software)?
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 37855605
... connecting to mysql inside that iteration could be costly time wise ...
"Connecting" is a term of art in data base work, and what you're actually looking at there is a query, not a connection.  Unless you have tested it and timed it you could not know how long that query takes, but I would like to suggest that it is representative of the relational nature of relational data bases.  It gives us the relationships of the alphabet keys to the molecules keys.  If these columns are indexed, it is essentially an in-memory operation and will complete in lightning fast times.  Having had some experience with these kinds of relational tables, I would not waste time looking for a better way to do it.
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
In this article, I’ll talk about multi-threaded slave statistics printed in MySQL error log file.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Suggested Courses

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question