Solved

Sorting many-to-many database records

Posted on 2007-03-27
17
325 Views
Last Modified: 2013-12-13
I have a database with one table of items with a number of many-to-many relationships associated with it.  For example:

Table "Items": itemId, itemName, itemDate
Table "People": personId, personName
Table "ItemPeople": itemId, personId
Table "Categories": categoryId, categoryName
Table "ItemCategories": itemId, categoryId

...et cetera.

In the end I'm going to be working with PHP 5 "Item" objects.  But I need to be able to sort those objects based on the object properties AND the people, categories, etc associated with them.  So in pseudo-SQL I want to do something like this:

SELECT *
FROM `Items`, `People`, `Categories`
ORDER BY `categoryName` ASC, `personName` DESC, `itemDate` ASC

(obviously the above code doesn't work--it's just to illustrate a point).

I've been playing with various types of subqueries and JOIN statements, but nothing seems to be working.  Anything that looks remotely promising takes several seconds per 100 records, which is in no way acceptable.  I think the problem is that I'm just not wrapping my brain around the best way to do this.

The options, as I see them are:

1. Use a fancy MySQL statement to get the correct sort
2. Grab the data via a few different statements and sort them via PHP using usort()
3. Build the object and use some kind of ItemSorter object to handle the sorting

Numbers 1 and 3 make the most sense to me, but both have their problems.  Doing everything in SQL seems like it would be the most efficient if I could get the query right; MySQL was built to gather and sort data after all.  Writing a sorting class also makes sense to me, but then it seems like I would have to initialize all the objects before doing the sort, which could cause problems when there are hundreds or thousands of Items to be sorted.

I KNOW other people have had to tackle this problem.  Anyone have a solution or able to point me towards an article that discusses this kind of issue?

Thanks,

Chris
0
Comment
Question by:inxil
  • 9
  • 5
  • 2
  • +1
17 Comments
 
LVL 20

Expert Comment

by:steelseth12
Comment Utility
If you give as the tables and the relationships we may be able to provide you with the appropriate SQL query.
0
 
LVL 7

Author Comment

by:inxil
Comment Utility
How would you do the above five tables?  I want to get a all Items ordered first by their associated category (ascending), then by their associated people (descending), and finally by their dates (ascending).
0
 
LVL 12

Expert Comment

by:AdrianSRU
Comment Utility
This query should work for you:

SELECT *
FROM (((People INNER JOIN ItemPeople ON People.personId=ItemPeople.personId)
INNER JOIN Items ON ItemPeople.itemId=Items.itemId)
INNER JOIN ItemCategories ON Items.itemId=ItemCategories.itemId)
INNER JOIN Categories ON ItemCategories.categoryId=Categories.categoryId
ORDER BY Categories.categoryName, People.personName DESC, Items.itemDate;

The key is telling MySQL which fields match between the tables so that they can be joined together.

Hope this helps!


--Adrian
0
 
LVL 15

Accepted Solution

by:
JakobA earned 300 total points
Comment Utility
A good way to speed up joins is to create indexes on the fields to be joined. When a field (in both tables) are registered as an index sql can just start from one end and match the two tables together (like a zipper). When they are not it has to search through the second table again and again for each match to be found.

eg:
CREATE INDEX key_categoryId ON ItemCategories( categoryId )

Read more on syntax and possibilities here. http://dev.mysql.com/doc/refman/5.0/en/create-index.html

There is no need to create an index for Categories.categoryId in category as it is (I expect) the primary key there and therefor already perfectly indexed. It is only the foreign keys used that may need an index.

Normally MySQL is pretty good at finding the appropiate indexes for the task at hand, but you can als specificly tell it which index to use on a join. see the index hint here http://dev.mysql.com/doc/refman/5.0/en/join.html

Caution: apart from speeding things up, these extra indexes als take up a godd bit of space in your databas. and they are likely to make updating those tables a bit slower (because ALL indexes on the table need to be updated when a row is inserted or deleted. Dont make more indexes than you need.

regards JakobA
0
 
LVL 7

Author Comment

by:inxil
Comment Utility
I realized after I posted this that the tables aren't indexed at all, and the linking tables have a unique primary key but are not indexed on their foreign keys.  Needless to say, I didn't design the database.  I'll get back to y'all when I try a few queries on the newly indexed tables.

Cheers,

CM
0
 
LVL 7

Author Comment

by:inxil
Comment Utility
OK, here's a more specific query:

SELECT DISTINCT ri.`itemId`
FROM `items` ri

LEFT JOIN `metadata_links` ml_0
ON ml_0.`itemId` = ri.`itemId`
LEFT JOIN `metadata` m_0
ON m_0.`dataId` = ml_0.`dataId`

LEFT JOIN `metadata_links` ml_1
ON ml_1.`itemId` = ri.`itemId`
LEFT JOIN `metadata` m_1
ON m_1.`dataId` = ml_1.`dataId`

LEFT JOIN `metadata_links` ml_2
ON ml_2.`itemId` = ri.`itemId`
LEFT JOIN `metadata` m_2
ON m_2.`dataId` = ml_2.`dataId`

LEFT JOIN `metadata_links` ml_3
ON ml_3.`itemId` = ri.`itemId`
LEFT JOIN `metadata` m_3
ON m_3.`dataId` = ml_3.`dataId`

WHERE m_0.`typeId` = 4
AND m_1.`typeId` = 7
AND m_2.`typeId` = 3
AND m_3.`typeId` = 9

AND (
     `m_1`.`dataId` = 709
     OR `m_1`.`dataId` = 720
     OR `m_2`.`dataId` = 1
     OR `m_3`.`dataId` = 5150
)

ORDER BY m_0.`value` ASC, m_1.`value` DESC;


And here are my table structures:


CREATE TABLE `metadata` (
  `dataId` int(10) unsigned NOT NULL auto_increment,
  `typeId` int(10) unsigned NOT NULL,
  `value` varchar(255) default NULL,
  PRIMARY KEY  (`dataId`),
  KEY `dataType` (`typeId`)
)

CREATE TABLE `metadata_links` (
  `dataId` int(10) unsigned NOT NULL,
  `itemId` int(10) unsigned NOT NULL,
  KEY `dataId` (`dataId`,`itemId`)
)

CREATE TABLE `items` (
  `itemId` int(10) unsigned NOT NULL auto_increment,
  `date` char(4) default NULL,
  `description` text,
  PRIMARY KEY  (`itemId`),
  KEY `collectionId` (`collectionId`,`dateRangeId`,`permissionId`,`statusId`,`sectionTitleId`,`userId`),
  KEY `subSectionTitleId` (`subSectionTitleId`),
  FULLTEXT KEY `description` (`description`)
)


The query is being generated programmatically--sorry about the poor alias names.  Basically, it's joining metadata of different types from the metadata table (based on the user's input) and using that data to restrict the SELECT results and sort it.  For example, the above query was generated with the following:

$sb = new SearchBuilder($db);
$sb->addQuery(7, 709);
$sb->addQuery(7, 720);
$sb->addQuery(3, 1);
$sb->addQuery(9, 5150);
$sb->addSort(4, 'ASC');
$sb->addSort(7, 'DESC');
$query = $sb->getSQL();

addQuery(a, b) tells the system to search where metadata of typeId a matches dataId b.
addSort(a, b) tells the system to sort the results based on metadata of typeId a either ASC or DESC

I hope this is enough information to explain what I'm trying to do.  The problem is that the above query takes WAY too long to execute.  I'm pretty sure I have all the index set up right.  Does anything seem amiss?

Thanks for your help,

CM
0
 
LVL 12

Assisted Solution

by:AdrianSRU
AdrianSRU earned 200 total points
Comment Utility
You need an additional index on metadata_links for the itemId field.  With a compound index, like the one you already have on metadata_links, in order to use all of the fields in the index the columns have to be used in the JOIN/WHERE/... clause in the order that they are specified in the index.  When items and metadata_links are joined, the only field in the join is itemId and since that is the second field in the compound index, the index is not used.

ALTER TABLE metadata_links ADD INDEX (itemId);

If your query still does not execute as fast as you want then post the results of EXPLAIN on your query.


--Adrian
0
 
LVL 15

Assisted Solution

by:JakobA
JakobA earned 300 total points
Comment Utility
You might also consider the size of your join. In the case of repeated foreign keys the size of the resultant table rise exponentially

eg table items contain a itemId with value 2
table metadata_links (ml_0) contain that value twice in its itemId foreign key
so you get 2 rows there
table metadata_links (ml_1) contain that value twice in its itemId foreign key
so you are up to 4 rows now
table metadata_links (ml_2) contain that value twice in its itemId foreign key
making 8 rows
table metadata_links (ml_3) contain that value twice in its itemId foreign key
giving us a total of 16 rows just for the itemId join with itemid repeated twice as a foreign key.
Imagine if it was repeated 10 times (10^4 = 10000 rows) times the number of different itemId's

Your join table can grow quite big quite fast, fast enough to make pageswapping and diskaccesses a real resource drain



0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 7

Author Comment

by:inxil
Comment Utility
Jakob,

What's a more efficient way to get the same result, then?

CM
0
 
LVL 15

Expert Comment

by:JakobA
Comment Utility
Well one change you could make is to change from 'left join' to a plain 'inner join'

Your condition
WHERE m_0.`typeId` = 4
     AND m_1.`typeId` = 7
     AND m_2.`typeId` = 3
     AND m_3.`typeId` = 9

Imply that a maching row must be exist in all tables of the resultset, so we can scrap rows where that is not the case

Then we can use a temporary table to limit the result even more. fx:

CREATE TABLE temp_metadata
LIKE metadata;
INSERT INTO temp_metadata
SELECT * from metadata
WHERE metadata.typeId IN ( 4, 7, 3, 9 )
     AND metadata.dataId IN (709, 720, 1, 5150 )

shat should crunch the metadata down to a neat little one with containing only those rows that have a chance of ultimately being found

And then use that temporary table in your query (now LOTS or rows should get pruned from the compound table as the join progresses)

SELECT DISTINCT ri.`itemId`
FROM `items` ri

LEFT JOIN `metadata_links` ml_0
ON ml_0.`itemId` = ri.`itemId`
LEFT JOIN `temp_metadata` m_0       -- use temp table here
ON m_0.`dataId` = ml_0.`dataId`

LEFT JOIN `metadata_links` ml_1
ON ml_1.`itemId` = ri.`itemId`
LEFT JOIN `temp_metadata` m_1      -- and here
ON m_1.`dataId` = ml_1.`dataId`

LEFT JOIN `metadata_links` ml_2
ON ml_2.`itemId` = ri.`itemId`
LEFT JOIN `temp_metadata` m_2      -- and here
ON m_2.`dataId` = ml_2.`dataId`

LEFT JOIN `metadata_links` ml_3
ON ml_3.`itemId` = ri.`itemId`
LEFT JOIN `temp_metadata` m_3      -- and here
ON m_3.`dataId` = ml_3.`dataId`

WHERE m_0.`typeId` = 4
AND m_1.`typeId` = 7
AND m_2.`typeId` = 3
AND m_3.`typeId` = 9

AND (
     `m_1`.`dataId` = 709
     OR `m_1`.`dataId` = 720
     OR `m_2`.`dataId` = 1
     OR `m_3`.`dataId` = 5150
)
ORDER BY m_0.`value` ASC, m_1.`value` DESC;

by the way, didnt the indexing help ?

regards JakobA
0
 
LVL 15

Expert Comment

by:JakobA
Comment Utility
Oops forgot to change all the LEFT JOIN's to just JOIN's. you do that.
0
 
LVL 7

Author Comment

by:inxil
Comment Utility
Actually, that was a mistake on my part.  The LEFT JOINs were right--those should have been ORs instead of ANDs.  But I've tried to tackle this in a different way so that I'm not joining quite so many results together.  I haven't tested it on a large query yet, but here's my example done using subqueries.  Would this be a better way to go about it?

Note: I'm using "~~~" for sorting 'cause I can't create an IF NULL column to sort on in this particular case (without duplicating the subquery, at least).


SELECT      ri.`itemId`
            , IFNULL((
                  SELECT m_0.`value`
                  FROM `metadata_links` AS ml_0
                  INNER JOIN `metadata` AS m_0
                        ON ml_0.`dataId` = m_0.`dataId`
                  WHERE ml_0.`itemId` = ri.`itemId`
                  AND m_0.`typeId` = 4
                  ORDER BY m_0.`value` ASC
                  LIMIT 1
            ), '~~~') AS 'm_0_value'
            , IFNULL((
                  SELECT m_1.`value`
                  FROM `metadata_links` AS ml_1
                  INNER JOIN `metadata` AS m_1
                        ON ml_1.`dataId` = m_1.`dataId`
                  WHERE ml_1.`itemId` = ri.`itemId`
                  AND m_1.`typeId` = 7
                  ORDER BY m_1.`value` DESC
                  LIMIT 1
            ), '~~~') AS 'm_1_value'

FROM `items` AS ri

WHERE ri.`itemId` = ANY (
      SELECT ml_2.`itemId`
      FROM `metadata_links` AS ml_2
      WHERE ml_2.`dataId` IN (709,720,1,5150)
)

ORDER BY `m_0_value` ASC, `m_1_value` DESC


How's that look?  I'll post another comment when I test it on a larger query (the above returns 46 results in around .2 seconds, so we'll see what happens when we're dealing with 500-1000 results).

Thanks,

CM
0
 
LVL 7

Author Comment

by:inxil
Comment Utility
Oops!  Big-ish error in the above.  Will post fixed query in a minute.
0
 
LVL 7

Author Comment

by:inxil
Comment Utility
Oh, the error isn't in the above statement, but rather a modified version of it.  Here's what I'm working with right now.  It seems to run pretty fast, but I'm not 100% sure it's actually doing what I want it to do... any thoughts?

SELECT ri.`itemId`

       , IFNULL((
       SELECT m_3.`value`
       FROM `metadata_links` AS ml_3
       INNER JOIN `metadata` AS m_3
            ON ml_3.`dataId` = m_3.`dataId`
       WHERE ml_3.`itemId` = ri.`itemId`
       AND m_3.`typeId` = 4
       ORDER BY m_3.`value` ASC
       LIMIT 1
       ), '~~~') AS 'm_3_value'

       , IFNULL((
       SELECT m_1.`value`
       FROM `metadata_links` AS ml_1
       INNER JOIN `metadata` AS m_1
            ON ml_1.`dataId` = m_1.`dataId`
       WHERE ml_1.`itemId` = ri.`itemId`
       AND m_1.`typeId` = 7
       ORDER BY m_1.`value` DESC
       LIMIT 1
       ), '~~~') AS 'm_1_value'

FROM `registry_items` ri

WHERE ri.`itemId` = ANY (
       SELECT ml_q.`itemId`
       FROM `metadata_links` AS ml_q
       JOIN `metadata` AS m_q
            ON ml_q.`dataId` = m_q.`dataId`
       WHERE m_q.`typeId` IN (7,3,9,4)
       AND ml_q.`dataId` IN (709,720,1,5150)
)

ORDER BY `m_3_value` ASC, `m_1_value` DESC


You can see that in my WHERE ri.`itemId` = ANY subquery I've added a join so that I can restrict the result to only itemIds where the typeId matches a type that I'm searching on.  It seems like this will speed things up 'cause I'm matching ri.`itemId` against less results, but also means I'm joining more data... not sure which is better.

Thanks for you're help so far!  I'm pretty sure were REALLY close to an answer.

Cheers,

CM
0
 
LVL 15

Expert Comment

by:JakobA
Comment Utility
wont the IFNULL give you NULL if the subselect return a value and '~~~' if it does not ?

Anyway the complexity of your query is getting a bit beyond what I can digest over a cup of tea, I dont understand what your are doing, but yes you do seem to be on the right track ;-))
0
 
LVL 7

Author Comment

by:inxil
Comment Utility
No, IFNULL(A, B) returns A unless A is NULL, in which case it returns B.
0
 
LVL 7

Author Comment

by:inxil
Comment Utility
Thanks all for the help--I'm not 100% sure it's all working, but I'm definitely in the right direction thanks to you.

Cheers,

CM
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Creating and Managing Databases with phpMyAdmin in cPanel.
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now