asked on

Multi-term queries across many-to-many tables

Hello Experts

I am using a database to store "Items" which possess varying numbers of "Assets". Each Item has some fixed columns and may also may have zero, one or more Assets associated with it. Each Asset has a fixed number of columns of different types.

So I have a table of Items, a table of Assets, a many-to-many bridge table between them, and then do a two-stage join on the Item ID and Asset ID. This gives a results set containing one row per Item per Asset, which all works fine.

I can easily find Items that possess a single named Asset, but how do I draft a multi-term SQL query for a combination of named Assets, e.g. "find Items that possess Assets A and B but not C"?

I am sure this is a well-known problem :)

PortletPaul

what query are you using now for "I can easily find Items that possess a single named Asset"?

it will be far easier for you if we provide a sql query that will work straight away.

I could invent one, with names that won't be familiar to you, if you prefer.

NickDuffill

ASKER

Hello PortletPaul

Thanks for your quick response. At the moment I do not have a skeleton query for this. An extract from the schema is something like:

tblItem: item_ID, item_Name, item_Description
tblAsset: ast_ID, ast_Type, ast_Name, ast_Value
tblItemAsset: item_ID, ast_ID

Create view v1 as select * from tblItem left outer join (tblItemAsset on (tblItem.ItemID = tblItemAsset.ItemID) inner join tblAsset on (tblAsset.ast_ID = tblItemAsset.Ast_ID));

... returns a table of items x individual assets possessed, where they exist. I can then query this with

select * from v1 where ast_Name = "Asset A";

The above lines may not be precise, as I have simplified them. It also relies on the assumption that each Item can only possess one copy of a specific Asset.

If you can give me a hint on what approach to use to find combinations of assets, that would be really helpful - it doesn't have to be working code, just a strategy. If it involves redesign of the tables then that would be OK, although a solution in the query would be ideal.

Thanks again
Nick

awking00

If you could provide some sample data for the tblItem, tblAsset and tblItemAsset tables that include astName values of "A", "B", and "C" it would be most helpful.

PortletPaul

>> find Items ... a multi-term SQL query for a combination of named Assets, e.g. "find Items that possess Assets A and B but not C"?

select distinct
item_ID, item_Name, item_Description
from v1
where ast_name in('Asset A','Asset B')
and not exists (
                select v1b.item_id
                from v1 as v1b
                where ast_name in('Asset C')
                and v1.item_id = v1b.item_id
               )
;

Open in new window

Is this what you are after? i.e. There may be no need to alter that view - just amend the where clause. Note I'm assuming you just want the related item(s) hence the use of distinct to remove unwanted repetition.
see: http://sqlfiddle.com/#!3/39cf0/3

NickDuffill

ASKER

@PortletPaul

Thank you for taking the time to enter this in SQLFiddle - much appreciated.

This is a good step forwards but I think the "ast_name in(...)" term returns Items that have either Asset A or Asset B, rather than having both. I deleted the last line (2,2) of the tblItemAsset initialisation and still get "e-e" returned from the query. It is the "anding" part that has me stumped.

(Regarding the return of distinct lines, the back-end code that renders the results expects and handles repeated lines anyway, but is ignorant of the query. It just knows how to merge assets across multiple lines).

@Awking00 - thank you for your reply. PortletPaul has been kind enough to put together some sample data.

Best regards
Nick

SOLUTION

awking00

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER CERTIFIED SOLUTION

PortletPaul

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

NickDuffill

ASKER

Thank you so much for these detailed answers. I will study them by the end of the week and respond again.

In general it seems that although the many-to-many junction table is elegant in terms of data structure, it makes some types of query more complex. Is this a reasonable interpretation, and might there be better ways to structure the data?

Best regards
Nick

PortletPaul

Nick, the "query complexity" isn't due to the junction table. It is due to this:

>>the "ast_name in(...)" term returns Items that have either Asset A or Asset B, rather than having both. ... It is the "anding" part that has me stumped.

a simple query will produce the "either" effect - which you do not want
a slightly harder query will produce the "anding" effect - which you do want

not only that, you want that "anding", plus you want "excluding", together as 1 query

SQL is "row oriented" so as it scrolls through the rows it evaluates the where conditions and decides if that row is "in" or "out" (of the result). To produce that "anding" effect those row-by-row decisions are NOT sufficient by themselves so there needs to me more.

The "more" can be done via those "exists ( ... )" or "not exists ( ...)" subqueries, or it can be achieved using "group by" and "having". It is these extra bits that allow the query to combine the "in/out" decisions of multiple rows together, and hence providing that "anding" you crave.

short answer: it's your filtering desire that is "complex"

btw: I don't view these queries as complex :) and I suspect awking00 would agree

awking00

>> btw: I don't view these queries as complex :) and I suspect awking00 would agree <<
Wholeheartedly!

NickDuffill

ASKER

Thank you to PortletPaul and awking00 for your help, which has allowed this to be solved with a slightly modified version of the first variant suggested by PortletPaul. This makes an "exists" test on the junction table between items and assets. It does not use the counting technique so is a bit more long-winded, but because the application accepts user-defined elements of the SQL, I wanted to keep the number of clauses as low as possible and use repeated patterns. An example of the eventual code for a term in the Where clause that tests for an asset is

exists (select item_pk from v2 where (v2.ast_type = x) and (v2.ast_name like 'abc') and (v2.item_guid = v1.item_guid))

where
item_pk, item_guid are the primary key and guid
ast_name and ast_type are asset fields
v2 is a view that is a join between the item/asset junction table and the asset table
v1 is a view that is a join between the item table and v2.
The first two terms find specific assets, and the third ensures that it refers to the row currently under examination. Expressions of this form can be daisy-chained and bracketed to test combinations of multiple assets, which is what I originally needed to achieve.

PortletPaul, you hit the nail on the head when you identified that the root cause is the row-oriented nature of the SQL filtering. A generic filter of the form "where A and B but not C" is straightforward if all the fields exist in the same row, but not if they are in the same columns across multiple rows. I accept the comments about this not being a particularly complex query, but as part of the project I need to be able to explain the querying logic in broad terms to non-programmers. The "exists" test across the junction table will not be as easy to explain as an expression that references fields that are already part of the row being tested.

This seems to be working pretty well and allows queries on one or more assets attached to an item.

Thanks again for your help.
Nick

NickDuffill

ASKER

See last comment.

PortletPaul

Thanks Nick, & good luck explaining any of it to non-programmers. Cheers, Paul.