DB design question: multi-level attribute boolean search implementation?

Posted on 2004-09-10
Last Modified: 2012-06-27
Okay, just for the information, the question is sort of DB independant, but the implementation will most probably be on MySQL.

So, there will be a database of documents, which each will have a number of properties and categories.

Document A
categories: this document has let us say 3 levels of categories, like:
5 - Food - Textiles - Clothing - Watches - Household
5.18 Clothing
5.18.33 Shoes, shoe parts

so it has :
category level 1 = 5,
category level 2 = 18,
category level 3 = 33

Document B is of different type. It has some of the properties the same as document A, but some are different:
categories: this document uses a different category system, which has let's say 5 category levels:

3 - general document
3.1 - memo
A - important
A.1.B.2.F - secrecy level: high

so it has:
categories of  level 1 = 3; A
category of level 2 = 1
category of level 5 = F

The second example is totally random, just to explain that category system can be different.
Also, of course, any document can be in more than one category per level (ie. doc B is in top level categories 3 _and_ A).

Typical queries might be:

give me all documents which are in category "C" in this classification system, or in any of subcategories of it OR ( documents that have EXPIRY_DATE more than year in future AND NOT have subcategory "5.18")

So, the questions are:

1. What would be a good general table design be for such document storage, that would allow cross-category search on arbitrary number of attributes and categories, considering that additional classification systems will be added later.

2. Any ideas about performance of such a design?

The query builder and parser in programming language (PHP) will be a separate pain, but that's something I've already done in past, so I could reimplement it and tweak a bit.

(Wanted to give 1000 points, but said max is 500, however, for a good and concise answer, I will give 1000 total somehow)
Question by:gnudiff
  • 7
  • 5

Assisted Solution

solution46 earned 200 total points
ID: 12026711

OK, had a bit of a think about it and I would suggest the following... The table structure is based on SQL Server datatypes and so on, but these can be converted easily enough.

CategoryID (int, PK)
Level1 (tinyint)
Level2 (tinyint)
Level3 (tinyint)
Level4 (tinyint)
Level5 (tinyint)

DocumentID (int, PK)
DocumentName nvarchar(50)
Description nvarchar(500) -- or whatever

RecipientID (int, PK)
RecipientName (nvarchar(50)) -- or whatever
-- any other details you want to store about your recipients

DocumentID (int, PK, FK lookup to [Document]![DocumentID])
CategoryID (int, PK, FK lookup to [Category]![CategoryID])

DocumentID (int, PK, FK lookup to [Document]![DocumentID])
RecipientID (int, PK, FL lookup to [Recipient]![RecipientID])

I would normally recommend a hierarchical design of the Category table, but in this case I think this design will give you much better performance. You will need to enter each of the categories in the table but I don't think it will get too big (I would be surprised if it reaches a thousand rows). Searching for a Category is easy. For instance, category 1.2.3 can be found using...
WHERE Level1 = 1
AND Level2 = 2
AND Level3 = 3

... and all children of 1.2.3 can be found by leaving off the Level4 and Level5 lines.

The [DocumentRecipient] table will allow you to send each table to multiple recipients and easily search for them...
FROM Document d
    INNER JOIN DocumentRecipient dr
        on d.DocumentID = dr.DocumentID
WHERE dr.RecipientID = ...

Performance of this design should be pretty good (assuming your indexes are good an up to date) as there is no particularly complex processing to handle. A hierarchical Category design would be much slower (recursive searching of 'all children category' type searches) and much more complicated to programme.

Hope this helps,


Expert Comment

ID: 12026734
Sorry, just realised the categories can be alphanumeric. Change the datatypes of Level1 .. Level5 in [Category] to be text.

LVL 18

Assisted Solution

SjoerdVerweij earned 300 total points
ID: 12027993
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!


Expert Comment

ID: 12028997

I was thinking about a hierarchical (nested) data set for this but he's going to be slowed down quite a bit by the recursive seraching. My eventual thoughts were that this structure is simpler, is easier to programme and will run a bit faster. I know is isn't strictly ideal (non-normalised, etc., etc.)and the performance decrease wont be massive but if you can avoid using cursors for this, surely it has to be better?

If you think otherwise, I'd be really interested to hear your views as I have had to work on this sort of thing quite a few times myself.


LVL 18

Accepted Solution

SjoerdVerweij earned 300 total points
ID: 12029111
s46: I think your solution would have very ugly queries for questions such as

"give me all documents which are in category "C" in this classification system, or in any of subcategories of it OR ( documents that have EXPIRY_DATE more than year in future AND NOT have subcategory "5.18")"

If that is the most common query, personally, I would go for the simple way:

  DocumentID Not Null PK

  CategoryID Not Null PK
  Name Not Null
  ParentCategoryID Null

  DocumentID Not Null
  CategoryID Not Null

This makes everything easy, except for determining the full name of the category (5.18.33). There's two options to handle this, at least on SQL Server:

- Create a table-valued user-defined function to look them up when needed (table-valued since there can be more than one category)

- Add a field to DocumentCategory that holds the full name, and populate it through triggers.


Expert Comment

ID: 12029285

FROM Document d
    INNER JOIN DocumentCategory dc
        ON d.DocumentID = dc.DocumentID
    INNER JOIN Category c
        ON dc.CategoryID = c.CategroyID
WHERE c.Level1 = 'C'                                 -- gets everything in cat C and all it's subcategoried
OR (DateDiff('d', c.ExpiryDate, getdate()) > 365              -- not sure about my use of DateDiff() but you get the idea...:)
    AND (Level1 <> '5' OR (Level1 = '5' AND Level2 <> '18')))

how would you go about doing this in a nested structure? This would probably be one of the few times I would use a cursor (hence my dislike of the approach) but if you have an efficient way of getting all the subcategories I would love to know what it is (will probably give you some points myself!)

Just thought, I'm assuming SQL Server here; if the better way is in another RDBMS then you don't get any points...!
LVL 18

Expert Comment

ID: 12029473
Actually, you could do it in 1 SQL statement in SQL Server.... 2005  :-)

Otherwise, yep, you're looping (I'd use a while loop instead of a cursor). Depending on what you do most (selecting or updating), triggers or a UDF would take care of it.

Expert Comment

ID: 12029902
*grin* still stuck with 2k here.

anyway, I don't think my query is too ugly... kinda cute, in a really sad sort of way :)))

LVL 18

Expert Comment

ID: 12029937
Cute  :-)

I was really thinking more along the lines of "give me all documents in this category, regardless of level" -- however, that's really an issue only if the categories are a graph, not a tree.

Expert Comment

ID: 12030602
was working on the idea that for, say, a level 3 category to exist, it must share the same level 3 category as its parent and siblings: I think in this case my approach is more efficient. In any circumstance where a sibling would noy inherit properties from its parent like this, I agree with your nested / hierarchical approach.

Anyway, this poor chap is probably confused as hell by now :)

LVL 18

Expert Comment

ID: 12031517
Your solution is almost always more efficient when you're talking about server resources. Mine would be more efficient on your own time the second your manager walks in and says "how about a 6th level"?  :-)

Author Comment

ID: 12043531
Hello, and thanks for the answers.

The answer by solution46 is good, however, as I have no idea what might the upcoming category levels be, I think I will feel safer with the recursive table option given by SjoerdVervweij - the point about the 6th level might be well on spot here. Also the DBMS magazine links were very helpful.

I still have to think about the model, but thanks to both of you, I can now evaluate my options better.

Expert Comment

ID: 12045818
Fair enough - if the structure is not fixed, then SV's answer is definately better.

Glad you're sorted,


Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
These days, all we hear about hacktivists took down so and so websites and retrieved thousands of user’s data. One of the techniques to get unauthorized access to database is by performing SQL injection. This article is quite lengthy which gives bas…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question