Link to home
Start Free TrialLog in
Avatar of sheep7
sheep7

asked on

binary masks or other solution for query

Hello, I was wondering someone could help me with this problem:

A person has the desire to find products matching his needs on ingredients: vitamines, proteines, sodium. He selects these options using a website(php) out of a list of 500 ingredients, no use of db so far, the user isn't in the db.

There is a db with 3 products
Product 1 consists of vitamines, proteines and sodium
Product 2 consists of vitamines and sodium
Product 3 consists of vitamines, proteines, sodium and fat

How can I query that only products 1 and 2 are in the result?
(since product 3 also contains fat as ingredient)

The database hasn't been created yet so I can design this for optium usage.
I expect a load of 5 user request a minute with 500,000 products containing each 1-30 ingredients. These ingredients are dynamic and change/add or get deleted.

---
I have found some solution in using subqueries
PRODUCT
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  |     | NULL    |       |
| name  | varchar(255) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
 
INGEDIENT
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  |     | NULL    |       |
| name  | varchar(255) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
 
PRODUCT_VITAMINE
+---------+---------+------+-----+---------+-------+
| Field   | Type    | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+-------+
| prod_id | int(11) | YES  |     | NULL    |       |
| ig_id  | int(11) | YES  |     | NULL    |       |
+---------+---------+------+-----+---------+-------+

select * from PRODUCT p where exists(select * FROM PRODUCT_VITAMINE pv,INGEDIENT i where pv.ig_id=i.id AND p.id = pv.prod_id and i.name in('vitamines','sodium','proteines')) AND not exists (select * FROM PRODUCT_VITAMINE pv,INGEDIENT i where pv.ig_id=i.id AND p.id = pv.prod_id and i.name  not in('vitamines','sodium','proteines'))
----

But I rather have 1 query and more simplier and faster. Maybe using binary masks. Anyone has an idea about how to arrange that for this problem or some other solution?

Regards and merry christmas!
ASKER CERTIFIED SOLUTION
Avatar of thenelson
thenelson

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of phoffric
phoffric

For Product 1,       (lngValue And 7) = (0111b And 0111b) = 0111b
But for Product 3, (lngValue And 7) = (1111b And 0111b) = 0111b

So, how did we eliminate the fat? The result of (lngValue And 7) is the same for Product 1 and 3. Maybe, use (IngValue And 8) = (IngValue And 1000b). This expression will be 0 if there is no fat, and not 0 if there is fat.
re: "matching his needs on ingredients: vitamines, proteines, sodium"
Since Product 2 is a good return, then it sounds like you want "at least" one of these ingredients (i.e., even one of these is good enough).
If this is the case, then how about using:
((lngValue And 7) !=0)  AND  (IngValue And 8) = 0
This does rather depend on the size of your database. If you have only 32 nutrients or less and a few hundred products, don't use a database at all. Save it as XML. I get the idea that this data will be fairly static: it isn't crucial that a new product that is added to the database will show up the next millisecond in user queries.

But if you really do need to use a database, then don't use a bitmapped field. It works well in memory, but in a SQL Server or Sybase database you will force a table scan on every query because, as far as I know, only Oracle implements bitwise operators at the database level. So your query will read the bitmap on every single row in the table, mask off the bits, and keep or not keep the row depending on the result.

The fundamental issue is that you have a many-to-many relationship between nutrients (vitamins, sodium, protein, etc) and products. The classical way to implement a many-to-many relationship is to have one table of nutrients:

ID                     Description
--                      ----------------
0                      Protein
1                      Sodium
2                      Vitamin B12
etc

and another table of products (which I won't bother to illustrate) and a link table with products and nutrients:

ProductID         NutrientID
--------------         --------------
12345              0
12345              1
12345              2
12367              1
12369              2

Then your queries will mostly be of 3 types:

1. All products with vitamin B12:  select distinct ProductID from linktable where NutrientID = 2 (this is the obvious way to do the query but you could treat is as a degenerate case of type 3)

2. All sodium-free products: select distinct ProductID from linktable where not exists (select * from linktable sodium where sodium.NutrientID = 1 and sodium.ProductID = linktable.ProductID)

3. All products with only protein and vitamin B12 (silly example, I know, but never mind that): select distinct ProductID from linktable
where exists (select * from linktable protein where protein.NutrientID = 0 and protein.ProductID = linktable.ProductID)
and exists (select * from linktable b12 where b12.NutrientID = 2 and b12.ProductID = linktable.ProductID)
and not exists (select * from linktable other where other.NutrientID not in (0,2) and other.ProductID = linktable.ProductID)

As you can see, if you allow your users free-form queries with an unknown number of options of nutrients to include or exclude, you will have to generate the SQL in your program on the fly: there is no way to just slot numbers into a where-clause. You could do that with a bitmapped field but, as I've explained, you really shouldn't unless your database is a toy with only a couple of hundred products. And if it's that small you shouldn't be bothering with a database at all.

But you can probably also see that generating the SQL won't be especially difficult.


Avatar of sheep7

ASKER

Thank you very much, products can have 1-30 ingredients, but there are in fact more than 30 ingredients, about 200. A query example would be helpful as well, but I'll figure it out, thanks for the start and idea's. Regards