Link to home
Start Free TrialLog in
Avatar of intoxicated_curveball
intoxicated_curveball

asked on

SQL Query logic question

I want to query 8 string/columns (NOT ROWS), the only stipulation is that there can't be more than 4 of the columns matching in the returned row. For example:

col1, col2, col3, col4, col5, col6, col7, col8
pizza, apple, tomato, pizza, pizza, apple, pizza, pizza - should NOT be returned
pizza, apple, tomato, pizza, pizza, apple, pizza, banana  - should be returned
Avatar of JesterToo
JesterToo
Flag of United States of America image

You need to clarify what datastore is to be queried... this question is categorized as both "access" and "sql server".  A working solution to this question will not be the same for both platforms.
Avatar of intoxicated_curveball
intoxicated_curveball

ASKER

It's MS Access 2007 - 2010 database using VBA to query
SOLUTION
Avatar of HainKurt
HainKurt
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of Jim Horn
Jim Horn
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Jim Horn

your solution just gives ID and the opposite...

if we modify your solution a bit, here is another solution

with YourTable as 
(
select  1 id, 'pizza' col1, 'apple' col2, 'tomato' col3, 'pizza' col4, 'pizza' col5, 'apple' col6, 'pizza' col7, 'pizza' col8 -- should NOT be returned
union select 2, 'pizza', 'apple', 'tomato', 'pizza', 'pizza', 'apple', 'pizza', 'banana'  -- should be returned
union select 3, 'pizza', 'apple', 'tomato', 'pizza', 'pizza', 'orange', 'pizza', 'banana'  -- should be returned
union select 4, 'pizza', 'orange', 'tomato', 'pizza', 'pizza', 'apple', 'pizza', 'banana'  -- should be returned
), cte AS (
	SELECT id, col1 AS col FROM YourTable UNION
	SELECT id, col2 FROM YourTable UNION
	SELECT id, col3 FROM YourTable UNION
	SELECT id, col4 FROM YourTable UNION
	SELECT id, col5 FROM YourTable UNION
	SELECT id, col6 FROM YourTable UNION
	SELECT id, col7 FROM YourTable UNION
	SELECT id, col8 FROM YourTable)
select * from YourTable where id in (
SELECT id--, COUNT(col) AS distinct_column_count
FROM cte
GROUP BY id 
HAVING COUNT(col) >= 4
)

Open in new window

@Huseyin - My original post had HAVING COUNT(col) > 4, which I changed a couple of minutes later after rereading the requirements to < 4.

... and I just read the first asker comment that he is looking for an Access solution, and I posted SQL Server T-SQL, so I'm not sure if Access queries have CTE abilities and if that will work.

@intoxicated_curveball - Next time please spell out in what language you are looking for a solution, especially when adding multiple zones in a question, as unless explicitly stated most experts will read a question in their zone and assume that zone's language is what is needed.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Huseyin,

I implemented your second solution, it worked but made my query 33X slower than otherwise (I'm running a massive optimal query). Sorry I haven't had the chance to test others yet.
what is final query
can you upload a tiny db here with some demo  data and that query
The modified part of the query is the same as what you provided, the only difference with MS Access is you can't use Alias in the WHERE clause (you have to write out the whole construct everytime), etc.

I would provide sample db but it's pretty convoluted (I tried to trim it down). I can post if you still want to see.
ID: 42112625 should not be slow...

just create a new query, MyQuery

select t.*, '-'+col1+'-'+col2+'-'+col3+'-'+col4+'-'+col5+'-'+col6+'-'+col7+'-'+col8+'-' cmb from YourTable

Open in new window


and use

select * from myQuery
where (len(cmb)-len(replace(cmb, '-'+col1,''))) / (len(col1)+1) <= 4
  and (len(cmb)-len(replace(cmb, '-'+col2,''))) / (len(col2)+1) <= 4
  and (len(cmb)-len(replace(cmb, '-'+col3,''))) / (len(col3)+1) <= 4
  and (len(cmb)-len(replace(cmb, '-'+col4,''))) / (len(col4)+1) <= 4
  and (len(cmb)-len(replace(cmb, '-'+col5,''))) / (len(col5)+1) <= 4
  and (len(cmb)-len(replace(cmb, '-'+col6,''))) / (len(col6)+1) <= 4
  and (len(cmb)-len(replace(cmb, '-'+col7,''))) / (len(col7)+1) <= 4
  and (len(cmb)-len(replace(cmb, '-'+col8,''))) / (len(col8)+1) <= 4

Open in new window

I made sure to create the query this time, it is 12X slower now (I compare with/without). But it's OK - it's still a solution.
Thanks!