Link to home
Create AccountLog in
Avatar of Raahaugen
RaahaugenFlag for Denmark

asked on

SQL Query right join and where condition question

Hi Experts!

I would like to populate a list that always contains all active employees with initial and name from a given department, which is given by the 1st variable specified underneath.

The list should have a column containing the date (course day) the employee attended the course. The choosen course is given by the 2nd variable specified underneath.

Tables
tblEmployees (EM) contains employees
tblCourses contains the possible courses, an employee can attend
linkEmployeeCourse (EC) contains each link between the active employees and the course

Variables:
[Forms]![frmCourseOverview]![lstCourses] : contains the course ID to show status for
[Forms]![frmCourseOverview]![cboDepartment] : contains the department ID, so only employees from a specific department are in the output

Here's what I've come up with so far, but beside employees who took the course, this query only outputs any employees who did not attend a course at all.

SELECT 
            Right(EM.strInitial,3) AS strInitialShort
          , EM.strFirstname & " " & EM.strLastname AS strFullName
          , EC.dtmCourseday
          , EC.intCourseFK

FROM linkEmployeeCourse AS EC 

RIGHT JOIN tblEmployees AS EM ON EC.intEmployeeFK = EM.intEmployeePK

WHERE 
            EC.intCourseFK = [Forms]![frmCourseOverview]![lstCourses] 
            AND EM.dtmDischarge Is Null OR EM.dtmDischarge > Date() 
            AND EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]
     
ORDER BY EM.strInitial

Open in new window


I was wondering if something like this would work, but this query (underneath) are not working:

SELECT * from
(SELECT 
            Right(EM.strInitial,3) AS strInitialShort
          , EM.strFirstname & " " & EM.strLastname AS strFullName
          , EC.dtmCourseday
          , EC.intCourseFK

FROM linkEmployeeCourse AS EC WHERE intCourseFK = [Forms]![frmCourseOverview]![lstCourses]
)

RIGHT JOIN tblEmployees AS EM ON EC.intEmployeeFK = EM.intEmployeePK

WHERE 
            AND EM.dtmDischarge Is Null OR EM.dtmDischarge > Date() 
            AND EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]
     
ORDER BY EM.strInitial

Open in new window


Best regards

/Raahaugen
Avatar of mbizup
mbizup
Flag of Kazakhstan image

Have you tried using an INNER JOIN?  That will select only those employees who are in both tables.  A rRIGHT join will select all employees - even those that are not listed in the courses tables.

SELECT 
            Right(EM.strInitial,3) AS strInitialShort
          , EM.strFirstname & " " & EM.strLastname AS strFullName
          , EC.dtmCourseday
          , EC.intCourseFK

FROM linkEmployeeCourse AS EC 

INNER JOIN tblEmployees AS EM ON EC.intEmployeeFK = EM.intEmployeePK

WHERE 
            EC.intCourseFK = [Forms]![frmCourseOverview]![lstCourses] 
            AND EM.dtmDischarge Is Null OR EM.dtmDischarge > Date() 
            AND EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]
     
ORDER BY EM.strInitial

Open in new window

Avatar of Raahaugen

ASKER

I actually need all employees - not just the ones who did not attend the course. But I need all the employees to occur just once, and based on the chosen course (one of the variables), I want the date column to populate using the dtmCourseday value for those employees who took that course.
I tried this, but Access crashed when I try to execute... Do I have the right direction - why does it fail? :-(

SELECT EC.*
         , EM.dtmDischarge
         , EM.intDepartmentFK
FROM (
    SELECT dtmCourseday, intEmployeeFK 
    FROM linkEmployeeCourse 
    WHERE intCourseFK = [Forms]![frmCourseOverview]![lstCourses]) AS EC

RIGHT JOIN tblEmployees AS EM ON EC.intEmployeeFK = EM.intEmployeePK

WHERE 
     EM.dtmDischarge Is Null OR EM.dtmDischarge > Date() 
     AND EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]

ORDER BY EM.strInitial;

Open in new window

First of all, in your conditions you have AND and OR mixed up. This will twist your logic. Place the ORs between parenthesis.
Second, you have to move the variable from the WHERE to the ON. What is the main difference in this? Imagine these tables:
Table1
Col1  Col2
1        2
1        3
2        2
Table2
Col1
1
2
3

When you try to join them, you get this output (joining on col1):
Col1 Col1 Col2
1   1   2
1   1   3
2   2   2
3   NULL NULL

This is the table that gets built on which you'll be applying your where condition. If you say WHERE Col2=2, then you'll be removing the nulls. You can get around this by checking for both null and 2, which you can do with an OR enclosed in parenthesis, or with the more direct method of: WHERE Nz(Col2,2)=2 (if it's null, it returns 2, or the variable)

If you use the variable on the ON clause, though, this is the table that gets built:
Col1 Col1 Col2
1    1   2
2    2   2
3    NULL NULL

The ones that don't have Col2=2 are eliminated, and nulls are still generated. Since you're not restricting the field any more in the WHERE clause, this is your result, and the one you want.
Sorry to say

I am completely lost!

Could you please correct the syntax for me - or create a totally new one, that will work?
Ok, using your original query, I think this might work:
SELECT 
            Right(EM.strInitial,3) AS strInitialShort
          , EM.strFirstname & " " & EM.strLastname AS strFullName
          , EC.dtmCourseday
          , EC.intCourseFK

FROM linkEmployeeCourse AS EC 

RIGHT JOIN tblEmployees AS EM ON EC.intEmployeeFK = EM.intEmployeePK AND EC.intCourseFK = [Forms]![frmCourseOverview]![lstCourses] 

WHERE (EM.dtmDischarge Is Null OR EM.dtmDischarge > Date())
            AND EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]
     
ORDER BY EM.strInitial

Open in new window

When you move the EC variable to the on clause, you will allow for nulls, not to mention that your table joining will be more effective, though you won't notice it unless you have thousands or millions of records. Also, surrounding the OR with parenthesis ensures that your logic is preserved. Without parenthesis, if dmtDischarge was null, any intDepartmentFK would be returned, because the OR just needs one true.
Could it be that access do not support this option, moving the EC variable to the ON clause? At least that is what the error message tells me, when I try to run it...

Any ideas?

Maybe I should notify that I try to run the query in Access 2007 - I forgot that in the initial question...
It could be. Access is very peculiar on what it allows or not. But try this instead:
SELECT 
            Right(EM.strInitial,3) AS strInitialShort
          , EM.strFirstname & " " & EM.strLastname AS strFullName
          , EC.dtmCourseday
          , EC.intCourseFK

FROM linkEmployeeCourse AS EC 

RIGHT JOIN tblEmployees AS EM ON EC.intEmployeeFK = EM.intEmployeePK

WHERE (EM.dtmDischarge Is Null OR EM.dtmDischarge > Date())
            AND EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]
            AND (EC.intCourseFK = [Forms]![frmCourseOverview]![lstCourses] 
                     OR EC.intCourseFK Is Null)
ORDER BY EM.strInitial

Open in new window

This will work as well, though it's not as efficient. Then again, Access is never as efficient as SQL. But as long as it meets your needs, this should be enough.
Thanks for helping me out

Actually this last query syntax seems to do exactly as my initial query syntax. If an employee has attended any other course, but not 'this' course, that employee does not occur in the list. It is because, although the employee does not have a matching record in the ListEmployeeCourse table containing the needed course ID, the field is not null either - in this case it is just containing another ID of another course (and it might even occur more than once, if the employee attended more courses).

Please help :-)
I see. That would be solved moving to the ON solution, but it seems that Access doesn't like it, so what you can do is separate it:
Create a view/query with just the EC table and the respective condition. Then, instead of joining to the table, join to the view/query. That should produce the expected results.
Is it not possible to make a nested query that would do the trick?
I always have lots of problems with Access and nested queries. In MS SQL, this would be super easy. All you would need to do is use my first syntax. But Access is very picky with subqueries and joins. The easiest way would be to create one for the table with the variable, then join with that instead of with the table directly. I know Access doesn't have a problem with this.

If you want it done in one go, then I can't help you further. I always end up doing multiple view/queries (however access renamed these lately) so I can join them properly. Overall, I usually prefer to move it to SQL and use linked tables. Ends up a lot easier.
Avatar of skullnobrains
skullnobrains

SELECT
            Right(EM.strInitial,3) AS strInitialShort
          , EM.strFirstname & " " & EM.strLastname AS strFullName
          , EC.dtmCourseday
          , EC.intCourseFK

FROM tblEmployees AS EM

left outer join linkEmployeeCourse AS EC  ON EC.intEmployeeFK = EM.intEmployeePK or EC.intEmployeeFK is null

WHERE (EM.dtmDischarge Is Null OR EM.dtmDischarge > Date())
            AND EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]
            AND (EC.intCourseFK = [Forms]![frmCourseOverview]![lstCourses]
                     OR EC.intCourseFK Is Null)

group by EM.strFullName
ORDER BY EM.strInitial

---

the key here is OUTER if i understand your needs + explicitly allowing the join column to be null on the out side
i used a left join instead of a right for no specific reason
All,

Keep in mind that if you use a LEFT JOIN or RIGHT JOIN (both of which Access supports), you must be especially careful with your WHERE criteria.

For example:

SELECT *
FROM t1 LEFT JOIN t2 ON t1.ID = t2.ID
WHERE t1.Name = "Fred" and t2.Type = 7

Open in new window


The net effect of including t2.Type = 7 in the WHERE clause is to effectively make this an inner join: on the rows from t1 where t2 has no match, t2.Type is going to be null, and any time you use a comparison operator against null, the result is always false.  (Yes, even null = null evaluates to false.)

In most if not all of the queries above, I see this being violated.

Patrick
That is why I wanted to move the condition to the ON statement, but Access doesn't allow it, apparently. In which case, other than creating a separate query for the table with the variable, then left (or right) joining with that query, I can't see how to achieve the OP's intentions.
in modern database processors, putting every thing that is in an on clause in a where clause ie skipping the on clauses altogether will not change how the query is processed. on clauses just exist because they help the reader to understand the join by making the clause explicit

what it implies is you need to change where clauses to allow the table on the outer side of the join to contain null values. in this case

AND EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]

should be written as

AND ( EM.intDepartmentFK = [Forms]![frmCourseOverview]![cboDepartment]
  OR EM.intEmployeePK is null
  )

thanks for pointing it out. i probably was a little lazy there
>> on clauses just exist because they help the reader to understand the join by making the clause explicit

Actually, that isn't true. Using conditions on the ON clause will create smaller sets of each table to be joined, resulting in less overall rows. For example, if you have two tables, each with 1k records where your conditions will cut each in half. If you use the "old style" way, you'll generate a table of 1M records, on which you'll then cut 750k. If you use the ON clause, each table will be cut to 500 rows BEFORE joining, resulting in the same table of 250k, but with much better performance.
>> in modern database processors

i guess access is one of those meaning the query processor is probably capable to identify the where clauses relative to the join and almost definitely capable to filter the rows using the appropriate where clauses while performing the join, hence saving the memory efficiently

actually this explains the behavior described by @matthewspatrick that makes an additional where clause relative to a column on the "out" side of the query turn it into an "inner" query and not just being filtered after building a dataset with as many additional useless rows as the "in" side table contains

anyway, i guess we are drifting...
Both Access and MS SQL (and I believe any that is based on the SQL standard) follow a certain order when creating the result. From is created first, then where, then group, then having, then select, then order. Neither Access nor T-SQL try to identify anything from the WHERE clause before the JOINS are built. WHERE is always applied to a single table, which is the result of all joins. So if you don't specify an ON clause, you will build a full join table, thus rendering your code less efficient. For smaller tables, using current processor/RAM power, this isn't noticeable, but it is on larger ones.

Now, in this case, you can do this, or simply do a full outer join, which would be actually better, as it would include only those where the id matched or that are null on either side, whereas not including the on clause would also generate records where the ids don't match.
i'm not here for an argument with you, but most database processors out there including mssql, oracle, mysql,... have builtin query processors that handle such issues, and you can test it by running a bench if you are unsure. i'm not sure about access though, and even less if you're using the jet engine, but msde is supposed to be the same processor as in sql server, and sql server actually does it
I don't want an argument either, but in this case, either you're wrong, or you have to warn Microsoft that they're wrong:
http://msdn.microsoft.com/en-us/library/ms189499.aspx

"This order determines when the objects defined in one step are made available to the clauses in subsequent steps. For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps. Conversely, because the SELECT clause is step 8, any column aliases or derived columns defined in that clause cannot be referenced by preceding clauses."

In fact, it's a shame that it's the way it is. I've always thought it would be wonderful to use Aliases everywhere on the query.
this is the way the query is parsed and not the way it is executed.

after a short googling, i wound this blog which pretty much illustrates what we were discussing.
http://blog.sqlauthority.com/2008/10/25/sql-server-simulate-inner-join-using-left-join-statement-performance-analysis/

and this here : http://www.databasejournal.com/features/mssql/article.php/3618061/MS-SQL-Joins---Part-1.htm

An INNER JOIN can also be written into a WHERE clause as shown below.

SELECT Customers.CustomerName, Sales.SaleDate
FROM Sales, Customers
WHERE Customers.CustomerName = Sales.CustomerName

and like i said before, even in cases where it cannot be smart enough to make an inner join, it will at list filter the data with the where clause while it is building the joined dataset. when there is more than one join, you may start to loose performance due to the optimiser's choices
ASKER CERTIFIED SOLUTION
Avatar of Raahaugen
Raahaugen
Flag of Denmark image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
I think, since in this case, the resultset of the subquery will never exceed 50 rows, it will work out alright :-)
yeah most likely ;)
anyway, access probably rewrites the query as such

SELECT 
      EC.*
    , EM.dtmDischarge
    , EM.intDepartmentFK
    , Right([strInitial],3) AS Initialer
    , [strFirstname] & " " & [strLastname] AS FullName

FROM linkEmployeeCourse AS EC 

RIGHT JOIN tblEmployees AS EM ON EC.intEmployeeFK = EM.intEmployeePK

WHERE (((EM.dtmDischarge) Is Null Or (EM.dtmDischarge)>Date()) 
    AND((EM.intDepartmentFK)=[Forms]![frmCourseOverview]![cboDepartment]))
    AND EC.intCourseFK = [Forms]![frmCourseOverview]![lstCourses]

ORDER BY Right([strInitial],3)

Open in new window


which is exactly the same query, and the same as the one i suggested without the right join as well if i guessed the data structure properly

btw, i'd be happy to know if there is a speed improvement without the subquery. it is possible actually given the fact that the subquery was right-joined with another query
Hey - sorry I was late on giving points. I hope that you can accept a 50/50 split :-) thanks for the whole discussion ;-)

Best regards
no problem on my side. sorry about that off-topic argument we had. thanks for posting your final code. regards