query joins multiple tables, but too many rows come back

Hi Experts,

I have a simple query like this:

SELECT distinct firstTable.id,
   username.emailaddress,
   othertable.id,
   anothertable.name,
   yetanother.date

from firstTable
     JOIN username ON ( firstTable.id = username.id )
     JOIN othertable ON (username.id = othertable.id)
     JOIN anothertable ON (username.othercolumn = anothertable.somecolumn)
    JOIN yetanothertable ON (anothertable.somecolumn=yetanothertable.somecolumn)

The yetanother.date in the select introduces dozens of rows I don't want because I only want the first (max) date back.  How can I do this?

Many thanks,
Mike
LVL 1
threadyAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dsackerContract ERP Admin/ConsultantCommented:
Is this a homework assignment? :)
threadyAuthor Commented:
i'm 40...   :-)
dsackerContract ERP Admin/ConsultantCommented:
Well, that's an ambiguous answer, which I can assume is a "yes". :)

If this is homework, I can't completely answer this for you, but I can steer you in the right direction. Your last JOIN would need to be changed to a CROSS APPLY, selecting what you need. I must assume you have been assigned and have the appropriate references to figure out the rest.

If you're a veteran at SQL, the CROSS APPLY should immediately make sense. And if you're somewhere in between, I'm glad to help you a little further, but need transparency, as EE has a strict policy on this.

And if not a student, my apologies for the caution. We've had a barrage of student homework assignments posed as questions. Must be finals *lol*.
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Jim HornMicrosoft SQL Server Data DudeCommented:
>because I only want the first (max) date back
For starters, I don't see anywhere in your T-SQL an aggregate function like MIN() or MAX() that would tell SQL to return the minimum value of date.   If you're not familiar with this, I have an article called SQL Server GROUP BY Solutions that would be an excellent read.

>Is this a homework assignment? :)
>i'm 40...   :-)
You're going to have to cut us some slack on this one.  I see a question with few supporting details, experts here have to guess what level the asker is at based on those details in order to tailor their solutions, and few details will equate to an assumption that the asker is very new to T-SQL.  

That, and you won't believe how many people try to get their homework completed here, and for multiple reasons EE does not want to be labeled a homework site.
Scott PletcherSenior DBACommented:
CROSS|OUTER APPLY works beautifully for these types of requirements.  CROSS APPLY is the equivalent of an INNER JOIN, OUTER APPLY is the equivalent of an OUTER JOIN.  Thus, for your specific case:

SELECT firstTable.id,
    username.emailaddress,
    othertable.id,
    anothertable.name,
    yetanother.date

 from firstTable
      JOIN username ON ( firstTable.id = username.id )
      JOIN othertable ON (username.id = othertable.id)
      JOIN anothertable ON (username.othercolumn = anothertable.somecolumn)
      CROSS APPLY (
          SELECT TOP (1) *
          FROM yetanothertable
          WHERE anothertable.somecolumn=yetanothertable.somecolumn
          ORDER BY yetanother.date DESC
      ) AS ca1
threadyAuthor Commented:
One look at the questions I've asked across the board and you'll know I've been on EE for over 10 years and I work on just about everything.  I assure everyone I'm not a student in the faintest sense of the word.

Now to read up on cross apply...
Jim HornMicrosoft SQL Server Data DudeCommented:
That may be the case, but most experts that answer questions here are working for another client, so we typically don't have a lot of available time to do background like search expert profiles.     So again, you'll have to cut us some slack.
threadyAuthor Commented:
That might be a good feature to add to EE- when someone is obviously not a student based on previous questions, their profile could be marked as professional.
threadyAuthor Commented:
Now I've got this working, but I'm wondering about performance when there's a million rows in the database.  There's no index on the date column- of course I can add one.  Any thoughts?

Thank you!
Mike
dsackerContract ERP Admin/ConsultantCommented:
Yes. You can create a non-unique index on the date (if it's unique, even better).
CREATE INDEX IX_tablename_fieldname
    ON tablename (fieldname)

Open in new window

Keep your table statistics updated, and re-index your table periodically, and you should see good performance.
Jim HornMicrosoft SQL Server Data DudeCommented:
>There's no index on the date column- of course I can add one.  Any thoughts?
It's a common ETL practice to pump data from your source into a 'staging' table that has no restrictions such as keys, indexes, constraints, etc.  Some people even have it all varchar() just to ensure that all rows are instered into the database, even those that are obvious data type errors, such as birth_date = 'banana'.

Then you can execute SP's to 'scrub' the data, making sure dates are dates, numbers are numbers, keys have valid foreign keys, etc.

Then AFTER all this validation is done, either insert to the ultimate target table that has all the keys, indexes, etc., or if the staging table is fine then create those keys (i.e. drop them, then insert the data, then re-create afterwards) after the data load.   This is very common in data warehousing solutions.
Scott PletcherSenior DBACommented:
If you create a covering index on yetanothertable, it should be on ( somecolumn, date ), in that order.  That is, the lookup column(s), followed by date; include another column or two as well if you need to also list them in the outer/main query.

For example:

SELECT firstTable.id,
     username.emailaddress,
     othertable.id,
     anothertable.name,
     ca1.date --corrected the alias
    ,ca1.yet_another_column

  from firstTable
       JOIN username ON ( firstTable.id = username.id )
       JOIN othertable ON (username.id = othertable.id)
       JOIN anothertable ON (username.othercolumn = anothertable.somecolumn)
       CROSS APPLY (
           SELECT TOP (1) *
           FROM yetanothertable
           WHERE anothertable.somecolumn=yetanothertable.somecolumn
           ORDER BY yetanother.date DESC
       ) AS ca1


Index then should be: ( somecolumn, date ) include ( yet_another_column )

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.