Get the 1st occurrence of a row based on column value

Russ Suter
Russ Suter used Ask the Experts™
on
Given the data set in the attached Excel spreadsheet, how would I query against that data so I get only the 1st occurrence of Property_Id?

I already have a query that gets the data I need but it's pretty slow. I'm looking to get this data in the fastest way possible.

Or, to ask another way, given the following simple set of data...

Element_Id     Property_Id     Value
 2528          1               Company conameshort
-4096          1             
 2528          2               Company Name
-4096          2               Text
-4096          3               Company Text Field
 2528          5               6
-4096          5               3

Open in new window

How do I get this...
Element_Id     Property_Id     Value
 2528          1               Company conameshort
 2528          2               Company Name
-4096          3               Company Text Field
 2528          5               6

Open in new window


properties.xlsx

I'm using Microsoft SQL Server 2012
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2012
Distinguished Expert 2018
Commented:
Check out row_number:
select element_id, property_id, value
from (
  select element_id, property_id, value,
    row_number() over(partition by property_id order by property_id, element_id desc) rn
   from tab1
) t1
where rn=1

  

Open in new window

Russ SuterSenior Software Developer

Author

Commented:
Sorry, I forgot to mention. I need to do this using a CROSS APPLY. I cannot use any windowed functions to perform this task due to performance reasons.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
How about doing the cross apply in an inner query and using row_number on an outer query?

pseudo code:
select element_id, property_id, value
from (
  select element_id, property_id, value,
    row_number() over(partition by property_id order by property_id, element_id desc) rn
   from (
         select blah, blah blah from junk1 cross apply junk2
       ) t2
) t1
where rn=1
 

Open in new window

Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Russ SuterSenior Software Developer

Author

Commented:
That still uses ROW_NUMBER() which I can't use.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
As you know, I'm not really a SQL Server person.  I was thinking by moving the window function to an outer level and keeping the cross join inside, performance might not be an issue.

Guess a temp table or materialize view from the cross join then using row_number isn't an option either since it uses row_number?
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013
Commented:
>>"I already have a query that gets the data I need but it's pretty slow."

NB: analytic functions like row_number() are fast, cross apply may not be faster at all because you still need a where clause based on the outcome of the apply operator, which is way easier to achieve using row_number.

select element_id, ca.first_property_id, property_id, value
from tab1 AS T1
cross apply (
  select TOP(1) T2.property_id 
   from tab1 AS T2 
   where T2.property_id > T1.property_id 
   order by T2.property_id, T@.element_id DESC
) ca (first_property)

where ... what?

Open in new window

Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014
Commented:
I cannot use any windowed functions to perform this task due to performance reasons.

Hmm, odd.  Typically windowed functions perform well overall for tasks where they are appropriate.  The real performance issue would be how the table is clustered and the actual full DDL for the table.  

Do you have another column in the table, such as a datetime or identity (not quite as accurate as datetime, but better than nothing), that could be used for sorting?  Remember, SQL has no such as "first" record without an explicit sort order from you.

Based just on what you've shown, I guess you'd have to use something like this, which would be slower because of the function in the sort value:

SELECT Element_Id, Property_Id, Value
FROM (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Property_Id ORDER BY ABS(Element_id)) AS row_num
    FROM dbo.table_name
) AS query1
WHERE row_num = 1

--btw, don't worry about the "SELECT *": SQL will only pull columns you actually use in the outer query from the table
Russ SuterSenior Software Developer

Author

Commented:
Thanks all!

So it would appear that I was operating under a false assumption that ROW_NUMBER() is slow. I had an interesting few hours last weekend figuring out why my view was running so slow. It turned out that my view uses an APPLY clause to call a user defined function. It's the UDF that is using the ROW_NUMBER() function. When using CROSS APPLY, performance of the view is about twice as slow as when using OUTER APPLY. Furthermore, if I filter the view with any kind of WHERE clause, performance of the view drops even further when using CROSS APPLY. My solution was to just use OUTER APPLY for the view and now performance is great. The UDF is still using ROW_NUMBER(). Thanks all for your contributions.
PortletPaulEE Topic Advisor
Most Valuable Expert 2014
Awarded 2013

Commented:
What does that UDF return?

A scalar value?  e.g. RETURN INT
A table?

I suggest you consider possible alternatives for that function.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial