asked on

Get the 1st occurrence of a row based on column value

Given the data set in the attached Excel spreadsheet, how would I query against that data so I get only the 1st occurrence of Property_Id?

I already have a query that gets the data I need but it's pretty slow. I'm looking to get this data in the fastest way possible.

Or, to ask another way, given the following simple set of data...

Element_Id     Property_Id     Value
 2528          1               Company conameshort
-4096          1             
 2528          2               Company Name
-4096          2               Text
-4096          3               Company Text Field
 2528          5               6
-4096          5               3

Open in new window

How do I get this...

Element_Id     Property_Id     Value
 2528          1               Company conameshort
 2528          2               Company Name
-4096          3               Company Text Field
 2528          5               6

Open in new window

properties.xlsx

I'm using Microsoft SQL Server 2012

ASKER CERTIFIED SOLUTION

slightwv (䄆 Netminder)

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Russ Suter

ASKER

Sorry, I forgot to mention. I need to do this using a CROSS APPLY. I cannot use any windowed functions to perform this task due to performance reasons.

slightwv (䄆 Netminder)

How about doing the cross apply in an inner query and using row_number on an outer query?

pseudo code:

select element_id, property_id, value
from (
  select element_id, property_id, value,
    row_number() over(partition by property_id order by property_id, element_id desc) rn
   from (
         select blah, blah blah from junk1 cross apply junk2
       ) t2
) t1
where rn=1

Open in new window

Russ Suter

ASKER

That still uses ROW_NUMBER() which I can't use.

slightwv (䄆 Netminder)

As you know, I'm not really a SQL Server person. I was thinking by moving the window function to an outer level and keeping the cross join inside, performance might not be an issue.

Guess a temp table or materialize view from the cross join then using row_number isn't an option either since it uses row_number?

SOLUTION

PortletPaul

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Scott Pletcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Russ Suter

ASKER

Thanks all!

So it would appear that I was operating under a false assumption that ROW_NUMBER() is slow. I had an interesting few hours last weekend figuring out why my view was running so slow. It turned out that my view uses an APPLY clause to call a user defined function. It's the UDF that is using the ROW_NUMBER() function. When using CROSS APPLY, performance of the view is about twice as slow as when using OUTER APPLY. Furthermore, if I filter the view with any kind of WHERE clause, performance of the view drops even further when using CROSS APPLY. My solution was to just use OUTER APPLY for the view and now performance is great. The UDF is still using ROW_NUMBER(). Thanks all for your contributions.

PortletPaul

What does that UDF return?

A scalar value? e.g. RETURN INT
A table?

I suggest you consider possible alternatives for that function.