Get One row in duplicates

Hi ,
  I have an Sql Server database with table Documents
  There are many duplicates rows with same Document_No  . But I need to get only one row from each document no

From the following sample data , I am looking for a query

   Doc1 Proj1 Path1 ......
   Doc1 Proj1 Path2
   Doc2 Proj1 Path2
   Doc3 Proj1 Path3

Query Should return rows with  Doc2 and Doc3 and one of the Doc1 row (it does not matter which one)
Sam OZAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ryan ChongBusiness Systems Analyst , ex-Senior Application EngineerCommented:
try:

;with cte as
(
  select a.*, row_number() over (partition by docfield order by projfield, pathfield ) idx from yourtable
)
select * from cte where idx = 1
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Sam OZAuthor Commented:
Thanks , In fact I have some 20 fields in the Table  . Will I have to do order by on all except DocNO ?
     DocTable has fields like
         DocNo
         ProjNo
         Path
         RevNo
          ...... More fields

Can you please write the query for this ?
0
Ryan ChongBusiness Systems Analyst , ex-Senior Application EngineerCommented:
>>Will I have to do order by on all except DocNO

not necessarily, what I did is for as an example. you can also try like this instead:

row_number() over (partition by docfield order by (select 1)) idx
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

Ryan ChongBusiness Systems Analyst , ex-Senior Application EngineerCommented:
"partition by" will determine what criterion to group the records while "order by" will determine the sequence orders
1
Mark WillsTopic AdvisorCommented:
And, while Ryan is absolutely right, I would add you need to partition by all the columns that identify the other column that has the duplicated values.

All the other fields pertain to the columns as defined in the partition by, so revision will be per docfield, projfield

Now, I can well imagine multiple revisions, so you may want to include that in your order by RevNo desc  (keeping the most recent) In which case we will need to know a bit more about your requirements.

In your case (borrowing Ryans code)
;with cte as
(
  select a.*, row_number() over (partition by docfield, projfield  order by pathfield ) rn from yourtable
)
select * from cte where rn = 1  -- and you should only select the columns you want

Open in new window

0
Mark WillsTopic AdvisorCommented:
With due respect, it seems you didnt understand my comments about partitioning by all columns that generates the column with the dupe values... There is a profound difference in partitioning and ordering and it will make or break

By way of example
create table #projects (Docfield varchar(20), ProjField Varchar(20), Pathfield varchar(50))

insert #projects values ('Doc1','Proj1','Path1')
insert #projects values ('Doc1','Proj5','Path5')

;with ryan_cte as
(
  select a.*, row_number() over (partition by docfield order by projfield, pathfield ) idx from #projects a
)
select * from ryan_cte where idx = 1

;with mark_cte as
(
  select a.*, row_number() over (partition by docfield, projfield order by pathfield ) rn from #projects a
)
select * from mark_cte where rn = 1

Open in new window

And if we consider the prospect of RevNo it could become even more complex.

Just saying, be careful.... And we havent even discussed datatypes or sort sequences (use of collation)
0
Mark WillsTopic AdvisorCommented:
Hello ?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
SQL

From novice to tech pro — start learning today.