Solved

How can i find a duplicate value  from a recordset?

Posted on 2003-12-08
38
509 Views
Last Modified: 2013-11-25
Is there any method or property to get the duplicate values in a recordset?

thanks
0
Comment
Question by:srikmaha
  • 15
  • 10
  • 7
  • +2
38 Comments
 
LVL 33

Expert Comment

by:hongjun
ID: 9897550
i don't think so
try using sql statements like this

select name from yourtable where name = 'John'


hongjun
0
 
LVL 9

Accepted Solution

by:
bhagyesht earned 25 total points
ID: 9902068
you can run the access's query wizard to make the query for you. it has a find duplicate query item

typically this is the SQL
SELECT First(Contacts.Contact_id) AS [Contact_id Field], First(Contacts.Contact) AS [Contact Field], First(Contacts.[Mobile Number]) AS [Mobile Number Field], Count(Contacts.Contact_id) AS NumberOfDups
FROM Contacts
GROUP BY Contacts.Contact_id, Contacts.Contact, Contacts.[Mobile Number]
HAVING (((Count(Contacts.Contact_id))>1) AND ((Count(Contacts.[Mobile Number]))>1));
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 9902279
Do you want to find a duplicate value simply in a column or an entire row which is a duplicate of another row (not possible if there is a primary key selected)?

Mayank.
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 9902384
mayankeagle : if there is a primary key how can a record be duplicated?
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 9902559
That is what I said, bhagyesht. >> not possible if there is a primary key selected
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 9902577
ok sorry i read it as

not possible
if there is a primary key selected?
0
 
LVL 30

Assisted Solution

by:mayankeagle
mayankeagle earned 25 total points
ID: 9902671
I guess what srikmaha wants is to find duplicate records in any recordset itself.... not duplicate records in the table using a query like the one which you posted (which is, I agree, a really good effort :-) ).

But once you get a particular recordset, I guess it might be all-right to populate something like a Collection with objects that contain the recordset's data, and then see if the collection has duplicates. Might be better in terms of performance too, rather than iterating through the recordset for every row in the recordset and checking if its got a duplicate (which might not be possible with all languages/ recordsets either) - depends upon the type of recordset.

Mayank.
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 9902732
"(which is, I agree, a really good effort :-) )." this query is build using the wizard in under 30 seconds flat!. ;-)
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 9903359
Oops! Sorry I missed out that part :-)
0
 

Expert Comment

by:rgshafer
ID: 9917977
Sort the recordset, then if then next record is the same - Viola = Duplicate!!

Richard ;-)
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 9918061
Well, but we don't even know what language he is using. He hasn't told it as yet (matter of factly, he hasn't got back to us). And sorting of recordsets will not be allowed in all languages.

Moreover, in any case, he will have to iterate through the recordset and simultaneously compare two records (which seems to be something that is pretty tough.... because most recordsets (in most languages) maintain a pointer to the current record and its only that record which you can read at that time. Sorting might also result in an overhead (even if there is an in-built function to do that), because the in-built function will also be applying some sorting algorithm (with a best complexity of O (n log n)). Then again, you have to iterate through it to search for duplicates. That way, it would be better to search for duplicates directly, but I don't think that would be possible.

Moreover, in multi-user environments, it is not feasible to stay connected with the recordset and the database for such a long time due to locks, expenses incurred in connectivity in terms of resources, etc.

I guess its best to first store the recordset in a collection or an array kind of a thing, and then search for duplicates there.

Regards,
Mayank.
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 9918154
mayank : "And sorting of recordsets will not be allowed in all languages" just order by all the field and if its duplicate then the records will come together.
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 9918158
srikmaha: can we have some feedback?
0
 

Expert Comment

by:rgshafer
ID: 9918207
Mayank,

I appreciate your ideas yet consider this:

1. The data's SQL can have an Order By clause, thus the recordset is sorted without creating any algorithms
2. Recordsets can be disconnected:  Set rs.activeconnection = nothing
3. The recordset can be cloned and each recordset compared to the other - simple match logic - if size really matters ;-) than see #5
4. NEVER use collections on sizable amounts of data - collections ARE DEADLY SLOW - recordsets are much more effecient.
5. If he wants, stream the recordset to an XML document and supply an XPath statement to the SelectNodes(XPathStatement) and the NodeList will have your duplicates.
6. If he cannot use XML because of language issues - Get a Real Language - I just had to add that LOL - Just Kidding!!!

Richard ;-)
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 9918413
I really appreciate you guys' sugesstions, but:

1. The Order By clause will change the query. What if the query is not supposed to be changed? What if its like a paging request kind of thing using row-numbers or something of that sort (depending on the database), and the sorting is not to be applied?

2. >> Set rs.activeconnection = nothing

Like I said, we don't know the language which he/ she is using.

3. >> collections ARE DEADLY SLOW - recordsets are much more effecient.

Depends upon language again. And while using recordsets, you are connected to the database still (which really turns out to be expensive in multi-user environments). And you cannot have a disconnected recordset in all languages.

Keeping all these things in mind, I suggested staying disconnected and using the collection. Why to go as far as cloning the recordset :-) ? Just populate the collection, and close the recordset/ connection. Saves memory too.

BTW, srikmaha, what is your state?

Mayank.
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 9918424
looks like srikmaha as done the disappearing act. I guess we can freeze posting comments on this until the asker responds.
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10319057
Split between bhagyesht and mayankeagle.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:srikmaha
ID: 10323252
I am reading a text file and putting all data into a virtual recordset of VB.I want to find the duplicate value in a specific fields.Niether I can use a complex query using group by nor any functions of sql.Becz the ssql functions do not support  MS text driver.
It is time consuming to use the collection and sort field by field for a milions of records.

Still I am trying to get the best solution.
Thank you friends.


0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10329262
You don't wanna change your query, you don't wanna use extra functions, you don't wanna use collections.... what do you want to do, then? For a disconnected access (which will be faster), Collections will be best. Otherwise, you might have to use an ORDER BY or GROUP BY (whatever you need) in the query.... or clone the RecordSet!

Mayank.
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10329265
The only efficient thing can be the algorithm that you use for sorting.
0
 

Expert Comment

by:rgshafer
ID: 10329506
If you are going to populate a collection, what are you populating it from?? A recordset?? Then clone it since the memory usage will be virtually the same, also the time it takes to populate a collection verses cloning the recordset equates to me being in the Carribean on the beach sipping Mia Tia's while you're still dizzy from the looping!!!   ;=))
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 10329524
srikmaha: dont you think posting this information earlier would have made life simpler for all of us?
are u ready to use a workaround?
Bhagyesh Trivedi
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 10329527
do u the output as unique values or duplicate values?
Bhagyesh Trivedi
0
 

Author Comment

by:srikmaha
ID: 10332550
Thanks a lot every one I am going to use the collection and later will sort it to find duplicate values.
0
 

Expert Comment

by:rgshafer
ID: 10335081
Collections are BAD ideas on lot's of records - you'll find this out on your own.

Your recordset must be disconnected.
Set rs.ActiveConnection = Nothing

Dim rs as ADODB.Recordset
Dim col as Collection

Sorting is native to a Recordset - rs.Sort !!!
Where is col.Sort ???

I hope your approach works acceptably for you.
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10339224
Whether it is native or not, it will still implement some algorithm which will have more or less the same time complexity.
0
 

Expert Comment

by:rgshafer
ID: 10341827
Like I said since you are stuck on doing it with collections and don't mind the performance penalty go for it, and when I open a restuarant where the price of a meal is many times over normal I'll invite you....
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10341939
Ha, I bet that I will come and eat!

BTW, I had suggested Collections at a time when we did not know the language which the questioner was using. Some languages like Java and the .NET framework do provide in-built methods for sorting. Some Collections are slow, like Vector (because it is synchroznied) and some are fast (like ArrayList). So we did not know what features the language used by the questioner gives. RecordSets are there in many languages.

And the reason why I had suggested Collections to be better than RecordSet was that at that time, we didn't know what kind of RecordSet it was. We assumed it is a normal DB RecordSet with an active Connection - which, I say is much more costlier and slower than a cached Collection would be. Moreover, having more active connections open will be more expensive in multi-user environments, and can also be more prone to exceptions - what if there is a database-access error? What will your RecordSet do in that case? But the Collection will still work.

Anyway, its up to the questioner to decide what he wishes to use.
0
 
LVL 9

Expert Comment

by:bhagyesht
ID: 10342078
srikmaha: well apart from the thanks if you have got your answer can u close this question so that there are no invitations to restuarants send out ;-)
Bhagyesh Trivedi
0
 

Expert Comment

by:rgshafer
ID: 10342484
Great I'll let you know ... LOL

Anyway, True, ArrayList's are a good option. I would use them if I had to derive the data from calculations, custom formating etc. (ie I didn't just get the data from the db or a file). If I just received the data from the db or a file why pull it out of an rs to a collection when in essense you already have it in a format that you can sort on? So you are talking 0 seconds since it is already in the rs verses XXX seconds to load a collection. then do a sort. If you set rs2 = rs1.clone it is still faster than looping and loading a collection since all you are doing is duplicating memory - you are not looping to reload the new rs2.

I wouldn't perform sorting on a recordset with an active connection for the very reason you mentioned earlier as as I previously suggested Set rs.activeconnection = nothing.

If you have a clientside rs than you can set the activeconnection to nothing. As per db recordset it can be any recordset (I'm not sure what you think might not be available on the rs if it wasn't a db recordset) once the recordset is loaded it's a recordset, of course you will not have db actions available but it's not a db recordset so you are not trying to db actions anyway.

All the same as you mentioned:
> Anyway, its up to the questioner to decide what he wishes to use.

Cheers,

Richard

0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10349485
>> I'm not sure what you think might not be available on the rs if it wasn't a db recordset)

The questioner said: >> I am reading a text file and putting all data into a virtual recordset of VB.

Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

>> as I previously suggested Set rs.activeconnection = nothing. If you have a clientside rs than you can set the
>> activeconnection to nothing.

Hmmm.... you can do that, but initially, we were not knowing which language he was using. You can't do that in all languages.

Regards,
Mayank.
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10349487
>> I'm not sure what you think might not be available on the rs if it wasn't a db recordset)

The questioner said: >> I am reading a text file and putting all data into a virtual recordset of VB.

Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

>> as I previously suggested Set rs.activeconnection = nothing. If you have a clientside rs than you can set the
>> activeconnection to nothing.

Hmmm.... you can do that, but initially, we were not knowing which language he was using. You can't do that in all languages.

Regards,
Mayank.
0
 

Expert Comment

by:rgshafer
ID: 10349627

>>Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

A virtual recordset is a recordset you create (fields names, datatypes etc.) Since you didn't get the info from a database the metadata is missing so you create it yourself. Then you just do an open on it and you can add data - very easy! Plus it is obviously disconnected. Then after you add data, if you set up the field names and datatypes like a table in your database all you would need to do is set an active connection to the ActiveConnection property of the recordset and you can then update the data to your database.

Now, let me think a dumb line of reasoning here:
Srikmaha says:  I am reading a text file and putting all data into a virtual recordset of VB.
   Sooo he'll loop to load a recordset
   Then LOOOOP AGAIN to load a collection instead of cloning it (just a simple memory copy no looping a second time)
Now a STUPID Question - WHHHYYYY ????

mayankeagle are we missing something???

I think Srikmaha is smoking some kind of mind altering substance since we never hear from him!!

Srikmaha - are you seeing yellow, purple, green and orange stripped pink polka-dotted flying elephants yet!!!

Regards,
RGS


0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10350211
I agree that is all right.... the only thing is, it didn't seem all right initially ;-) because we didn't know what he was doing. However, ain't there a better way in VB 6 to read the contents from the file? Anyways, Srikmaha, we're waiting for your response.
0
 
LVL 30

Expert Comment

by:mayankeagle
ID: 10936411
Please proceed with that recommendation.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

A short article about a problem I had getting the GPS LocationListener working.
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now