Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 522
  • Last Modified:

How can i find a duplicate value from a recordset?

Is there any method or property to get the duplicate values in a recordset?

thanks
0
srikmaha
Asked:
srikmaha
  • 15
  • 10
  • 7
  • +2
2 Solutions
 
hongjunCommented:
i don't think so
try using sql statements like this

select name from yourtable where name = 'John'


hongjun
0
 
bhagyeshtCommented:
you can run the access's query wizard to make the query for you. it has a find duplicate query item

typically this is the SQL
SELECT First(Contacts.Contact_id) AS [Contact_id Field], First(Contacts.Contact) AS [Contact Field], First(Contacts.[Mobile Number]) AS [Mobile Number Field], Count(Contacts.Contact_id) AS NumberOfDups
FROM Contacts
GROUP BY Contacts.Contact_id, Contacts.Contact, Contacts.[Mobile Number]
HAVING (((Count(Contacts.Contact_id))>1) AND ((Count(Contacts.[Mobile Number]))>1));
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Do you want to find a duplicate value simply in a column or an entire row which is a duplicate of another row (not possible if there is a primary key selected)?

Mayank.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
bhagyeshtCommented:
mayankeagle : if there is a primary key how can a record be duplicated?
0
 
Mayank SAssociate Director - Product EngineeringCommented:
That is what I said, bhagyesht. >> not possible if there is a primary key selected
0
 
bhagyeshtCommented:
ok sorry i read it as

not possible
if there is a primary key selected?
0
 
Mayank SAssociate Director - Product EngineeringCommented:
I guess what srikmaha wants is to find duplicate records in any recordset itself.... not duplicate records in the table using a query like the one which you posted (which is, I agree, a really good effort :-) ).

But once you get a particular recordset, I guess it might be all-right to populate something like a Collection with objects that contain the recordset's data, and then see if the collection has duplicates. Might be better in terms of performance too, rather than iterating through the recordset for every row in the recordset and checking if its got a duplicate (which might not be possible with all languages/ recordsets either) - depends upon the type of recordset.

Mayank.
0
 
bhagyeshtCommented:
"(which is, I agree, a really good effort :-) )." this query is build using the wizard in under 30 seconds flat!. ;-)
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Oops! Sorry I missed out that part :-)
0
 
rgshaferCommented:
Sort the recordset, then if then next record is the same - Viola = Duplicate!!

Richard ;-)
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Well, but we don't even know what language he is using. He hasn't told it as yet (matter of factly, he hasn't got back to us). And sorting of recordsets will not be allowed in all languages.

Moreover, in any case, he will have to iterate through the recordset and simultaneously compare two records (which seems to be something that is pretty tough.... because most recordsets (in most languages) maintain a pointer to the current record and its only that record which you can read at that time. Sorting might also result in an overhead (even if there is an in-built function to do that), because the in-built function will also be applying some sorting algorithm (with a best complexity of O (n log n)). Then again, you have to iterate through it to search for duplicates. That way, it would be better to search for duplicates directly, but I don't think that would be possible.

Moreover, in multi-user environments, it is not feasible to stay connected with the recordset and the database for such a long time due to locks, expenses incurred in connectivity in terms of resources, etc.

I guess its best to first store the recordset in a collection or an array kind of a thing, and then search for duplicates there.

Regards,
Mayank.
0
 
bhagyeshtCommented:
mayank : "And sorting of recordsets will not be allowed in all languages" just order by all the field and if its duplicate then the records will come together.
0
 
bhagyeshtCommented:
srikmaha: can we have some feedback?
0
 
rgshaferCommented:
Mayank,

I appreciate your ideas yet consider this:

1. The data's SQL can have an Order By clause, thus the recordset is sorted without creating any algorithms
2. Recordsets can be disconnected:  Set rs.activeconnection = nothing
3. The recordset can be cloned and each recordset compared to the other - simple match logic - if size really matters ;-) than see #5
4. NEVER use collections on sizable amounts of data - collections ARE DEADLY SLOW - recordsets are much more effecient.
5. If he wants, stream the recordset to an XML document and supply an XPath statement to the SelectNodes(XPathStatement) and the NodeList will have your duplicates.
6. If he cannot use XML because of language issues - Get a Real Language - I just had to add that LOL - Just Kidding!!!

Richard ;-)
0
 
Mayank SAssociate Director - Product EngineeringCommented:
I really appreciate you guys' sugesstions, but:

1. The Order By clause will change the query. What if the query is not supposed to be changed? What if its like a paging request kind of thing using row-numbers or something of that sort (depending on the database), and the sorting is not to be applied?

2. >> Set rs.activeconnection = nothing

Like I said, we don't know the language which he/ she is using.

3. >> collections ARE DEADLY SLOW - recordsets are much more effecient.

Depends upon language again. And while using recordsets, you are connected to the database still (which really turns out to be expensive in multi-user environments). And you cannot have a disconnected recordset in all languages.

Keeping all these things in mind, I suggested staying disconnected and using the collection. Why to go as far as cloning the recordset :-) ? Just populate the collection, and close the recordset/ connection. Saves memory too.

BTW, srikmaha, what is your state?

Mayank.
0
 
bhagyeshtCommented:
looks like srikmaha as done the disappearing act. I guess we can freeze posting comments on this until the asker responds.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Split between bhagyesht and mayankeagle.
0
 
srikmahaAuthor Commented:
I am reading a text file and putting all data into a virtual recordset of VB.I want to find the duplicate value in a specific fields.Niether I can use a complex query using group by nor any functions of sql.Becz the ssql functions do not support  MS text driver.
It is time consuming to use the collection and sort field by field for a milions of records.

Still I am trying to get the best solution.
Thank you friends.


0
 
Mayank SAssociate Director - Product EngineeringCommented:
You don't wanna change your query, you don't wanna use extra functions, you don't wanna use collections.... what do you want to do, then? For a disconnected access (which will be faster), Collections will be best. Otherwise, you might have to use an ORDER BY or GROUP BY (whatever you need) in the query.... or clone the RecordSet!

Mayank.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
The only efficient thing can be the algorithm that you use for sorting.
0
 
rgshaferCommented:
If you are going to populate a collection, what are you populating it from?? A recordset?? Then clone it since the memory usage will be virtually the same, also the time it takes to populate a collection verses cloning the recordset equates to me being in the Carribean on the beach sipping Mia Tia's while you're still dizzy from the looping!!!   ;=))
0
 
bhagyeshtCommented:
srikmaha: dont you think posting this information earlier would have made life simpler for all of us?
are u ready to use a workaround?
Bhagyesh Trivedi
0
 
bhagyeshtCommented:
do u the output as unique values or duplicate values?
Bhagyesh Trivedi
0
 
srikmahaAuthor Commented:
Thanks a lot every one I am going to use the collection and later will sort it to find duplicate values.
0
 
rgshaferCommented:
Collections are BAD ideas on lot's of records - you'll find this out on your own.

Your recordset must be disconnected.
Set rs.ActiveConnection = Nothing

Dim rs as ADODB.Recordset
Dim col as Collection

Sorting is native to a Recordset - rs.Sort !!!
Where is col.Sort ???

I hope your approach works acceptably for you.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Whether it is native or not, it will still implement some algorithm which will have more or less the same time complexity.
0
 
rgshaferCommented:
Like I said since you are stuck on doing it with collections and don't mind the performance penalty go for it, and when I open a restuarant where the price of a meal is many times over normal I'll invite you....
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Ha, I bet that I will come and eat!

BTW, I had suggested Collections at a time when we did not know the language which the questioner was using. Some languages like Java and the .NET framework do provide in-built methods for sorting. Some Collections are slow, like Vector (because it is synchroznied) and some are fast (like ArrayList). So we did not know what features the language used by the questioner gives. RecordSets are there in many languages.

And the reason why I had suggested Collections to be better than RecordSet was that at that time, we didn't know what kind of RecordSet it was. We assumed it is a normal DB RecordSet with an active Connection - which, I say is much more costlier and slower than a cached Collection would be. Moreover, having more active connections open will be more expensive in multi-user environments, and can also be more prone to exceptions - what if there is a database-access error? What will your RecordSet do in that case? But the Collection will still work.

Anyway, its up to the questioner to decide what he wishes to use.
0
 
bhagyeshtCommented:
srikmaha: well apart from the thanks if you have got your answer can u close this question so that there are no invitations to restuarants send out ;-)
Bhagyesh Trivedi
0
 
rgshaferCommented:
Great I'll let you know ... LOL

Anyway, True, ArrayList's are a good option. I would use them if I had to derive the data from calculations, custom formating etc. (ie I didn't just get the data from the db or a file). If I just received the data from the db or a file why pull it out of an rs to a collection when in essense you already have it in a format that you can sort on? So you are talking 0 seconds since it is already in the rs verses XXX seconds to load a collection. then do a sort. If you set rs2 = rs1.clone it is still faster than looping and loading a collection since all you are doing is duplicating memory - you are not looping to reload the new rs2.

I wouldn't perform sorting on a recordset with an active connection for the very reason you mentioned earlier as as I previously suggested Set rs.activeconnection = nothing.

If you have a clientside rs than you can set the activeconnection to nothing. As per db recordset it can be any recordset (I'm not sure what you think might not be available on the rs if it wasn't a db recordset) once the recordset is loaded it's a recordset, of course you will not have db actions available but it's not a db recordset so you are not trying to db actions anyway.

All the same as you mentioned:
> Anyway, its up to the questioner to decide what he wishes to use.

Cheers,

Richard

0
 
Mayank SAssociate Director - Product EngineeringCommented:
>> I'm not sure what you think might not be available on the rs if it wasn't a db recordset)

The questioner said: >> I am reading a text file and putting all data into a virtual recordset of VB.

Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

>> as I previously suggested Set rs.activeconnection = nothing. If you have a clientside rs than you can set the
>> activeconnection to nothing.

Hmmm.... you can do that, but initially, we were not knowing which language he was using. You can't do that in all languages.

Regards,
Mayank.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
>> I'm not sure what you think might not be available on the rs if it wasn't a db recordset)

The questioner said: >> I am reading a text file and putting all data into a virtual recordset of VB.

Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

>> as I previously suggested Set rs.activeconnection = nothing. If you have a clientside rs than you can set the
>> activeconnection to nothing.

Hmmm.... you can do that, but initially, we were not knowing which language he was using. You can't do that in all languages.

Regards,
Mayank.
0
 
rgshaferCommented:

>>Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

A virtual recordset is a recordset you create (fields names, datatypes etc.) Since you didn't get the info from a database the metadata is missing so you create it yourself. Then you just do an open on it and you can add data - very easy! Plus it is obviously disconnected. Then after you add data, if you set up the field names and datatypes like a table in your database all you would need to do is set an active connection to the ActiveConnection property of the recordset and you can then update the data to your database.

Now, let me think a dumb line of reasoning here:
Srikmaha says:  I am reading a text file and putting all data into a virtual recordset of VB.
   Sooo he'll loop to load a recordset
   Then LOOOOP AGAIN to load a collection instead of cloning it (just a simple memory copy no looping a second time)
Now a STUPID Question - WHHHYYYY ????

mayankeagle are we missing something???

I think Srikmaha is smoking some kind of mind altering substance since we never hear from him!!

Srikmaha - are you seeing yellow, purple, green and orange stripped pink polka-dotted flying elephants yet!!!

Regards,
RGS


0
 
Mayank SAssociate Director - Product EngineeringCommented:
I agree that is all right.... the only thing is, it didn't seem all right initially ;-) because we didn't know what he was doing. However, ain't there a better way in VB 6 to read the contents from the file? Anyways, Srikmaha, we're waiting for your response.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Please proceed with that recommendation.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 15
  • 10
  • 7
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now