Link to home
Start Free TrialLog in
Avatar of srikmaha
srikmaha

asked on

How can i find a duplicate value from a recordset?

Is there any method or property to get the duplicate values in a recordset?

thanks
Avatar of hongjun
hongjun
Flag of Singapore image

i don't think so
try using sql statements like this

select name from yourtable where name = 'John'


hongjun
ASKER CERTIFIED SOLUTION
Avatar of bhagyesht
bhagyesht
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Mayank S
Do you want to find a duplicate value simply in a column or an entire row which is a duplicate of another row (not possible if there is a primary key selected)?

Mayank.
mayankeagle : if there is a primary key how can a record be duplicated?
That is what I said, bhagyesht. >> not possible if there is a primary key selected
ok sorry i read it as

not possible
if there is a primary key selected?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
"(which is, I agree, a really good effort :-) )." this query is build using the wizard in under 30 seconds flat!. ;-)
Oops! Sorry I missed out that part :-)
Avatar of rgshafer
rgshafer

Sort the recordset, then if then next record is the same - Viola = Duplicate!!

Richard ;-)
Well, but we don't even know what language he is using. He hasn't told it as yet (matter of factly, he hasn't got back to us). And sorting of recordsets will not be allowed in all languages.

Moreover, in any case, he will have to iterate through the recordset and simultaneously compare two records (which seems to be something that is pretty tough.... because most recordsets (in most languages) maintain a pointer to the current record and its only that record which you can read at that time. Sorting might also result in an overhead (even if there is an in-built function to do that), because the in-built function will also be applying some sorting algorithm (with a best complexity of O (n log n)). Then again, you have to iterate through it to search for duplicates. That way, it would be better to search for duplicates directly, but I don't think that would be possible.

Moreover, in multi-user environments, it is not feasible to stay connected with the recordset and the database for such a long time due to locks, expenses incurred in connectivity in terms of resources, etc.

I guess its best to first store the recordset in a collection or an array kind of a thing, and then search for duplicates there.

Regards,
Mayank.
mayank : "And sorting of recordsets will not be allowed in all languages" just order by all the field and if its duplicate then the records will come together.
srikmaha: can we have some feedback?
Mayank,

I appreciate your ideas yet consider this:

1. The data's SQL can have an Order By clause, thus the recordset is sorted without creating any algorithms
2. Recordsets can be disconnected:  Set rs.activeconnection = nothing
3. The recordset can be cloned and each recordset compared to the other - simple match logic - if size really matters ;-) than see #5
4. NEVER use collections on sizable amounts of data - collections ARE DEADLY SLOW - recordsets are much more effecient.
5. If he wants, stream the recordset to an XML document and supply an XPath statement to the SelectNodes(XPathStatement) and the NodeList will have your duplicates.
6. If he cannot use XML because of language issues - Get a Real Language - I just had to add that LOL - Just Kidding!!!

Richard ;-)
I really appreciate you guys' sugesstions, but:

1. The Order By clause will change the query. What if the query is not supposed to be changed? What if its like a paging request kind of thing using row-numbers or something of that sort (depending on the database), and the sorting is not to be applied?

2. >> Set rs.activeconnection = nothing

Like I said, we don't know the language which he/ she is using.

3. >> collections ARE DEADLY SLOW - recordsets are much more effecient.

Depends upon language again. And while using recordsets, you are connected to the database still (which really turns out to be expensive in multi-user environments). And you cannot have a disconnected recordset in all languages.

Keeping all these things in mind, I suggested staying disconnected and using the collection. Why to go as far as cloning the recordset :-) ? Just populate the collection, and close the recordset/ connection. Saves memory too.

BTW, srikmaha, what is your state?

Mayank.
looks like srikmaha as done the disappearing act. I guess we can freeze posting comments on this until the asker responds.
Split between bhagyesht and mayankeagle.
Avatar of srikmaha

ASKER

I am reading a text file and putting all data into a virtual recordset of VB.I want to find the duplicate value in a specific fields.Niether I can use a complex query using group by nor any functions of sql.Becz the ssql functions do not support  MS text driver.
It is time consuming to use the collection and sort field by field for a milions of records.

Still I am trying to get the best solution.
Thank you friends.


You don't wanna change your query, you don't wanna use extra functions, you don't wanna use collections.... what do you want to do, then? For a disconnected access (which will be faster), Collections will be best. Otherwise, you might have to use an ORDER BY or GROUP BY (whatever you need) in the query.... or clone the RecordSet!

Mayank.
The only efficient thing can be the algorithm that you use for sorting.
If you are going to populate a collection, what are you populating it from?? A recordset?? Then clone it since the memory usage will be virtually the same, also the time it takes to populate a collection verses cloning the recordset equates to me being in the Carribean on the beach sipping Mia Tia's while you're still dizzy from the looping!!!   ;=))
srikmaha: dont you think posting this information earlier would have made life simpler for all of us?
are u ready to use a workaround?
Bhagyesh Trivedi
do u the output as unique values or duplicate values?
Bhagyesh Trivedi
Thanks a lot every one I am going to use the collection and later will sort it to find duplicate values.
Collections are BAD ideas on lot's of records - you'll find this out on your own.

Your recordset must be disconnected.
Set rs.ActiveConnection = Nothing

Dim rs as ADODB.Recordset
Dim col as Collection

Sorting is native to a Recordset - rs.Sort !!!
Where is col.Sort ???

I hope your approach works acceptably for you.
Whether it is native or not, it will still implement some algorithm which will have more or less the same time complexity.
Like I said since you are stuck on doing it with collections and don't mind the performance penalty go for it, and when I open a restuarant where the price of a meal is many times over normal I'll invite you....
Ha, I bet that I will come and eat!

BTW, I had suggested Collections at a time when we did not know the language which the questioner was using. Some languages like Java and the .NET framework do provide in-built methods for sorting. Some Collections are slow, like Vector (because it is synchroznied) and some are fast (like ArrayList). So we did not know what features the language used by the questioner gives. RecordSets are there in many languages.

And the reason why I had suggested Collections to be better than RecordSet was that at that time, we didn't know what kind of RecordSet it was. We assumed it is a normal DB RecordSet with an active Connection - which, I say is much more costlier and slower than a cached Collection would be. Moreover, having more active connections open will be more expensive in multi-user environments, and can also be more prone to exceptions - what if there is a database-access error? What will your RecordSet do in that case? But the Collection will still work.

Anyway, its up to the questioner to decide what he wishes to use.
srikmaha: well apart from the thanks if you have got your answer can u close this question so that there are no invitations to restuarants send out ;-)
Bhagyesh Trivedi
Great I'll let you know ... LOL

Anyway, True, ArrayList's are a good option. I would use them if I had to derive the data from calculations, custom formating etc. (ie I didn't just get the data from the db or a file). If I just received the data from the db or a file why pull it out of an rs to a collection when in essense you already have it in a format that you can sort on? So you are talking 0 seconds since it is already in the rs verses XXX seconds to load a collection. then do a sort. If you set rs2 = rs1.clone it is still faster than looping and loading a collection since all you are doing is duplicating memory - you are not looping to reload the new rs2.

I wouldn't perform sorting on a recordset with an active connection for the very reason you mentioned earlier as as I previously suggested Set rs.activeconnection = nothing.

If you have a clientside rs than you can set the activeconnection to nothing. As per db recordset it can be any recordset (I'm not sure what you think might not be available on the rs if it wasn't a db recordset) once the recordset is loaded it's a recordset, of course you will not have db actions available but it's not a db recordset so you are not trying to db actions anyway.

All the same as you mentioned:
> Anyway, its up to the questioner to decide what he wishes to use.

Cheers,

Richard

>> I'm not sure what you think might not be available on the rs if it wasn't a db recordset)

The questioner said: >> I am reading a text file and putting all data into a virtual recordset of VB.

Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

>> as I previously suggested Set rs.activeconnection = nothing. If you have a clientside rs than you can set the
>> activeconnection to nothing.

Hmmm.... you can do that, but initially, we were not knowing which language he was using. You can't do that in all languages.

Regards,
Mayank.
>> I'm not sure what you think might not be available on the rs if it wasn't a db recordset)

The questioner said: >> I am reading a text file and putting all data into a virtual recordset of VB.

Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

>> as I previously suggested Set rs.activeconnection = nothing. If you have a clientside rs than you can set the
>> activeconnection to nothing.

Hmmm.... you can do that, but initially, we were not knowing which language he was using. You can't do that in all languages.

Regards,
Mayank.

>>Don't know why he was doing that. I have never done that and I don't think that I will ever do it either ;-)

A virtual recordset is a recordset you create (fields names, datatypes etc.) Since you didn't get the info from a database the metadata is missing so you create it yourself. Then you just do an open on it and you can add data - very easy! Plus it is obviously disconnected. Then after you add data, if you set up the field names and datatypes like a table in your database all you would need to do is set an active connection to the ActiveConnection property of the recordset and you can then update the data to your database.

Now, let me think a dumb line of reasoning here:
Srikmaha says:  I am reading a text file and putting all data into a virtual recordset of VB.
   Sooo he'll loop to load a recordset
   Then LOOOOP AGAIN to load a collection instead of cloning it (just a simple memory copy no looping a second time)
Now a STUPID Question - WHHHYYYY ????

mayankeagle are we missing something???

I think Srikmaha is smoking some kind of mind altering substance since we never hear from him!!

Srikmaha - are you seeing yellow, purple, green and orange stripped pink polka-dotted flying elephants yet!!!

Regards,
RGS


I agree that is all right.... the only thing is, it didn't seem all right initially ;-) because we didn't know what he was doing. However, ain't there a better way in VB 6 to read the contents from the file? Anyways, Srikmaha, we're waiting for your response.
Please proceed with that recommendation.