ADO.Net - Evaluating t-sql against data in a DataSet / DataTable - returning a 'distinct column' row subset from ado.net

I wrote a stored procedure that didn't work because it was iterating through thousands of records and performing selects for each iteration. Extremely inefficient, and much to slow to be useful.

I hadn't used ado.net for anything more then the basics before yesterday, populating readers and datasets, looping through records, but I did think one day it might come in handy, possibly when a bad sproc just wasn't cutting it. I still haven’t done anything interesting with ado.net, but after poking around, playing with the .Select() method and reading msdn’s fascinating explanation of the .Expression property and .FindRows() and what have you, it seems that ado.net doesn't really act as a database in the logical sky as I would have hoped. I don't see a way to simply evaluate a "select distinct column" statement against data in a datatable/dataview/dataset.

What I'd like to do is return an entire table's data from my database into a disconnected holder in my app. Once it's there. I want to do a bunch of stuff to it, starting with returning a set of distinct values on one of the columns, and iterating through that collection doing 'count(*)'s where string evaluations on a column return true.

If all I have at my disposal is .Select() and .FindRows(), it seems as though I may be out of luck. Is there any way to return a subset of records in a ado.net holder that mirror the results of using the 'distinct' keyword on a column in the recordset? How would I retreive a record count of rows in a datatable where a varchar column matches a string? Is there any way to evaluate t-sql against data in a dataset/datatable/dataview?

bd
brooklynDevAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bob LearnedCommented:
I don't see how returning all rows to the client, and then performing a Select would be any better than running from a Stored Procedure.

Bob
brooklynDevAuthor Commented:
because the sproc returns a recordset and then loops through the recordset preforming selects for each record. Beleive me, it can't really be done like this on the db side. I can post the sproc if you'd like to see it.
Bob LearnedCommented:
Sure, if it is not a gonzo beast :)

Bob
OWASP Proactive Controls

Learn the most important control and control categories that every architect and developer should include in their projects.

brooklynDevAuthor Commented:
declare @days int
set @days = -90

CREATE TABLE #DistinctReferrers (
     id integer identity(1, 1) primary key,
     http_referer varchar(8000) )

CREATE TABLE #DistinctReferreringDomains (
     id integer identity(1, 1) primary key,
     domain varchar(8000),
     numReferrals int,
     firstDateReferred datetime )

insert into #DistinctReferrers
select distinct http_referer from session where dateEntered > dateadd( d, -90, getDate() )
and http_referer <> 'none' and http_referer <> '' and http_referer not like '%tunababy.com%' and http_referer like '%http%'

declare @done bit
set @done = 0

while ( EXISTS (SELECT 1 FROM #DistinctReferrers) )
Begin
     declare @count int
     declare @domainLogged int
     declare @recordID int
     declare @referrer varchar( 512 )
     declare @domain varchar( 256 )
     declare @numReferrals int
     declare @firstDateReferred datetime
     select top 1 @recordID = id, @referrer = http_referer from #DistinctReferrers
     set @domain = substring( @referrer, charindex( '//', @referrer) + 2, len( @referrer ) )
     set @domain = substring( @domain, 0, charindex( '/', @domain) )
     if ( len(@domain) = 0 ) set @domain = substring( @referrer, charindex( '//', @referrer) + 2, len( @referrer ) )
     select @numReferrals = count(*), @firstDateReferred = min(dateEntered) from session where http_referer like '%'+ @domain + '%'
     select @domainLogged = count(*) from #DistinctReferreringDomains where domain = @domain
     if ( @domainLogged = 0 ) insert #DistinctReferreringDomains( domain, numReferrals, firstDateReferred ) values( @domain, @numReferrals, @firstDateReferred )
     delete from #DistinctReferrers where id = @recordID
End

select * from #DistinctReferreringDomains

drop table #DistinctReferrers
drop table #DistinctReferreringDomains
Bob LearnedCommented:
Could you tell me in English what this query does?

Bob
brooklynDevAuthor Commented:
sure,

This sproc is part of an asp.net web application. The query returns a list of referring domain with the number of referrals each domain has referred.

For each session that is created by asp.net when a visitor surfs my website I track the referrer server variable. What I'm trying to return for a stats page is a list of referring domains with the number of referrals each of them kicked down.

What I do in the query is, I get a list of distinct referrers, then I loop through that list, parse out the domain for each record, count the number of referrals for that domain, write a new record in a temporary table, and skip that domain if it shows up in the loop again, writing one record per domain in my temporary table: #DistinctReferreringDomains. Then I have a nice list of unique referring domains and the number of sessions created from that referrer in my temporary table, and I return a select *, and everything is roses, except the query takes a million years to run.

Bob LearnedCommented:
And, what does the schema look like, with some example data?

Bob
brooklynDevAuthor Commented:
the query works with one table from the db:

Session

and two columns for that table:

http_referer and dateEntered

Here is some sample data:

http_referer                                                                                 dateEntered
-------------------------------------------------------------------                               ------------------------------------------------------
http://www.wreckingpit.com/psycho/sslink/links.php?cat=7                2005-07-01 12:58:53.313
none                                                                                           2005-07-01 12:24:43.480
http://www.juxtapoz.com/html/links.html                                         2005-07-01 12:20:21.940
http://www.purpleglitter.com/links/Style/Clothing/                             2005-07-01 11:47:25.360
none                                                                                           2005-07-01 11:01:23.890
none                                                                                           2005-07-01 08:55:44.337
http://www.juxtapoz.com/html/links.html                                         2005-07-01 08:48:15.847
http://www.juxtapoz.com/html/links.html                                         2005-07-01 07:41:47.020
http://www.juxtapoz.com/html/links.html                                         2005-07-01 07:09:12.320
none                                                                                           2005-07-01 06:17:05.027

(10 row(s) affected)

grayeCommented:
I've got an SQL Engine class written in VB.Net that might be what you're looking for.   It is designed to execute SQL commands against DataTables in a detached DataSet.

Hummm... I didn't implement the DISTINCT keyword exactly like what you're expecting (I made it distinct for the entire row).  But you might be able to look at the source code and make some tweaks to implement the syntax that you're after.

I'm not quite finished with the project (hope to get back to it this week, actually), but it's compete enough..  the only thing lacking is left joins.  It's available as VB.Net source code only.

You're welcome to take a peek at http://home.hot.rr.com/graye/WorkInProgress.htm
brooklynDevAuthor Commented:
that's pretty cool graye, wonder if ms will ever build a sql engine like that for ado.net
grayeCommented:
I doubt it... it's not in VS2005
Bob LearnedCommented:
Try this query instead, and tell me how badly I missed:


SELECT Count(*), Min(DateEntered), Domain
FROM (

    SELECT dateEntered, CASE WHEN Patindex('http%://_%._%/%', http_referer)>0 Then
                Substring(http_referer, charindex('://',http_referer) + 3, charindex('/',http_referer, charindex('://',http_referer) + 3) - (charindex('://',http_referer) + 3))
            ELSE
                Substring(@ht,charindex('://',http_referer) + 3, Len(http_referer))
         END As Domain
    FROM Session
    WHERE dateEntered > dateadd( d, -90, getDate() )
    AND http_referer LIKE '%http%'
    AND http_referer NOT LIKE '%tunababy.com%' ) As Raw
GROUP BY Domain

Bob

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
brooklynDevAuthor Commented:
that query worked beautifully, thank you for the tsql lesson, I'm sure I'll use your techniques many more times
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.