[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 251
  • Last Modified:

An efficient way to process a filtered set of DataRows

An efficient way to process a filtered set of DataRows

My VB 2008 app requires processing a subset of DataRows from a child table for each row in the parent table. When I first designed the app, I used all of the rows belonging to a particular parent row, so I used drParent.GetChildRows to get the rows, and worked off that. That works very efficiently.

The latest and greatest version of the app requires using filtered subgroups of the rows: the rows can be revised, and I need to process each revision group separately. So a row might indicate that it's active for all revisions, another is active for revisions 0-2, another for 3-end, another for just revision 1, another for 4-end, etc.

This seemed like an ideal situation for a Linq query, so I set up such a query against the DataTable, with WHERE clauses for the parent table linking key and the active revision. But I'm getting a massive performance hit on large databases. It seems to me, looking at the results and what I'm seeing happening with the debugger, that the query is being reprocessed each time a reference is made to it, which means that the time to process my whole file is rising exponentially as the file gets bigger. (One particular function that previously took 10 milliseconds now takes over 500 ms with a 4000-row table.)

So that doesn't seem like a workable solution for me. I need to use a different methodology, but I'm not sure what's the most efficient way. I could use a DataView, but then I lose strong typing, and I'll have to go back through all my code to rewrite every reference to a DataRow. What I'd like is some sort of strongly typed grouping of rows from a DataTable that I can create with different filtering choices with minimal overhead. Suggestions?
  • 4
  • 3
1 Solution
Consider processing the entire child table and placing the rows into a Dictionary<Revision, DataRow>
Then when processing each parent row, you just index into the Dictionary with each needed revision to get its DataRow.
If you have multiple DataRow per Revision then use List<DataRow>
ElrondCTAuthor Commented:
I wasn't familiar with the Dictionary class, but I don't think that's going to help me. I only want the rows that belong to a particular parent table row, there can be multiple child rows for each revision, and a single row may be active in multiple revisions. It doesn't look to me like there's an easy way to recognize all of that in a Dictionary item.

List(Of DataRow) looks like it could probably do what I want. However, I'm not sure of the best way to load the list from the table. What comes to mind is to filter with a DataView, then loop through the results and load each result row into the List. Does that sound like the right approach? (Anything more direct would be preferable.)
So drParent has a Revision column in it which maps to a Revision column in the other DataTable?
What I am suggesting is do some coding and do the join in code, rather than using the inefficient query engine.
So each Revision may have multiple rows from the other DataTable that match?
If that be the case then you would want Dictionary<Revision,List<DataRow>>
Provide some names of the tables and columns and their types and I can put together a code sample.
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

ElrondCTAuthor Commented:
Not quite as simple as that (though I've debated restructuring it that way). In names following, I'll use "dt" as a prefix for a DataTable, and "dr" for the DataRow. dtParent has a Revision column. dtChild has a RevisionStart and a RevisionEnd column, because a child row may be active for one, multiple, or all revisions.

I need to go through all revisions, processing the child rows that apply to that revision. Very simplified, it would be (with a Linq query to show getting the child rows):

For i = 0 to drParent.Revision
    LinqChild = From drqChild in dtChild _
        Where drqChild.fldKey = drParent.fldKey _
        And drqChild.RevisionStart <= i _
        And drqChild.RevisionEnd >= i _
        Order By drqChild.OrderNumber
   < Process child records >
Next i

I've been experimenting with alternatives, and dtChild.Select gives reasonable results; it's slower than things were before I had multiple revisions, but the result set is strongly typed, sorted, and referenced by index, so I don't have to rewrite all the rest of my code. But I'm open to a Dictionary solution if that would be more efficient; I'm still not sure, though, how to load that efficiently.
Basically this pre-groups the child rows based on the Revisions they relate to
    Dim dRev2Children As New Dictionary(Of Integer, List(Of DataRow))
    Dim ldrChildren As List(Of DataRow)

    ' Construct the dictionary from dtChild
    For Each drChild In dtChild.Rows
        For r = drChild.RevisionStart To drChild.RevisionEnd
            If Not dRev2Children.TryGetValue(r, ldrChildren) Then
                ldrChildren = New List(Of DataRow)
                dRev2Children.Add(r, ldrChildren)
            End If

    For Each drParent In dtParent.Rows
        ' Now given a drParent, use the related dtChild rows
        ldrChildren = dRev2Children(drParent.Revision)
        For Each drChild In ldrChildren
            ' use members of drChild and drParent

Open in new window

Also if you want to be even more efficient, you can avoid loading all the data into a DataTable and instead use a DataReader to load everything into structures.
ElrondCTAuthor Commented:
Sorry for the delay in responding. The directory idea is an interesting one, but as I looked at implementing it, I realized that it actually won't work in this situation, because in many of the cases, I have to also do date filtering, and just checking dates inside a For Each loop (use if it matches, skip if it doesn't) is insufficient, because I need the count of valid rows in various places.

So I think my best solution is going to be to do a .Select on the DataTable. The selection itself can't use strong typing, but the result set does, and that's more useful (and requires less recoding). But thanks for mentioning the Directory class; I wasn't previously aware of it, and it could be useful in the future.

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now