Suggestions : Ways to maintain a .NET data object against an SQL source?

I have a scenario where a webpage starts by display up to 1000 records from an SQL server with about 20 relatively small field sizes and where I want to then maintain that view in sync with the database.
I am using .NET and my approach has been to select the appropriate data and then periodically requery with a WHERE clause against the datetime fields such that only data modified since the last known update date is retrieved.  After an update the last update date is updated to match the highest datetime value encountered so far and so on and so fourth.  This has worked well smaller datasets but I'm curious as to what the best way is to manage the integration of these updates into the in memory datatable/class or other type of data object.
The data does have a unique a primary key, new rows may appear in update checks with their modified date being the date they were created but rows do not get deleted which simplifies the requirements.

I have read about using selects against the datatable to find the row, merges and other approaches but am not sure which method typically gives best performance.
I would expect only 1-2 rows to be updated at any time with most initial datasets typically being under 100 rows.

There is no need to save back to the datasource so this doesn't need to be considered in the suggestions.

The datasource is SQL 2000 which I understand is outdated although I am guessing that this is largely irrelevant for the purpose of managing it efficiently at the webserver end.

Any advice appreciated.
Who is Participating?
Jacques Bourgeois (James Burger)Connect With a Mentor PresidentCommented:
"The data does have a unique a primary key" is why I understood that you do not have one. A primary key is always unique.

You seem to have access to the database since you say that you could modify the indexes. So here are a few thoughts, understanding that you users will have different filters, but always receive the same set of fields.

First, take a look at the database diagrams (there were diagrams in 2000, isn't it? The stuff that is used to define relations between the tables. I did not used SQL Server at that time) and make sure that the relations are properly defined, with indexes for the fields that are use in these relations. This is one of the most important factors as far as speed is concerned when you have many links between tables, because the query optimizer has the proper information to do a good job.

If you know that some fields will be filtered most of the time, an index on these might also be a good idea. There are always fields and fields combinations, such as CustomerName/PostalCode, that are queried very regularly.

The idea of a temporary table to hold the results is not bad. But instead of a regular requery of the whole thing, I would see if it would not be more interesting to use triggers to update that temporary table. Triggers fire automatically each time there is a change in the original data, so your temporary table would always be up to date.

All you need in our application is then to requery your temp table every 30 seconds.

As for the technology to use...

A DataReader is usually the fastest way to retrieve information for display only. But since it gets the data one line at a time and does not keep track of it, you need to record that data somewhere if you want to be able to have the necessary information to refresh it regularly. A custom collection is usually the best solution performance wise, because all the other methods create extra overhead to be able to perform operations that you do not need when all you do is display the data.

As far as I am concerned, I almost always work with custom collections... because I have the leisure to do so. I do not work in an enterprise environment anymore, so deadlines and productions constraints are not weighting on my shoulders. I have all the time I want to take the time to do things my way, in what I consider to be the most efficient way of working with data. The problem with a custom collection in many real world situations is that it does not provide any mechanism other than to collect the data. You will need to design and code the class that will record the data in memory, and do the same for all the mechanism that will be used to refresh the data. That can take a long time compared to...

DataSets and DataTables. They do a lot for you, at the extra cost that the data takes a lot more space in memory for the overhead need to do that job, and a lot of functions you will never use. One has to be careful too, because DataSets are overused. Most of the time a DataTable is sufficient. See my article on the subject. A DataTable is thus often the best thing to use in a programming environment where you are always on a rush to get things done yesterday.

Stuff such as LINQ seems at first to be the easiest to use. Personally, I do not like working with those "We do everything for you" technologies. I like to have some control over what is happening. I want to look at the data eye to eye. I am a SQL Server person, and do not like it when Microsoft start saying stuff like We will continue make some investments in LINQ to SQL based on customer feedback. This post was about making our intentions for future innovation clear and to call out the fact that as of .NET 4.0, LINQ to Entities will be the recommended( (Clarifying the message on L2S Futures)).

They go for the generic mechanism. What I liked about ADO.NET compare with ADO is that finally, we had classes specifically geared to each type of database. SqlClient is a lot better than "do it the same for everybody" OleDB. Data entities are the OleDB of LINQ. I understand that many programmers out there have to work with different sources of data. Learning that there is a technology that favors them is good for them, but nor for me.

I have to say that I never liked LINK anyway and have never used it in a real application. Great at first. Until you want to do stuff that it cannot do. Until you have bugs that you cannot understand because too many things are hidden from you. I consider that time spent on something that cannot do everything is time lost investing in mastering a technology that can. So I would stick to basic ADO.NET (DateTables, DataReaders and their friends).

This being said, I know very good programmers who look at LINQ as they look at God. So maybe there is something I did not grasp there. And investing in data entities might be absolutely necessary for somebody who is not approaching the end of his career as I do, because if you look at the latest technologies developed by Microsoft, such as Windows Store applications, they use data entities instead of ADO.NET. The DataTable is not available in a Windows Store application, and I am not sure it will be very strong in what they will bring us with all that is coming with that Windows 10 that will run on many different platforms.

In any case, the real answer is always the same: test.
Jacques Bourgeois (James Burger)PresidentCommented:
There is never a definitive answer to such a question, too much factors have to be considered.

What is the type of your dataset : DataReader, DataTable, DataSet, custom Collection, LINQ, data entity, something else?

Do you use databinding or do you fill the display yourself? Do you display the data in a grid, in a table or as simple text?

What is the expected size of the thing in the future? This seems to be relatively low for now, so it might not make a big difference how you do it if you code the right way for the mechanisms you use.

And there might be more factors that do not come directly to my mind.

If the data is simply for display and is no bigger than what you say, then simply recalling the original query and refreshing the display with the results might be the best solution.

A little more trafic than bringing back only the modified rows, but on most systems, that would compensate for running a new query that checks, added and extra field (last update) and update that field. And maybe not. Since you do not have a primary key, what is the type of the field(s) that you use to identify the rows. Some types of fields slow the performance, so the difference between a complete requery and and a filtered one can vary. You have to test.

A whole requery will be simpler for you but might be a little slower to update the screen for the user, depending on whether you databind the whole dataset to the display or handle the display of individual rows yourself.

A merge might be better for other situations, but implies that you temporarily deal with 2 datasets + the time it takes for the merge. This might be faster with small to medium datasets, but might start to be a problem as the amount of data grows if there is not enough memory to handle multiple datasets at the same time.

The thing is that if it works for you now, "if it ain't broken, don't fix it". If you do not expect the amount of data to go up significantly in the future, you should be OK in the future.

If we had a straight answer to give to you right now for your current situation, it would be useless if you consider the preceding sentence.

And if it is as a lead in case the load eventually goes up, then is would not be really useful, because things will have changed at that point. You might not be on SQL Server 2000 anymore (you know you will have to upgrade eventually), the network will be faster, there will be a new browser and a new version of the framework with new technologies. What is the best today might not be the best at that point.

If it is out of curiosity, to prepare for future projects and situations, then it would also be useless, because not 2 situations are the same. The combination of data / retrieval method / display mode always change somewhat.

The final answer is that the only real way to know what is best in a given situation is to test different methods for that situation, before putting it into production, typically with a big set of data build specifically for the test.

Any other answer would come from the experience of programmers who where not working with the same data and in the same environment, and maybe with different technology components, and would only be a guess.
dgloverukAuthor Commented:
Hi Jacques,
Thank you for your reply.
The nature of the query is such that despite being a fairly small result set it is joining across many tables and not performing terribly quickly at the database side.  Since the tables belong to an off the shelf product rather than our creation I am limited in how I might choose to modify or index the original table designs to improve the query times.  There are also many fields upon which my end users may filter, with any number of filters simultaneously which I suspect are burdening the query.

In some simulations using a stored procedure with parameters the stored procedures were taking over 2 seconds to run when timed using the profiler.
Given that my screen updates need to be every 30 seconds and it could be up to 30 or so sessions running at once, requerying each time did not seem like a good option.  Each user would be requesting different views so it is not unfortunately possible for them to share a datasource.

My thinking has been I ought to recreate a selection of the data into another table I have more control over and apply my own indexes and to keep that up to date, with a requery of the original source periodically applying some date criteria to filter upon updated records.  So in essence the SQL server is syncing periodically to a table that my end users can then query.  Since the original source without filters will return 30000 rows, creating a table out of these results won't be an especially large overhead.

My question perhaps better phrased (or not) was of the technologies you mentioned, DataReader, DataTable, DataSet, custom Collection and LINQ is there one that stands out to you as being easier to manage updates against and which provides better flexibility for web server not sql server filtering based on multiple criteria.
Just to clarify Jaques, I DO have a primary key, you seemed to have read that I did not.  As you understood, I do not need to allow the user to update the data, I just need an efficient way of keeping it true to the underlying datasource with a delay of up to 30-60 seconds being acceptable if necessary.

dgloverukAuthor Commented:
Thank you for your considered and comprehensive answer Jacques.
I will use the datareader with custom collection approach.  I am familiar with this approach and I'm not under pressure to fire out applications trading off design preferences for speed.  You helped me with another question about disposing objects in this area.
I won't be able to use Triggers unfortunately or rather, because so many tables are involved in the query, I would presumably need a trigger per table which sounds like it could get messy.

I feel similarly about LINQ although I can't use the excuse of getting near the end of my working life!
I have found LINQ to be eloquent but hard to create working models.

It strikes me similarly to when I went from classic ASP, to .NET, much of what I did became easier but more unusual required behaviours became harder.

Thank you for your advice and particularly your balanced approach and considerations of finesse versus practicality.

All Courses

From novice to tech pro — start learning today.