Solved

C# SQL Query considerations against a large database

Posted on 2007-03-25
6
2,034 Views
Last Modified: 2013-11-07
I am querying a database table and performing a calculation in each row returned.  I'd like to put the result of this calculation right back in the same table, but I'm unsure of how to do this properly.  The table is very large - 100,000 - 1,000,000 rows - so I can't bring the table local, it must stay on the server.  Here is what I'm currently doing:

OleDbConnection accessConnect = new OleDbConnection()
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);
      }
.............

I need to use the "distribution" value in another query against the same database, so I'm thinking the best way to do this would be to add a "DISTRUBUTION" column to the table I'm querying and adding the "distribution" value.  Would I add this data as soon as it is calculated, i.e.

SqlConnection custConn = new SqlConnection(...);
custConn.Open();
SqlCommand sqlCmd = new SqlCommand();
SqlUpdateCommand1.Connection = custConn;
SqlUpdateCommand1.CommandText="ALTER TABLE table ADD COLUMN DISTRIBUTION Double;
SqlUpdateCommand1.ExecuteNonQuery();

using (OleDbConnection accessConnect = new OleDbConnection())
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);

         SqlUpdateCommand1.CommandText="UPDATE table SET DISTRIBUTION =" + distribution.ToString() + "' WHERE stuff;"
         SqlUpdateCommand1.ExecuteNonQuery();
      }
................

These seems pretty clunky, but I think it would work.  I think the best method would be to create a dataset based on the query, calculate, add, and fill the column, and then update the server database with the dataset.  The problem is that there is too much data, so filling a local dataset isn't an option.  Is there a way I could add the "distribution" column in bulk, once all distrubution values are calculated?

Thanks in advance for the help!!
0
Comment
Question by:nbb007
  • 3
  • 2
6 Comments
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 18787729
what kind of calculation is this distribution value?
if possible, you could make that a function in the sql server directly...
0
 

Author Comment

by:nbb007
ID: 18789708
The calculation is a "Cumulative Beta Distribution", and is described here: http://en.wikipedia.org/wiki/Beta_distribution

I am using a library that contains an implementation of this function, so I'm not even sure how it is implemented.  Specifically, I am using a library that I bought from SyncFusion:
"Syncfusion.Windows.Forms.Chart.Statistics.UtilityFunctions.BetaCumulativeDistribution".  An example of it's use can be found on their website: http://www.syncfusion.com/support/evalcenter/default.aspx?cNode=468.

I would love to be able to do this calculation directly in SQL, but I have no idea how to accomplish this.  I did allot of searching on doing this calc directly in SQL before buying the SyncFusion library, so I don't know if it can be done in an SQL query...
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 18790986
>I would love to be able to do this calculation directly in SQL
do you have sql server 2005? you could create a CLR function, which maps that function and hence could be used directly in SQL.

otherwise, I would suggest that you stay with your code, with a "minor" but effective change:
instead of submitting each UPDATE individually, put some 1000 updates together, and submit them as "batch". this will avoid many server roundtrips, and work alot faster.

using System.Text;

SqlConnection custConn = new SqlConnection(...);
custConn.Open();
SqlCommand sqlCmd = new SqlCommand();
SqlUpdateCommand1.Connection = custConn;
SqlUpdateCommand1.CommandText="ALTER TABLE table ADD COLUMN DISTRIBUTION Double;
SqlUpdateCommand1.ExecuteNonQuery();

using (OleDbConnection accessConnect = new OleDbConnection())
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     StringBuilder query = new StringBuilder();
     int query_count  = 0;

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);

         query_count ++;
         query.Append("UPDATE table SET DISTRIBUTION =" + distribution.ToString() + "' WHERE stuff;");
 
         if (query_count >= 1000)
         {
           SqlUpdateCommand1.CommandText= query.ToString();
           SqlUpdateCommand1.ExecuteNonQuery();
           query = new StringBuilder();
           query_count = 0;
         } // end if query_count
      } // while (dr.Read())

      if (query_count > 0)
      {
        SqlUpdateCommand1.CommandText= query.ToString();
        SqlUpdateCommand1.ExecuteNonQuery();
      }

0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:nbb007
ID: 18794766
Ok, good advice.  I am currently using SQL Server 2000, so the CLR route wouldn't work for me.  I do like the Batch approach however - SQL will execute a string of 1000 queries at once?  There is no way to use the current position of the TableReader object to update the particular row it is currently refering to, is there?
0
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 500 total points
ID: 18794795
>SQL will execute a string of 1000 queries at once?
yes.

> There is no way to use the current position of the TableReader object to update the particular row it is currently refering to, is there?
there is, but that will either do what you are currently doing (update each single line) or trying to avoid (ie read the entire data set at once...)
0
 

Expert Comment

by:bhushanvinay
ID: 20067468
Just looking at your problem you could try some thing wild.

CREATE a datatable_target
data column a
data column b
data column c -- calculated.

while r(datatable_source.read())
{
   add rows to the new table from yoru old table
  with any calulated value
}

you can make this typesafe also if you want to make a XSD and create the table ?

dont know if it helps you.

Regards
Vinay
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

After several hours of googling I could not gather any information on this topic. There are several ways of controlling the USB port connected to any storage device. The best example of that is by changing the registry value of "HKEY_LOCAL_MACHINE\S…
This article shows how to deploy dynamic backgrounds to computers depending on the aspect ratio of display
The viewer will learn how to use and create new code templates in NetBeans IDE 8.0 for Windows.
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now