C# SQL Query considerations against a large database

I am querying a database table and performing a calculation in each row returned.  I'd like to put the result of this calculation right back in the same table, but I'm unsure of how to do this properly.  The table is very large - 100,000 - 1,000,000 rows - so I can't bring the table local, it must stay on the server.  Here is what I'm currently doing:

OleDbConnection accessConnect = new OleDbConnection()
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);
      }
.............

I need to use the "distribution" value in another query against the same database, so I'm thinking the best way to do this would be to add a "DISTRUBUTION" column to the table I'm querying and adding the "distribution" value.  Would I add this data as soon as it is calculated, i.e.

SqlConnection custConn = new SqlConnection(...);
custConn.Open();
SqlCommand sqlCmd = new SqlCommand();
SqlUpdateCommand1.Connection = custConn;
SqlUpdateCommand1.CommandText="ALTER TABLE table ADD COLUMN DISTRIBUTION Double;
SqlUpdateCommand1.ExecuteNonQuery();

using (OleDbConnection accessConnect = new OleDbConnection())
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);

         SqlUpdateCommand1.CommandText="UPDATE table SET DISTRIBUTION =" + distribution.ToString() + "' WHERE stuff;"
         SqlUpdateCommand1.ExecuteNonQuery();
      }
................

These seems pretty clunky, but I think it would work.  I think the best method would be to create a dataset based on the query, calculate, add, and fill the column, and then update the server database with the dataset.  The problem is that there is too much data, so filling a local dataset isn't an option.  Is there a way I could add the "distribution" column in bulk, once all distrubution values are calculated?

Thanks in advance for the help!!
nbb007Asked:
Who is Participating?
 
Guy Hengel [angelIII / a3]Connect With a Mentor Billing EngineerCommented:
>SQL will execute a string of 1000 queries at once?
yes.

> There is no way to use the current position of the TableReader object to update the particular row it is currently refering to, is there?
there is, but that will either do what you are currently doing (update each single line) or trying to avoid (ie read the entire data set at once...)
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
what kind of calculation is this distribution value?
if possible, you could make that a function in the sql server directly...
0
 
nbb007Author Commented:
The calculation is a "Cumulative Beta Distribution", and is described here: http://en.wikipedia.org/wiki/Beta_distribution

I am using a library that contains an implementation of this function, so I'm not even sure how it is implemented.  Specifically, I am using a library that I bought from SyncFusion:
"Syncfusion.Windows.Forms.Chart.Statistics.UtilityFunctions.BetaCumulativeDistribution".  An example of it's use can be found on their website: http://www.syncfusion.com/support/evalcenter/default.aspx?cNode=468.

I would love to be able to do this calculation directly in SQL, but I have no idea how to accomplish this.  I did allot of searching on doing this calc directly in SQL before buying the SyncFusion library, so I don't know if it can be done in an SQL query...
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
>I would love to be able to do this calculation directly in SQL
do you have sql server 2005? you could create a CLR function, which maps that function and hence could be used directly in SQL.

otherwise, I would suggest that you stay with your code, with a "minor" but effective change:
instead of submitting each UPDATE individually, put some 1000 updates together, and submit them as "batch". this will avoid many server roundtrips, and work alot faster.

using System.Text;

SqlConnection custConn = new SqlConnection(...);
custConn.Open();
SqlCommand sqlCmd = new SqlCommand();
SqlUpdateCommand1.Connection = custConn;
SqlUpdateCommand1.CommandText="ALTER TABLE table ADD COLUMN DISTRIBUTION Double;
SqlUpdateCommand1.ExecuteNonQuery();

using (OleDbConnection accessConnect = new OleDbConnection())
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     StringBuilder query = new StringBuilder();
     int query_count  = 0;

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);

         query_count ++;
         query.Append("UPDATE table SET DISTRIBUTION =" + distribution.ToString() + "' WHERE stuff;");
 
         if (query_count >= 1000)
         {
           SqlUpdateCommand1.CommandText= query.ToString();
           SqlUpdateCommand1.ExecuteNonQuery();
           query = new StringBuilder();
           query_count = 0;
         } // end if query_count
      } // while (dr.Read())

      if (query_count > 0)
      {
        SqlUpdateCommand1.CommandText= query.ToString();
        SqlUpdateCommand1.ExecuteNonQuery();
      }

0
 
nbb007Author Commented:
Ok, good advice.  I am currently using SQL Server 2000, so the CLR route wouldn't work for me.  I do like the Batch approach however - SQL will execute a string of 1000 queries at once?  There is no way to use the current position of the TableReader object to update the particular row it is currently refering to, is there?
0
 
bhushanvinayCommented:
Just looking at your problem you could try some thing wild.

CREATE a datatable_target
data column a
data column b
data column c -- calculated.

while r(datatable_source.read())
{
   add rows to the new table from yoru old table
  with any calulated value
}

you can make this typesafe also if you want to make a XSD and create the table ?

dont know if it helps you.

Regards
Vinay
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.