Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

C# SQL Query considerations against a large database

Posted on 2007-03-25
6
Medium Priority
?
2,042 Views
Last Modified: 2013-11-07
I am querying a database table and performing a calculation in each row returned.  I'd like to put the result of this calculation right back in the same table, but I'm unsure of how to do this properly.  The table is very large - 100,000 - 1,000,000 rows - so I can't bring the table local, it must stay on the server.  Here is what I'm currently doing:

OleDbConnection accessConnect = new OleDbConnection()
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);
      }
.............

I need to use the "distribution" value in another query against the same database, so I'm thinking the best way to do this would be to add a "DISTRUBUTION" column to the table I'm querying and adding the "distribution" value.  Would I add this data as soon as it is calculated, i.e.

SqlConnection custConn = new SqlConnection(...);
custConn.Open();
SqlCommand sqlCmd = new SqlCommand();
SqlUpdateCommand1.Connection = custConn;
SqlUpdateCommand1.CommandText="ALTER TABLE table ADD COLUMN DISTRIBUTION Double;
SqlUpdateCommand1.ExecuteNonQuery();

using (OleDbConnection accessConnect = new OleDbConnection())
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);

         SqlUpdateCommand1.CommandText="UPDATE table SET DISTRIBUTION =" + distribution.ToString() + "' WHERE stuff;"
         SqlUpdateCommand1.ExecuteNonQuery();
      }
................

These seems pretty clunky, but I think it would work.  I think the best method would be to create a dataset based on the query, calculate, add, and fill the column, and then update the server database with the dataset.  The problem is that there is too much data, so filling a local dataset isn't an option.  Is there a way I could add the "distribution" column in bulk, once all distrubution values are calculated?

Thanks in advance for the help!!
0
Comment
Question by:nbb007
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 18787729
what kind of calculation is this distribution value?
if possible, you could make that a function in the sql server directly...
0
 

Author Comment

by:nbb007
ID: 18789708
The calculation is a "Cumulative Beta Distribution", and is described here: http://en.wikipedia.org/wiki/Beta_distribution

I am using a library that contains an implementation of this function, so I'm not even sure how it is implemented.  Specifically, I am using a library that I bought from SyncFusion:
"Syncfusion.Windows.Forms.Chart.Statistics.UtilityFunctions.BetaCumulativeDistribution".  An example of it's use can be found on their website: http://www.syncfusion.com/support/evalcenter/default.aspx?cNode=468.

I would love to be able to do this calculation directly in SQL, but I have no idea how to accomplish this.  I did allot of searching on doing this calc directly in SQL before buying the SyncFusion library, so I don't know if it can be done in an SQL query...
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 18790986
>I would love to be able to do this calculation directly in SQL
do you have sql server 2005? you could create a CLR function, which maps that function and hence could be used directly in SQL.

otherwise, I would suggest that you stay with your code, with a "minor" but effective change:
instead of submitting each UPDATE individually, put some 1000 updates together, and submit them as "batch". this will avoid many server roundtrips, and work alot faster.

using System.Text;

SqlConnection custConn = new SqlConnection(...);
custConn.Open();
SqlCommand sqlCmd = new SqlCommand();
SqlUpdateCommand1.Connection = custConn;
SqlUpdateCommand1.CommandText="ALTER TABLE table ADD COLUMN DISTRIBUTION Double;
SqlUpdateCommand1.ExecuteNonQuery();

using (OleDbConnection accessConnect = new OleDbConnection())
{
   try
   {
      accessConnect.Open();//Open the data connection

      OleDbDataReader tableReader = (new OleDbCommand(sqlQuery, accessConnect)).ExecuteReader();

     StringBuilder query = new StringBuilder();
     int query_count  = 0;

     while (tableReader.Read())//true if there are more rows; otherwise, false
     {
        alpha = Convert.ToDouble(tableReader.GetValue(0));
        beta = Convert.ToDouble(tableReader.GetValue(1));

         distribution = CalcUtility.BetaCumulativeDistribution(alpha, beta);

         query_count ++;
         query.Append("UPDATE table SET DISTRIBUTION =" + distribution.ToString() + "' WHERE stuff;");
 
         if (query_count >= 1000)
         {
           SqlUpdateCommand1.CommandText= query.ToString();
           SqlUpdateCommand1.ExecuteNonQuery();
           query = new StringBuilder();
           query_count = 0;
         } // end if query_count
      } // while (dr.Read())

      if (query_count > 0)
      {
        SqlUpdateCommand1.CommandText= query.ToString();
        SqlUpdateCommand1.ExecuteNonQuery();
      }

0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 

Author Comment

by:nbb007
ID: 18794766
Ok, good advice.  I am currently using SQL Server 2000, so the CLR route wouldn't work for me.  I do like the Batch approach however - SQL will execute a string of 1000 queries at once?  There is no way to use the current position of the TableReader object to update the particular row it is currently refering to, is there?
0
 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 2000 total points
ID: 18794795
>SQL will execute a string of 1000 queries at once?
yes.

> There is no way to use the current position of the TableReader object to update the particular row it is currently refering to, is there?
there is, but that will either do what you are currently doing (update each single line) or trying to avoid (ie read the entire data set at once...)
0
 

Expert Comment

by:bhushanvinay
ID: 20067468
Just looking at your problem you could try some thing wild.

CREATE a datatable_target
data column a
data column b
data column c -- calculated.

while r(datatable_source.read())
{
   add rows to the new table from yoru old table
  with any calulated value
}

you can make this typesafe also if you want to make a XSD and create the table ?

dont know if it helps you.

Regards
Vinay
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to install Selenium IDE and loops for quick automated testing. Get Selenium IDE from http://seleniumhq.org Go to that link and select download selenium in the right hand column That will then direct you to their download page. From that p…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
THe viewer will learn how to use NetBeans IDE 8.0 for Windows to perform CRUD operations on a MySql database.
The viewer will learn how to synchronize PHP projects with a remote server in NetBeans IDE 8.0 for Windows.

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question