[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1821
  • Last Modified:

Foreach loop every slow

This application reads data from a huge file and then, after doing some calculations, it stores the data in another table.

But the loop doing this (see below) is taking a really long time. Since the file sometimes contains 1,000s of records, the entire process takes days.

Can I replace this foreach loop with something else? I tried using Parallel.ForEach and it did help. I am new to this, so will appreciate your help.


using (command = new SqlCommand("[sp]", sqlConn))
{
    command.CommandTimeout = 0;
    command.CommandType = CommandType.StoredProcedure;
    foreach (record someRecord in someReport.)
    {
        command.Parameters.Clear();
        command.Parameters.Add(....)
        command.Prepare();                            

        using (dr = command.ExecuteReader())
        {
            while (dr.Read())
            {
                if ()
                {
                   log something
                }
                else if ()
                {
                   log something
                }
            }
        }                            
    }                        
}
0
mkrch
Asked:
mkrch
  • 8
  • 5
  • 4
  • +2
1 Solution
 
sachitjainCommented:
It looks like some batch updates problem statement. I think above mentioned .Net code is not optimal choice for solving this problem because each and every record is being pulled over database connection. Rather try doing it this way:

1.> Create one stored procedure that implements your If-LogSomething-Else-LogSomething logic for each of the records in SomeReport object.
2.> Now the challenge would be that how the collection of these records would be accessed by SQL Server stored procedure. So there could be 2 ways, if these records are in some external file or something then export them directly into some staging table in DB itself. Or if these records are created/updated through some logic in application only then generate an XML containing all these records and pass that XML as varchar into the stored procedure. Thus within stored procedure, you could iterate through each of these records in XML and do the needful logging.
0
 
Alan WarrenCommented:
Make sure your sp runs slick in Sql Server Management Studio before looking for problems in the ASP page.

How long does it take to execute the stored procedure "sp" using sql server management studio?

If it takes a long time to execute in SSMS, you can choose the option to display the execution plan (CTRL L), which in turn often suggests optimisation actions, when you execute the sp in a SSMS Query window. Often there are missing index details, which you can get by right clicking in the background  of the Execution Plan window and choosing "Missing Index Details..."

Might want to do the same for your  "log something" sql too.

Alan
0
 
DcpKingCommented:
As I understand you, you want to read in some data from a flat file (like a .csv or similar text file), fix up problems in the data, and stick it into a table.

In your place I'd use bulk insert in some SQL code to pull the info in to a staging table, and then sort things out there. The B.I. subsystem has been around quite a while and it really quick - .Net realy isn't designed for pulling millions of records in at a time!

If you're put off by the idea of the B.I. then go take a look at Pinal Dave's example - it really can be that easy!

hth

Mike
0
Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

 
Olaf DoschkeSoftware DeveloperCommented:
As others pointed out this would rather be a server side job. You pull data through LAN only to make a small decision on what to log. You cancelled out the only other part about that inner loop action, besides the sp execution, which could consume a reasonable time: Your if and elseif conditions. If they call into something taking long, of course that takes long. If it's merely simple conditions you could also put in t-sql in a where clause, it would be faster to do so and don't pull data to the client side at all.

You don't just have a loop, you have two nested loops, for each record of a report you exeute an sp and loop all it's result records via while (dr.Read()).

Staying on the DotNet side, there is one big potential of making this faster: command.Prepare();

Preparing a command is very good to execute a command repeatedly, but it must be done once for all calls and you do prepare every single call, which is making that preparation useless. It's meant to be used once for many similar calls.

I assume only parameter values change from call to call, not parameter number, names, types and anything else. So make that outside of the loop, and in the loop just change the parameter values.

There's not much potential in preparing an sp call, it's just an exec of sp function name with parameters. The code inside the sp already is prepared, precompiled code by being an sp. So there is not much to expect from changing that, but you haven't understand the meaning of prepare by doing it repeatedly once for each single command execution.

Anyhing else strongly depends on the sp and how long each execution takes. A loop itself is never slow, it's just a multiplicator of how long a single iteration needs and so you need to optimize that sp called in each iteration in the first place and if that doesn't suffice, see if you can even put the whole loop inside sql server to get rid of the LAN bottleneck, through which you pull data.

The answer to the program duration most probably lies in the code of the sp.

Bye, Olaf.
0
 
mkrchAuthor Commented:
Execution plan
I ran the stored procedure and it does seem to be slow. I am attaching the execution plan. Basically we are doing updates or inserts in the procedure. let me know how I can fix this.
Thanks
0
 
mkrchAuthor Commented:
@alanwarrenP I understand that at the database level, Bulk inserts will be much easier , but we cannot implement this at this time, So I'll have to find solution in .net itself.

Thank you
0
 
Alan WarrenCommented:
Hi mkrch,

the execution plan snapshot doesn't appear to offer much scope in the way of suggested optimisation. Wondering about the @data param, a name like that usually indicates varchar(max) or varbinary in VB, are we passing Binary Large Object (BLOB) data up the tube here?

Alan
0
 
Alan WarrenCommented:
Hi mkrch,

Do all the records in the huge file need to be migrated to the db, or is it only the ones that have changed or don't exist in the db that need to be migrated to the db.

We are currently initiating a log something action for each and every record.

Wondering if we could parse the huge file, find the records which are new and the records that are modified, dump the rest of them and only send the ones we need up the tube.

It may be a case of updating records that already exist with the same data we already have, in which case we don't need to send them.

If we could weed those records out prior to invoking logging, the process that takes days may be reduced to hours or even minutes.

Does your huge file have a timestamp field; and does your destination table have a timestamp field? If so we could compare the timestamp and the length of the @data param, if both match, there's a good chance we don't need to send that record, because we have it.

Alan
0
 
Olaf DoschkeSoftware DeveloperCommented:
>Basically we are doing updates or inserts in the procedure.

Could you simply post the code? The execution plan is not unusual in case of an insert, the update of a clustered index takes the main time. There's nothing to optimize there. It's executing each insert separately, what takes much longer than doing a bulk operation, that`s the culprit.

>we cannot implement this at this time
why?

There's another thing the sp must do, it must return records, otherwise your inner while loop would not make any sense:

  using (dr = command.ExecuteReader())
        {
            while (dr.Read())
            {
...
            }
....
         }

Open in new window


command is the call of the sp, and as you ExecuteReader, you read something back.

What is the overall operaiotn, anyway? Is it as alan guesses, you're updating your data from a file, which has old, updated and new data? Then the best thing to do is to insert all data into a temp or staging table and check for old, updated and new from there, from inside.

You're sending in all data one record at a time anyway, and this is not the way to go with huge amounts of data. If you need to process huge amounts of data and even more so process it against another huge amount of data, then better put the data together at one place to enable bulk processing.

Bye, Olaf.
0
 
mkrchAuthor Commented:
Yes Alan is correct here .We are reading data from a flat file , a csv file and then based on the calculating total values.  We are saving the information in a list<t> and then executing the stored procedure that checks for 3 different  conditions . If the conditions are true , then we are updating 3 tables  , if false, then we are inserting new values.
The stored procedure return errorcodes , and the following loop , updates a log file based on the results of the stored procedure. You are correct when you say that each separate insert is the root of the problem.
while (dr.Read())
            {
...
            }

Can you send me an example of how I can insert all data into a temp or staging table and check for old, updated and new from there, from inside ?

Thanks again . I appreciate your help.
I have to use the .net application because the mgmt does not wish to go that route. I have presented the idea and they are against it.
0
 
DcpKingCommented:
Write a stored procedure that pulls in the file using bulk insert and then invoke that from a .net call ? The way you're going now is doomed to take a very long time!
0
 
Alan WarrenCommented:
Hi mkrch,

Re: "The stored procedure return errorcodes , and the following loop , updates a log file based on the results of the stored procedure."

Wondering what sort of errors you are returning?

Obviously reducing the error incidence would be a gain.

Thinking however, should an error occur, that the stored procedure could be modified to append error details to the error_log_table directly, being as we are already there, rather than returning the error codes to the app, then initiating another round trip to the Sql server.

Any chance of taking a peek at the sproc that returns the error codes and the sproc that appends errors to the error log?

Alan
0
 
Olaf DoschkeSoftware DeveloperCommented:
DcpKing has already said in short, what to do with the flat file to get it read into a staging table. Even shorter,Do a BULK INSERT.

The part about " check for old, updated and new from there" is easy, if the staging table contains data in the same way as a database table you want to merge with that data. If, as in your case, you "are updating 3 tables", then it seems a bit more complicated. You'd have to explain that a bit more.

I'll illustrate the simpler case. Assumed you have a customers table and want to update it from a flat file containing updated customer info and new customers. Assumed that flat file is already read in via BULK INSERT into a table customersStaging, you do a MERGE afterwards:

  MERGE customers AS target
    USING (SELECT * from customersStaging) AS source (id, name)
    ON (target.id = source.id)
    WHEN MATCHED THEN
        UPDATE SET name = source.name
      WHEN NOT MATCHED THEN      
          INSERT (id, name)
          VALUES (source.id,  source.name)

That updates existing and inserts new customers in one go. Take a look in books online about MERGE, it also has an OUTPUT clause to do logging of actions done on the data, eg changes applied.

This is possible inside SQL and only inside, you need to first read in the flat file, as MARGE can't tkae a flat file as source, but if you go that way such a merge of new data into the database is one command only, including logging output of what's deen done.

You can even make use of this in more comp├╝licated cases, where you want to merge in sales data into a stock table, for example. Take a look at the help, it has samples.

Bye, Olaf.
0
 
mkrchAuthor Commented:
@Olaf_Doschke  
Merge statement sounds like a great idea. I have too many if exists clauses . Can I convert those into merge. Secondly the sp updates or inserts in 3 tables . Can I do this using one single merge statement or I'll need to implement 3 separate statements.

thanks
0
 
Olaf DoschkeSoftware DeveloperCommented:
As MERGE defines one target, seems you need 3 merges, yes. But the good news is it will most probably make a nice replacement of if exists subqueries, as you can do quite complex queries for the source and also make more complex matching conditions than just a matching id.

Bye, Olaf.
0
 
DcpKingCommented:
Takk, Olaf :)
0
 
mkrchAuthor Commented:
Let me try and getback to you. Thanks
0
 
mkrchAuthor Commented:
Here ts the sproc. I converted this into 3 merge statements and getting more error.

:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER OFF
GO@variable3
ALTER PROCEDURE [dbo].[proc]

    (
        @variable1                  VARCHAR(20),
        @variable2                  DateTime,
        @variable3                  BIGINT,
        @variable4                  INT,      
        @variable5                  BIGINT,
        @variable6                  INT,
      @variable7                  DateTime
    ) AS
BEGIN

Declare @VID INT
Declare @NowDate DateTime
Declare @VariableDate Date

Declare @SUCCESS_RETURN_VALUE INT
Declare @ERROR_Condition1_RETURN INT
Declare @ERROR_Condition12_RETURN INT

Set @SUCCESS_RETURN_VALUE = 0
Set @ERROR_Condition1_RETURN INT = 2
Set @ERROR_Condition12_RETURN INT = 1

select @VID = db_id, @VariableDate = convert(date, some_date) from tbl_devices where db_device_mdn = @VID

if(@VID is not null and convert(date, @variable2) >= @VariableDate)

Begin
    if not exists(select 1 from Table1 where id = @VID)
    Begin
        Insert into Table1 values(...............)
    End
    Else
    Begin
        update Table1 set a = @variable7, b = 0, c=
        where id = @VID
    End

    if not exists(select 1 from Table2 where id = @VID)
    Begin
        insert into Table2 values (.............)
    End
    Else
    Begin
        update Table2 set ----------------------
            where id = @VID
    End
      
    if exists(select 1 from Table3 where device_id = @VID and somedate = @variable2)
    Begin
        update Table3 set values ----------------
    End
    else
    Begin
        insert into Table3 values (---)
    End

    select @SUCCESS_RETURN_VALUE as [ErrorCode]
End
else if(@VID is not null and convert(date, @variable2) < @VariableDate)
Begin
      select @ERROR_Condition1_RETURN as [ErrorCode]
End
Else
Begin
      select @ERROR_Condition12_RETURN as [ErrorCode]
End

end

I will post the merge statement in a few mins
0
 
Alan WarrenCommented:
Hi mkrch,

Rather than returning the error code to the app, or prior to returning the code to the app, you could invoke the error_report stored procedure/s from this stored procedure.
That would save you a round-trip, or maybe a couple of round-trips to the server and you wouldn't need your "log something" calls in your original posted C# script.
Begin
      select @ERROR_Condition1_RETURN as [ErrorCode]
      EXEC dbo.Report_Error @ERROR_Condition1_RETURN
End
Else
Begin
      select @ERROR_Condition12_RETURN as [ErrorCode]
      EXEC dbo.Report_Error @ERROR_Condition12_RETURN
End

Open in new window

Alan
0
 
mkrchAuthor Commented:
Merge statement did not resolve the issue. Right now am trying out table variables. I passed my list to a datatable and passing this as a table variable to the stored procedure. I have used aliases as I am going but for some reason I am getting
Invalid object name '@TableVar'. Below is my stored procedure. Please help !


Alter PROCEDURE [dbo].[proc]
    (
        @TableVar [dbo].[TableVar] READONLY
    )
    AS
BEGIN
Set nocount on
Declare @MYID INT
Declare @SOMEDate DateTime
Declare @NEWDate Date


select @MYID = some_id,
@SOMEDate = convert(date, db_some_date) from tbl_some
where someval = (select tb.val1 from [@TableVar] as tb);


Merge DeviceBillCycle as target
  using (Select x.val1, x.val2, x.val3,x.val4,
                        x.val5,x.val6, x.val7 from [@TableVar] as x)
  as Source

  on val_id = @MYID
  when matched and  @MYID !=NULL then
 
 update set MDN= Source.[MDN],
                  val1= Source.[val1],
                  val2=Source.[val2],
                  val3=Source.[val3],
                  val4=Source.[val4],
                  val5=Source.[val5],
                  val6=Source.[val6]
       
  when not matched and @MYID !=NULL then  
   insert  
   values (Source.[val1],
           Source.[val2],
           Source.[val3],
               Source.[val4],
               Source.[val5],
               Source.[val6],
               Source.[val7],
               [@MYID]
      )      ;
end
0
 
mkrchAuthor Commented:
I forgot to add, I am planning to use this merge statement inside a CTE or a cursor so that I can loop within the server and not go back to the .net again and again. let me know what you think about this approach. thank you so much for all your help so far.
0

Featured Post

Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

  • 8
  • 5
  • 4
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now