Solved

Process Large Volume of Records

Posted on 2009-05-14
15
820 Views
Last Modified: 2013-11-08
I'm looking for a way to improve performance on inserting/updating a bunch of records in a SQL database.  I can't use any native SQL DTS or SSIS tools to get the data in there either.  So I'm stuck using C# to read in this CSV file and compare each record with that of what is in the SQL table, and if it's not there, insert it.  My code works, however, it's far from efficient.  I've tried a couple things, such as changing one of my foreach loops to a for loop and instead of using a List<>, I'm now using an array of objects.

Right now I have a CSV file that gets parsed into a datatable with over 150,000 records in it.  The SQL table, at this point, doesn't have anything in it, so it will insert all of these records which calls for over 150,000 insert statements.  Once they're all in there, this will get a new CSV file each week and compare all the records again and see if ANY property has changed, and if so, update that record, and again, if anything new is there, add that record, which should only be 1000 or so.

The problem is the amount of memory being used in this process.  Because of all the SQL calls, SQL is starting to jump up in the amount of memory it uses, and I let the app run for about 10 minutes yesterday, and it got up to about 600mb of ram usage and kept climbing.  I've even tried doing a GC.Collect after every 500 records or so and that doesn't seem to be helping either.

I've attached the code here to see if anyone can shed any light for some optimization or some techniques I can use for processing this many records.
private void ParseMovieData(DataTable dataTable)

	{           

 	    Movie[] movies = repository.FindAll().ToArray();

            int countCreated = 0;

            int countUpdated = 0;

            int countTotal = 0;

            int movieCount = movies.Length;

            Movie movie = null;

            DataRow row;

            bool update;
 

            do

            {

                row = dataTable.Rows[0];

                update = false;
 

                for (int i = 0; i < movieCount; i++)

                {

                    

                    movie = movies[i];
 

                    if (string.IsNullOrEmpty(row.ItemArray[0].ToString()) ||

                        row.ItemArray[0].ToString().ToUpper() != movie.Name.ToUpper()) continue;

                    update = true;

                    break;

                }
 

                if (update)

                {

                    UpdateMovie(repository, row, movie);

                    countUpdated++;

                    Console.Write("{0}: Added: {1}.  Updated: {2}.\r", DateTime.Now, countCreated, countUpdated);

                }

                else

                {

                    AddMovie(repository, row, new Movie());

                    countCreated++;

                    Console.Write("{0}: Added: {1}.  Updated: {2}.\r", DateTime.Now, countCreated, countUpdated);

                }

                

                dataTable.Rows.Remove(row);
 

                if (countTotal > 500)

                {

                    GC.Collect();

                    countTotal = 0;

                }
 

            } while (dataTable.Rows.Count > 0);
 

            UpdateLog(true, countCreated, countUpdated, "Update success.");

	}
 

	private static void AddMovie(IRepository<Movie> repository, DataRow row, Movie movie)

        {

            movie.Name = row.ItemArray[0].ToString();

            movie.Studio = row.ItemArray[1].ToString();
 

            if (!string.IsNullOrEmpty(row.ItemArray[2].ToString()))

            {

                movie.Released = DateTime.Parse(row.ItemArray[2].ToString());

            }

            else

            {

                movie.Released = null;

            }
 

            movie.Status = row.ItemArray[3].ToString();

            movie.Sound = row.ItemArray[4].ToString();

            movie.Versions = row.ItemArray[5].ToString();
 

            if (row.ItemArray[6] != null && !string.IsNullOrEmpty(row.ItemArray[6].ToString()))

            {

                movie.Price = Convert.ToDouble(row.ItemArray[6]);

            }

            else

            {

                movie.Price = null;

            }
 

            movie.Rating = row.ItemArray[7].ToString();

            movie.Year = row.ItemArray[8].ToString();

            movie.Genre = row.ItemArray[9].ToString();

            movie.Aspect = row.ItemArray[10].ToString();
 

            if (row.ItemArray[11] != null && !string.IsNullOrEmpty(row.ItemArray[11].ToString()))

            {

                movie.UPC = Convert.ToInt64(row.ItemArray[11]);

            }

            else

            {

                movie.UPC = null;

            }
 

            if (!string.IsNullOrEmpty(row.ItemArray[12].ToString()))

            {

                movie.DVDReleaseDate = DateTime.Parse(row.ItemArray[12].ToString());

            }

            else

            {

                movie.DVDReleaseDate = null;

            }
 

            if (row.ItemArray[13] != null && !string.IsNullOrEmpty(row.ItemArray[13].ToString()))

            {

                movie.Dbid = Convert.ToInt64(row.ItemArray[13]);

            }

            else

            {

                movie.Dbid = null;

            }
 

            if (!string.IsNullOrEmpty(row.ItemArray[14].ToString()))

            {

                movie.TimeStamp = DateTime.Parse(row.ItemArray[14].ToString());

            }

            else

            {

                movie.TimeStamp = null;

            }
 

            repository.Create(movie);

        }
 

	private static void UpdateMovie(IRepository<Movie> repository, DataRow row, Movie movie)

        {

            bool update = false;
 

            #region Set Properties
 

            #region Set Name
 

            if ((!string.IsNullOrEmpty((string)row.ItemArray[0])) && movie.Name != (string)row.ItemArray[0])

            {

                movie.Name = (string)row.ItemArray[0];

                update = true;

            }
 

            #endregion
 

            #region Set Studio
 

            if (movie.Studio != GetStringToUpdate(row.ItemArray[1], movie.Studio))

            {

                movie.Studio = GetStringToUpdate(row.ItemArray[1], movie.Studio);

                update = true;

            }
 

            #endregion
 

            #region Set Released Date
 

            if (movie.Released != GetDateToUpdate(row.ItemArray[2], movie.Released))

            {

                movie.Released = GetDateToUpdate(row.ItemArray[2], movie.Released);

                update = true;

            }
 

            #endregion
 

            #region Set Status
 

            if (movie.Status != GetStringToUpdate(row.ItemArray[3], movie.Status))

            {

                movie.Status = GetStringToUpdate(row.ItemArray[3], movie.Status);

                update = true;

            }
 

            #endregion
 

            #region Set Sound
 

            if (movie.Sound != GetStringToUpdate(row.ItemArray[4], movie.Sound))

            {

                movie.Sound = GetStringToUpdate(row.ItemArray[4], movie.Sound);

                update = true;

            }
 

            #endregion
 

            #region Set Versions
 

            if (movie.Versions != GetStringToUpdate(row.ItemArray[5], movie.Versions))

            {

                movie.Versions = GetStringToUpdate(row.ItemArray[5], movie.Versions);

                update = true;

            }
 

            #endregion
 

            #region Set Price
 

            if (movie.Price != GetDoubleToUpdate(row.ItemArray[6], movie.Price))

            {

                movie.Price = GetDoubleToUpdate(row.ItemArray[6], movie.Price);

                update = true;

            }
 

            #endregion
 

            #region Set Rating
 

            if (movie.Rating != GetStringToUpdate(row.ItemArray[7], movie.Rating))

            {

                movie.Rating = GetStringToUpdate(row.ItemArray[7], movie.Rating);

                update = true;

            }
 

            #endregion
 

            #region Set Year
 

            if (movie.Year != GetStringToUpdate(row.ItemArray[8], movie.Year))

            {

                movie.Year = GetStringToUpdate(row.ItemArray[8], movie.Year);

                update = true;

            }
 

            #endregion
 

            #region Set Genre
 

            if (movie.Genre != GetStringToUpdate(row.ItemArray[9], movie.Genre))

            {

                movie.Genre = GetStringToUpdate(row.ItemArray[9], movie.Genre);

                update = true;

            }
 

            #endregion
 

            #region Set Aspect
 

            if (movie.Aspect != GetStringToUpdate(row.ItemArray[10], movie.Aspect))

            {

                movie.Aspect = GetStringToUpdate(row.ItemArray[10], movie.Aspect);

                update = true;

            }
 

            #endregion
 

            #region Set UPC
 

            if (movie.UPC != GetLongToUpdate(row.ItemArray[11], movie.UPC))

            {

                movie.UPC = GetLongToUpdate(row.ItemArray[11], movie.UPC);

                update = true;

            }
 

            #endregion
 

            #region Set DVDReleaseDate
 

            if (movie.DVDReleaseDate != GetDateToUpdate(row.ItemArray[12], movie.DVDReleaseDate))

            {

                movie.DVDReleaseDate = GetDateToUpdate(row.ItemArray[12], movie.DVDReleaseDate);

                update = true;

            }
 

            #endregion
 

            #region Set DBID
 

            if (movie.Dbid != GetLongToUpdate(row.ItemArray[13], movie.Dbid))

            {

                movie.Dbid = GetLongToUpdate(row.ItemArray[13], movie.Dbid);

                update = true;

            }
 

            #endregion
 

            #region Set TimeStamp
 

            if (movie.TimeStamp != GetDateToUpdate(row.ItemArray[14], movie.TimeStamp))

            {

                movie.TimeStamp = GetDateToUpdate(row.ItemArray[14], movie.TimeStamp);

                update = true;

            }
 

            #endregion
 

            #endregion
 

            if (update)

            {

                repository.Update(movie);

            }

        }
 

	private static long? GetLongToUpdate(object item, long? property)

        {

            if (item != null)

            {

                if (property != Convert.ToInt64(item))

                {

                    return Convert.ToInt64(item);

                }

            }

            else

            {

                return null;

            }
 

            return property;

        }
 

        private static double? GetDoubleToUpdate(object item, double? property)

        {

            if (item != null)

            {

                if (property != Convert.ToDouble(item))

                {

                    return Convert.ToDouble(item);

                }

            }

            else

            {

                return null;

            }
 

            return property;

        }
 

        private static string GetStringToUpdate(object item, string property)

        {

            if (item != null)

            {

                if (property != (string)item)

                {

                    return (string)item;

                }

            }

            else

            {

                return null;

            }
 

            return property;

        }
 

        private static DateTime? GetDateToUpdate(object item, DateTime? property)

        {

            if (item != null && !string.IsNullOrEmpty(item.ToString()))

            {

                if (property != null)

                {

                    if (property != DateTime.Parse(item.ToString()))

                    {

                        return DateTime.Parse(item.ToString());

                    }

                }

                else

                {

                    return DateTime.Parse(item.ToString());

                }

            }

            else

            {

                return null;

            }
 

            return property;

        }

Open in new window

0
Comment
Question by:Phreak3eb
  • 7
  • 4
  • 2
  • +2
15 Comments
 
LVL 9

Expert Comment

by:Gorkem Yuksel
ID: 24387230
Hi Phreak3eb,

I do something similar in an application I created.  To make it efficient and really fast.. I grab the CSV, parse it into a datatable object, then do a BulkCopy into the SQL server into a temporary table.  From there, it is fairly easy to write a stored procedure to check if it exists, or requires updating...  You can cut down your processing time and memory usuage by about 80-90%.

Cheers,

G.
0
 
LVL 9

Expert Comment

by:tculler
ID: 24387312
Well, I'd need more details to give a really sophisticated explanation, but this should help (as long as the file is line-delimited. I know you're saying CSV, but usually they're line-delimited anyway)


private void LoadValuesExample(System.Data.IDbConnection myConnection)

{

	// Keep adding {N} to formatting args for every field you need

	const System.String INSERT_INTO = "INSERT INTO tableName VALUES({0}, {1});";
 

	using(System.Data.IDbCommand cmd = myConnection.CreateCommand())

	using(System.IO.StreamReader reader = new System.IO.StreamReader(System.IO.File.OpenRead("Path.csv")))

	{

		while(!reader.EndOfStream)

		{

			// Make sure ReadLine() calls match number of formatting args

			cmd.CommandText = System.String.Format(INSERT_INTO, reader.ReadLine(), reader.ReadLine());

			cmd.ExecuteNonQuery();

		}

	}

}

Open in new window

0
 
LVL 22

Expert Comment

by:dportas
ID: 24388448
Can you use BULK INSERT for this? 150,000 rows is not large . . . unless you try to do silly things like inserting it one row at a time using C#!
0
 
LVL 22

Accepted Solution

by:
dportas earned 450 total points
ID: 24388531
Better still, use OPENROWSET. You could do the whole thing with one INSERT statement:

INSERT INTO YourTable (col1, col2, col3)
SELECT col1, col2, col3
FROM OPENROWSET( BULK 'C:\File01.txt',
 FORMATFILE = 'c:\file01.xml') AS t;

0
 

Author Comment

by:Phreak3eb
ID: 24391518
gyuksel:

Could you explain what you mean by or how to do a "BulkCopy"?
0
 
LVL 26

Assisted Solution

by:Anurag Thakur
Anurag Thakur earned 50 total points
ID: 24391624
if i have to do what you are trying to achieve i will be using openrowset as suggested by dportas
what i will do is that first i will import the data to a temporary table and then put in my logic to add data that does not exist in the main table and in the end empty the temp table and this logic will continue again and again

some links for reference purposes
http://support.microsoft.com/kb/321686
http://www.tech-archive.net/Archive/SQL-Server/microsoft.public.sqlserver/2007-05/msg00017.html
0
 

Author Comment

by:Phreak3eb
ID: 24399147
Ok. I've been able to get the data from the CSV file into a staging SQL table using OPENROWSET as suggested by dportas and raqi0017.  It works really well.  Now I'm just left with the problem of updating the records in the main table based on changes to fields in the staging table AND inserting new records from staging table into main table.

Can you guys give me a quick example of how I could use a stored proc that would compare the fields and update as needed and insert records from staging that are new into the main table?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 22

Assisted Solution

by:dportas
dportas earned 450 total points
ID: 24399238
0
 

Author Comment

by:Phreak3eb
ID: 24399575
Ok. I'm trying to use MERGE, but when I use the attached code in a query, I'm getting:

Msg 102, Level 15, State 1, Line 4
Incorrect syntax near 'MERGE'.
Msg 137, Level 15, State 2, Line 5
Must declare the scalar variable "@Dbid".

And this was taken directly off the technet link you gave me.  So I'm not sure why I'm getting a syntax error.
BEGIN

    SET NOCOUNT ON;

           

    MERGE Movies.Dbid AS target

    USING (SELECT @Dbid, @Status, @Studio, @Released, @Sound, @Versions, @Price, @Rating, @Year, @Genre, @Aspect, @UPC, @DVDReleaseDate, @TimeStamp, @name) AS source (Dbid, Status, Studio, Released, Sound, Versions, Price, Rating, Year, Genre, Aspect, UPC, DVDReleaseDate, TimeStamp, Name)

    ON (target.Dbid = source.Dbid)

    WHEN MATCHED THEN 

        UPDATE SET	Name = source.Name,

					Dbid = source.Dbid,

					Status = source.Status,

					Released = source.Release,

					Sound = source.Sound,

					Versions = source.Versions,

					Price = source.Price,

					Rating = source.Rating,

					Year = source.Year,

					Genre = source.Genre,

					Aspect = source.Aspect,

					UPC = source.UPC,

					DVDReleaseDate = source.DVDReleaseDate,

					TimeStamp = source.TimeStamp					

	WHEN NOT MATCHED THEN	

	    INSERT (Dbid, Status, Studio, Released, Sound, Versions, Price, Rating, Year, Genre, Aspect, UPC, DVDReleaseDate, TimeStamp, Name)

	    VALUES (source.Dbid, source.Status, source.Studio, source.Released, source.Sound, source.Versions, source.Price, source.Rating, source.Year, source.Genre, source.Aspect, source.UPC, source.DVDReleaseDate, source.TimeStamp, source.Name)

	    --OUTPUT deleted.*, $action, inserted.* INTO #MyTempTable;

END;

Open in new window

0
 
LVL 22

Expert Comment

by:dportas
ID: 24399820
You posted in the SQL Server 2008 zone, are you sure you are using SQL Server 2008? MERGE isn't supported in earlier versions.

It also seems you haven't declared all your variables.
0
 

Author Comment

by:Phreak3eb
ID: 24400040
Yeah. I realized that my dev box is using SQL 2005.  I'll try the 2008 box later.
0
 
LVL 26

Expert Comment

by:Anurag Thakur
ID: 24401213
i will just achieve the same functionality by using the select statements
something like this
insert into tablename (colnames........)
    select colnamens.....
    from temptable
      where condition to exclude the records and find unique ones
0
 

Author Comment

by:Phreak3eb
ID: 24433929
Ok. I got the MERGE statement to work... kind of.  If there aren't rows in the target table that match to the source table, it inserts them as it should.  However, if I change the Studio field for instance in the source table, it won't update it in the target table.  Here is the exact MERGE statement I'm using.  (Slightly modified from the one I made in a previous post).
MERGE INTO [Movies] As [target]

    USING (SELECT [Dbid], [Status], Studio, Released, Sound, Versions, Price, Rating, [Year], Genre, Aspect, UPC, DVDReleaseDate, [TimeStamp], Name FROM MoviesStaging) AS source

    ON ([target].[Dbid] = source.[Dbid])

    WHEN MATCHED THEN 

        UPDATE SET	Name = source.Name,

			[Dbid] = source.[Dbid],

			[Status] = source.[Status],

			Released = source.Released,

			Sound = source.Sound,

			Versions = source.Versions,

			Price = source.Price,

			Rating = source.Rating,

			[Year] = source.[Year],

			Genre = source.Genre,

			Aspect = source.Aspect,

			UPC = source.UPC,

			DVDReleaseDate = source.DVDReleaseDate,

			[TimeStamp] = source.[TimeStamp]					

	WHEN NOT MATCHED THEN	

	    INSERT ([Dbid], [Status], Studio, Released, Sound, Versions, Price, Rating, [Year], Genre, Aspect, UPC, DVDReleaseDate, [TimeStamp], Name)

	    VALUES (source.[Dbid], source.[Status], source.Studio, source.Released, source.Sound, source.Versions, source.Price, source.Rating, source.[Year], source.Genre, source.Aspect, source.UPC, source.DVDReleaseDate, source.[TimeStamp], source.Name);

Open in new window

0
 

Author Comment

by:Phreak3eb
ID: 24433956
Nevermind. I just realized I didn't have the Studio field in my update statement.  Everything is working fine.  Thank you for your help.
0
 

Author Closing Comment

by:Phreak3eb
ID: 31581509
50 to raqi0017 for providing links to examples I used to get the final OPENROWSET statement to work correctly.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now