Process Large Volume of Records

I'm looking for a way to improve performance on inserting/updating a bunch of records in a SQL database.  I can't use any native SQL DTS or SSIS tools to get the data in there either.  So I'm stuck using C# to read in this CSV file and compare each record with that of what is in the SQL table, and if it's not there, insert it.  My code works, however, it's far from efficient.  I've tried a couple things, such as changing one of my foreach loops to a for loop and instead of using a List<>, I'm now using an array of objects.

Right now I have a CSV file that gets parsed into a datatable with over 150,000 records in it.  The SQL table, at this point, doesn't have anything in it, so it will insert all of these records which calls for over 150,000 insert statements.  Once they're all in there, this will get a new CSV file each week and compare all the records again and see if ANY property has changed, and if so, update that record, and again, if anything new is there, add that record, which should only be 1000 or so.

The problem is the amount of memory being used in this process.  Because of all the SQL calls, SQL is starting to jump up in the amount of memory it uses, and I let the app run for about 10 minutes yesterday, and it got up to about 600mb of ram usage and kept climbing.  I've even tried doing a GC.Collect after every 500 records or so and that doesn't seem to be helping either.

I've attached the code here to see if anyone can shed any light for some optimization or some techniques I can use for processing this many records.
private void ParseMovieData(DataTable dataTable)
	{           
 	    Movie[] movies = repository.FindAll().ToArray();
            int countCreated = 0;
            int countUpdated = 0;
            int countTotal = 0;
            int movieCount = movies.Length;
            Movie movie = null;
            DataRow row;
            bool update;
 
            do
            {
                row = dataTable.Rows[0];
                update = false;
 
                for (int i = 0; i < movieCount; i++)
                {
                    
                    movie = movies[i];
 
                    if (string.IsNullOrEmpty(row.ItemArray[0].ToString()) ||
                        row.ItemArray[0].ToString().ToUpper() != movie.Name.ToUpper()) continue;
                    update = true;
                    break;
                }
 
                if (update)
                {
                    UpdateMovie(repository, row, movie);
                    countUpdated++;
                    Console.Write("{0}: Added: {1}.  Updated: {2}.\r", DateTime.Now, countCreated, countUpdated);
                }
                else
                {
                    AddMovie(repository, row, new Movie());
                    countCreated++;
                    Console.Write("{0}: Added: {1}.  Updated: {2}.\r", DateTime.Now, countCreated, countUpdated);
                }
                
                dataTable.Rows.Remove(row);
 
                if (countTotal > 500)
                {
                    GC.Collect();
                    countTotal = 0;
                }
 
            } while (dataTable.Rows.Count > 0);
 
            UpdateLog(true, countCreated, countUpdated, "Update success.");
	}
 
	private static void AddMovie(IRepository<Movie> repository, DataRow row, Movie movie)
        {
            movie.Name = row.ItemArray[0].ToString();
            movie.Studio = row.ItemArray[1].ToString();
 
            if (!string.IsNullOrEmpty(row.ItemArray[2].ToString()))
            {
                movie.Released = DateTime.Parse(row.ItemArray[2].ToString());
            }
            else
            {
                movie.Released = null;
            }
 
            movie.Status = row.ItemArray[3].ToString();
            movie.Sound = row.ItemArray[4].ToString();
            movie.Versions = row.ItemArray[5].ToString();
 
            if (row.ItemArray[6] != null && !string.IsNullOrEmpty(row.ItemArray[6].ToString()))
            {
                movie.Price = Convert.ToDouble(row.ItemArray[6]);
            }
            else
            {
                movie.Price = null;
            }
 
            movie.Rating = row.ItemArray[7].ToString();
            movie.Year = row.ItemArray[8].ToString();
            movie.Genre = row.ItemArray[9].ToString();
            movie.Aspect = row.ItemArray[10].ToString();
 
            if (row.ItemArray[11] != null && !string.IsNullOrEmpty(row.ItemArray[11].ToString()))
            {
                movie.UPC = Convert.ToInt64(row.ItemArray[11]);
            }
            else
            {
                movie.UPC = null;
            }
 
            if (!string.IsNullOrEmpty(row.ItemArray[12].ToString()))
            {
                movie.DVDReleaseDate = DateTime.Parse(row.ItemArray[12].ToString());
            }
            else
            {
                movie.DVDReleaseDate = null;
            }
 
            if (row.ItemArray[13] != null && !string.IsNullOrEmpty(row.ItemArray[13].ToString()))
            {
                movie.Dbid = Convert.ToInt64(row.ItemArray[13]);
            }
            else
            {
                movie.Dbid = null;
            }
 
            if (!string.IsNullOrEmpty(row.ItemArray[14].ToString()))
            {
                movie.TimeStamp = DateTime.Parse(row.ItemArray[14].ToString());
            }
            else
            {
                movie.TimeStamp = null;
            }
 
            repository.Create(movie);
        }
 
	private static void UpdateMovie(IRepository<Movie> repository, DataRow row, Movie movie)
        {
            bool update = false;
 
            #region Set Properties
 
            #region Set Name
 
            if ((!string.IsNullOrEmpty((string)row.ItemArray[0])) && movie.Name != (string)row.ItemArray[0])
            {
                movie.Name = (string)row.ItemArray[0];
                update = true;
            }
 
            #endregion
 
            #region Set Studio
 
            if (movie.Studio != GetStringToUpdate(row.ItemArray[1], movie.Studio))
            {
                movie.Studio = GetStringToUpdate(row.ItemArray[1], movie.Studio);
                update = true;
            }
 
            #endregion
 
            #region Set Released Date
 
            if (movie.Released != GetDateToUpdate(row.ItemArray[2], movie.Released))
            {
                movie.Released = GetDateToUpdate(row.ItemArray[2], movie.Released);
                update = true;
            }
 
            #endregion
 
            #region Set Status
 
            if (movie.Status != GetStringToUpdate(row.ItemArray[3], movie.Status))
            {
                movie.Status = GetStringToUpdate(row.ItemArray[3], movie.Status);
                update = true;
            }
 
            #endregion
 
            #region Set Sound
 
            if (movie.Sound != GetStringToUpdate(row.ItemArray[4], movie.Sound))
            {
                movie.Sound = GetStringToUpdate(row.ItemArray[4], movie.Sound);
                update = true;
            }
 
            #endregion
 
            #region Set Versions
 
            if (movie.Versions != GetStringToUpdate(row.ItemArray[5], movie.Versions))
            {
                movie.Versions = GetStringToUpdate(row.ItemArray[5], movie.Versions);
                update = true;
            }
 
            #endregion
 
            #region Set Price
 
            if (movie.Price != GetDoubleToUpdate(row.ItemArray[6], movie.Price))
            {
                movie.Price = GetDoubleToUpdate(row.ItemArray[6], movie.Price);
                update = true;
            }
 
            #endregion
 
            #region Set Rating
 
            if (movie.Rating != GetStringToUpdate(row.ItemArray[7], movie.Rating))
            {
                movie.Rating = GetStringToUpdate(row.ItemArray[7], movie.Rating);
                update = true;
            }
 
            #endregion
 
            #region Set Year
 
            if (movie.Year != GetStringToUpdate(row.ItemArray[8], movie.Year))
            {
                movie.Year = GetStringToUpdate(row.ItemArray[8], movie.Year);
                update = true;
            }
 
            #endregion
 
            #region Set Genre
 
            if (movie.Genre != GetStringToUpdate(row.ItemArray[9], movie.Genre))
            {
                movie.Genre = GetStringToUpdate(row.ItemArray[9], movie.Genre);
                update = true;
            }
 
            #endregion
 
            #region Set Aspect
 
            if (movie.Aspect != GetStringToUpdate(row.ItemArray[10], movie.Aspect))
            {
                movie.Aspect = GetStringToUpdate(row.ItemArray[10], movie.Aspect);
                update = true;
            }
 
            #endregion
 
            #region Set UPC
 
            if (movie.UPC != GetLongToUpdate(row.ItemArray[11], movie.UPC))
            {
                movie.UPC = GetLongToUpdate(row.ItemArray[11], movie.UPC);
                update = true;
            }
 
            #endregion
 
            #region Set DVDReleaseDate
 
            if (movie.DVDReleaseDate != GetDateToUpdate(row.ItemArray[12], movie.DVDReleaseDate))
            {
                movie.DVDReleaseDate = GetDateToUpdate(row.ItemArray[12], movie.DVDReleaseDate);
                update = true;
            }
 
            #endregion
 
            #region Set DBID
 
            if (movie.Dbid != GetLongToUpdate(row.ItemArray[13], movie.Dbid))
            {
                movie.Dbid = GetLongToUpdate(row.ItemArray[13], movie.Dbid);
                update = true;
            }
 
            #endregion
 
            #region Set TimeStamp
 
            if (movie.TimeStamp != GetDateToUpdate(row.ItemArray[14], movie.TimeStamp))
            {
                movie.TimeStamp = GetDateToUpdate(row.ItemArray[14], movie.TimeStamp);
                update = true;
            }
 
            #endregion
 
            #endregion
 
            if (update)
            {
                repository.Update(movie);
            }
        }
 
	private static long? GetLongToUpdate(object item, long? property)
        {
            if (item != null)
            {
                if (property != Convert.ToInt64(item))
                {
                    return Convert.ToInt64(item);
                }
            }
            else
            {
                return null;
            }
 
            return property;
        }
 
        private static double? GetDoubleToUpdate(object item, double? property)
        {
            if (item != null)
            {
                if (property != Convert.ToDouble(item))
                {
                    return Convert.ToDouble(item);
                }
            }
            else
            {
                return null;
            }
 
            return property;
        }
 
        private static string GetStringToUpdate(object item, string property)
        {
            if (item != null)
            {
                if (property != (string)item)
                {
                    return (string)item;
                }
            }
            else
            {
                return null;
            }
 
            return property;
        }
 
        private static DateTime? GetDateToUpdate(object item, DateTime? property)
        {
            if (item != null && !string.IsNullOrEmpty(item.ToString()))
            {
                if (property != null)
                {
                    if (property != DateTime.Parse(item.ToString()))
                    {
                        return DateTime.Parse(item.ToString());
                    }
                }
                else
                {
                    return DateTime.Parse(item.ToString());
                }
            }
            else
            {
                return null;
            }
 
            return property;
        }

Open in new window

Phreak3ebAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gorkem YukselCommented:
Hi Phreak3eb,

I do something similar in an application I created.  To make it efficient and really fast.. I grab the CSV, parse it into a datatable object, then do a BulkCopy into the SQL server into a temporary table.  From there, it is fairly easy to write a stored procedure to check if it exists, or requires updating...  You can cut down your processing time and memory usuage by about 80-90%.

Cheers,

G.
0
tcullerCommented:
Well, I'd need more details to give a really sophisticated explanation, but this should help (as long as the file is line-delimited. I know you're saying CSV, but usually they're line-delimited anyway)

private void LoadValuesExample(System.Data.IDbConnection myConnection)
{
	// Keep adding {N} to formatting args for every field you need
	const System.String INSERT_INTO = "INSERT INTO tableName VALUES({0}, {1});";
 
	using(System.Data.IDbCommand cmd = myConnection.CreateCommand())
	using(System.IO.StreamReader reader = new System.IO.StreamReader(System.IO.File.OpenRead("Path.csv")))
	{
		while(!reader.EndOfStream)
		{
			// Make sure ReadLine() calls match number of formatting args
			cmd.CommandText = System.String.Format(INSERT_INTO, reader.ReadLine(), reader.ReadLine());
			cmd.ExecuteNonQuery();
		}
	}
}

Open in new window

0
dportasCommented:
Can you use BULK INSERT for this? 150,000 rows is not large . . . unless you try to do silly things like inserting it one row at a time using C#!
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

dportasCommented:
Better still, use OPENROWSET. You could do the whole thing with one INSERT statement:

INSERT INTO YourTable (col1, col2, col3)
SELECT col1, col2, col3
FROM OPENROWSET( BULK 'C:\File01.txt',
 FORMATFILE = 'c:\file01.xml') AS t;

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Phreak3ebAuthor Commented:
gyuksel:

Could you explain what you mean by or how to do a "BulkCopy"?
0
Anurag ThakurTechnical ManagerCommented:
if i have to do what you are trying to achieve i will be using openrowset as suggested by dportas
what i will do is that first i will import the data to a temporary table and then put in my logic to add data that does not exist in the main table and in the end empty the temp table and this logic will continue again and again

some links for reference purposes
http://support.microsoft.com/kb/321686
http://www.tech-archive.net/Archive/SQL-Server/microsoft.public.sqlserver/2007-05/msg00017.html
0
Phreak3ebAuthor Commented:
Ok. I've been able to get the data from the CSV file into a staging SQL table using OPENROWSET as suggested by dportas and raqi0017.  It works really well.  Now I'm just left with the problem of updating the records in the main table based on changes to fields in the staging table AND inserting new records from staging table into main table.

Can you guys give me a quick example of how I could use a stored proc that would compare the fields and update as needed and insert records from staging that are new into the main table?
0
dportasCommented:
0
Phreak3ebAuthor Commented:
Ok. I'm trying to use MERGE, but when I use the attached code in a query, I'm getting:

Msg 102, Level 15, State 1, Line 4
Incorrect syntax near 'MERGE'.
Msg 137, Level 15, State 2, Line 5
Must declare the scalar variable "@Dbid".

And this was taken directly off the technet link you gave me.  So I'm not sure why I'm getting a syntax error.
BEGIN
    SET NOCOUNT ON;
           
    MERGE Movies.Dbid AS target
    USING (SELECT @Dbid, @Status, @Studio, @Released, @Sound, @Versions, @Price, @Rating, @Year, @Genre, @Aspect, @UPC, @DVDReleaseDate, @TimeStamp, @name) AS source (Dbid, Status, Studio, Released, Sound, Versions, Price, Rating, Year, Genre, Aspect, UPC, DVDReleaseDate, TimeStamp, Name)
    ON (target.Dbid = source.Dbid)
    WHEN MATCHED THEN 
        UPDATE SET	Name = source.Name,
					Dbid = source.Dbid,
					Status = source.Status,
					Released = source.Release,
					Sound = source.Sound,
					Versions = source.Versions,
					Price = source.Price,
					Rating = source.Rating,
					Year = source.Year,
					Genre = source.Genre,
					Aspect = source.Aspect,
					UPC = source.UPC,
					DVDReleaseDate = source.DVDReleaseDate,
					TimeStamp = source.TimeStamp					
	WHEN NOT MATCHED THEN	
	    INSERT (Dbid, Status, Studio, Released, Sound, Versions, Price, Rating, Year, Genre, Aspect, UPC, DVDReleaseDate, TimeStamp, Name)
	    VALUES (source.Dbid, source.Status, source.Studio, source.Released, source.Sound, source.Versions, source.Price, source.Rating, source.Year, source.Genre, source.Aspect, source.UPC, source.DVDReleaseDate, source.TimeStamp, source.Name)
	    --OUTPUT deleted.*, $action, inserted.* INTO #MyTempTable;
END;

Open in new window

0
dportasCommented:
You posted in the SQL Server 2008 zone, are you sure you are using SQL Server 2008? MERGE isn't supported in earlier versions.

It also seems you haven't declared all your variables.
0
Phreak3ebAuthor Commented:
Yeah. I realized that my dev box is using SQL 2005.  I'll try the 2008 box later.
0
Anurag ThakurTechnical ManagerCommented:
i will just achieve the same functionality by using the select statements
something like this
insert into tablename (colnames........)
    select colnamens.....
    from temptable
      where condition to exclude the records and find unique ones
0
Phreak3ebAuthor Commented:
Ok. I got the MERGE statement to work... kind of.  If there aren't rows in the target table that match to the source table, it inserts them as it should.  However, if I change the Studio field for instance in the source table, it won't update it in the target table.  Here is the exact MERGE statement I'm using.  (Slightly modified from the one I made in a previous post).
MERGE INTO [Movies] As [target]
    USING (SELECT [Dbid], [Status], Studio, Released, Sound, Versions, Price, Rating, [Year], Genre, Aspect, UPC, DVDReleaseDate, [TimeStamp], Name FROM MoviesStaging) AS source
    ON ([target].[Dbid] = source.[Dbid])
    WHEN MATCHED THEN 
        UPDATE SET	Name = source.Name,
			[Dbid] = source.[Dbid],
			[Status] = source.[Status],
			Released = source.Released,
			Sound = source.Sound,
			Versions = source.Versions,
			Price = source.Price,
			Rating = source.Rating,
			[Year] = source.[Year],
			Genre = source.Genre,
			Aspect = source.Aspect,
			UPC = source.UPC,
			DVDReleaseDate = source.DVDReleaseDate,
			[TimeStamp] = source.[TimeStamp]					
	WHEN NOT MATCHED THEN	
	    INSERT ([Dbid], [Status], Studio, Released, Sound, Versions, Price, Rating, [Year], Genre, Aspect, UPC, DVDReleaseDate, [TimeStamp], Name)
	    VALUES (source.[Dbid], source.[Status], source.Studio, source.Released, source.Sound, source.Versions, source.Price, source.Rating, source.[Year], source.Genre, source.Aspect, source.UPC, source.DVDReleaseDate, source.[TimeStamp], source.Name);

Open in new window

0
Phreak3ebAuthor Commented:
Nevermind. I just realized I didn't have the Studio field in my update statement.  Everything is working fine.  Thank you for your help.
0
Phreak3ebAuthor Commented:
50 to raqi0017 for providing links to examples I used to get the final OPENROWSET statement to work correctly.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.