Link to home
Start Free TrialLog in
Avatar of David Bach
David BachFlag for United States of America

asked on

More Efficient Inserting Of Rows For SQL Server

Greetings:

A database with a table called "RawTestData". This contains just short of 1,000,000 rows.

I need to parse each row into 3 columns and store the result in table (in the same database) called "RawTestDataFormatted".

A C# method is used to insert each row.

The rows are committed at a rate of 23.18 rows per second. At this rate it will take 11.8 hours to complete.

I'm looking for a more efficient way of committing the rows. My C# method is as follows:
	protected bool SaveRawTestDataFormatted(string strStudentName,
											int intTestNumber,
											DateTime dtSection,
											ref long bigRawTestDataId)
	{
		cmdParse.Parameters.Clear();

		try
		{
			cnnParse.ConnectionString = ConfigurationManager.ConnectionStrings["ParseConnectString"].ConnectionString;

			cmdParse.Parameters.Add("@strStudentName", SqlDbType.NVarChar, 50).Value = strStudentName;
			cmdParse.Parameters.Add("@intTestNumber", SqlDbType.Int).Value = intTestNumber;
			cmdParse.Parameters.Add("@dtSection", SqlDbType.DateTime).Value = dtSection;
			cmdParse.Parameters.Add("@bigRawTestDataId", SqlDbType.BigInt).Direction = ParameterDirection.Output;

			cmdParse.Connection = cnnParse;
			cmdParse.CommandText = "up_i_RawTestDataFormatted";
			cmdParse.CommandType = CommandType.StoredProcedure;

			cnnParse.Open();
			cmdParse.ExecuteNonQuery();
			cnnParse.Close();

			bigRawTestDataId = Convert.ToInt64(cmdParse.Parameters["@bigRawTestDataId"].Value);

		}
		catch (Exception ex)
		{
			Response.Write("Error: " + ex.Message);
			return false;
		}
		return true;
	}

Open in new window

The T-SQL is as follows:
USE [DB_117672_training]
GO
/****** Object:  StoredProcedure [dbo].[up_i_RawTestDataFormatted]    Script Date: 6/14/2019 1:40:30 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:		
-- Create date: 6/13/2019
-- Description:	Insert raw data formatted row
-- =============================================
ALTER PROCEDURE [dbo].[up_i_RawTestDataFormatted] 
	-- Add the parameters for the stored procedure here
	@strStudentName			nvarchar(50),
	@intTestNumber			int,
	@dtSection				datetime,
	@bigRawTestDataId		bigint				OUT
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

    -- Insert statements for procedure here
	INSERT		RawTestDataFormatted
					(strStudentName,
					intTestNumber,
					dtSection)
		VALUES		(@strStudentName,
					@intTestNumber,
					@dtSection)

	SET			@bigRawTestDataId = SCOPE_IDENTITY()
END

Open in new window

Any assistance you might provide would be most appreciated.

Thank you,
David Bach
Avatar of kaufmed
kaufmed
Flag of United States of America image

You probably want BULK INSERT for this task.
Another unanswered question: Why C#? Why don't you do it T-SQL?

If it must be in C#, then you should look into parallel operations. Maybe a simple Parallel.For can do it, assuming that you have an optimal posibitility to partition your data.
Avatar of David Bach

ASKER

Greetings kaufmed and ste5an;

kaufmed - I was hoping to avoid creating a flat file and loading this back in to SQL using Bulk Insert.

ste5an - The solution does not require it be done in C#. if you have a suggestion using T-SQL I am most interested? I can see I confused the description of what I'm after. My apologies. I'm using C# to invoke the T-SQL INSERT.

I look forward to your responses.

Much thanks,
David Bach
Please post the table DDL of your table including existing indices. Also add some sample data. And explain your extraction/splitting logic.
ASKER CERTIFIED SOLUTION
Avatar of John Tsioumpris
John Tsioumpris
Flag of Greece image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Where does the source data come from?
John - if I understand you correctly, I would generate an INSERT for each row then subsequently send the 132,930 INSERT statements to Table-A along with INSERT statements for 4M to 5M rows to Table-B which has a 1 to many relationship with Table-A.

Is this correct, John?

kaufmed - The source is Table-C which contains raw log data which I then parse so the data is formatted.

ste5an - the DDL is as follows for the Table-A:
/****** Object:  Table [dbo].[RawTestDataFormatted]    Script Date: 6/16/2019 2:40:22 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[RawTestDataFormatted](
	[bigRawTestDataId] [bigint] IDENTITY(1,1) NOT NULL,
	[strStudentName] [nvarchar](50) NOT NULL,
	[intTestNumber] [int] NOT NULL,
	[dtSection] [datetime] NOT NULL,
 CONSTRAINT [PK_RawTestDataFormatted] PRIMARY KEY CLUSTERED 
(
	[bigRawTestDataId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

Open in new window

The DDL for Table-B is:
/****** Object:  Table [dbo].[RawTestDataFormattedAnswer]    Script Date: 6/16/2019 2:43:32 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[RawTestDataFormattedAnswer](
	[fkbigRawTestDataId] [bigint] NOT NULL,
	[intQuestionNumber] [int] NOT NULL,
	[chrAnswer] [char](1) NOT NULL,
 CONSTRAINT [PK_RawTestDateFormattedAnswers] PRIMARY KEY CLUSTERED 
(
	[fkbigRawTestDataId] ASC,
	[intQuestionNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[RawTestDataFormattedAnswer]  WITH CHECK ADD  CONSTRAINT [FK_RawTestDataFormattedAnswer_RawTestDataFormatted] FOREIGN KEY([fkbigRawTestDataId])
REFERENCES [dbo].[RawTestDataFormatted] ([bigRawTestDataId])
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[RawTestDataFormattedAnswer] CHECK CONSTRAINT [FK_RawTestDataFormattedAnswer_RawTestDataFormatted]
GO

Open in new window

I learned from my client's hosting company, they do not allow the BULK INSERT T-SQL command to be issued. I'm considering generating INSERT statements for each row for both tables and using SSMS to process them. I'm hoping this might be faster than an open and close for the database for each row to be processed.

Update - my client's hosting company got back to me with the following answer:
I'm afraid we don't support any type of BULK INSERT operations because of the load it puts on the server.  The only suggestion I can offer is that you load your data locally, then backup your database, upload it via FTP, and restore it using our Control Panel.
This is an idea I may try as well.

Please let me know if you have additional input for me.


Much thanks,
David Bach
Greetings kaufmed, ste5an and John:

I want to thank all of you for your diligence, time and patience with me.

I was able to use John's suggestion to execute roughly 25,000 to 30,000 INSERT statements at a time to fill Table-A. I also abandoned the idea of 2 tables and deleted Table-B. After 6 passes of INSERT statements I was able to fill Table-A with the data required.

Thank you again,
David Bach
Glad it worked to your liking..i am on vacations and i see messages occasionally