Solved

# importing excel file with multiple headers into sql server

Posted on 2013-11-16
1,020 Views
Last Modified: 2016-02-11
I have multiple excel files that has one sheet but has 130 headers with each header having different data.

I'd like to import each one of these header rows with data into its own file in sql server.

Do I first split out each header row of data into its own excel file it? If so, how do I automate that?

Or, can I use SSIS to do this? I know very basic SSIS and am not familiar with the scripting in it though.

Each header in the excel file is structured as such(also see example pic):

first header would be this:
ITEM = ORG_V
DATE = 2013-07-22 10:00 ~ 2013-07-22 10:15
column names
data

second header would be this:
ITEM = TER_V
DATE = 2013-07-22 10:00 ~ 2013-07-22 10:15
column names
data

The headers can be at any random row number as well as the data size in each excel file differs but they all start with "ITEM =" and then in the next row "DATE ="
toimport.png
0
Question by:k1ng87
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• Learn & ask questions
• 6
• 4
• 3
• +2
17 Comments

LVL 38

Expert Comment

ID: 39654821
Are you saying each of the headers/data have to go into a separate table?

Or are you saying the rows of data need to go into a single table with the header  and date(s) as the identifier? And since this has multiple files it sounds like it is going to be an ongoing process.

So I would look at DB design very carefully to start with.

The way I would go about it is stage it through an Access DB using VBA that ends up writing into the SQL DB. If this is a daily process or depending on what results you want to query on I would advise against 130 separate tables. Is there some logical grouping like ORG_V and ORG_V_ADV should be in one table and the TER_V and TER_V_ADV in a separate one?
0

LVL 1

Author Comment

ID: 39655101
For analytics purpose, I'd like it to go into a 130 different tables...one for each table in the csv file...

So each header row with its data would be a table in the db....but dont need the header imported each time...
0

LVL 38

Expert Comment

ID: 39655129
It is still doable. I would suggest that you pre-create all the tables though, with the suggested field sizes and types.

My assumption is the table columns remain the same, just the header/name is changing?

Is it a CSV or an Excel file. It makes a difference in how it is done.
0

LVL 51

Expert Comment

ID: 39655210
Because the header row determines what happens to the detailed rows (in terms of location) then it is going to be difficult to do anything other than row by row.

You have to first get the header so you can then decide what / where the details go.

Which is fine, just need a cursor. But there are challenges with a variable format spreadsheet. Ideally we can import as 'n' columns and then inspect / test each row to decide what we should do with the rest of those columns (such as extract headers / dates etc)

Maybe you could run a formula in the spreadsheet to propagate the header info onto the detail rows and that means every detail row has complete information. Maybe insert two columns being ITEM and DATE. then do a bulk import into a tagging table so you can process as needed.

Might be worth considering putting it all in the one table and use views instead of the 130 odd different "tables".

Is it possible to include that PNG as a sample spreadsheet ?
0

LVL 51

Expert Comment

ID: 39655291
Just a couple of more questions for you...

1) how often does this import happen
2) do the detail column names ever change
3) what is the space before the header lines - is it a tab or is it spaces
4) with the footers, do you need to check if it was "OK"
0

LVL 45

Expert Comment

ID: 39656387
Rather than having separate tables, you should have one table with key data about the row.  The simplest scheme would be to transform the Excel data into something like the following.  Please note that I only used the first few columns of detail data.
ITEM       	STARTDATE           	ENDDATE               	BSC	BTS	SECTOR	BAND
ORG_V    	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0
ORG_V    	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0
ORG_V     	2013-07-22 10:00	2013-07-22 10:15	0	168	1      	0
TER_V     	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0
TER_V     	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0
ORG_V_ADV	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0
ORG_V_ADV	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0

When you needed to do something with this data, you would include the key columns in your Where clause.
Example:
Select BSC, BAND, FA, ASSGN_BA
From mytable
Where ITEM='TER_V'

A more normalized configuration would have two tables, with the key data (ITEM, STARTDATE, ENDDATE) in one table, along with an identity column and the data table with an additional column for the identity value in the key data table.
0

LVL 1

Author Comment

ID: 39656463
Mark_Willis:

the answer to your questions

1) how often does this import happen: import would happen every hour
2) do the detail column names ever change: no, the column names and positions never change
3) what is the space before the header lines - is it a tab or is it spaces: it looks like they are 6 spaces
4) with the footers, do you need to check if it was "OK": no, do not need to check the footers, can just skip those

aikimark:

for analytics purpose and I think it is better if each were its own table.

I've attached a sample file but be aware that some tables have different headers (such as the last two)

I've also attached a tab delimited file which contains what the headers are for each table.
sample.csv
headers.txt
0

LVL 1

Author Comment

ID: 39656480
another thing I should mention too is that I have to option of getting the data as in excel file or as a csv file. not sure if either or makes it easier.
0

LVL 45

Expert Comment

ID: 39656519
If you can post the Excel version of the CSV file you posted, we might be able to advise you on the best solution path(s).  Looking at the CSV file, I assume the data is coming from a Linux/Unix system.

I withdraw my prior recommendation.  Your header differences affect the number, name, nature of the data columns.  With the limited view of the headers and data, these were the same and led to my recommendation for single table result.
0

LVL 51

Accepted Solution

Mark Wills earned 500 total points
ID: 39659929
either CSV or EXCEL doesn't worry too much. It seems to be a pretty "clean" file.

so, the steps...

1) create a staging table
2) bulk import
3) fix up those ITEM and DATES
4) with everything fixed, decide where they should really live
5) in the code example below, I use a view, but you could insert into a real table

assumption is if first column is numeric then it is data, otherwise it is header stuff...
the code uses the CSV file to bulk import, there is an equivalent for XLSX

-- create a staging table

if object_id('ee_staging_table','U') is not null drop table ee_staging_table
create table ee_staging_table (BSC varchar(100), BTS varchar(20), SECTOR varchar(20), BAND varchar(20), FA varchar(20), ASSGN_BAND varchar(20), ATT varchar(20), RE_ORG varchar(20), OVD_CMP varchar(20), OVD_BCP varchar(20), A1_OOS varchar(20), CMP_MSC varchar(20), BS_REL1 varchar(20), MSC_REL1 varchar(20), MOB_REL1 varchar(20), ASR varchar(20), ATP_UNA varchar(20), CMP_ATP varchar(20), CE_UNA varchar(20), WALSH_UNA varchar(20), TCH_OVP varchar(20), SYN_TO varchar(20), BCP_CEP varchar(20), CMP_BCP varchar(20), NOT_ACQ varchar(20), MS_ATP1 varchar(20), MS_ATP2 varchar(20), SO_REJ varchar(20), REV_NO_FRM_FROM_CEP2 varchar(20), FWD_NO_FRM_FROM_ATP2 varchar(20), BAD_FRM2 varchar(20), BS_REL2 varchar(20), MSC_REL2 varchar(20), MOB_REL2 varchar(20), ASC1 varchar(20), ASC2 varchar(20), REV_NO_FRM_FROM_CEP3 varchar(20), FWD_NO_FRM_FROM_ATP3 varchar(20), FWD_NO_FRM_FROM_MGW3 varchar(20), BAD_FRM3 varchar(20), BS_REL3 varchar(20), MSC_REL3 varchar(20), MOB_REL3 varchar(20), ANS varchar(20), REV_NO_FRM_FROM_CEP4 varchar(20), FWD_NO_FRM_FROM_ATP4 varchar(20), FWD_NO_FRM_FROM_MGW4 varchar(20), BAD_FRM4 varchar(20), BS_REL4 varchar(20), MSC_REL4 varchar(20), MOB_REL4 varchar(20), AVG_ANS varchar(20), AVG_CPL varchar(20), AVG_DRP varchar(20), AVG_TRF varchar(20), AVG_HLD varchar(20), AVG_DLY varchar(20), LINK_UNA varchar(20))
GO

-- now import

BULK INSERT ee_staging_table
FROM 'c:\ee\Q_28296291.csv'         -- needs to reflect correct path and name of your csv file
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR = '\n'
)
GO

--Check the content of the table.

select * from ee_staging_table
GO

-- so that is the import done...

-- but we want to add a couple of columns for ease of processing

alter table ee_staging_table add ID int identity primary key, ITEM varchar(20), DateFrom datetime, Dateto datetime
GO

-- now we have the file... We can loop around and fix
declare @id varchar(800)
declare @datefrom datetime
declare @dateto datetime
declare @item varchar(20)
declare @bsc varchar(100)

declare c cursor for select id,bsc from ee_staging_table

open c

fetch next from c into @id,@bsc
while @@fetch_status = 0
begin

if @bsc like '%ITEM = %'
begin
set @item = ltrim(substring(@bsc,13,100))   -- we have a header
end

if @bsc like '%DATE = %'
begin
set @datefrom = substring(@bsc,13,16)   -- we have a header
set @dateto   = substring(@bsc,32,16)   -- we have a header
end

if isnumeric(@bsc) = 1
begin
update ee_staging_table set item=@item,datefrom=@datefrom,dateto=@dateto where id=@id
end

fetch next from c into @id,@bsc

end

close c
deallocate c
go

-- now lets see what we really have

select * from ee_staging_table where item is not null
-- or better to delete where item is null

-- so, lets now do something a bit different and create a view over our table

if object_id('vw_TER_V_ADV','V') is not null drop view vw_TER_V_ADV
go

create view vw_TER_V_ADV
as
select *   -- dont use * you really should use the column names
from ee_staging_table where item = 'ter_v_adv'
go

-- and repeat for the other "items", then use just like any table viz:

select item,* from vw_TER_V_ADV

0

LVL 38

Expert Comment

ID: 39661213
mark_wills,

Very nice code.
0

LVL 51

Expert Comment

ID: 39662497
@jimpen, thanks :)

There is another possible solution getting the ID's for each "header" and updating the rows in between.

But with the RBAR (row by agonising row) Cursor approach, it seems to work reasonably quickly given the complexity of format and the secret is that ID as a clustered index used to update. There is always a place for cursors, just need to be careful when and how...

There should be a bit more error checking in there (like checking that we have correct dates and items) before the update. But essentially, it turns out to be reasonably quick and easy to insert additional error handling, or split up into different tables (instead of views).

One thing that is potentially missing is the frequency of processing new data and if that new data is incremental or replacement.

Cheers,
Mark
0

LVL 1

Author Comment

ID: 39664387
Sorry I haven't got back to you guys...I'll give it a shot tonight and am kind of excited to try this!
0

LVL 51

Expert Comment

ID: 39723642
How did you go ? Have you had the chance to try it out, and/or, is there anything else we can help you with ?
0

LVL 48

Expert Comment

ID: 40387531
I've requested that this question be deleted for the following reason:

Not enough information to confirm an answer.
0

LVL 51

Expert Comment

ID: 40387532
The code in http:#a39659929 works and does answer the question fully.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have a large data set and a SSIS package. How can I load this file in multi threading?
This article describes a serious pitfall that can happen when deleting shapes using VBA.
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.
###### Suggested Courses
Course of the Month4 days, 11 hours left to enroll

#### 687 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.