Solved

importing excel file with multiple headers into sql server

Posted on 2013-11-16
17
758 Views
Last Modified: 2016-02-11
I have multiple excel files that has one sheet but has 130 headers with each header having different data.

I'd like to import each one of these header rows with data into its own file in sql server.

Do I first split out each header row of data into its own excel file it? If so, how do I automate that?

Or, can I use SSIS to do this? I know very basic SSIS and am not familiar with the scripting in it though.

Each header in the excel file is structured as such(also see example pic):

first header would be this:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
      ITEM = ORG_V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
      DATE = 2013-07-22 10:00 ~ 2013-07-22 10:15      
column names
data

second header would be this:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
      ITEM = TER_V
      DATE = 2013-07-22 10:00 ~ 2013-07-22 10:15
column names
data


The headers can be at any random row number as well as the data size in each excel file differs but they all start with "ITEM =" and then in the next row "DATE ="
toimport.png
0
Comment
Question by:k1ng87
  • 6
  • 4
  • 3
  • +2
17 Comments
 
LVL 38

Expert Comment

by:Jim P.
ID: 39654821
Are you saying each of the headers/data have to go into a separate table?

Or are you saying the rows of data need to go into a single table with the header  and date(s) as the identifier? And since this has multiple files it sounds like it is going to be an ongoing process.

So I would look at DB design very carefully to start with.

The way I would go about it is stage it through an Access DB using VBA that ends up writing into the SQL DB. If this is a daily process or depending on what results you want to query on I would advise against 130 separate tables. Is there some logical grouping like ORG_V and ORG_V_ADV should be in one table and the TER_V and TER_V_ADV in a separate one?
0
 
LVL 1

Author Comment

by:k1ng87
ID: 39655101
For analytics purpose, I'd like it to go into a 130 different tables...one for each table in the csv file...

So each header row with its data would be a table in the db....but dont need the header imported each time...
0
 
LVL 38

Expert Comment

by:Jim P.
ID: 39655129
It is still doable. I would suggest that you pre-create all the tables though, with the suggested field sizes and types.

My assumption is the table columns remain the same, just the header/name is changing?

Is it a CSV or an Excel file. It makes a difference in how it is done.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 39655210
Because the header row determines what happens to the detailed rows (in terms of location) then it is going to be difficult to do anything other than row by row.

You have to first get the header so you can then decide what / where the details go.

Which is fine, just need a cursor. But there are challenges with a variable format spreadsheet. Ideally we can import as 'n' columns and then inspect / test each row to decide what we should do with the rest of those columns (such as extract headers / dates etc)

Maybe you could run a formula in the spreadsheet to propagate the header info onto the detail rows and that means every detail row has complete information. Maybe insert two columns being ITEM and DATE. then do a bulk import into a tagging table so you can process as needed.

Might be worth considering putting it all in the one table and use views instead of the 130 odd different "tables".

Is it possible to include that PNG as a sample spreadsheet ?
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 39655291
Just a couple of more questions for you...

1) how often does this import happen
2) do the detail column names ever change
3) what is the space before the header lines - is it a tab or is it spaces
4) with the footers, do you need to check if it was "OK"
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39656387
Rather than having separate tables, you should have one table with key data about the row.  The simplest scheme would be to transform the Excel data into something like the following.  Please note that I only used the first few columns of detail data.
ITEM       	STARTDATE           	ENDDATE               	BSC	BTS	SECTOR	BAND
ORG_V    	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0	
ORG_V    	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0	
ORG_V     	2013-07-22 10:00	2013-07-22 10:15	0	168	1      	0	
TER_V     	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0
TER_V     	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0
ORG_V_ADV	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0
ORG_V_ADV	2013-07-22 10:00	2013-07-22 10:15	0	168	0      	0

Open in new window

When you needed to do something with this data, you would include the key columns in your Where clause.
Example:
Select BSC, BAND, FA, ASSGN_BA 
From mytable
Where ITEM='TER_V'

Open in new window

A more normalized configuration would have two tables, with the key data (ITEM, STARTDATE, ENDDATE) in one table, along with an identity column and the data table with an additional column for the identity value in the key data table.
0
 
LVL 1

Author Comment

by:k1ng87
ID: 39656463
Mark_Willis:

the answer to your questions

1) how often does this import happen: import would happen every hour
2) do the detail column names ever change: no, the column names and positions never change
3) what is the space before the header lines - is it a tab or is it spaces: it looks like they are 6 spaces
4) with the footers, do you need to check if it was "OK": no, do not need to check the footers, can just skip those

aikimark:

for analytics purpose and I think it is better if each were its own table.


I've attached a sample file but be aware that some tables have different headers (such as the last two)

I've also attached a tab delimited file which contains what the headers are for each table.
sample.csv
headers.txt
0
 
LVL 1

Author Comment

by:k1ng87
ID: 39656480
another thing I should mention too is that I have to option of getting the data as in excel file or as a csv file. not sure if either or makes it easier.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 45

Expert Comment

by:aikimark
ID: 39656519
If you can post the Excel version of the CSV file you posted, we might be able to advise you on the best solution path(s).  Looking at the CSV file, I assume the data is coming from a Linux/Unix system.

I withdraw my prior recommendation.  Your header differences affect the number, name, nature of the data columns.  With the limited view of the headers and data, these were the same and led to my recommendation for single table result.
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 500 total points
ID: 39659929
either CSV or EXCEL doesn't worry too much. It seems to be a pretty "clean" file.

so, the steps...

1) create a staging table
2) bulk import
3) fix up those ITEM and DATES
4) with everything fixed, decide where they should really live
5) in the code example below, I use a view, but you could insert into a real table

assumption is if first column is numeric then it is data, otherwise it is header stuff...
the code uses the CSV file to bulk import, there is an equivalent for XLSX

-- create a staging table

if object_id('ee_staging_table','U') is not null drop table ee_staging_table
create table ee_staging_table (BSC varchar(100), BTS varchar(20), SECTOR varchar(20), BAND varchar(20), FA varchar(20), ASSGN_BAND varchar(20), ATT varchar(20), RE_ORG varchar(20), OVD_CMP varchar(20), OVD_BCP varchar(20), A1_OOS varchar(20), CMP_MSC varchar(20), BS_REL1 varchar(20), MSC_REL1 varchar(20), MOB_REL1 varchar(20), ASR varchar(20), ATP_UNA varchar(20), CMP_ATP varchar(20), CE_UNA varchar(20), WALSH_UNA varchar(20), TCH_OVP varchar(20), SYN_TO varchar(20), BCP_CEP varchar(20), CMP_BCP varchar(20), NOT_ACQ varchar(20), MS_ATP1 varchar(20), MS_ATP2 varchar(20), SO_REJ varchar(20), REV_NO_FRM_FROM_CEP2 varchar(20), FWD_NO_FRM_FROM_ATP2 varchar(20), BAD_FRM2 varchar(20), BS_REL2 varchar(20), MSC_REL2 varchar(20), MOB_REL2 varchar(20), ASC1 varchar(20), ASC2 varchar(20), REV_NO_FRM_FROM_CEP3 varchar(20), FWD_NO_FRM_FROM_ATP3 varchar(20), FWD_NO_FRM_FROM_MGW3 varchar(20), BAD_FRM3 varchar(20), BS_REL3 varchar(20), MSC_REL3 varchar(20), MOB_REL3 varchar(20), ANS varchar(20), REV_NO_FRM_FROM_CEP4 varchar(20), FWD_NO_FRM_FROM_ATP4 varchar(20), FWD_NO_FRM_FROM_MGW4 varchar(20), BAD_FRM4 varchar(20), BS_REL4 varchar(20), MSC_REL4 varchar(20), MOB_REL4 varchar(20), AVG_ANS varchar(20), AVG_CPL varchar(20), AVG_DRP varchar(20), AVG_TRF varchar(20), AVG_HLD varchar(20), AVG_DLY varchar(20), LINK_UNA varchar(20))
GO

-- now import

BULK INSERT ee_staging_table
FROM 'c:\ee\Q_28296291.csv'         -- needs to reflect correct path and name of your csv file
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR = '\n'
)
GO

--Check the content of the table.

select * from ee_staging_table
GO

-- so that is the import done...

-- but we want to add a couple of columns for ease of processing

alter table ee_staging_table add ID int identity primary key, ITEM varchar(20), DateFrom datetime, Dateto datetime
GO

-- now we have the file... We can loop around and fix
declare @id varchar(800)
declare @datefrom datetime
declare @dateto datetime
declare @item varchar(20)
declare @bsc varchar(100)

declare c cursor for select id,bsc from ee_staging_table

open c 
 
fetch next from c into @id,@bsc
while @@fetch_status = 0 
begin 

   if @bsc like '%ITEM = %'
   begin
      set @item = ltrim(substring(@bsc,13,100))   -- we have a header
   end 

   if @bsc like '%DATE = %'
   begin
      set @datefrom = substring(@bsc,13,16)   -- we have a header
      set @dateto   = substring(@bsc,32,16)   -- we have a header
   end 

   if isnumeric(@bsc) = 1
   begin
      update ee_staging_table set item=@item,datefrom=@datefrom,dateto=@dateto where id=@id
   end
           
   fetch next from c into @id,@bsc

end
 
close c
deallocate c
go 

-- now lets see what we really have

select * from ee_staging_table where item is not null
-- or better to delete where item is null

-- so, lets now do something a bit different and create a view over our table

if object_id('vw_TER_V_ADV','V') is not null drop view vw_TER_V_ADV
go

create view vw_TER_V_ADV
as 
select *   -- dont use * you really should use the column names
from ee_staging_table where item = 'ter_v_adv'
go

-- and repeat for the other "items", then use just like any table viz:

select item,* from vw_TER_V_ADV

Open in new window

0
 
LVL 38

Expert Comment

by:Jim P.
ID: 39661213
mark_wills,

Very nice code.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 39662497
@jimpen, thanks :)

There is another possible solution getting the ID's for each "header" and updating the rows in between.

But with the RBAR (row by agonising row) Cursor approach, it seems to work reasonably quickly given the complexity of format and the secret is that ID as a clustered index used to update. There is always a place for cursors, just need to be careful when and how...

There should be a bit more error checking in there (like checking that we have correct dates and items) before the update. But essentially, it turns out to be reasonably quick and easy to insert additional error handling, or split up into different tables (instead of views).

One thing that is potentially missing is the frequency of processing new data and if that new data is incremental or replacement.

Cheers,
Mark
0
 
LVL 1

Author Comment

by:k1ng87
ID: 39664387
Sorry I haven't got back to you guys...I'll give it a shot tonight and am kind of excited to try this!
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 39723642
How did you go ? Have you had the chance to try it out, and/or, is there anything else we can help you with ?
0
 
LVL 45

Expert Comment

by:Martin Liss
ID: 40387531
I've requested that this question be deleted for the following reason:

Not enough information to confirm an answer.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 40387532
The code in http:#a39659929 works and does answer the question fully.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Suggested Solutions

In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now