Link to home
Start Free TrialLog in
Avatar of TeknikDev
TeknikDevFlag for United States of America

asked on

Select a specific Excel file to import into SQL table in SSIS, HOW?!!!

Hello,

What I'm trying to do is insert data into a SQL table from an excel spreadsheet. The challenge is being able to grab the most recent excel spreadsheet as the data source.

The file has a naming convention similar to:
File(1).xls, File(2).xls, File(3), File(4) etc.,

Can someone tell me how I can grab the latest file? The issue is the person who is placing the file in the directory only increments the file each day, so if today was Monday, the file will be named File(1).xls, and Tuesday would be File(2).xls. To make things more complicated, we cannot used "Date modified" becasue the file is swept up by a an auto-sync program which makes all of the dates the same because it all gets downloaded and synced at the same time.

Also, the data has been placed on its own individual cell in the excel spreadsheet.

So the logic would be grabbing the file with the greatest number in the parenthesis and then pulling the data off the spreadsheet from specific cells.

Right now, the data is being inserted correctly if I preselect an excel spreadsheet, however, not pulling the most recent file in the folder.

The data is being inserted using Excel Source --> OLE DB Destination. I'm using SQL command and doing a select * from [SHEET1$A2:A2] as name, [SHEET1$D3:D3] as Address, [SHEET1$F5:F5] as Invoice  

(Yes the data I need are all on separate lines).

Does anyone have any idea how to accomplish this?  Thanks so much in advance!
ASKER CERTIFIED SOLUTION
Avatar of sentner
sentner
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of TeknikDev

ASKER

So I assume I drag a "Script Task" into the Control flow prior to the Import so it can grab the largest file?
From looking at the SSIS documentation, it looks like you can set up an "Execute Process" task.  This can be used to execute a batch file.  You should of course modify the code to first change to the proper directory where the files are stored.  

http://msdn.microsoft.com/en-us/library/ms141166.aspx
http://www.sqlmonster.com/Uwe/Forum.aspx/sql-server-ssis/1725/run-bat-file-from-SSIS

Thank you sentner, do you think of another solution without having to use the batch? I rather keep everything inside SSIS. Only because we are doing something simliar for 3 other jobs but its using a flat file text file which is obviously easier.
Yes, you can probably do the whole thing within SSIS as a script task, by essentially porting the batch script to that language.  If you do that you probably don't need to copy the latest file, but rather just find the name and pass it back to the task that does the import.

There are several threads on MSDN doing a similar thing.  See:  
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/9ec4e837-6188-4eb2-aac0-f5f38a89bc91
Hmm....i need to find the code that will find the files and grab the largest number in parenthesis. Then take the values in specific cells and import in SQL.
Yep, don't know that you can just "find" the code existing.  Your best bet is to find something "similar" and modify it.  I'd suggest looping through all the files in the directory, using a regular expression to match the pattern and pull out the number enclosed within parens (regex search format would be something like ".*\(([0-9]+)\).xls"), and compare against the biggest one found so far like I did in the batch file.  You then reconstruct the filename from that, or you could save that name as a separate variable inside the loop.

thats the thing, im new to this., i wouldnt know how to start coding.
Looks like I might be able to get away with using the last modified date as the critieria. If ideally only 1 .xls file gets put in the directory. I can try to pick up the newest file as long as the modified date is today's date.

Can someone pleaseeee provide help!
It can still be done using a script task.  As I said, I'm not familiar with SSIS, though I have found several threads on this type of thing out there from googling.  

This link has some sample code that can be tweaked... I'll take a blind stab at doing some of that in the code box.  I'm sure there are plenty of bugs in there, since I have nothing to test on, but it should be a place to start.  I'm not sure how you get the "useFiles" variable into something useful to SSIS, but that's where the filename to use should end up.

Public Class ScriptMain    
  Inherits UserComponent     
  Public Overrides Sub FindNewestFile()        
    Dim fileName As String         
    Dim useFile As String = String.Empty
    Dim number As Integer
    Dim biggest As Integer = 0
    Dim pattern As String = String.Empty
    Dim r As Regex = Nothing
    pattern = “.*\(([0-9])\).*”
    r = New Regex(pattern, RegexOptions.Compiled)

    For Each fileName In Directory.GetFiles(Me.Connections.ImportFilesDir.AcquireConnection(Nothing).ToString())            
      number = Integer.parse(Regex.Replace(fileName, pattern, “$1”))
      if number > biggest
        biggest = number
        useFile = fileName
      End If
    Next         
    
  End Sub 
End Class

Open in new window

blah, where does the inherit goes? it doesnt like it when i copy and paste, could I just not use a function and put the code in the main section?
I suppose so.. As I said, I'm not familiar with SSIS specifically, but I am versed in other scripting and programming languages.  I'm just trying to help steer you in the right direction.
Thank you!

Can someone help me write a vb script to find the largest of number in parenthesis and the modified date equals today and the select a few cells in Ms excel spreadsheet and enter into SQL table? I need help ASAP! My boss needs this done by end of the week!
Hi,
You can use that as a reference to get the last creation date of the file and import only the daily file

http://asqlb.blogspot.com/2011/08/ssis-process-files-that-has-been.html

Share  the excel file and let me know which cells you need to import
Thanks, but I manage to complete this process with the help of batch files!  

Huslayer, thanks for that, it looked like it would have helped, but I found an alternative solution.

This was the answer to selecting the largest value in the parenthesis for excel file name.
So i ran a test today and the import and everything worked fine up until today. For some reason, I think its once it hits the number 10 or double digits in the parenthesis.

For file 1-9, it worked file until the 10th. Any idea how I can expand the check to grab the largest to 6 or 7 decimal places instead of 1?
Are you using the VB script, or the batch file?   I had tested the batch file with double-digits to ensure it worked before I pasted it.  Can you paste the script you put into place, so I can see where it may be failing?
@echo off
set /a _biggest=1
Setlocal EnableDelayedExpansion
 
for /f "tokens=2delims=()" %%i in ( 'dir /b "TEST1 (*).xls"' ) do (
   if %%i GTR !_biggest! (set /a _biggest=%%i)
)

echo "Latest: TEST1 (%_biggest%).xls"

echo copy /Y "E:\File\TEST1 (%_biggest%).xls"

copy "E:\File\TEST1 (%_biggest%).xls"   "E:\File Main (0).xls"
 
batch file
I thought of another thing, the batch file is not in the same directory. It cannot be because it syncs with the server. Not sure if thats an issue.
Just to be clear, you have "TEST1" listed as the filename, so the next files are not "TEST2", "TEST3", etc, right?  It has to have the number within the parentheses.

I just tested again, creating files name "TEST1 (1).xls" through "TEST1 (13).xls" and each time the script picks the biggest number.  I even then made a couple that were several digits long, and it still worked as expected.

Can you try removing the "@echo off" line, and run it, so we can see what it's doing?

ok something really funny occured. so i ran the batch directly and it copied the largest file over, but the ssis package does not and copies the wrong file. WHy is that?
It shouldn't matter whether it's in the same directory, but you should add a line before the "for" loop, that changes to the correct directory where the files are located.  Just a line that reads something like "cd c:\datadir"
Could the SSIS package be running the script from the wrong directory, and thus grabbing a file from whatever directory it runs it from?

Add that "CD" line to the script, and try again.
hmmmmmm....in ssis for the "execute process task" I set the working directory to the path which contains all the files to compare and it works.

strange...................................ill contineu to test.
Yes, if you set the working directory from SSIS it should work as well.  The working directory is the place where it will run the script from.  The other way to do it is to set the directory within the script itself (which is what I was suggesting).
wait, the workign directory has been set to where the files are located where it does the compare not where the batch file is. Should it be where the batch file should be?
No, the working directory should be where the files are.  But if you add the "CD" line to the batch file, you won't have to worry about where the working directory is set.
can u provide the syntax and where it should go?
Sure.  replace "C:\datadir" in the code below with the actual location where the excel files are stored.

the "/d" flag to "cd" tells it to also change the to that drive if necessary.

@echo off

cd /d "C:\temp"

set /a _biggest=1
Setlocal EnableDelayedExpansion
 
for /f "tokens=2delims=()" %%i in ( 'dir /b "TEST1 (*).xls"' ) do (
   if %%i GTR !_biggest! (set /a _biggest=%%i)
)

echo "Latest: TEST1 (%_biggest%).xls"

echo copy /Y "E:\File\TEST1 (%_biggest%).xls" 

copy "E:\File\TEST1 (%_biggest%).xls"   "E:\File Main (0).xls"

Open in new window

Ok I added the CD before the FOR loop and added the working directory path and the issue is fixed.

Seems like the batch file still couldnt find the right file until I put CD before the FOR loop and SSIS couldnt execute the batch file correctly until the working directory was added.

Thanks for your help!
Sorry... I called it "C:\temp" in that script instead of "C:\datadir", so THAT is what you should replace with the actual location of the excel files.
sorry my latest comments were prior to you posting your code with the CD stuff. But either way, it works. I am too lazy to proceed to test without clearing out the working directory in SSIS. But as long as it works, im good!

Thanks!