Solved

SSIS: Find Similar Files and move to another folder

Posted on 2010-08-23
8
466 Views
Last Modified: 2012-05-10
I have a package that loops thru files that may look like ths:

2001SurveyCompanyA_Q.xlsx
2001SurveyCompanyA_R.xlsx
2001SurveyCompanyA_S.xlsx
2001SurveyCompanyA_C.xlsx

sometimes there may be more than one company survey:

2001SurveyCompanyB_Q.xlsx
2001SurveyCompanyB_R.xlsx
2001SurveyCompanyB_S.xlsx
2001SurveyCompanyB_C.xlsx

2001SurveyCompanyC_Q.xlsx
2001SurveyCompanyC_R.xlsx
2001SurveyCompanyC_S.xlsx
2001SurveyCompanyC_C.xlsx


Now, i need to process one file group at a time, t here will always be the four files together, the name of the company will be the company name to the underscore. but every group will have a q,r,s and c file.  I want to proces them one company at a time, then move to the next set.

right now, i use a for each loop, to get every file, and process them by a variable to the _r.xlsx,_c.xlsx, etc....so, i want to take similar files, process them, then move to the next set. possible in ssis ?
0
Comment
Question by:basile
  • 4
  • 4
8 Comments
 
LVL 22

Expert Comment

by:8080_Diver
ID: 33510671
Absolutely!  In your For Each loop, it would seem that you are segregating the files based upon the last (rightmost) 7 characters.  However, I have a couple of questions:
  1. Why do you feel compelled to process the files "one company at a time"?  Is this because of how you think of the processing or is this some sort of requirement that is being imposed on you?
  2. Just to clarify, is my understanding that all _R.xlsx files are to be in one folder, all _Q.xlsx files in another, etc.?
0
 
LVL 1

Author Comment

by:basile
ID: 33510863
The reason that we have to procreas each type separare is that they go ínto different tables. They are not in different folders just to be clear. All files are delivered to the same folder. _q files go to a question table that is built on the fly with a select into statement. All _ tables are built on the fly. Make sense? All companies will have the four files. Once I process the four files, the loop has to find the next four files. Whats good here Is that everything to The left of The underscosre, Is The company name. So for each company, I nee to process it's four files, then move on to the next set.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 33511191
Okay, I am a litle confused.  
The reason that we have to procreas each type separare is that they go ínto different tables.
So, the data for a _?.xlsx is processed to a given table.  Is that by company or are all files of a given type processed to the same table?
Are the different tables identified by the type of file (e.g. Q, R, etc.) or are they also identified by the company name?
In other words, if you process the following files, how do they look in your database?
  • 2001SurveyCompanyA_Q.xlsx
  • 2001SurveyCompanyB_Q.xlsx
  • 2001SurveyCompanyC_Q.xlsx
Do they all wind up in a Qeustions table or do they wind up in a 3 separate tables, segregated (and I would assume named) by company?  If they are in the same table, do you carry the Company Name (e.g. 2001SurveyCompanyA) in each row so you can identify which rows came from each company?
Again, though, I am wondering whether the processing by company is more important than just making sure all files are processed.  In other words, I am looking at the possibility that processing all of the _Q.xlsx files and then all of the _R.xlsx files might not be easier to accomplish.

0
 
LVL 1

Author Comment

by:basile
ID: 33511594
ok, so to answer your questions.

_?.xlsx

let's see. to explain easier.

Each company has data for each type _?.xlsx. Let's say company had the four files _q,_c,_q,_r the first time the loop goes thru, it check for the existence of the table. If it's q, it's called a question table. If hte table exists it's dropped. So as the loop processes, it builds the table based on the variable that holds _?.xlsx. IT builds the four tables. On the next run, it drops the tables and repeats. There is only ever 1 company stored in the four tables at any given time. As it is now, we only have one company every time it runs, so it's very easy, it check the tables, drops them, recreates based on the content in the spreadsheet then moves to the next file.

If i proces the four files together, it would be like this:

2001SurveyCompanyA_Q.xlsx -----> Question table
2001SurveyCompanyA_C.xlsx -----> Choice Table
2001SurveyCompanyA_R.xlsx -----> Response Table
2001SurveyCompanyA_S.xlsx ------> Survey Table

In each table there will only ever be one company's data, until the next run. Now, if there are going to be multiple companies in the folder, that is where the change comes, and the requirement for me to process only that company' four files, then move to the nexc company (and it's four files).

Each company is identified, by a variable that takes the left string all the way to the underscore and that value is stored in each table for later processing.

FYI: afte these tabels are loaded, the data is then moved to a data mart, but that;s a differnet set of rules. That is why we can drop the tables after every loop, this is just a staging area.
0
Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

 
LVL 22

Expert Comment

by:8080_Diver
ID: 33512306
May i suggest a slightly different approach.  If you carry the Company Name in the staging table and then break the data out into the data mart based upon extracts that use the company name, you can process all of the files in one fell swoop and then break the data out.
I can foresee an objection in that you would have to handle the break out a bit differently but i have to wonder how you know which company should be associated with the data in the staging.
However, if you set up a FOR EACH within a FOR EACH, you can probaably initially find all of the _Q.xlsx files, extract the company name portion and use that to set up your inner FOR EACH loop to find all of the files for the given Company, you can probably pull off doing things the way you are wanting to.
0
 
LVL 1

Author Comment

by:basile
ID: 33515335
Ok, that's where the problem lies, i'm not sure, how to pass the variable from the outer to the inner for each loop. Didn't even know you could do such a thing. Is there any guidance ? I can post the .xml of the package here, and maybe you can assist me with this? and setting up how i'm looking to do so?
0
 
LVL 22

Accepted Solution

by:
8080_Diver earned 500 total points
ID: 33515496
basile,
I don't work from XML. ;-)  I use Visual Studio/BIDS.
If you have it in Bids, then you can add the variable that you are using for the outer FOR EACH as a variable of the over all package.  I would then add the one for the inner loop as an over all package variable.
You will need 2 fileconnections, one for each loop, and you'll need to set those up so that they use expressions that use the appropriate variable for the "connection" (i.e. fully pathed file name).
 
0
 
LVL 1

Author Closing Comment

by:basile
ID: 33641773
This answer didn't quite get me all the there, but it was a good starting point.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Dynamics crm 2011 8 45
C# Application Local DB Connection String 23 59
using t-sql EXISTS 8 23
When to use an Aggregate Function. 18 36
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now