yoshcomputers
asked on
MYSQL stored procedure import fixed width text file
I am looking to import a large dataset from the IRS located here:
http://www.irs.gov/uac/SOI-Tax-Stats-Exempt-Organizations-Business-Master-File-Extract-(EO-BMF)
at the bottom of the page are file regional files, here is one of them:
http://www.irs.gov/file_source/pub/irs-soi/eo1.zip
The file is an ascii file with fixed width columns as defined here:
Position(s) Contents
1 9 Employer Identification Number (EIN).
10 79 Primary Name of Organization.
80 114 In Care of Name.
115 149 Street Address.
150 171 City.
172 173 State.
174 183 Zip Code.
184 - 187 Group Exemption Number.
188 189 Subsection Code.
190 Affiliation Code.
191 194 Classification Code(s).
195 200 Ruling Date.
201 Deductibility Code.
202 203 Foundation Code.
204 212 Activity Codes.
213 Organization Code.
214 215 Exempt Organization Status Code (New)
216 - 221 The Advance Ruling process is obsolete as of July 2008. These positions will be blank or zeroed out. For further information, please check the URL below: http://www.irs.gov/charities/charitable/article/0,,id=185568,00.html:
222 - 227 Tax Period.
228 Asset Code.
229 Income Code.
230 - 231 Filing Requirement Code.
232 PF Filing Requirement Code. (New)
233 - 235 Blanks.
236 - 237 Accounting Period.
238 - 250 Asset Amount.
251 - 263 Income Amount.
264 If the Income Amount is negative this contains a negative sign.
265 - 277 Form 990 Revenue Amount
278 If Revenue Amount is negative this contains a negative sign.
279 - 282 National Taxonomy of Exempt Entities (NTEE) Code.
283 - 317 Sort Name (Secondary Name Line).
318 0A - Line Feed
This data is updated monthly and I would like to have a stored procedure that I can run to grab the file and import it to a table with the appropriate columns without having to redefine all the column widths each time.
I am both a novice at stored procedures and have never used Load Data InFile (which is what the other forums suggested).
Could someone show me the template of what the stored procedure should look like so that I can see how to define the different column widths (just show a few and I can figure out the rest).
What I imagine doing is uploading the files to the server then having a script on my server run that will call the stored procedure with a parameter of the file name to import from (there will be 4 files in total that I will be needing to import). Since all the files have the identical column width structure and just have different file names that should allow me to repeat this processes as often as needed.
http://www.irs.gov/uac/SOI-Tax-Stats-Exempt-Organizations-Business-Master-File-Extract-(EO-BMF)
at the bottom of the page are file regional files, here is one of them:
http://www.irs.gov/file_source/pub/irs-soi/eo1.zip
The file is an ascii file with fixed width columns as defined here:
Position(s) Contents
1 9 Employer Identification Number (EIN).
10 79 Primary Name of Organization.
80 114 In Care of Name.
115 149 Street Address.
150 171 City.
172 173 State.
174 183 Zip Code.
184 - 187 Group Exemption Number.
188 189 Subsection Code.
190 Affiliation Code.
191 194 Classification Code(s).
195 200 Ruling Date.
201 Deductibility Code.
202 203 Foundation Code.
204 212 Activity Codes.
213 Organization Code.
214 215 Exempt Organization Status Code (New)
216 - 221 The Advance Ruling process is obsolete as of July 2008. These positions will be blank or zeroed out. For further information, please check the URL below: http://www.irs.gov/charities/charitable/article/0,,id=185568,00.html:
222 - 227 Tax Period.
228 Asset Code.
229 Income Code.
230 - 231 Filing Requirement Code.
232 PF Filing Requirement Code. (New)
233 - 235 Blanks.
236 - 237 Accounting Period.
238 - 250 Asset Amount.
251 - 263 Income Amount.
264 If the Income Amount is negative this contains a negative sign.
265 - 277 Form 990 Revenue Amount
278 If Revenue Amount is negative this contains a negative sign.
279 - 282 National Taxonomy of Exempt Entities (NTEE) Code.
283 - 317 Sort Name (Secondary Name Line).
318 0A - Line Feed
This data is updated monthly and I would like to have a stored procedure that I can run to grab the file and import it to a table with the appropriate columns without having to redefine all the column widths each time.
I am both a novice at stored procedures and have never used Load Data InFile (which is what the other forums suggested).
Could someone show me the template of what the stored procedure should look like so that I can see how to define the different column widths (just show a few and I can figure out the rest).
What I imagine doing is uploading the files to the server then having a script on my server run that will call the stored procedure with a parameter of the file name to import from (there will be 4 files in total that I will be needing to import). Since all the files have the identical column width structure and just have different file names that should allow me to repeat this processes as often as needed.
ASKER
I found out in Mysql you cannot have LOAD DATA INFILE in a stored procedure
this (brief) page discusses a workaround that appears relevant
http://kedar.nitty-witty.com/blog/using-load-data-infile-with-stored-procedure-workaround-mysql
http://kedar.nitty-witty.com/blog/using-load-data-infile-with-stored-procedure-workaround-mysql
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I appreciate the example, though it did leave something to the imagination to fill is some details on implementation. It certainly got me on the right direction.
ASKER
LOAD DATA LOCAL INFILE 'C:/Users/Yosh/Downloads/e
(@var1)
SET
irs_ein=SUBSTR(@var1,1,9),
irs_name=SUBSTR(@var1,10,7
irs_address=SUBSTR(@var1,1
irs_city=SUBSTR(@var1,150,
irs_state=SUBSTR(@var1,172
irs_zipcode=SUBSTR(@var1,1
irs_groupexemptionnumber=S
irs_subsectioncode=SUBSTR(
irs_affiliationcode=SUBSTR
irs_classificationcode=SUB
irs_exemptstatuscode=SUBST
irs_nteecode=SUBSTR(@var1,
irs_sortname=SUBSTR(@var1,