Solved

How to split excel file by columns from irs.gov website

Posted on 2014-02-03
8
665 Views
Last Modified: 2014-02-12
Want to split large text file into multiple excel spreadsheets by column.

The end goal is to parse the file so it can be uploaded to Access 2013 which limits the number of columns in a table to 250.

I would like to know how to:
1.) use the field layout to turn the text file into a tab separated text file
2.) how to split the file by columns and insert each section of columns into a separate sheet inside the existing workbook.

Thanks!
dwillpower
10eo.zip
10eoderl.xlsx
0
Comment
Question by:dwillpower
  • 3
  • 3
8 Comments
 
LVL 35

Expert Comment

by:PatHartman
ID: 39830591
I presume the point of this exercise is to have data you can use for analysis.  Importing all those columns into 4 tables with ~ 250 columns each will simply leave you with a worse nightmare.  I would use OLE automation to open the spreadsheet and column by column, insert rows into an Access table.  I didn't spend a lot of time with it but there are several repeating groups and I would use a separate many-side table for each.  So you would end up with a "main" table and several related tables.  The related tables will each have only 4 columns - UniqueID (autonumber) ParentID (FK to main table), DataPointName, DataPointValue.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39831824
I'm running this code in Access, since it seems like Excel isn't necessary for this to import correctly from the text file source.  Before running this code, you should create a table in your Access database with the following fields:
Key: text
Name: text(255)
Value: text(255)
I did not specify any primary key for this table.

You will need to change the path string literals associated with the two constants (cSpec, cData).

Option Explicit

Public Sub Q_28355228()
    Dim oFS As Object
    Dim oTS As Object
    Dim strLine As String
    Dim vSpecs As Variant
    Dim oXL As Object
    Dim owkb As Object
    Dim owks As Object
    Dim rs As Recordset
    Dim strKeys As String
    Dim lngLoop As Long
    Dim vKey As Variant
    Dim oDic As Object
    Dim vItem As Variant
    Const cSpec As String = "C:\Users\Mark\Downloads\10eoderl.xlsx"
    Const cData As String = "C:\Users\Mark\Downloads\10eo.txt"
    Set oXL = CreateObject("excel.application")
    Set owkb = oXL.Workbooks.Open(cSpec)
    Set owks = owkb.Worksheets("EO990_10")
    vSpecs = owks.Range(owks.Range("A5"), owks.Range("A5").End(xlDown).Offset(0, 3))
    Set owks = Nothing
    owkb.Close
    Set owkb = Nothing
    Set oXL = Nothing
    Set rs = DBEngine(0)(0).OpenRecordset("Q_28355228", dbOpenTable)
    Set oDic = CreateObject("scripting.dictionary")
    Set oFS = CreateObject("scripting.filesystemobject")
    Set oTS = oFS.OpenTextFile(cData)
    Do Until oTS.AtEndOfStream
        strLine = oTS.ReadLine
        For lngLoop = LBound(vSpecs, 1) To UBound(vSpecs, 1)
            oDic(vSpecs(lngLoop, 1)) = Mid$(strLine, vSpecs(lngLoop, 3), vSpecs(lngLoop, 4))
        Next
        strKeys = vbNullString
        For Each vKey In Array("ein", "taxpd")
            strKeys = strKeys & oDic(vKey)
        Next
        For Each vItem In oDic
            rs.AddNew
                rs!Key = strKeys
                rs!Name = vItem
                rs!Value = Trim(oDic(vItem))
            rs.Update
        Next
        oDic.RemoveAll
    Loop
    oTS.Close

End Sub

Open in new window


Once the data has been imported, you can use a crosstab query to look at the data in columns.  You may use the crosstabs to make virtual tables or populate real tables.
Crosstab Example:
TRANSFORM First(Q_28355228.Value) AS FirstOfValue
SELECT Q_28355228.Key
FROM Q_28355228
WHERE Name in ("name","dba_name","state","zip","pt1_num_vtng_gvrn_bdy_mems","pt1_num_ind_vtng_mems","tot_num_empls",
"tot_num_vlntrs","tot_gro_ubi","net_unrltd_bus_txbl_incm","contri_grnts_py","prog_srvc_rev_py","invst_incm_py","oth_rev_py",
"contri_grnts_cy","prog_srvc_rev_cy","invst_incm_cy","oth_rev_cy","tot_rev_cy","grnts_and_smlr_amts_cy","bnfts_pd_to_mems_cy",
"slrs_etc_cy","tot_prof_fndrsng_exp_cy","tot_fndrsng_exp_cy","oth_expns_cy","tot_expns_cy","rev_less_expns_cy","pt1_tot_asts_eoy",
"pt1_tot_liab_eoy","net_asts_or_fund_bals_eoy","desc_ln_5021c3","sch_b_rqrd","pltcl_actvs","lbbyng_actvs","subj_to_prxy_tx",
"donr_advsd_fnd","cnsrv_easemnt","colls_of_art","crdt_cnslng","term_of_perm_endwmts","bal_sht_amts_reptd","aud_fincl_stmts",
"cnsld_audt_fin_stmt","school","frgn_offc","frgn_actvs","more_than_5000k_to_orgs","more_than_5000k_to_inds","prof_fndrsng",
"fndrsng_actvs","gaming","hospital","grnts_to_orgs","grnts_to_inds","sch_j_rqrd","tx_exmpt_bnds","invst_tx_exmpt_bnds","escrw_acct",
"on_behalf_of_issr","excss_bnft_trans","pr_excss_bnft_trans","ln_to_ofcr_or_dqp","grnt_to_rltd_prsn","bus_rltnshp_with_org",
"bus_rltnshp_thru_fam","ofcr_ent_with_bus_rltnshp","ded_non_csh_contri","ded_contris_of_art","terminated","partl_liqdtn","disrgd_entity",
"rltd_entity","rltd_org_cntrld_entity","trnsfr_to_non_chrtbl_org ","actvs_cndctd_prtshp","sch_o_req","num_with_1096","num_w2g_incld",
"cmplnc_with_bkup_wthldng","num_of_empls","emplmn_tx_rets_fld","unrltd_bus_incm","frm990t_fld","frgn_fincl_acct","prohib_tx_shltr_trans",
"txbl_prty_notif")
GROUP BY Q_28355228.Key
PIVOT Q_28355228.Name;

Open in new window


Note: you can't get all the columns in a single crosstab query result, because 826 is just too many for Access to handle.
0
 

Author Comment

by:dwillpower
ID: 39843397
Thanks! Great!

One problem so far,  

vSpecs = owks.Range(owks.Range("A5"), owks.Range("A5").End(xlDown).Offset(0, 3))

it errors on (xldown) saying it not defined?

Thanks!
D
0
Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

 
LVL 45

Expert Comment

by:aikimark
ID: 39843644
Oops.  Try this.
vSpecs = owks.Range(owks.Range("A5"), owks.Range("A5").End(xlDown).Offset(0, 3)).Value

Open in new window

0
 

Author Comment

by:dwillpower
ID: 39848760
Still getting the variable not defined xldown.....
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 39849464
replace xldown with -4121
0
 

Author Closing Comment

by:dwillpower
ID: 39854136
Rockstar!
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Phishing attempts can come in all forms, shapes and sizes. No matter how familiar you think you are with them, always remember to take extra precaution when opening an email with attachments or links.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question