Solved

Excel VBA: How to load whole text (CSV) file in memory and parse strings

Posted on 2012-12-30
4
4,473 Views
Last Modified: 2013-01-18
Hello!

First at all ... please sorry for my English!

Data file

I have CSV file (text file with TAB delimiters). It's a XER file used by Primavera P6 Professional v8 for import/export data.

First task: How to load XER file into Workbook?

Mostly XER file size is not big: 0.5 – 3.3 MB (5200 – 17000 rows). Maximum filesize (seldom) is 20 MB (160 000 rows).

And I think it's have sense to load whole file in memory and then parse it.
How to do it?

Next task is Parsing

1. XER file is a text file with TAB delimited columns.

2. XER file consist from block of date.  Each block have 3 parts:
First line (started from "%T") of block is Table header. Second line ("%F") is Fields headers. And next rows contain data ("%R").

Example:
%T	RCATVAL
%F	rsrc_catg_id	rsrc_catg_type_id	seq_num	rsrc_catg_short_name	rsrc_catg_name	parent_rsrc_catg_id
%R	56	22	0	Yes	Yes	
%R	27968	22	0	Yes	Yes	
%R	28020	27776	10	Yes	Description
%T	NEXT_TABLE
%F new fields set
%R A B C

Open in new window

     
3. Each table have regular structure.

4. Fields sets in each table (text block) are different.

5. Values of data are variant (but stored in XER file as text)

6. XER file not contain blank rows or special data blocks separator.

7. Last row of XER file is "%E".

My plane is:
A. Parse XER file in memory row by row.
B. When new table ("%T") appears create new Sheet.
C. Then write fields headers ("%F") and rows ("%R")

Questions

1. What is more efficient method for split row (by TAB) to cells (when you don't know columns count, at least for first row "%F" of table)?

2. Maybe if we deal with large data files (160 000 rows) have sence to load whole file (20 MB) into memory, split to tables (block of data) via RegExp and then parse block to rows (or somehow paste them into sheets as array).

3. How to determine total rows count for showing form with progress bar?

4. How to determine Excel version for setup rows limit? --> Global Const EXCEL_ROW_LIMIT = 65536 (or 1048576)

P.S. Now I have some parser (attached) but this one hav¿ slow performance (because in load procedure used approach "symbol by symbol"). Source of XERparser is Oracle Support web-site (access only by suscription). Parser description is here: http://www.plannertuts.com/tutorials/using-primaveras-xer-file-parser-import-wbs-from-excel-without-the-sdk-840/
Difference 2003/2007 versions of Parser is only setup "Global Const EXCEL_ROW_LIMIT = 65536 (or 1048576)" (as far as I know)

Resume

There is base XER Parser (from Oracle) and my task is to optimize performance for load date.

Excel 2003 compatible code needed

Any help will be high appreciated!
xerFileParserBuilder-2003-and-20.zip
0
Comment
Question by:Last_Free_Man
  • 2
4 Comments
 
LVL 16

Accepted Solution

by:
terencino earned 500 total points
Comment Utility
Hi Last_Free_Man, you have done some very nice work with this so far. Have you considered using FileSystemObject for the parsing work? Just add a reference to Microsoft Scripting Runtime and it is good to go. It has excellent file management capability, and the OpenTextFile method converts the file into a TextStream object, which you can manipulate easily, for example searching and counting lines. Each line can be quickly converted into an array by using the Split function on the TAB delimiter and then load into another array until you are ready to commit it to a worksheet. It should be able to handle your files without an issue, and you can tweak the code a bit to improve performance.

While this is likely to work better in Excel 2010, it should be fine for Excel 2003
Hope that helps
...Terry
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Is the above solution suits you or you need a VBA solution pls let me know.

Also at first hand, you mention Code to be Excel 2003 compatible however Excel 2003 support only 65536 rows and you mention rows could easily go to 160000 hence presume you will need at least Excel 2007 is this a problem ? if 2003 we can put the overflow on 65536 to subseuent worksheets would this be an option for you if you need to stick to 2003 ?

Last but not least can you post the longest XER file you have so we can build the macro around it ?
gowflow
0
 

Author Closing Comment

by:Last_Free_Man
Comment Utility
Thanks! Very substantial links! HQ answer!
0
 

Author Comment

by:Last_Free_Man
Comment Utility
In e-book "VBA and macros : Microsoft Excel 2010 / Bill Jelen, Tracy Syrstad." there is good example

Sub ReadTxtLines()
    
    ' Read Entire TXT to Memory and Parse
    
    ' Submitted by Suat Mehmet Ozgur of Istanbul, Turkey. Suat develops applications in Excel,
    ' Access, and Visual Basic.
    ' This sample takes a different approach to reading a text file. Instead of reading one record
    ' at a time, the macro loads the entire text file into memory in a single string variable. The
    ' macro then parses the string into individual records. The advantage of this method is that
    ' you access the file on disk only one time. All subsequent processing occurs in memory and
    ' is very fast:
    
    'No need to install Scripting Runtime library since we used late binding
    Dim sht As Worksheet
    Dim fso As Object
    Dim fil As Object
    Dim txt As Object
    Dim strtxt As String
    Dim tmpLoc As Long
    
    'Working on active sheet
    Set sht = ActiveSheet
    'Clear data in the sheet
    sht.UsedRange.ClearContents
    
    'File system object that we need to manage files
    Set fso = CreateObject(“Scripting.FileSystemObject”)
    
    'File that we like to open and read
    Set fil = fso.GetFile("c:\test.txt")
    
    'Opening file as a TextStream
    Set txt = fil.OpenAsTextStream(1)
    
    'Reading file include into a string variable at once
    strtxt = txt.ReadAll
    
    'Close textstream and free the file. We don’t need it anymore.
    txt.Close
    
    'Find the first placement of new line char
    tmpLoc = InStr(1, strtxt, vbCrLf)
    
    'Loop until no more new line
    Do Until tmpLoc = 0
    
        'Use A column and next empty cell to write the text file line
        sht.Cells(sht.Rows.Count, 1).End(xlUp).Offset(1).Value = _
        Left(strtxt, tmpLoc - 1)
        
        'Remove the parsed line from the variable that we stored file include
        strtxt = Right(strtxt, Len(strtxt) - tmpLoc - 1)
        
        'Find the next placement of new line char
        tmpLoc = InStr(1, strtxt, vbCrLf)
    
    Loop
    
    'Last line that has data but no new line char
    sht.Cells(sht.Rows.Count, 1).End(xlUp).Offset(1).Value = strtxt
    
    'It will be already released by the ending of this procedure but
    ' as a good habit, set the object as nothing.
    Set fso = Nothing

End Sub

Open in new window

0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

I. Introduction In a previous article (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_6537-PaperPort-Upgrade-How-to-download-and-install-updated-versions-of-PaperPort-11-and-12.html) (now deprecated), I discussed how to upgrad…
When the confidentiality and security of your data is a must, trust the highly encrypted cloud fax portfolio used by 12 million businesses worldwide, including nearly half of the Fortune 500.
Sometimes we receive PDF files that are in the wrong orientation. They may be sideways or even upside down. This most commonly happens with scanned or faxed documents. It is possible to rotate the view of these PDFs with the free Adobe Reader produc…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now