Solved

How do i read a text file using VBA and transpose the content

Posted on 2009-07-08
18
933 Views
Last Modified: 2013-11-10
Hi,

I have a text file that has a structure like this

The first 3 numbers of each row are field tags and are followed by field values
There are variations in the number of fields per record and also cases where one field has multiple value lines and hence the field tag appears multiple times as shown in the examples below (at the end of this problem description)
There is also a possibility of file headers and footers appearing but they have to be ignored

Basically, tag 210 indicates the start of a new record. So, every time a tag 210 is encountered - it indicates a new record.

I need help by way of a solution which
1). Accepts the location of the .txt file(s) as a parameter
2). Starts reading the file(s)
3). Picks up all the record(s) one by one - transposes the content like this

Field Names 210         211        213          220        222        225        226      240(line1)  240(line2)
Values         <value>  <value>  <value>   <value>  <value> <value> <blank> <value>    <value>
                     <value>  <value>  <value>   <value>  <value> <value> <blank> <value>    <value>


4). Saves the resultant .txt/.xls file - with a suffix to the original file name - e.g. input file abc.txt; output file abc_output.txt or abc_output.xls (as the case maybe)

The output can be in the form of an excel workbook (with 50k records per sheet) [shown above]

OR

in the form of another text file with field header and the consequent values being separated by a delimiter " | " [pipe]

Would appreciate if an expert could help me out with a solution to this problem. I can do this in Visual Fox, but there is no s/w installed and hence have to rely on VBA - Excel combination. Would also appreciate if the vba code to be incorporate in the excel sheet is shared (instead of the excel workbook with protected code content). I can use the code to customize the solution further and this would be a learning for me.

Thx in advance

Examples of variations
==================
210  0011204783884620314879439          
211  0000000058001101059023000000020    
212  00689710000022000       31483  002
213             00000000000000          
220                0007091835000076261  
225  ABCD INTERNATIONAL                
226  "XYZ" B.V.                    
232200000000000000918300003            
261  091    0104090000000113994   065707
262HY4701    000                  528002
2637165740000000000000WV1EXZN          
2646147110063227                        
265XYZ TANK EN W>DIDAM            
2663196734000478388462281         282  
2670900673002008PIN 0050123            
268  00000010070000            0000000  
269000000000000000000000000067300172209

OR

210  0011205359724900310966604          
211  0000000074242000259000400000168    
212  00008910000022000       10122  002
213             00000000000000          
220                10122188RA0DEA000E  
225  NNNN CUSTOMER CARE CENTE          
226  Make me Feel.....                  
232100000000000000918300007            
235  Andy ABCstraat                  
236  1328 LA  ABC                    
240  12213216/3005038988                

OR

210  0011204539011070584139314          
211  0000000510682000059022000000251    
212  00001310000022000       00112  112
213             000000000000000767090702
220PR20085000518467PRGTI20095001567637  
222  NONREF          00                
225  ABC COFFEE BV            
226  BBA REGIO RDAM          
232100000000000000918300008            
240  BBB UTILITEITSBOUW REGIO WEST      
240  FACTUUR*EN*  4200703047            
240  4200704195                        
245  10000000000021515                  

OR

210  0011204814464860129264202          
211  0000017682122022259022000000001    
212  00000110000022000       13132  002
213             000000000000004977090702
220PR20080000397952PRGTI20090001103908  
222  NONREF          00                
225  XYZ PHARMA BV                    
226  ABCMEDICA XYZ B.V.            
232200000000000000918300001            
240  225000/9801578 DIVERSE ORDERS XX  
240  225000/09801578XX                  
240  6222NL MAASTRICHT                  
253000000
0
Comment
Question by:kiran_bajaj
  • 9
  • 6
18 Comments
 
LVL 3

Expert Comment

by:cmorbach
Comment Utility
Here a quick and dirty solution in VBA tested on you example file in Excel. There comes an input-file-dialog where you can chose the file. The transposed file contains semicolon-separated values. You can import them in Excel, for instance. Feel free to improve and modify.


Public Function OpenSingleFile()

Dim Filter As String, Title As String

Dim FilterIndex As Integer

Dim Filename As Variant

' File filters

Filter = "All Files (*.*),*.*"
 

FilterIndex = 1
 

Title = "Select a File to Open"
 
 

With Application

    Filename = .GetOpenFilename(Filter, FilterIndex, Title)

End With
 
 

If Filename = False Then

    MsgBox "No file was selected."

    Exit Function

End If

OpenSingleFile = Filename
 

End Function
 

Public Sub test()

Const mMax = 999
 

file = OpenSingleFile
 

Dim line

Dim fso As New FileSystemObject

Dim ofs
 

Dim table(1 To mMax, 1 To mMax) As String

Dim numElements(1 To mMax) As Integer
 

'read lines to table

Set ofs = fso.OpenTextFile(file)
 

Do Until ofs.AtEndOfStream

    line = ofs.ReadLine

    mKey = Mid(line, 1, 3)

    mValue = Mid(line, 4)

    

    

    If (mKey <= mMax) Then

    

    'header

    If numElements(mKey) = 0 Then

        table(mKey, numElements(mKey) + 1) = mKey

        numElements(mKey) = numElements(mKey) + 1

    End If

    

    'body

    table(mKey, numElements(mKey) + 1) = mValue

    

    numElements(mKey) = numElements(mKey) + 1

    End If
 

Loop
 
 

delim = ";"
 
 

'determine range

rangefrom = 0

rangeto = 0
 

For i = 1 To mMax

    If numElements(i) >= 1 Then

        If rangefrom = 0 Then rangefrom = i

        rangeto = i

    End If

Next i
 
 

'output transposed table

Dim outlines(1 To mMax) As String
 

For i = 1 To mMax

    

    For j = rangefrom To rangeto

        If numElements(j) >= i Then

            outlines(i) = outlines(i) & table(j, i) & delim

        Else

            outlines(i) = outlines(i) & delim

        End If

    Next j

Next i
 
 
 

Set txtstr = fso.CreateTextFile(file & ".txt", True)
 

For k = 1 To mMax

    lens = Len(outlines(k))

    minlen = (rangeto - rangefrom) + 1  'throw away empty lines

    If lens > minlen Then

        txtstr.WriteLine outlines(k)

    End If

Next k
 
 

End Sub

Open in new window

0
 

Author Comment

by:kiran_bajaj
Comment Utility
Hi cmorbach

Thx for the solution

There is one glitch though
When i run it i get the following error "Compile Error: User-defined type not defined" against the code line "Dim fso As New FileSystemObject"; and then the yellow highlighter is focussed on the Public Sub test() function

Sorry if this question is very basic, but if u could help me close this, it would be great

Thx & Rgds
Kiran Bajaj
Error-in-Solution---Read-a-text-.JPG
0
 
LVL 3

Expert Comment

by:cmorbach
Comment Utility
Sorry, I forgot to mention: You need to have a reference to the "Microsoft Scripting Runtime" (srcrun.dll)

In VBA go to menu item Extras -> References -> and select "Microsoft Scripting Runtime" in the list.
0
 
LVL 3

Expert Comment

by:cmorbach
Comment Utility
No, it's not called Extras (like in the german version), but Tools - see:

http://www.experts-exchange.com/Microsoft/Development/MS_Access/Access_Coding-Macros/Q_24392448.html
0
 

Author Comment

by:kiran_bajaj
Comment Utility
Hi cmorbach,

Upon compiling the code and running it
i get the error message "Run Time Error '9':
Subscript out of range
this is after i have selected the file

Info on the file This file has 72321 lines (size 2,967 KB)
Also - the code has to look out for tag "210"  to know that it is the start of a record.
If the first 3 chars of the line =210; then it is the start of the record; End of the record is achieved when the next line is Tag 210

If u could let me know what to do next

Thx & Rgds.
Kiran Bajaj.
0
 
LVL 3

Expert Comment

by:cmorbach
Comment Utility
Well, I thought you wanted to practice VBA!?!  ;)

My code is not robust, i.e. the input data has to have the format  XXXDDDD....
where XXX ist a number and DDD... some Data but there MUST be data there or - in line 44 in my post - the code tries to read more than there is to read...

catch this e.g. by testing the strlen of line read from file against 3, if it's larger then continue reading the line (code: above line 43)

Can you? Or should I help?
0
 

Author Comment

by:kiran_bajaj
Comment Utility
Hi cmorbach,

Regarding the comment

the input data has to have the format  XXXDDDD....
where XXX ist a number and DDD... some Data but there MUST be data there <yes data is always there, maybe spaces in some cases, but data always follows the tag>

or - in line 44 in my post - the code tries to read more than there is to read... <the mvalue for which the routine is giving an error is on line no 6915 (mkey = 240  ) and mvalue is "  680-69000803                       "

The whole record for reference
210  0011204263352440484234315          
211  0000045814202000059022000000699    
212  00002310000022000       00112  112
213             000000000000001332090702
220PR20085000524743PRGTI20095001590506  
222  NONREF          00                
225  VOLKERRAIL CONTRACTING            
226  DURA VERMEER RAILINFRA            
232100000000000000918300008            
240  BGC 240609 B60                    
240  680-69000492 680-69000535          
240  680-69000803                      
245  10000000000021182                  

Maybe i dont understand what you are trying to communicate. Can you check why the routine is getting an error for this record

And for info, each record is 40 chars max.

Do let me know (and yes i want to practice vba - but i am a beginner and it is taking some time .... but will surely get there some day ...  :-)

Thx & Rgds,
Kiran Bajaj.
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 3

Expert Comment

by:cmorbach
Comment Utility
Try this modified code. VBA/Excel is very restricted in creating large arrays. If this doesn't work, try to split the input file in 2, 3, ... parts...


Public Function OpenSingleFile()

Dim Filter As String, Title As String

Dim FilterIndex As Integer

Dim Filename As Variant

' File filters

Filter = "All Files (*.*),*.*"
 

FilterIndex = 1
 

Title = "Select a File to Open"
 
 

With Application

    Filename = .GetOpenFilename(Filter, FilterIndex, Title)

End With
 
 

If Filename = False Then

    MsgBox "No file was selected."

    Exit Function

End If

OpenSingleFile = Filename
 

End Function
 

Public Sub test()

Const mMax = 9999
 

file = OpenSingleFile
 

Dim line

Dim fso As New FileSystemObject

Dim ofs
 

Dim table(200 To 300, 1 To mMax) As String

Dim numElements(1 To 9999) As Integer
 

'read lines to table

Set ofs = fso.OpenTextFile(file)
 

c = 0
 

Do Until ofs.AtEndOfStream

    line = ofs.ReadLine

    

    c = c + 1

    Debug.Print c

    

    If c = 8577 Then

        c = c

    End If

    

    

    

    mKey = Mid(line, 1, 3)

    mValue = Mid(line, 4)

    

    

    If (mKey <= mMax) Then

    

        'header

        If numElements(mKey) = 0 Then

            table(mKey, numElements(mKey) + 1) = mKey

            numElements(mKey) = numElements(mKey) + 1

        End If

        

        'body

        table(mKey, numElements(mKey) + 1) = mValue

        

        numElements(mKey) = numElements(mKey) + 1

    Else

        

    End If
 

Loop
 
 

delim = ";"
 
 

'determine range

rangefrom = 0

rangeto = 0
 

For i = 1 To mMax

    If numElements(i) >= 1 Then

        If rangefrom = 0 Then rangefrom = i

        rangeto = i

    End If

Next i
 
 

'output transposed table

Dim outlines(1 To mMax) As String
 

For i = 1 To mMax

    

    For j = rangefrom To rangeto

        If numElements(j) >= i Then

            outlines(i) = outlines(i) & table(j, i) & delim

        Else

            outlines(i) = outlines(i) & delim

        End If

    Next j

Next i
 
 
 

Set txtstr = fso.CreateTextFile(file & ".txt", True)
 

For k = 1 To mMax

    lens = Len(outlines(k))

    minlen = (rangeto - rangefrom) + 1  'throw away empty lines

    If lens > minlen Then

        txtstr.WriteLine outlines(k)

    End If

Next k
 
 

End Sub

Open in new window

0
 

Author Comment

by:kiran_bajaj
Comment Utility
Hi cmorbach,

This does work to a large extent.

I manually split the file into 9 files of approx 8500 (complete records) lines each. I stripped the first file of the header and the last file of the footer

Header Record :
10101T  0H0907024100090702EUR2112004.6  
102000009                              

Footer Record
9900100282641878200006302071416        
991  004283001928009425020512843        

So i got 9 separate files - which i imported into access to carry on with some reconciliation with another table (thru queries)

2 discrepancies in the output
1). The tag 245 does not get aligned with the right record - for e.g. if there are 3 records in a text file and the tag 245 is present for record 2 and 3 (record 1 has no tag 245) - the routine populates values of tag 245 in record 1 and 2 and not in 2 and 3.
2). In case mulitple lines of Tag 240 are present per record - they get spread across multiple lines

Am attaching 3 files to illustrate this
A). test file
B). test.txt.txt file (output of running the execution)
C). excel file (where i have imported the file - to show the discrepancy)

I know this is asking for a lot and in hindsight this question seems to be at least worth 400 points. But if u can also provide me with the script to:

A). Strip the headers and the footers
B). Split the file into approx 8500 lines each (complete records; keeping in mind the new record starts with 210)
C). Provide for processing of multiple files within the \in directory and generate output in the \out directory.

Appreciate yr co-operation thus far.
Thx in advance

Kiran Bajaj.
test.txt
test.txt.txt
Test-Output-Analysis.xls
0
 
LVL 3

Expert Comment

by:cmorbach
Comment Utility
>1). The tag 245 does not get aligned with the right record
True, so far my script just _transposes_ (and combines)...it doesn't consider records


>2). In case mulitple lines of Tag 240 are present per record - they get spread across multiple lines
see 1)


>A). Strip the headers and the footers
>B). Split the file into approx 8500 lines each (complete records; keeping in mind the new record starts with 210)
>C). Provide for processing of multiple files within the \in directory and generate output in the \out directory.
1 and 2 must be fixed so you can really use it later. a) is no problem
With the information that every record contains indices between 210 and 260 or so, i can tweek the vba array so that they can hold more  100000+ lines so you don't need the splitting...or I read line by line until the next 210 record, do the processing and free memory, and continue until the rest of the file

Let me start doing this in about 12 hours.
0
 

Author Comment

by:kiran_bajaj
Comment Utility
Hi cmorbach,

Any success on this.

Thx & Rgds,
Kiran Bajaj.
0
 
LVL 3

Expert Comment

by:cmorbach
Comment Utility
Hi and here you go with your script! I should have written 12 _days_ instead - sorry for waiting.

Have a look at it.
Option Explicit
 
 

Public Function OpenSingleFile()

    Dim Filter As String, Title As String

    Dim FilterIndex As Integer

    Dim Filename As Variant

    

    ' File filters

    Filter = "All Files (*.*),*.*"

    

    FilterIndex = 1

    

    Title = "Select a File to Open"

    

    With Application

        Filename = .GetOpenFilename(Filter, FilterIndex, Title)

    End With

    

    

    If Filename = False Then

        OpenSingleFile = ""

        Exit Function

    End If

    OpenSingleFile = Filename

    

End Function
 

Public Sub TransposeRecords()
 

'constants

Const recordStart As Integer = 210

Const recordMax As Integer = 270

Const recordWidth As Integer = recordMax - recordStart + 1

Const delim As String = ";"
 

Dim iFileName, oFileName As String
 
 

'get input filename

iFileName = OpenSingleFile

If iFileName = "" Then Exit Sub
 
 

'output filename

oFileName = iFileName & "_out.txt"
 
 

'open files

Dim fso As New FileSystemObject

Dim ioFileStream

Dim ooFileStream
 

Set ioFileStream = fso.OpenTextFile(iFileName)

Set ooFileStream = fso.CreateTextFile(oFileName, True)  'create and overwrite
 
 

'read line by line

Dim record(recordStart To recordMax) As String

Dim numLines  As Integer

Dim numRecords  As Integer

Dim outstr As String

Dim line, mKey, mValue As String
 

numLines = 0

numRecords = 0
 

Do Until ioFileStream.AtEndOfStream

startOfLoop:
 

    'read line

    line = ioFileStream.ReadLine

    mKey = Mid(line, 1, 3)

    mValue = Mid(line, 4)

    

    'check for key, ignore if out of range

    If (mKey < recordStart) Or (mKey > recordMax) Then On 1 GoTo startOfLoop

        

    'reading record finished?

    If (numLines > 0) And (mKey = recordStart) Then

        

        'first record? so add header

        If numRecords = 0 Then

            

            outstr = ""

            For c = recordStart To recordMax

                outstr = outstr & c & delim

                record(c) = ""

            Next c

    

            'append to file

            ooFileStream.WriteLine outstr

            

        End If

        

        'create transposition and clear record

        outstr = ""

        For c = recordStart To recordMax

            outstr = outstr & record(c) & delim

            record(c) = ""

        Next c
 

        'append to file

        ooFileStream.WriteLine outstr

        

        If numRecords = 0 Then numRecords = numRecords + 1

    

    End If

    

    'record not finished, so add entry

    record(mKey) = record(mKey) + mValue

    

    'prevent overflow on large files

    If numLines = 0 Then numLines = numLines + 1

Loop
 
 

End Sub

Open in new window

0
 
LVL 3

Expert Comment

by:cmorbach
Comment Utility
one remark: if you want the values in the cells to be stripped by spaces, add the trim() function in line 111 - see code below

second: this script should be able to handle files of (almost) arbitrary size - I tested it with a >100 MB file, so no splitting or such is needed
    'record not finished, so add entry

    record(mKey) = record(mKey) + Trim(mValue)

Open in new window

0
 
LVL 3

Accepted Solution

by:
cmorbach earned 125 total points
Comment Utility
third: there is an error in the code so that first and last record are not written - take this version instead
Option Explicit
 
 

Public Function OpenSingleFile()

    Dim Filter As String, Title As String

    Dim FilterIndex As Integer

    Dim Filename As Variant

    

    ' File filters

    Filter = "All Files (*.*),*.*"

    

    FilterIndex = 1

    

    Title = "Select a File to Open"

    

    With Application

        Filename = .GetOpenFilename(Filter, FilterIndex, Title)

    End With

    

    

    If Filename = False Then

        OpenSingleFile = ""

        Exit Function

    End If

    OpenSingleFile = Filename

    

End Function
 
 

Public Sub TransposeRecords()

 

'constants

Const recordStart As Integer = 210

Const recordMax As Integer = 270

Const recordWidth As Integer = recordMax - recordStart + 1

Const delim As String = ";"

 

Dim iFileName, oFileName As String

 

 

'get input filename

iFileName = OpenSingleFile

If iFileName = "" Then Exit Sub

 

 

'output filename

oFileName = iFileName & "_out.txt"

 

 

'open files

Dim fso As New FileSystemObject

Dim ioFileStream

Dim ooFileStream

 

Set ioFileStream = fso.OpenTextFile(iFileName)

Set ooFileStream = fso.CreateTextFile(oFileName, True)  'create and overwrite

 

 

'read line by line

Dim record(recordStart To recordMax) As String

Dim numLines  As Integer

Dim numRecords  As Integer

Dim outstr As String

Dim line, mKey, mValue As String

Dim c As Integer

Dim eof As Boolean

 

numLines = 0

numRecords = 0
 

Do While True

    eof = ioFileStream.AtEndOfStream

     

    'read line

    If Not eof Then

        line = ioFileStream.ReadLine

        mKey = Mid(line, 1, 3)

        mValue = Mid(line, 4)

    End If

    

    'check for key, ignore if out of range

    If eof Or ((mKey >= recordStart) And (mKey <= recordMax)) Then

           

       'reading record finished?

       If eof Or ((numLines > 0) And (mKey = recordStart)) Then

           

           'first record? so add header

           If numRecords = 0 Then

               

               outstr = ""

               For c = recordStart To recordMax

                   outstr = outstr & c & delim

               Next c

       

               'append to file

               ooFileStream.WriteLine outstr

               

           End If

           

           'create transposition and clear record

           outstr = ""

           For c = recordStart To recordMax

               outstr = outstr & record(c) & delim

               record(c) = ""

           Next c

    

           'append to file

           ooFileStream.WriteLine outstr

           

           'eof?

           If eof Then Exit Do

           

           If numRecords = 0 Then numRecords = numRecords + 1

       

       End If 'reading record finished?

       

       'record not finished, so add entry

       record(mKey) = record(mKey) + Trim(mValue)

       

       'prevent overflow on large files

       If numLines = 0 Then numLines = numLines + 1

       

    End If 'check for key, ignore if out of range

    

Loop
 
 

ioFileStream.Close

ooFileStream.Close
 

 

End Sub

Open in new window

0
 

Author Closing Comment

by:kiran_bajaj
Comment Utility
Thx; cmorbach for the solution. Many Thx. This has got me initiated into VBA. Worked fine with some 5-6 files that i ran through; the transpose is working as expected.
Thx & Rgds,
Kiran Bajaj.
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now