Solved

How do i read a text file using VBA and transpose the content

Posted on 2009-07-08
18
956 Views
Last Modified: 2013-11-10
Hi,

I have a text file that has a structure like this

The first 3 numbers of each row are field tags and are followed by field values
There are variations in the number of fields per record and also cases where one field has multiple value lines and hence the field tag appears multiple times as shown in the examples below (at the end of this problem description)
There is also a possibility of file headers and footers appearing but they have to be ignored

Basically, tag 210 indicates the start of a new record. So, every time a tag 210 is encountered - it indicates a new record.

I need help by way of a solution which
1). Accepts the location of the .txt file(s) as a parameter
2). Starts reading the file(s)
3). Picks up all the record(s) one by one - transposes the content like this

Field Names 210         211        213          220        222        225        226      240(line1)  240(line2)
Values         <value>  <value>  <value>   <value>  <value> <value> <blank> <value>    <value>
                     <value>  <value>  <value>   <value>  <value> <value> <blank> <value>    <value>


4). Saves the resultant .txt/.xls file - with a suffix to the original file name - e.g. input file abc.txt; output file abc_output.txt or abc_output.xls (as the case maybe)

The output can be in the form of an excel workbook (with 50k records per sheet) [shown above]

OR

in the form of another text file with field header and the consequent values being separated by a delimiter " | " [pipe]

Would appreciate if an expert could help me out with a solution to this problem. I can do this in Visual Fox, but there is no s/w installed and hence have to rely on VBA - Excel combination. Would also appreciate if the vba code to be incorporate in the excel sheet is shared (instead of the excel workbook with protected code content). I can use the code to customize the solution further and this would be a learning for me.

Thx in advance

Examples of variations
==================
210  0011204783884620314879439          
211  0000000058001101059023000000020    
212  00689710000022000       31483  002
213             00000000000000          
220                0007091835000076261  
225  ABCD INTERNATIONAL                
226  "XYZ" B.V.                    
232200000000000000918300003            
261  091    0104090000000113994   065707
262HY4701    000                  528002
2637165740000000000000WV1EXZN          
2646147110063227                        
265XYZ TANK EN W>DIDAM            
2663196734000478388462281         282  
2670900673002008PIN 0050123            
268  00000010070000            0000000  
269000000000000000000000000067300172209

OR

210  0011205359724900310966604          
211  0000000074242000259000400000168    
212  00008910000022000       10122  002
213             00000000000000          
220                10122188RA0DEA000E  
225  NNNN CUSTOMER CARE CENTE          
226  Make me Feel.....                  
232100000000000000918300007            
235  Andy ABCstraat                  
236  1328 LA  ABC                    
240  12213216/3005038988                

OR

210  0011204539011070584139314          
211  0000000510682000059022000000251    
212  00001310000022000       00112  112
213             000000000000000767090702
220PR20085000518467PRGTI20095001567637  
222  NONREF          00                
225  ABC COFFEE BV            
226  BBA REGIO RDAM          
232100000000000000918300008            
240  BBB UTILITEITSBOUW REGIO WEST      
240  FACTUUR*EN*  4200703047            
240  4200704195                        
245  10000000000021515                  

OR

210  0011204814464860129264202          
211  0000017682122022259022000000001    
212  00000110000022000       13132  002
213             000000000000004977090702
220PR20080000397952PRGTI20090001103908  
222  NONREF          00                
225  XYZ PHARMA BV                    
226  ABCMEDICA XYZ B.V.            
232200000000000000918300001            
240  225000/9801578 DIVERSE ORDERS XX  
240  225000/09801578XX                  
240  6222NL MAASTRICHT                  
253000000
0
Comment
Question by:kiran_bajaj
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 6
18 Comments
 
LVL 3

Expert Comment

by:cmorbach
ID: 24804093
Here a quick and dirty solution in VBA tested on you example file in Excel. There comes an input-file-dialog where you can chose the file. The transposed file contains semicolon-separated values. You can import them in Excel, for instance. Feel free to improve and modify.

Public Function OpenSingleFile()
Dim Filter As String, Title As String
Dim FilterIndex As Integer
Dim Filename As Variant
' File filters
Filter = "All Files (*.*),*.*"
 
FilterIndex = 1
 
Title = "Select a File to Open"
 
 
With Application
    Filename = .GetOpenFilename(Filter, FilterIndex, Title)
End With
 
 
If Filename = False Then
    MsgBox "No file was selected."
    Exit Function
End If
OpenSingleFile = Filename
 
End Function
 
Public Sub test()
Const mMax = 999
 
file = OpenSingleFile
 
Dim line
Dim fso As New FileSystemObject
Dim ofs
 
Dim table(1 To mMax, 1 To mMax) As String
Dim numElements(1 To mMax) As Integer
 
'read lines to table
Set ofs = fso.OpenTextFile(file)
 
Do Until ofs.AtEndOfStream
    line = ofs.ReadLine
    mKey = Mid(line, 1, 3)
    mValue = Mid(line, 4)
    
    
    If (mKey <= mMax) Then
    
    'header
    If numElements(mKey) = 0 Then
        table(mKey, numElements(mKey) + 1) = mKey
        numElements(mKey) = numElements(mKey) + 1
    End If
    
    'body
    table(mKey, numElements(mKey) + 1) = mValue
    
    numElements(mKey) = numElements(mKey) + 1
    End If
 
Loop
 
 
delim = ";"
 
 
'determine range
rangefrom = 0
rangeto = 0
 
For i = 1 To mMax
    If numElements(i) >= 1 Then
        If rangefrom = 0 Then rangefrom = i
        rangeto = i
    End If
Next i
 
 
'output transposed table
Dim outlines(1 To mMax) As String
 
For i = 1 To mMax
    
    For j = rangefrom To rangeto
        If numElements(j) >= i Then
            outlines(i) = outlines(i) & table(j, i) & delim
        Else
            outlines(i) = outlines(i) & delim
        End If
    Next j
Next i
 
 
 
Set txtstr = fso.CreateTextFile(file & ".txt", True)
 
For k = 1 To mMax
    lens = Len(outlines(k))
    minlen = (rangeto - rangefrom) + 1  'throw away empty lines
    If lens > minlen Then
        txtstr.WriteLine outlines(k)
    End If
Next k
 
 
End Sub

Open in new window

0
 

Author Comment

by:kiran_bajaj
ID: 24806420
Hi cmorbach

Thx for the solution

There is one glitch though
When i run it i get the following error "Compile Error: User-defined type not defined" against the code line "Dim fso As New FileSystemObject"; and then the yellow highlighter is focussed on the Public Sub test() function

Sorry if this question is very basic, but if u could help me close this, it would be great

Thx & Rgds
Kiran Bajaj
Error-in-Solution---Read-a-text-.JPG
0
 
LVL 3

Expert Comment

by:cmorbach
ID: 24806456
Sorry, I forgot to mention: You need to have a reference to the "Microsoft Scripting Runtime" (srcrun.dll)

In VBA go to menu item Extras -> References -> and select "Microsoft Scripting Runtime" in the list.
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 
LVL 3

Expert Comment

by:cmorbach
ID: 24806486
No, it's not called Extras (like in the german version), but Tools - see:

http://www.experts-exchange.com/Microsoft/Development/MS_Access/Access_Coding-Macros/Q_24392448.html
0
 

Author Comment

by:kiran_bajaj
ID: 24806581
Hi cmorbach,

Upon compiling the code and running it
i get the error message "Run Time Error '9':
Subscript out of range
this is after i have selected the file

Info on the file This file has 72321 lines (size 2,967 KB)
Also - the code has to look out for tag "210"  to know that it is the start of a record.
If the first 3 chars of the line =210; then it is the start of the record; End of the record is achieved when the next line is Tag 210

If u could let me know what to do next

Thx & Rgds.
Kiran Bajaj.
0
 
LVL 3

Expert Comment

by:cmorbach
ID: 24806719
Well, I thought you wanted to practice VBA!?!  ;)

My code is not robust, i.e. the input data has to have the format  XXXDDDD....
where XXX ist a number and DDD... some Data but there MUST be data there or - in line 44 in my post - the code tries to read more than there is to read...

catch this e.g. by testing the strlen of line read from file against 3, if it's larger then continue reading the line (code: above line 43)

Can you? Or should I help?
0
 

Author Comment

by:kiran_bajaj
ID: 24806989
Hi cmorbach,

Regarding the comment

the input data has to have the format  XXXDDDD....
where XXX ist a number and DDD... some Data but there MUST be data there <yes data is always there, maybe spaces in some cases, but data always follows the tag>

or - in line 44 in my post - the code tries to read more than there is to read... <the mvalue for which the routine is giving an error is on line no 6915 (mkey = 240  ) and mvalue is "  680-69000803                       "

The whole record for reference
210  0011204263352440484234315          
211  0000045814202000059022000000699    
212  00002310000022000       00112  112
213             000000000000001332090702
220PR20085000524743PRGTI20095001590506  
222  NONREF          00                
225  VOLKERRAIL CONTRACTING            
226  DURA VERMEER RAILINFRA            
232100000000000000918300008            
240  BGC 240609 B60                    
240  680-69000492 680-69000535          
240  680-69000803                      
245  10000000000021182                  

Maybe i dont understand what you are trying to communicate. Can you check why the routine is getting an error for this record

And for info, each record is 40 chars max.

Do let me know (and yes i want to practice vba - but i am a beginner and it is taking some time .... but will surely get there some day ...  :-)

Thx & Rgds,
Kiran Bajaj.
0
 
LVL 3

Expert Comment

by:cmorbach
ID: 24807720
Try this modified code. VBA/Excel is very restricted in creating large arrays. If this doesn't work, try to split the input file in 2, 3, ... parts...

Public Function OpenSingleFile()
Dim Filter As String, Title As String
Dim FilterIndex As Integer
Dim Filename As Variant
' File filters
Filter = "All Files (*.*),*.*"
 
FilterIndex = 1
 
Title = "Select a File to Open"
 
 
With Application
    Filename = .GetOpenFilename(Filter, FilterIndex, Title)
End With
 
 
If Filename = False Then
    MsgBox "No file was selected."
    Exit Function
End If
OpenSingleFile = Filename
 
End Function
 
Public Sub test()
Const mMax = 9999
 
file = OpenSingleFile
 
Dim line
Dim fso As New FileSystemObject
Dim ofs
 
Dim table(200 To 300, 1 To mMax) As String
Dim numElements(1 To 9999) As Integer
 
'read lines to table
Set ofs = fso.OpenTextFile(file)
 
c = 0
 
Do Until ofs.AtEndOfStream
    line = ofs.ReadLine
    
    c = c + 1
    Debug.Print c
    
    If c = 8577 Then
        c = c
    End If
    
    
    
    mKey = Mid(line, 1, 3)
    mValue = Mid(line, 4)
    
    
    If (mKey <= mMax) Then
    
        'header
        If numElements(mKey) = 0 Then
            table(mKey, numElements(mKey) + 1) = mKey
            numElements(mKey) = numElements(mKey) + 1
        End If
        
        'body
        table(mKey, numElements(mKey) + 1) = mValue
        
        numElements(mKey) = numElements(mKey) + 1
    Else
        
    End If
 
Loop
 
 
delim = ";"
 
 
'determine range
rangefrom = 0
rangeto = 0
 
For i = 1 To mMax
    If numElements(i) >= 1 Then
        If rangefrom = 0 Then rangefrom = i
        rangeto = i
    End If
Next i
 
 
'output transposed table
Dim outlines(1 To mMax) As String
 
For i = 1 To mMax
    
    For j = rangefrom To rangeto
        If numElements(j) >= i Then
            outlines(i) = outlines(i) & table(j, i) & delim
        Else
            outlines(i) = outlines(i) & delim
        End If
    Next j
Next i
 
 
 
Set txtstr = fso.CreateTextFile(file & ".txt", True)
 
For k = 1 To mMax
    lens = Len(outlines(k))
    minlen = (rangeto - rangefrom) + 1  'throw away empty lines
    If lens > minlen Then
        txtstr.WriteLine outlines(k)
    End If
Next k
 
 
End Sub

Open in new window

0
 

Author Comment

by:kiran_bajaj
ID: 24812173
Hi cmorbach,

This does work to a large extent.

I manually split the file into 9 files of approx 8500 (complete records) lines each. I stripped the first file of the header and the last file of the footer

Header Record :
10101T  0H0907024100090702EUR2112004.6  
102000009                              

Footer Record
9900100282641878200006302071416        
991  004283001928009425020512843        

So i got 9 separate files - which i imported into access to carry on with some reconciliation with another table (thru queries)

2 discrepancies in the output
1). The tag 245 does not get aligned with the right record - for e.g. if there are 3 records in a text file and the tag 245 is present for record 2 and 3 (record 1 has no tag 245) - the routine populates values of tag 245 in record 1 and 2 and not in 2 and 3.
2). In case mulitple lines of Tag 240 are present per record - they get spread across multiple lines

Am attaching 3 files to illustrate this
A). test file
B). test.txt.txt file (output of running the execution)
C). excel file (where i have imported the file - to show the discrepancy)

I know this is asking for a lot and in hindsight this question seems to be at least worth 400 points. But if u can also provide me with the script to:

A). Strip the headers and the footers
B). Split the file into approx 8500 lines each (complete records; keeping in mind the new record starts with 210)
C). Provide for processing of multiple files within the \in directory and generate output in the \out directory.

Appreciate yr co-operation thus far.
Thx in advance

Kiran Bajaj.
test.txt
test.txt.txt
Test-Output-Analysis.xls
0
 
LVL 3

Expert Comment

by:cmorbach
ID: 24819630
>1). The tag 245 does not get aligned with the right record
True, so far my script just _transposes_ (and combines)...it doesn't consider records


>2). In case mulitple lines of Tag 240 are present per record - they get spread across multiple lines
see 1)


>A). Strip the headers and the footers
>B). Split the file into approx 8500 lines each (complete records; keeping in mind the new record starts with 210)
>C). Provide for processing of multiple files within the \in directory and generate output in the \out directory.
1 and 2 must be fixed so you can really use it later. a) is no problem
With the information that every record contains indices between 210 and 260 or so, i can tweek the vba array so that they can hold more  100000+ lines so you don't need the splitting...or I read line by line until the next 210 record, do the processing and free memory, and continue until the rest of the file

Let me start doing this in about 12 hours.
0
 

Author Comment

by:kiran_bajaj
ID: 24847035
Hi cmorbach,

Any success on this.

Thx & Rgds,
Kiran Bajaj.
0
 
LVL 3

Expert Comment

by:cmorbach
ID: 24910567
Hi and here you go with your script! I should have written 12 _days_ instead - sorry for waiting.

Have a look at it.
Option Explicit
 
 
Public Function OpenSingleFile()
    Dim Filter As String, Title As String
    Dim FilterIndex As Integer
    Dim Filename As Variant
    
    ' File filters
    Filter = "All Files (*.*),*.*"
    
    FilterIndex = 1
    
    Title = "Select a File to Open"
    
    With Application
        Filename = .GetOpenFilename(Filter, FilterIndex, Title)
    End With
    
    
    If Filename = False Then
        OpenSingleFile = ""
        Exit Function
    End If
    OpenSingleFile = Filename
    
End Function
 
Public Sub TransposeRecords()
 
'constants
Const recordStart As Integer = 210
Const recordMax As Integer = 270
Const recordWidth As Integer = recordMax - recordStart + 1
Const delim As String = ";"
 
Dim iFileName, oFileName As String
 
 
'get input filename
iFileName = OpenSingleFile
If iFileName = "" Then Exit Sub
 
 
'output filename
oFileName = iFileName & "_out.txt"
 
 
'open files
Dim fso As New FileSystemObject
Dim ioFileStream
Dim ooFileStream
 
Set ioFileStream = fso.OpenTextFile(iFileName)
Set ooFileStream = fso.CreateTextFile(oFileName, True)  'create and overwrite
 
 
'read line by line
Dim record(recordStart To recordMax) As String
Dim numLines  As Integer
Dim numRecords  As Integer
Dim outstr As String
Dim line, mKey, mValue As String
 
numLines = 0
numRecords = 0
 
Do Until ioFileStream.AtEndOfStream
startOfLoop:
 
    'read line
    line = ioFileStream.ReadLine
    mKey = Mid(line, 1, 3)
    mValue = Mid(line, 4)
    
    'check for key, ignore if out of range
    If (mKey < recordStart) Or (mKey > recordMax) Then On 1 GoTo startOfLoop
        
    'reading record finished?
    If (numLines > 0) And (mKey = recordStart) Then
        
        'first record? so add header
        If numRecords = 0 Then
            
            outstr = ""
            For c = recordStart To recordMax
                outstr = outstr & c & delim
                record(c) = ""
            Next c
    
            'append to file
            ooFileStream.WriteLine outstr
            
        End If
        
        'create transposition and clear record
        outstr = ""
        For c = recordStart To recordMax
            outstr = outstr & record(c) & delim
            record(c) = ""
        Next c
 
        'append to file
        ooFileStream.WriteLine outstr
        
        If numRecords = 0 Then numRecords = numRecords + 1
    
    End If
    
    'record not finished, so add entry
    record(mKey) = record(mKey) + mValue
    
    'prevent overflow on large files
    If numLines = 0 Then numLines = numLines + 1
Loop
 
 
End Sub

Open in new window

0
 
LVL 3

Expert Comment

by:cmorbach
ID: 24910629
one remark: if you want the values in the cells to be stripped by spaces, add the trim() function in line 111 - see code below

second: this script should be able to handle files of (almost) arbitrary size - I tested it with a >100 MB file, so no splitting or such is needed
    'record not finished, so add entry
    record(mKey) = record(mKey) + Trim(mValue)

Open in new window

0
 
LVL 3

Accepted Solution

by:
cmorbach earned 125 total points
ID: 24910786
third: there is an error in the code so that first and last record are not written - take this version instead
Option Explicit
 
 
Public Function OpenSingleFile()
    Dim Filter As String, Title As String
    Dim FilterIndex As Integer
    Dim Filename As Variant
    
    ' File filters
    Filter = "All Files (*.*),*.*"
    
    FilterIndex = 1
    
    Title = "Select a File to Open"
    
    With Application
        Filename = .GetOpenFilename(Filter, FilterIndex, Title)
    End With
    
    
    If Filename = False Then
        OpenSingleFile = ""
        Exit Function
    End If
    OpenSingleFile = Filename
    
End Function
 
 
Public Sub TransposeRecords()
 
'constants
Const recordStart As Integer = 210
Const recordMax As Integer = 270
Const recordWidth As Integer = recordMax - recordStart + 1
Const delim As String = ";"
 
Dim iFileName, oFileName As String
 
 
'get input filename
iFileName = OpenSingleFile
If iFileName = "" Then Exit Sub
 
 
'output filename
oFileName = iFileName & "_out.txt"
 
 
'open files
Dim fso As New FileSystemObject
Dim ioFileStream
Dim ooFileStream
 
Set ioFileStream = fso.OpenTextFile(iFileName)
Set ooFileStream = fso.CreateTextFile(oFileName, True)  'create and overwrite
 
 
'read line by line
Dim record(recordStart To recordMax) As String
Dim numLines  As Integer
Dim numRecords  As Integer
Dim outstr As String
Dim line, mKey, mValue As String
Dim c As Integer
Dim eof As Boolean
 
numLines = 0
numRecords = 0
 
Do While True
    eof = ioFileStream.AtEndOfStream
     
    'read line
    If Not eof Then
        line = ioFileStream.ReadLine
        mKey = Mid(line, 1, 3)
        mValue = Mid(line, 4)
    End If
    
    'check for key, ignore if out of range
    If eof Or ((mKey >= recordStart) And (mKey <= recordMax)) Then
           
       'reading record finished?
       If eof Or ((numLines > 0) And (mKey = recordStart)) Then
           
           'first record? so add header
           If numRecords = 0 Then
               
               outstr = ""
               For c = recordStart To recordMax
                   outstr = outstr & c & delim
               Next c
       
               'append to file
               ooFileStream.WriteLine outstr
               
           End If
           
           'create transposition and clear record
           outstr = ""
           For c = recordStart To recordMax
               outstr = outstr & record(c) & delim
               record(c) = ""
           Next c
    
           'append to file
           ooFileStream.WriteLine outstr
           
           'eof?
           If eof Then Exit Do
           
           If numRecords = 0 Then numRecords = numRecords + 1
       
       End If 'reading record finished?
       
       'record not finished, so add entry
       record(mKey) = record(mKey) + Trim(mValue)
       
       'prevent overflow on large files
       If numLines = 0 Then numLines = numLines + 1
       
    End If 'check for key, ignore if out of range
    
Loop
 
 
ioFileStream.Close
ooFileStream.Close
 
 
End Sub

Open in new window

0
 

Author Closing Comment

by:kiran_bajaj
ID: 31607761
Thx; cmorbach for the solution. Many Thx. This has got me initiated into VBA. Worked fine with some 5-6 files that i ran through; the transpose is working as expected.
Thx & Rgds,
Kiran Bajaj.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Some code to ensure data integrity when using macros within Excel. Also included code that helps secure your data within an Excel workbook.
This article describes how to use a set of graphical playing cards to create a Draw Poker game in Excel or VB6.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question