Link to home
Start Free TrialLog in
Avatar of forever7
forever7

asked on

Importing data and outputting to excel

Hi

Hopefully the request below is possible!

I have a large number of text files with similar data. Each file contains header information and coordinate information.

What is required is to import the coordinate information in the text file to an excel spreadsheet. For example, if there are 100 profiles, I would be able to import all files and the coordinate data will ouput to a spreadsheet.

Please note that only the first 4 coordinates are required to be imported (i.e. The "003" and "004" attributes only)

Example file

ABC=Any Text
FGH=0.78990
YYY=02/12/2010
RRR=PC34522
+00.0000 +00.0000 003
+01.4301 +00.0229 004
+05.0447 +00.1422 004
+06.4763 +00.1573 003
-03.9258 -00.2992 001
-06.4687 +07.3191 000
-06.4687 +07.3191 000
+03.7969 +07.1102 000
+14.4819 +06.7197 000
+14.5178 +01.4609 000
+12.0899 -00.0104 002

What is required is given in the attached spreadsheet.

Any assitance greatlty appreciated
File001.txt
output.xlsx
Avatar of redmondb
redmondb
Flag of Afghanistan image

Hi, forever7.

Please see attached (code below). Couple of things...
(1) I didn't bother with formatting the output - let's get the rest sorted first!
(2) The macro opens a "file open" dialogue for you to select the text files you want. Is that OK, or do you want all .txt files to be opened?
Option Explicit

Sub Process_Text_Files()
Dim basebook As Workbook

'Add workbook with one sheet
Set basebook = Workbooks.Add(xlWBATWorksheet)

Call Get_TXT_Files_Space_Delimited(basebook)

Call Process_Sheets(basebook)

basebook.Sheets("Processed_Text_Files").Move

basebook.Close savechanges:=False

End Sub

Sub Process_Sheets(basebook As Workbook)
Dim xNewSheet As Worksheet
Dim xOldSheet As Worksheet
Dim xSheet    As Worksheet
Dim xResponse As Long
Dim xRow      As Long
Dim xHold     As String

For Each xSheet In basebook.Worksheets
    xHold = xHold & xSheet.Name & " # " & Chr(9)
Next

Set xNewSheet = basebook.Sheets.Add
xNewSheet.Name = "Processed_Text_Files"
xNewSheet.Range("B1:H1") = Array("003", , "004", , "004", , "003")
xNewSheet.Range("B2:I2") = Array("X", "Y", "X", "Y", "X", "Y", "X", "Y")
xRow = 3


For Each xOldSheet In ActiveWorkbook.Sheets
    
    If xOldSheet.Name <> "Processed_Text_Files" Then
        xNewSheet.Cells(xRow, 1).Value = xOldSheet.Name
        xOldSheet.Range("A5:B5").Copy xNewSheet.Cells(xRow, 2)
        xOldSheet.Range("A6:B6").Copy xNewSheet.Cells(xRow, 4)
        xOldSheet.Range("A7:B7").Copy xNewSheet.Cells(xRow, 6)
        xOldSheet.Range("A8:B8").Copy xNewSheet.Cells(xRow, 8)
        xRow = xRow + 1
    End If

Next
    
End Sub

Sub Get_TXT_Files_Space_Delimited(basebook As Workbook)
'For Excel 2000 and higher
'Careful - This strips leading and trailing blanks!
Dim Fnum As Long
Dim mysheet As Worksheet
Dim TxtFileNames As Variant
Dim QTable As QueryTable

TxtFileNames = Application.GetOpenFilename(filefilter:="TXT Files (*.txt), *.txt", MultiSelect:=True)

If IsArray(TxtFileNames) Then

    On Error GoTo Cleanup

    With Application
        .ScreenUpdating = False
        .EnableEvents = False
    End With

    'Loop through the array with txt files
    For Fnum = LBound(TxtFileNames) To UBound(TxtFileNames)

        'Add a new worksheet for the name of the txt file
        Set mysheet = Worksheets.Add(After:=basebook.Sheets(basebook.Sheets.Count))
        On Error Resume Next
            mysheet.Name = Right(TxtFileNames(Fnum), Len(TxtFileNames(Fnum)) - InStrRev(TxtFileNames(Fnum), "\", , 1))
        On Error GoTo 0

        With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & TxtFileNames(Fnum), Destination:=Range("A1"))
            .TextFilePlatform = xlWindows
            .TextFileStartRow = 1
            .TextFileParseType = xlDelimited
            .TextFileSpaceDelimiter = True
            .Refresh BackgroundQuery:=False
        End With
        
        ActiveSheet.QueryTables(1).Delete
        
    Next Fnum

    'Delete the first sheet of basebook
    On Error Resume Next
    Application.DisplayAlerts = False
        basebook.Worksheets(1).Delete
    Application.DisplayAlerts = True
    On Error GoTo 0

Cleanup:

    With Application
        .ScreenUpdating = True
        .EnableEvents = True
    End With
    
End If

End Sub

Open in new window

Regards,
Brian.Process-Text-Files.xlsm
Here's another solution.

1. Put this workbook in the same directory as the text files.
2. Click the button.

Edit: Tweaked the file a bit.
LoopThroughText2.xlsm
Avatar of forever7
forever7

ASKER

Thanks very much for both posts.

Brian - your solution is the ouput I was looking for. To answer your question, the "file open" dialogue is what I needed. A few things :

1) The files actually have extensions *.SC0 through to *.SC9 ( as well as *.txt files).

2) The files are split between header information and coordinate information. There can be much more "header" information in the files and the "coordinate" information can be several rows down and may not be in the same position. What will be constant is that the coordinate information will always be after the header information. I attach another example.

kind regards
file003.txt
I should have added. Could these comments be incorporated too. Thanks again
forever7,

(1) The different file types will  be awkward - unless we display "*.*". Are there other files that you wish to exclude?

(2) OK. The entries before the ones you want are effectively random - which might cause a spurious match. So, how about searching from the bottom? Are all the entries below the ones you want in the same format? Are they always in the order "003", "004", "004", "003"?

Thanks,
Brian.
Hi Brian,

1) as long as (*.SC?) files can be imported then *.* would be ok. There will ever only be these type of files at a time. they are effectivley text files anyway.

2) It will always be the fist 4 rows of the co-odinate data.

"003", "004", "004", "003" will be the order 99% of the time. On some occasions you might get "003", "004", "003", "004". However, the order will always be the same in all the files that are in "one batch" to be imported.

I am assuming the code writes "003", "004", "004", "003" to cells rather than it reading the "003" etc from the text file? - This is ok for me beacuse if on the 1% occasion they are the wrong way aroung - I can alter the column headers to suit. When I come to view the tabulated data, it would be easy to spot if one of the files imported had the coordinates in the wrong order.

kind regards
Thanks, forever7.

I'd done the attached before seeing your latest post, so I'm afraid it's a bit out of date. However, it's bed-time here, and this is my last post for the night so I'll give it to you to play with. Hopefully, I should be back here in seven or eight hours.

A few points...
(1) First the good - it'll process any file extensions you throw at it and it'll handle your 99% (with a proviso, see below). Please test these. I've also included your comments.
(2) Even with the 99%, if the last header entry has data in column C then the macro will get upset (not destructively - it'll prompt you to ignore this text file or to terminate the whole run).
(3) It won't like the 1% and will throw the ignore/terminate message mentioned in (2).
(4) An afterthought - all the header entries appear to include an equal sign. Is that universal?

Option Explicit

Sub Process_Text_Files()
Dim basebook As Workbook

'Add workbook with one sheet
Set basebook = Workbooks.Add(xlWBATWorksheet)

Call Get_TXT_Files_Space_Delimited(basebook)

Call Process_Sheets(basebook)

basebook.Sheets("Processed_Text_Files").Move

basebook.Close savechanges:=False

End Sub

Sub Process_Sheets(basebook As Workbook)
Dim xNewSheet As Worksheet
Dim xOldSheet As Worksheet
Dim xSheet    As Worksheet
Dim xResponse As Long
Dim xRow      As Long
Dim xFirst    As Range

' Create the output Sheet (and its headers)...
Set xNewSheet = basebook.Sheets.Add
xNewSheet.Name = "Processed_Text_Files"
xNewSheet.Range("B1:H1") = Array("003", , "004", , "004", , "003")
xNewSheet.Range("B2:I2") = Array("X", "Y", "X", "Y", "X", "Y", "X", "Y")
xRow = 3

' Process the files' Sheets (ignoring the output sheet)...
For Each xOldSheet In ActiveWorkbook.Sheets
    
    If xOldSheet.Name <> "Processed_Text_Files" Then
    
        ' Find first coordinate.
        ' The files are split between header information and coordinate information. There can be much more "header" information in the files
        ' and the "coordinate" information can be several rows down and may not be in the same position. What will be constant is that the
        ' coordinate information will always be after the header information.
        ' So we go to the bottom and "Ctrl-Up" to, hopefully, find the first coordinate entry.
        Set xFirst = xOldSheet.Range("C" & xOldSheet.Range("A1").SpecialCells(xlLastCell).Row).End(xlUp)
        ' However, if our luck is bad then the last header entry will also have data in Column C and we'll end up there.
        ' So we need to check that we're in the right place...
        If xFirst <> 3 Or xFirst.Offset(1, 0) <> 4 Or xFirst.Offset(2, 0) <> 4 Or xFirst.Offset(3, 0) <> 3 Then
            ' Oops...
            xResponse = MsgBox(xOldSheet.Name & "'s layout not recognised." & Chr(10) _
                            & """OK"" to skip this file or ""Cancel"" to terminate run.", vbOKCancel, "Process Text Files")
            If xResponse = 2 Then
                MsgBox ("Run terminating...")
                Exit Sub
            End If
        Else
            ' All OK. so copy this file's first four coordinates to the output sheet...
            xNewSheet.Cells(xRow, 1).Value = xOldSheet.Name
            xFirst.Offset(0, -2).Resize(1, 2).Copy xNewSheet.Cells(xRow, 2)
            xFirst.Offset(1, -2).Resize(1, 2).Copy xNewSheet.Cells(xRow, 4)
            xFirst.Offset(2, -2).Resize(1, 2).Copy xNewSheet.Cells(xRow, 6)
            xFirst.Offset(3, -2).Resize(1, 2).Copy xNewSheet.Cells(xRow, 8)
            xRow = xRow + 1
        End If
    End If

Next
    
End Sub

Sub Get_TXT_Files_Space_Delimited(basebook As Workbook)
'For Excel 2000 and higher
'Careful - This strips leading and trailing blanks!
Dim Fnum As Long
Dim mysheet As Worksheet
Dim TxtFileNames As Variant
Dim QTable As QueryTable

' The files have extensions *.SC0 through to *.SC9 (as well as *.txt files).
TxtFileNames = Application.GetOpenFilename(filefilter:="All Files (*.*), *.*", MultiSelect:=True)

If IsArray(TxtFileNames) Then

    On Error GoTo Cleanup

    With Application
        .ScreenUpdating = False
        .EnableEvents = False
    End With

    'Loop through the array with txt files
    For Fnum = LBound(TxtFileNames) To UBound(TxtFileNames)

        'Add a new worksheet for the name of the txt file
        Set mysheet = Worksheets.Add(After:=basebook.Sheets(basebook.Sheets.Count))
        On Error Resume Next
            mysheet.Name = Right(TxtFileNames(Fnum), Len(TxtFileNames(Fnum)) - InStrRev(TxtFileNames(Fnum), "\", , 1))
        On Error GoTo 0

        With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & TxtFileNames(Fnum), Destination:=Range("A1"))
            .TextFilePlatform = xlWindows
            .TextFileStartRow = 1
            .TextFileParseType = xlDelimited
            .TextFileSpaceDelimiter = True
            .Refresh BackgroundQuery:=False
        End With
        
        ActiveSheet.QueryTables(1).Delete
        
    Next Fnum

    'Delete the first sheet of basebook
    On Error Resume Next
    Application.DisplayAlerts = False
        basebook.Worksheets(1).Delete
    Application.DisplayAlerts = True
    On Error GoTo 0

Cleanup:

    With Application
        .ScreenUpdating = True
        .EnableEvents = True
    End With
    
End If

End Sub

Open in new window

Regards,
Brian.Process-Text-Files-V2.xlsm
Hi Brian,

The solution above is excellent - I ran it on batch of files and picked up some rogue data. (e.g there was no attribute 003 first. this is an added bonus.

I was reviewing some of our data files and it seems that we have more "003", "004", "003", "004" than we previously thought. But as i mentioned before, the batches of files are either one or the other (unless it is rogue/dodgy data).

the solution above (v2 works). Would it be possible to have two versions (i.e one that searches for "003", "004", "004", "003" and one for "003", "004", "003", "004". this are the only combinations that will appear.

thanks again and very much appreciate this.
forever7,

No problem and thanks for the kind words!

(1) I think two macros would be a mistake - it's more work to maintain two than one and they're bound to get subtly out of synch at some stage in their life. Instead, I can just change the macro to handle both types. That OK?

(2) Did you come across entries where the last header entry had data in column 3? If not, does that mean that it never happens?

Thanks,
Brian.
Brian,

1) Even better- on macro would be ideal.

2) I am not sure what you mean by column 3. By "column 3" doy mean the "003", "004" etc
forever7,

(1) OK, I'll get on to it now.

(2) Correct. The data is read in "space-delimited". There are 2 spaces in the coordinate entries and so that translates to 3 fields. The scary bit is that if the last header was something like "NAME=Test file II" then the "II" would end up in column 3 immediately above the coordinates and so would get selected instead of the first coordinate entry. Hence (2) in my last post - any idea?

edit:
Oops, do you want the output sheet's heading to be set by the Version ("003, 004" etc.) or do you want it to be the same, regardless of the Version?

Until/unless I hear from you to the contrary, I'm going with the former, but it's no biggie either way.

Thanks,
Brian.
forever7,

Here we go...
Option Explicit

Sub Process_Text_Files()
Dim basebook As Workbook

'Add workbook with one sheet
Set basebook = Workbooks.Add(xlWBATWorksheet)

Call Get_TXT_Files_Space_Delimited(basebook)

Call Process_Sheets(basebook)

basebook.Sheets("Processed_Text_Files").Move

basebook.Close savechanges:=False

End Sub

Sub Process_Sheets(basebook As Workbook)
Dim xNewSheet As Worksheet
Dim xOldSheet As Worksheet
Dim xSheet    As Worksheet
Dim xResponse As Long
Dim xRow      As Long
Dim xFirst    As Range
Dim xBad_Set  As Boolean
Dim xSet_Type As String
' Create the output Sheet (and its headers)...
Set xNewSheet = basebook.Sheets.Add
xNewSheet.Name = "Processed_Text_Files"
xNewSheet.Range("B2:I2") = Array("X", "Y", "X", "Y", "X", "Y", "X", "Y")
xRow = 3

' Process the files' Sheets (ignoring the output sheet)...
For Each xOldSheet In ActiveWorkbook.Sheets
    
    If xOldSheet.Name <> "Processed_Text_Files" Then
    
        ' Find first coordinate.
        ' The files are split between header information and coordinate information. There can be much more "header" information in the files
        ' and the "coordinate" information can be several rows down and may not be in the same position. What will be constant is that the
        ' coordinate information will always be after the header information.
        ' So we go to the bottom and "Ctrl-Up" to, hopefully, find the first coordinate entry.
        Set xFirst = xOldSheet.Range("C" & xOldSheet.Range("A1").SpecialCells(xlLastCell).Row).End(xlUp)
        ' However, if our luck is bad then the last header entry will also have data in Column C and we'll end up there.
        ' Also, there are two layouts for the first four rows of coordinates "003", "004", "004", "003" (the majority) and
        ' "003", "004", "003", "004". (Arbitrarily, the first of these is Set A and the second is Second B.) Each run will be composed
        ' entirely of one Set or the other. So we need to check each file...
        ' (A) If it's the first file, this tells us the Set type (and allows us to complete the output sheet's heading).
        ' (B) Do subsequent files agree with the initial Set type?
        xBad_Set = False
        If xFirst = 3 And xFirst.Offset(1, 0) = 4 And xFirst.Offset(2, 0) = 4 And xFirst.Offset(3, 0) = 3 Then
            If xSet_Type = "" Then
                xSet_Type = "A"
                xNewSheet.Range("B1:H1") = Array("'003", , "'004", , "'004", , "'003")
            End If
            If xSet_Type <> "A" Then xBad_Set = True
        ElseIf xFirst = 3 And xFirst.Offset(1, 0) = 4 And xFirst.Offset(2, 0) = 3 And xFirst.Offset(3, 0) = 4 Then
            If xSet_Type = "" Then
                xNewSheet.Range("B1:H1") = Array("'003", , "'004", , "'003", , "'004")
                xSet_Type = "B"
            End If
            If xSet_Type <> "B" Then xBad_Set = True
        Else
            xBad_Set = True
        End If
                
        If xBad_Set Then
            ' Oops...
            Debug.Print xOldSheet.Name & "'s layout not recognised."
            xResponse = MsgBox(xOldSheet.Name & "'s layout not recognised." & Chr(10) _
                            & """OK"" to skip this file or ""Cancel"" to terminate run.", vbOKCancel, "Process Text Files")
            If xResponse = 2 Then
                MsgBox ("Run terminating...")
                Exit Sub
            End If
        Else
            ' All OK. so copy this file's first four coordinates to the output sheet...
            xNewSheet.Cells(xRow, 1).Value = xOldSheet.Name
            xFirst.Offset(0, -2).Resize(1, 2).Copy xNewSheet.Cells(xRow, 2)
            xFirst.Offset(1, -2).Resize(1, 2).Copy xNewSheet.Cells(xRow, 4)
            xFirst.Offset(2, -2).Resize(1, 2).Copy xNewSheet.Cells(xRow, 6)
            xFirst.Offset(3, -2).Resize(1, 2).Copy xNewSheet.Cells(xRow, 8)
            xRow = xRow + 1
        End If
    End If

Next
    
End Sub

Sub Get_TXT_Files_Space_Delimited(basebook As Workbook)
'For Excel 2000 and higher
'Careful - This strips leading and trailing blanks!
Dim Fnum As Long
Dim mysheet As Worksheet
Dim TxtFileNames As Variant
Dim QTable As QueryTable

' The files have extensions *.SC0 through to *.SC9 (as well as *.txt files).
TxtFileNames = Application.GetOpenFilename(filefilter:="All Files (*.*), *.*", MultiSelect:=True)

If IsArray(TxtFileNames) Then

    On Error GoTo Cleanup

    With Application
        .ScreenUpdating = False
        .EnableEvents = False
    End With

    'Loop through the array with txt files
    For Fnum = LBound(TxtFileNames) To UBound(TxtFileNames)

        'Add a new worksheet for the name of the txt file
        Set mysheet = Worksheets.Add(After:=basebook.Sheets(basebook.Sheets.Count))
        On Error Resume Next
            mysheet.Name = Right(TxtFileNames(Fnum), Len(TxtFileNames(Fnum)) - InStrRev(TxtFileNames(Fnum), "\", , 1))
        On Error GoTo 0

        With ActiveSheet.QueryTables.Add(Connection:="TEXT;" & TxtFileNames(Fnum), Destination:=Range("A1"))
            .TextFilePlatform = xlWindows
            .TextFileStartRow = 1
            .TextFileParseType = xlDelimited
            .TextFileSpaceDelimiter = True
            .Refresh BackgroundQuery:=False
        End With
        
        ActiveSheet.QueryTables(1).Delete
        
    Next Fnum

    'Delete the first sheet of basebook
    On Error Resume Next
    Application.DisplayAlerts = False
        basebook.Worksheets(1).Delete
    Application.DisplayAlerts = True
    On Error GoTo 0

Cleanup:

    With Application
        .ScreenUpdating = True
        .EnableEvents = True
    End With
    
End If

End Sub

Open in new window

Regards,
Brian.Process-Text-Files-V3.xlsm
Hi Brian,

sorry for not replying before now. I am still trying to test it on various scenarios. It flagged a few profiles as "layer not recognised" but i cannot fgiure out why as they are the same as the others in the batch. In fact I tried to run one file and the program did not recognise the layout

I will get back with some comments..

Thanks again for this.
forever7,

Please post the ones with unrecognised layout - it should be easy for me to spot the issue.

Thanks,
Brian.
Brian,

two files uploaded. The "14nam.txt" is not recognised.

Further to the posts :

1) all header information contains an "equals (=)" sign.
2) The last header information could possibly contain a space. The last header information is usually does not. Does this macro search from the bottom.

thanks
10cam.txt
14nam.txt
ASKER CERTIFIED SOLUTION
Avatar of redmondb
redmondb
Flag of Afghanistan image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'll test now and post back shortly.
Brian,

I have ran the files on a number of batches and the macro works on all of them.

The last header information could possibly contain a space
It would need two of them to cause a problem. Is that possible?


It isn't possible - if the spaces are there then then the file will be noticed.

Thank you very much for all your help and patience (especially when I presented you with the different scenarios).  

the code snippets will be useful to try an understand in the future.

Thanks and Kind regards
Thanks, forever7.

I monitor "my" questions for at least a couple of weeks after they're closed, so please feel free to post here if you have any queries about things such as ...
(1) The code.
(2) Files which the code doesn't handle.
(3) Plan B.

Regards,
Brian.