Link to home
Start Free TrialLog in
Avatar of Kanwaljit Singh Dhunna
Kanwaljit Singh DhunnaFlag for India

asked on

Import Data from Multiple Text Files in Excel

Hi Experts,

Objective
I need to import data from Multiple Text files (.txt or .rpt or .lst) into Excel and then Split the same in different columns.
I am doing it manually with a specified set of rules and process flow. I understand it could be automated and so need the help of the experts..

Attachments
I have attached five Sample Text Files and One Excel file. Excel File is the Catalyst to import all the the files.

SheetName and Purpose
Start      To maintain Documentation.
Original      To Store and Update the formula, Logic and pattern of thoughts. Also to be used as Template for the "Working" sheet. Not to be amended otherwise.
Template      To be used for creation of New Sheets.
Working      Here the working will be done and then data will copied on to the new sheets created on the basis of "Template" sheet.
New Sheets      To save the Text files data after processing in "Working" sheet. These sheets will be created using "Template" sheet.

Process Flow
Given in Sheet "Start" in the attached Excel File.

Version
I am using Windows 10 and Excel 2010

Regards
Kanwaljit
000001-Import-3.0-EEE.rpt
000002-Import-3.0-EEE.rpt
000003-Import-3.0-EEE.rpt
000004-Import-3.0-EEE.rpt
000005-Import-3.0-EEE.rpt
Import-3.0---EEE-1.xlsm
Avatar of Kanwaljit Singh Dhunna
Kanwaljit Singh Dhunna
Flag of India image

ASKER

Thanks Sir !
I appreciate the Guidance !
Avatar of Roy Cox
Ron de Bruin as an article and sample code to do this. Have a read and see if it helps. Post back if you need help making it work
Thanks for the Comment !

I have given a process flow which I manually follow to get the desired results (in the Excel File itself). I am reproducing the same here.

     Process Flow
1      Remove the Char(12) from the.txt or.rpt files. To do this, open all the text files in Notepad ++ and replace the Char(12) with nothing. To be done before starting the Macro. So not a part of the Coding.
2      Start the Macro. This will ask for Text files to be selected for copying. Here option to select multiple files needs to be provided, though working will be one Text file at a time.
      Note : Also We can keep all the Raw Text files in Single Folder to ease working. So any option to select "Multiple Files" or "all the files in a folder" would be handy.
3      All the data in the selected.rpt file is copied and then pasted in the "Working" sheet starting from Cell A5 downwards in a single column only.
4      Clear the Clipboard and save the file.
5      Serial Number in Column B of "Working" is auto-filled till the last cell of the new used range (it will ensure updation of Format of Used Range in that column also)
6      Data is now sorted based on Column A of "Working" Sheet in ascending order.
7      Data is filtered in Column A based on the criteria Where Text File a - Begins with "---------------" or b - Contains "????*Total :"
8      Filtered rows are deleted to remove junk lines.
9      The used range is reset to remove the Dirty Range and Reset the Last Used Cell in "Working" sheet.
10      Data is again sorted based on Column B of "Working" Sheet in ascending order to restore the original sequence.
11      Now we need to Split the data in Text File Column by using formula given in Range "A01FormulaRange". It can be done in either of the following ways preferring the Faster option
a      Copy the "A01FormulaRange" and Paste the same on to entire "A02CopytoRng" and clear the clipboard
b      Copy the "A01FormulaRange" and Paste the same on to Range C6:AA6 of "Working" sheet. Now select the "A02CopyToRng" and then using Ctrl+D option copy down the entire range.
      Note -      I have not used the Text Import wizard as it takes entirety on large data.
      I have not used text to column as that will result in loss of formatting in some cases, and we had to use additional macro to change the format for each destination column.
      I have not used text to column as I might need to change the Sequence of columns and that will require additional code and processing time but can be managed easily via formula.

12      We need to remove the formula from the Range "A02CopyToRng" and keep the values only. Can be done using "SpecialPasteValuesonly"
13      Now we need to Move the data of "A03DataRange" to a new sheet so that we can save the data for future working and start working on the next Text File. It can be done in either of the following ways
a      Using ASAP Utilities Create required sheets with Specified Names using "Template" as a template. In such a case destination Sheet for Data pastin can be selected by matching the Destination Sheet name with the 1st 6 Digits of the Text File Name. I have entered two such sheets named 000004 and 000005
b      Create a New Sheet in this file itself using a macro. Here the Sheet Name will the 1st Six Digits of the Text File. Sheet "Template" Will be used as template for the new sheet.
c      Paste the data in a New Excel File which is having pre-defined sheets (as in 14a) or create the Sheets in a New Excel File (as in 14b).
      Note - Saving the data in a New Excel File might prove faster at the time of working of this file as well as saving the New File due to file size (as we will be adding new sheets to this file)
      The Text Files have been named in such a way that the Leftmost 6 characters denotes a Unique code to identify the Unit.
14      Clear the Clipboard.
15      Save this File. In case the new file is created, we need to save and close the same. Here we need to reopen the New file everytime a Text file is processed.
      Note : I don't know whether it will be better / speedier to keep the new file open or to close and reopen it everytime (in case the data is saved in a new file).
16      Now we need to reset the "Working" sheet to restart the process for the Next Text File on a Virgin Sheet. This can done in two ways.
a      Delete Everything in "Working" sheet from row 8 onwards and Reset the Last Used Cell.
b      Copy the Sheet "Original" and PasteAll on to the Sheet "Working". Delete the Dirty Range and Reset the Last Used Cell. Clear the Clipboard.
      Note : I don't know which of the above would be faster.
17      Macro selects the Next Text file to be for copying.
18      Repeat Step 3 to 17

Now
-->I am not to import and merge all the files in a Single Worksheet.
-->I have tried importing via Text Import Utility and for large files the speed is a BIG BIG issues
-->I tried different options and copying all the data in one column is the Best in Speed.

Regards
Kanwaljit
I have rephrased the Objective to provide more clarity of the issue on the face of the question itself.

Objective
I need to import data from Multiple Text files (.txt or .rpt or .lst) into Excel and then Split the same in different columns.
Text files are imported one at a time, splitted data is saved in a worksheet and then the next Text file is imported, splitted data is saved in another worksheet and so on.
I am doing it manually with a specified set of rules and process flow. I understand it could be automated and so need the help of the experts..
This is a Regex solution.  Also, I think there's probably an ADO solution, but the connection string I've tested isn't working.
Sub Q_29000204_Regex()
    Dim strData As String
    Dim rngTgt As Range
    Dim vPushData As Variant
    Dim lngRow As Long
    
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim oSM As Object
    Dim lngSM As Long
    
    Dim oFS, oTS, oFile
    Const ForReading = 1, ForWriting = 2, ForAppending = 8
    Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0
    
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = " *(\S{9})   (\S{16})  (\S.{24})   (\d\d-\d\d-\d\d\d\d)  (\d[^ ]*\d*) +(\d[^ ]*\d*) +(\d[^ ]*\d*) +(\d[^ ]*\d*) +(\S{10})"

    Set rngTgt = Worksheets("TestImport").Range("C6")
    Application.ScreenUpdating = False

    Set oFS = CreateObject("scripting.filesystemobject")
    
    For Each oFile In oFS.getfolder("C:\Users\Mark\Downloads\Q_29000204").Files
        Set oTS = oFile.OpenAsTextStream(ForReading, TristateFalse)
        strData = oTS.readall
        Debug.Print Now, "Processing: " & oFile.Name
        oTS.Close
        
        If oRE.test(strData) Then
            Set oMatches = oRE.Execute(strData)
            ReDim vPushData(0 To oMatches.Count - 1, 0 To 8)
            lngRow = 0
            For Each oM In oMatches
                With oM
                    For lngSM = 0 To .submatches.Count - 1
                        'rngTgt.Offset(0, lngSM).Value = .submatches(lngSM)
                        vPushData(lngRow, lngSM) = .submatches(lngSM)
                    Next
                    lngRow = lngRow + 1
                    'Set rngTgt = rngTgt.Offset(1)
                End With
            Next
            rngTgt.Worksheet.Range(rngTgt, rngTgt.Offset(lngRow - 1, 8)).Value = vPushData
            Set rngTgt = rngTgt.End(xlDown).Offset(1)
        Else
            Debug.Print oFile.Name, "Does not match pattern"
        End If
    Next

    Application.ScreenUpdating = True
    MsgBox "Import Completed" & vbCr & Now
End Sub

Open in new window

This routine assumes that all the input files are in the same folder.
Hi Aikimark,

Thanks a Lot for the time and efforts.
I have tested the solution and attached the resulting workbook.

Feedback :

--Speed looks good. (Speed is a major issue for me. I have files ranging from 1000 to 150000 rows. So I am finding a solution that has to be superfast primarily.)
--Data is imported in a Single Worksheet instead of each text file being saved to a different worksheet.
--Sr No is not auto filled
--Format is required to be copied till the last used row
--Gives a debug indication and focus is set to code line --->Set oTS = oFile.OpenAsTextStream(ForReading, TristateFalse)

Out of curiosity / lack of knowledge
--Whether the data is imported line by line here or entire file/Files is/are copied in the Excel file and the splitted in Columns ?
--I am doing it manually as per the process described above and it is working fairly fast only except the Manual involvement which I am trying to overcome. Even with Formula to be copied to 110000 R * 28 C area it wasn't taking more than 30 seconds for Calculation. Isn't it possible / better / faster to automate THAT process ? (No Offences Please. You are the Best judge of the solution, but I must share my experience honestly. I was doing it earlier via VBA code but that takes a LOT of TIME which was hurting and so I had to reset the Button again and compare all the alternative options. Also that is why I am interested in knowing whether the code is importing line by line or Copying the entire file/files. I have posted that code below. The Actual data contains a lot more columns)
--It was giving some morefunc.xll related message. Do I need to do something in regard to that ?

Option Explicit
Sub Import()
Dim Fn As String, F As Integer, Str As String, Arr(30) As String
Dim r As Long, c As Long, i As Long, WSn As String, WSNum As Long, WSl As String
Dim pos() As Variant, p1 As Integer, p2 As Integer
pos = Array(1, 14, 32, 60, 72, 80, 92, 104, 116, 128, 140, 152, 164, 176, 188, 194, 201, 225, 251, 260, 262) 'Check starting position 152 - there is no data in sample in this position
On Error GoTo Err
Fn = getfile()
If Len(Fn) = 0 Then
    MsgBox "File not selected", vbOKOnly
    Exit Sub
End If
    
F = FreeFile
Open Fn For Input As #F
WSn = getnum(Fn)
On Error GoTo cont
Worksheets(WSn).Select
i = MsgBox("Sheet with name: " & WSn & " already exists. Delete and continue import?", vbOKCancel)
If i = vbCancel Then Exit Sub
Worksheets(WSn).Delete
cont:
'On Error GoTo Err
WSl = Worksheets(Worksheets.Count).Name
Worksheets("Headers").Copy After:=Worksheets(WSl)
Worksheets(Worksheets.Count).Select
Worksheets(Worksheets.Count).Name = WSn
r = 3
    Do While Not EOF(F)
        Line Input #F, Str
        
        Str = Replace(Str, Chr(12), "")
        If InStr(1, Str, "Total") = 0 And InStr(1, Str, "-------------") = 0 And Len(Str) > 100 Then
            For i = 0 To UBound(pos)
                If i = UBound(pos) Then
                    p2 = Len(Str)
                Else
                    p2 = pos(i + 1) - pos(i)
                End If
                Arr(i) = Trim(Mid(Str, pos(i), p2))
            Next i
            Cells(r, 1) = r - 2
            Cells(r, 25) = Left(Arr(15), 2)
            Cells(r, 24) = Arr(6) - Arr(12)
            For i = 0 To UBound(pos)
                Cells(r, i + 2) = Arr(i)
            Next i
            r = r + 1
        End If
    Loop
    
Close #F
Range("G1").Formula = "=Sum(G3:G" & r - 1 & ")"
Range("H1").Formula = "=Sum(H3:H" & r - 1 & ")"
Range("I1").Formula = "=Sum(I3:I" & r - 1 & ")"
Range("N1").Formula = "=Sum(N3:N" & r - 1 & ")"
Range("X1").Formula = "=Sum(X3:X" & r - 1 & ")"
Exit Sub
Err:
    MsgBox "Error:" & Error(Err.Number), vbOKOnly, "Error"
   
End Sub

Public Function getfile() As String
  With Application.FileDialog(msoFileDialogOpen)
    .initialFilename = ActiveWorkbook.Path
    .Filters.Clear
    .Filters.Add "Rpt files (*.rpt)", "*.rpt", 1
    .Title = "Select file"
    .AllowMultiSelect = False
    If .Show = -1 Then getfile = .SelectedItems(1)
  End With
End Function

Public Function getnum(A As String) As String
Dim i As Integer
Dim pos As Integer
pos = 0
getnum = ""
'Find last slash
For i = 1 To Len(A)
    If Mid(A, i, 1) = "\" Then pos = i + 1
Next i
If pos > 0 Then getnum = Mid(A, pos, 6)
End Function

Open in new window

Q_29000204.xlsm
Did you change the folder literal value?
Yes , Because the files were in different folder. So I changed the Path.
ASKER CERTIFIED SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks a Lot Sir !
did you mean to accept my comment as the solution?  How did it solve your problem?