Link to home
Start Free TrialLog in
Avatar of bluue s
bluue s

asked on

Macro to convert data in .txt into excel format

Macro to convert data in .txt into excel format

Attached is the .txt (input) and .xlsx (output) files.

-I have also stated the type of data in row 4.
-Not all the data in .txt needs to be transferred in xlsx, only those stated in .xlsx (output) needs to be considered.
-Ignore the logic of the numbers. This is just to show what and where the data needs to be placed.
PDF-Page-1-Example_INPUT.txt
PDF-Page-1-Example_OUTPUT.xlsx
Avatar of Hakan Yılmaz
Hakan Yılmaz
Flag of Türkiye image

For simplification and generalization, please explain a little about the structure of your TXT file
  1. Which values are fixed?
  2. Which parts are repetitive?
  3. Where the record start and end?
  4. Are values always presented in single line?
I've analyzed your txt and marked field names with +++, values with --- and groups by ///. File at the attachment.
Company data looks unstructured.
Company Data
Delivery Adress    ABCDE FGH IJKLM L.P. PTY(T)  
494 MILESTONE RD.,                  
AMP.ADDRESS NAME NUMBER,  
10100 CITY NAME.                     
COUNTRY NAME                           
Unloading Point
                             

Open in new window

PDF-Page-1-Example_INPUT-HAKAN.txt
Will there be multiple sets of data in each txt file or just one?
SOLUTION
Avatar of Ejgil Hedegaard
Ejgil Hedegaard
Flag of Denmark image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bluue s
bluue s

ASKER

Thanks Ejgil, but the output is not what I want.

What I meant is those in the headers shall stay as headers and those that are supposed to be the data shall be below the headers.
Meaning:

Total Numbers      JUZZ Order      ABC - Order Line      Sales Unit      BU Order Number  and so on ................
123456789             4567891               0004A                        USA1234      19U0025916            and so on ................
123456780           4567822           0005A                        USA1235      19U0043800            and so on ................
123456751           4523456           0009A                        USA1236      19U0032111            and so on ................


All the .txt files will be in a specified folder.
Thus, when the macro is initiated, I will open the folder and all the data in this .txt files are read and tabulated as above accordingly.
Avatar of bluue s

ASKER

@Hakan,

    Which values are fixed? = Those in the headers are fixed.

    Which parts are repetitive? = Those in the headers are considered repetitive since they are fixed.

    Where the record start and end?
= See my reply to Ejgil -  https://www.experts-exchange.com/questions/29139469/Macro-to-convert-data-in-txt-into-excel-format.html?anchorAnswerId=42822467#a42822467

    Are values always presented in single line?
= No = see my reply to Ejgil -  https://www.experts-exchange.com/questions/29139469/Macro-to-convert-data-in-txt-into-excel-format.html?anchorAnswerId=42822467#a42822467


@Aikimark,

Will there be multiple sets of data in each txt file or just one?  
= There will be multiple .txt files - See my reply to Ejgil - https://www.experts-exchange.com/questions/29139469/Macro-to-convert-data-in-txt-into-excel-format.html?anchorAnswerId=42822467#a42822467 

Thanks !
Let me restate: Will there be more than one row's worth of data within any given input file?
Could you please clarify what this means?
In your excel file, you took "ABCDE FGH IJKLM L.P. PTY(T)" part as "Company Data" and "COUNTRY NAME" as "Delivery Address".
Company Data
Delivery Adress    ABCDE FGH IJKLM L.P. PTY(T)  
494 MILESTONE RD.,                  
AMP.ADDRESS NAME NUMBER,  
10100 CITY NAME.                     
COUNTRY NAME                           
Unloading Point
                             

Open in new window

Following questions were about the structure of your TXT file.
Which values are fixed? For example, is "494 Milestone RD.," above fixed or variable? Is the number of rows between the Delivery Address and COUNTRY NAME always same?
Which parts are repetitive? I mean, are there any single record that contains multiple PackInformation or Prodinformation or Some - Info?
Where the record start and end? Is a single TXT file always contains only 1 record for your output file?
Are values always presented in single line? Can COUNTRY NAME span to 2 rows?
Avatar of bluue s

ASKER

@aikimark
Let me restate: Will there be more than one row's worth of data within any given input file?
= Yes, there will be more than one row of data
Avatar of bluue s

ASKER

@Hakan

Which values are fixed? For example, is "494 Milestone RD.," above fixed or variable?
= Variable - They can be changed.
Is the number of rows between the Delivery Address and COUNTRY NAME always same?
= Yes

Which parts are repetitive? I mean, are there any single record that contains multiple PackInformation or Prodinformation or Some - Info?
=You can IGNORE the following fields:

ABR - Agreement Basic Report Checklist
NR: _________
Revceived _________ Finalized_______________  Agreement Accepted (Yes/No)_________
PackInformation
Prodinformation
Some - Info
Drop Point
USAB6USA0521729AMS
Unloading Point
TYPE Calculation Data

which means those fields and data NOT found in the .xlsx OUTPUT, they are NOT required.

Where the record start and end?
= N (after Separate Category)

Is a single TXT file always contains only 1 record for your output file?
= Yes

Are values always presented in single line? Can COUNTRY NAME span to 2 rows?
= Yes, single line; Country Name is 1 row.
Dear @Bluue
We don't know which row corresponds to a field and which row corresponds to a data in your TXT file. We must, first, clarify this. I've made assumptions while analyzing your file.
We may further assume that data will always appear in the same rows of each TXT file. But here we are trying to provide you a lasting solution.

I personally don't believe that this information is well structured. This part is not following the same harmony of previous Field/Data pairs. Therefore requires special attention.
This looks like a multiline address value. We have to understand its boundaries.
Company Data
Delivery Adress    ABCDE FGH IJKLM L.P. PTY(T)  
494 MILESTONE RD.,                  
AMP.ADDRESS NAME NUMBER,  
10100 CITY NAME.                     
COUNTRY NAME                           
Unloading Point
                             

Open in new window

Avatar of bluue s

ASKER

Company Data
Delivery Adress    ABCDE FGH IJKLM L.P. PTY(T)  
494 MILESTONE RD.,                  
AMP.ADDRESS NAME NUMBER,  
10100 CITY NAME.                     
COUNTRY NAME                           
Unloading Point

Open in new window


From Company Data to COUNTRY NAME is always 6 rows.
What do you mean
headers shall stay as headers and those that are supposed to be the data shall be below the headers.

That is how it is.
Headers in Sheet1 are not changed, and data is below.
I have just changed on the column specification, because the headers does not match the txt file.

All the .txt files will be in a specified folder.
So there will be multiple files.
One dataset per file, or more in each file.
Avatar of bluue s

ASKER

Please see attached.

Meaning I have multiple .txt files in a folder.
Macro initiated and read the txt files data and place them in the excel format as per "Output" tab.
Row 2 and onwards would be the data from the .txt files
and so on.....
PDF-Page-1-Example_OUTPUT.xlsx
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Here is a routine that does the importing:
Sub Q_29139469(ByVal parmPath As String)
    Dim vCols As Variant
    vCols = Array("Total Numbers", "JUZZ Order", "ABC - Order Line", "Sales Unit", "BU Order Number", "BU Customer Number", "Product Number", "Package", "TYP", "Weight per piece", "Storage Unit", "SPC", "Huge", "A", "B", "Storing Date", "Storage Time", "Internal BU Number", "Company Order", "Location", "Internal Pack", "Ordered Qty", "Pick Qty", "Delivered Qty", "Pick Date", "Pick Time", "Picked by", "Packed as", "Gross Weight", "Product Weight", "Net Weight", "Packed by", "Date", "Some", "Sequence", "Date Finished", "Time Finished", "CMT", "Shipment No.", "Order Priority", "Company Data", "Delivery Address", "Range", "Maximum Category", "Separate Category")
    
    Dim strData As String
    Dim vItem As Variant
    Dim lngLoop As Long
    
    Dim dicCol As Object
    Set dicCol = CreateObject("scripting.dictionary")
    For lngLoop = LBound(vCols) To UBound(vCols)
        dicCol(vCols(lngLoop)) = lngLoop + 1
    Next
    
    Dim rngTgT As Range
    Dim wks As Worksheet
    Set wks = ThisWorkbook.ActiveSheet
    Set rngTgT = wks.Cells(1, 1)
    rngTgT.Resize(1, UBound(vCols) + 1).Value = vCols
    
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim oSM As Object
    Dim lngSM As Long
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "(" & "\b" & Join(vCols, "\b|\b") & "\b" & ")\r\n([^\r]*)"
    
    Dim oFS, oTS, oFile
    Const ForReading = 1, ForWriting = 2, ForAppending = 8
    Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0
    Set oFS = CreateObject("scripting.filesystemobject")
    For Each oFile In oFS.getfolder(parmPath).Files
        If oFile.Name Like "*.txt" Then
            'Debug.Print oFile.Name
            Set oTS = oFS.OpenTextFile(oFile.Path, ForReading, True, TristateFalse)
            strData = oTS.readall
            oTS.Close
            
            If oRE.test(strData) Then
                Set oMatches = oRE.Execute(strData)
                For Each oM In oMatches
                    With oM
                        If .submatches(0) = "Total Numbers" Then
                            Set rngTgT = rngTgT.Offset(1)
                        End If
                        For lngSM = 0 To .submatches.Count - 1
                            rngTgT.Cells(1, dicCol(.submatches(0))).Value = .submatches(1)
                        Next
                    End With
                Next
            End If
        
        End If
    Next

End Sub

Open in new window

You would invoke it like this:
Q_29139469 "C:\users\mark\downloads\Q_29139469"

Open in new window

NOTE: I put a couple of copies of your input file in this folder.

It seems to provide the results you seek.  Please test in your environment.

I did assume that there would always be a Total Numbers text at the start of each set of data.  The other pairs of text could appear in any order.
Dear @aikimark, this is beautiful but can't catch all columns. If I understand right, following should be considered.
  1. "Shipment No." doesn't have word boundary on the right side.
  2. "Delivery Adress" isn't followed by CRLF.
  3. "Storage Time" should be "Storing Time"
  4. "Ordered QTY." doesn't have word boundary on the right side.
  5. "Pick Qty." doesn't have word boundary on the right side.
  6. "Delivered Qty." doesn't have word boundary on the right side.

Dear @bluee, actually, @aikimark's code should be the answer for your problem. I suggest you to tidy your source files for the remaining fields.
It would have been most helpful to have a truly representational input file.
Thanks, Hakan.
Avatar of bluue s

ASKER

@Aikimark,
I can't initiate the macro, and i think because I don't have  the folder "C:\users\mark\downloads\Q_29139469"

I tried to change the macro to adjust the destination folder but could not find in the macro text.

Could you change to ""C:\temp\package txt\" ?

Thanks !
Dear @bluue
@aikimark provided a generic import method. It will import all txt files in the folder specified by parameter. You may use it wherever you want in your codes.
Example below will use this method to import all txt files under specified folders.
Sub ImportTXTs()
     Q_29139469 ActiveWorkbook.Path 'this line will import all txt files in the same folder with active workbook.
     Q_29139469 "C:\users\mark\downloads\Q_29139469" 'this line will import all txt files under this folder.
     Q_29139469 "D:\" 'this line will import all txt files under the root of D:\ drive.
End Sub

Open in new window

You may want to ask a path from the user and pass it to this method to import from that folder.
Again, thanks to Hakan for the great explainer comment.  The path of the input files is passed into the routine as a parameter.  I did my testing from the immediate window.  You will invoke the code as best fits into your production environment.

I looked at your Excel workbook and see that you only captured the "COUNTRY NAME" line from your input file.  I don't think that is right.
Avatar of bluue s

ASKER

hi all

I asked is because
i copied aikimark's code in the module and run (Alt+F8), but no macro came out for me to select:

User generated image
Avatar of bluue s

ASKER

@aikimark

I looked at your Excel workbook and see that you only captured the "COUNTRY NAME" line from your input file.  I don't think that is right.

Do you mean the header "Delivery Address" has the data COUNTRY NAME is not correct ?
If so, this is what I need. I don't need to the full address. I just need the country name to know where it is delivered to that's all.
Dear @aikimark,
Could you extend your array to 2 dimensions? Together with the fieldnames, we may define linefeed counts.
1 for field names, 1 for prescribing linefeeds to skip before data row
This may hurt your Join method, but I think it will be able to capture all data.

We will start at 1st linefeed after Company Data, read until next linefeed.
So this will capture Delivery Adress    ABCDE FGH IJKLM L.P. PTY(T)

We will start at 4th linefeed after Delivery Adress, read until next linefeed.
So this will capture COUNTRY NAME

As this might happen in OCR situations, I think, because of the comlex structure of a PDF file, 1st row of delivery address is being considered in the sameline with Delivery Address field name.
Company Data
Delivery Adress    ABCDE FGH IJKLM L.P. PTY(T)  
494 MILESTONE RD.,                  
AMP.ADDRESS NAME NUMBER,  
10100 CITY NAME.                     
COUNTRY NAME                           
Unloading Point

Open in new window

@bluue

Since the routine has a parameter, it won't show up in the dialog window.

Simple test:
1. Position your cursor to the Immediate Window (Ctrl+G when you are looking at the VB Project)
2. Type a statement like the one I or Hakan showed you to invoke the routine with a string parameter that points to the folder containing the txt files.
3. Press Enter
4. Look at the results in the worksheet

Almost as simple test:
1. Create a new routine, like Hakan showed
2. Modify it so that it only invokes the routine once, changing the parameter to point to the folder containing the txt files
3. Go back to your worksheet
4. Bring up the dialog
5. Select and Run your new routine.
6. Look at the results in the worksheet
@Hakan

The period character, following the text, does constitute a 'word boundary'.  It is a word character (\w) followed by a non-word character.  I think the pattern will need to change, especially for the address and the Storage Time field.
Dear @aikimark
It is Storing Time in the example txt, not Storage Time. It is working when I correct.
Because some of our field names ending with a non-word character, the \b in Shipment No.\b is trying to find position where we completely have the field name Shipment No. followed by a word character. But we have only field name followed by empty space or linefeed. In our case, we don't have such boundary.
Also need to mention the dot means any character and we really have dot in place.
Avatar of bluue s

ASKER

@Aikimark

I encounter an error after doing the following:

in the window pane I typed" C:\Users\xxxxxxx\Downloads\Folder 1\package txt\"

Compile Error:
Expected: Line number or statement or end of statement

OK | Help
That string should be the parameter of the routine I posted.  I don't see a routine name.
Avatar of bluue s

ASKER

Yes, I pasted the exact routine that you mentioned in:
 https://www.experts-exchange.com/questions/29139469/Macro-to-convert-data-in-txt-into-excel-format.html?anchorAnswerId=42823260#a42823260

so do you mean I am wrong to enter
"C:\Users\xxxxxxx\Downloads\Folder 1\package txt\"  in the window pane?

Then what should be the exact statement i must type in the window pane so as to invoke the macro if my .txt files are found in "C:\Users\xxxxxxx\Downloads\Folder 1\package txt\" ?
Then what should be the exact statement
This was in the second code snippet in my earlier comment.  If this doesn't work, remove the trailing backslash character in your string.
Q_29139469 "C:\Users\xxxxxxx\Downloads\Folder 1\package txt\"

Open in new window

Will there always be a "Company Data" and "Delivery Adress" in the input data for every set of data?

I've tweaked the regex pattern in this version of the code.  It seems to parse everything but the company name and country.  Depending on your answer to my question (above), I should be able to get it all with the next tweak.
Sub Q_29139469(ByVal parmPath As String)
    Dim vCols As Variant
    vCols = Array("Total Numbers", "JUZZ Order", "ABC - Order Line", "Sales Unit", "BU Order Number", "BU Customer Number", "Product Number", "Package", "TYP", "Weight per piece", "Storage Unit", "SPC", "Huge", "A", "B", "Storing Date", "Storing Time", "Internal BU Number", "Company Order", "Location", "Internal Pack", "Ordered QTY", "Pick Qty", "Delivered Qty", "Pick Date", "Pick Time", "Picked by", "Packed as", "Gross Weight", "Product Weight", "Net Weight", "Packed by", "Date Finished", "Time Finished", "Date", "Some", "Sequence", "CMT", "Shipment No", "Order Priority", "Delivery Adress", "Company Data", "Range", "Maximum Category", "Separate Category")
    
    Dim strData As String
    Dim vItem As Variant
    Dim lngLoop As Long
    
    Dim dicCol As Object
    Set dicCol = CreateObject("scripting.dictionary")
    For lngLoop = LBound(vCols) To UBound(vCols)
        dicCol(vCols(lngLoop)) = lngLoop + 1
    Next
    
    Dim rngTgT As Range
    Dim wks As Worksheet
    Set wks = ThisWorkbook.ActiveSheet
    Set rngTgT = wks.Cells(1, 1)
    rngTgT.Resize(1, UBound(vCols) + 1).Value = vCols
    
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM As Object
    Dim oSM As Object
    Dim lngSM As Long
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "(" & "\b" & Join(vCols, "\b|\b") & "\b" & ")\.?\s+([^\r]*)\r\n"
    
    Dim oFS, oTS, oFile
    Const ForReading = 1, ForWriting = 2, ForAppending = 8
    Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0
    Set oFS = CreateObject("scripting.filesystemobject")
    For Each oFile In oFS.getfolder(parmPath).Files
        If oFile.Name Like "*.txt" Then
            'Debug.Print oFile.Name
            Set oTS = oFS.OpenTextFile(oFile.Path, ForReading, True, TristateFalse)
            strData = oTS.readall
            oTS.Close
            
            If oRE.test(strData) Then
                Set oMatches = oRE.Execute(strData)
                For Each oM In oMatches
                    With oM
                        If .submatches(0) = "Total Numbers" Then
                            Set rngTgT = rngTgT.Offset(1)
                        End If
                        For lngSM = 0 To .submatches.Count - 1
                            rngTgT.Cells(1, dicCol(.submatches(0))).Value = .submatches(1)
                        Next
                    End With
                Next
            End If
        
        End If
    Next

End Sub

Open in new window

Avatar of bluue s

ASKER

@aikimark

Will there always be a "Company Data" and "Delivery Adress" in the input data for every set of data?

=Yes
Avatar of bluue s

ASKER

@aikimark,

your code in  https://www.experts-exchange.com/questions/29139469/Macro-to-convert-data-in-txt-into-excel-format.html?anchorAnswerId=42825172#a42825172
doesn't work.

I have 5 .txt in the folder. But only 1 row came out.
And even then, 1st row should contain headers but headers and data are all in 1 row.
There should be 1 row of header and 5 rows of data.
You need to post an actual input file.
Avatar of bluue s

ASKER

my actual input file can be found in my question :
https://www.experts-exchange.com/questions/29139469/Macro-to-convert-data-in-txt-into-excel-format.html

file name = "PDF-Page-1-Example_INPUT.txt"
That file has been edited.  It is not an original file.  It contains misspellings and contrived data, such as "COUNTRY NAME".
Avatar of bluue s

ASKER

That file has been edited.  It is not an original file.  It contains misspellings and contrived data, such as "COUNTRY NAME".

= it is the original txt file. Only the original data is changed for privacy and confidentiality.
the original data is changed
Thus my assertion, "That file has been edited."
Avatar of bluue s

ASKER

But the type and nature of the data is the same.
Even the position of the data is the same.
For example, if the real data for the country name was "United States of America", am I wrong to just type "COUNTRY NAME" as a generic data ?
If country name is an obstacle for you, please ignore it, meaning, you just capture all the rest of the data and leave the country name out.
Depending on your text editor, the file may change.  I've experienced people who posted files, created on a *IX system, that had been edited.  The resulting file changed the end-of-line sequence from line-feed to carriage-return-line-feed.  My regex pattern looked for CrLf sequences, which didn't exist in the unedited file.

Also, there is at least one spelling error, which probably doesn't exist in the unedited file.

In the end it comes down to us being able to have a conversation in this thread about your problem and my solution to the problem.  If I can't recreate the error(s) you area experiencing, then I probably won't be able to actually solve the problem.
Avatar of bluue s

ASKER

I am using a normal wordpad for .txt.

Where is the spelling error, if there is spelling error, just ignore it.
I just need to have the consolidation data in excel, basically to have the raw data to do analysis as right now i am typing manually by looking at the pdf files, so whatever minor truncated or spelling error, i can take care of it.
Do you have Notepad++ ?
Avatar of bluue s

ASKER

No I don't.
What am i suppose to do if i were to download it via https://notepad-plus-plus.org/download/v7.6.4.html ?
What is the reason that I download notepad ++? how does it help me in this question ??
and which one would you recommend to download ?
Notepad++ is a better text editor.  It will let you see the non-visible characters and it won't change the end-of-line characters when you open/save the file.  It will allow you to compare documents if you install the Compare plug-in.  You can also change your view to split-screen mode, allowing a visual comparison of two documents, side-by-side.
Avatar of bluue s

ASKER

Are you able to do edit the macro to read the multiple .txt files ?
It's ok to miss out or truncated. At least I have most of the data for analysis.
If row numbers won't change, you may use this straightforward code. You need to change the targetfolder to your folder first.
Sub readlinebyline()
    Dim fso As Object
    Dim targetfolder As Object
    Dim iterfile As Object
    Dim txtreader As Object
    Dim txtcontents() As String
    Dim ws As Worksheet
    Dim targetrow As Integer
    
    Set ws = ActiveSheet
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set targetfolder = fso.GetFolder("C:\Users\hakan.yilmaz\Desktop")
    targetrow = 1
    
    For Each iterfile In targetfolder.Files
        
        Interaction.DoEvents
        
        If Right(iterfile.Name, 4) = ".txt" Then

            targetrow = targetrow + 1

            Set txtreader = fso.OpenTextFile(iterfile)

            Do
                ReDim Preserve txtcontents(txtreader.Line)
                txtcontents(UBound(txtcontents)) = txtreader.ReadLine()
            Loop While Not txtreader.AtEndOfStream
 
            If UBound(txtcontents) >= 109 Then
                ws.Cells(targetrow, 1) = txtcontents(5) 'Total Numbers
                ws.Cells(targetrow, 2) = txtcontents(9) 'JUZZ Order
                ws.Cells(targetrow, 3) = txtcontents(11) 'ABC-Order Line
                ws.Cells(targetrow, 4) = txtcontents(15) 'Sales Unit
                ws.Cells(targetrow, 5) = txtcontents(17) 'BU Order Number
                ws.Cells(targetrow, 6) = txtcontents(19) 'BU Customer Number
                ws.Cells(targetrow, 7) = txtcontents(21) 'Product Number
                ws.Cells(targetrow, 8) = txtcontents(23) 'Package
                ws.Cells(targetrow, 9) = txtcontents(25) 'TYP
                ws.Cells(targetrow, 10) = txtcontents(27) 'Weigh per piece
                ws.Cells(targetrow, 11) = txtcontents(30) 'Storage Unit
                ws.Cells(targetrow, 12) = txtcontents(32) 'SPC
                ws.Cells(targetrow, 13) = txtcontents(34) 'Huge
                ws.Cells(targetrow, 14) = txtcontents(36) 'A
                ws.Cells(targetrow, 15) = txtcontents(38) 'B
                ws.Cells(targetrow, 16) = txtcontents(40) 'Storing Date
                ws.Cells(targetrow, 17) = txtcontents(42) 'Storage Time
                ws.Cells(targetrow, 18) = txtcontents(44) 'Internal BU Number
                ws.Cells(targetrow, 19) = txtcontents(46) 'Company Order
                ws.Cells(targetrow, 20) = txtcontents(48) 'Location
                ws.Cells(targetrow, 21) = txtcontents(50) 'Internal Pack
                ws.Cells(targetrow, 22) = txtcontents(52) 'Ordered Qty
                ws.Cells(targetrow, 23) = txtcontents(54) 'Pick QTY
                ws.Cells(targetrow, 24) = txtcontents(56) 'Delivered QTY
                ws.Cells(targetrow, 25) = txtcontents(58) 'Pick Date
                ws.Cells(targetrow, 26) = txtcontents(60) 'Pick Time
                ws.Cells(targetrow, 27) = txtcontents(62) 'Picked by
                ws.Cells(targetrow, 28) = txtcontents(65) 'Picked as
                ws.Cells(targetrow, 29) = txtcontents(67) 'Gross Weight
                ws.Cells(targetrow, 30) = txtcontents(69) 'Product Weight
                ws.Cells(targetrow, 31) = txtcontents(71) 'Net Weight
                ws.Cells(targetrow, 32) = txtcontents(73) 'Packed by
                ws.Cells(targetrow, 33) = txtcontents(76) 'Date
                ws.Cells(targetrow, 34) = txtcontents(78) 'Some
                ws.Cells(targetrow, 35) = txtcontents(80) 'Sequence
                ws.Cells(targetrow, 36) = txtcontents(82) 'Date Finished
                ws.Cells(targetrow, 37) = txtcontents(84) 'Time Finished
                ws.Cells(targetrow, 38) = txtcontents(88) 'CMT
                ws.Cells(targetrow, 39) = txtcontents(90) 'Shipment No.
                ws.Cells(targetrow, 40) = txtcontents(92) 'Order Priority
                ws.Cells(targetrow, 41) = txtcontents(94) 'Company Data
                ws.Cells(targetrow, 42) = txtcontents(98) 'Delivery Address
                ws.Cells(targetrow, 43) = txtcontents(105) 'Range
                ws.Cells(targetrow, 44) = txtcontents(107) 'Maximum Category
                ws.Cells(targetrow, 45) = txtcontents(109) 'Separate Category
            End If

            txtreader.Close

            Set txtreader = Nothing

        End If
    
    Next iterfile
    
    Set ws = Nothing
    Set targetfolder = Nothing
    Set fso = Nothing
End Sub

Open in new window

@Hakan

Performance Hint: You might try a txtreader.ReadAll and then Split() the results, rather than ReDim-ing an array.

What do you do if there are multiple sets of (row) data in each input file?
Dear @aikimark,
Thanks for your suggestion, this is a quite nice replacement for ReadLine method.
This solution doesn't consider the possibility of multiple records in a single file or any other complexity.
To deal with multiline data, user need to manually combine the rows and map them to the corresponding columns as I did.
@bluue

What utility are you using to extract the text layer from the PDF?
Avatar of bluue s

ASKER

@Hakan, thanks for the macro. it is very simple and fast but miss out record 3 and 5 (i have place 5 .txt files and only 1, 2 and 4 came out)
It is strange as all the fields are in fixed position. What could have gone wrong ?

@aikimark, just go to pdf, save as *plain text (.txt).
If you have a lot of PDFs, you will want to use a utility that can process a lot of files in batch mode.

Are there any options for the format of the txt file when you save it?
Are you using Adobe Reader or Adobe Acrobat?
Dear @bluue
There are two possibilities.
Last 4 character of the file name must be ".txt". If it is not, the code will not read the file.
The file must contain minimum 109 lines. If it doesn't have, you must at least see an empty row in your worksheet for this file.
I'm not sure but as @aikimark mentioned, this may be caused by missing new line (CR+LF) characters.

Dear @aikimark
Maybe we can read all file at once, then replace any CR or LF with CR+LF and then split by CR+LF?
Avatar of bluue s

ASKER

@aikimark

If you have a lot of PDFs, you will want to use a utility that can process a lot of files in batch mode.
= What utility do you recommend ?

Are there any options for the format of the txt file when you save it?
=
Pdf Document (.pdf)
Plain text (.txt)
MS Powerpoint presentation (.pptx)
MS Word (.docx)
MS Excel (.xlsx)
Pdf A/Document (.pdf)
Pdf X/Document (.pdf)


Are you using Adobe Reader or Adobe Acrobat?
=PDF X-Change Editor to save the .txt files

I have Adobe Reader XI.

Is there any difference ?
Avatar of bluue s

ASKER

Dear @bluue
There are two possibilities.
Last 4 character of the file name must be ".txt". If it is not, the code will not read the file.
The file must contain minimum 109 lines. If it doesn't have, you must at least see an empty row in your worksheet for this file.
I'm not sure but as @aikimark mentioned, this may be caused by missing new line (CR+LF) characters.

@Hakan

all are name .txt
Yes there is an empty row. Row 3 and 5 are empty.
There is no header but this is not an issue, i can just paste the headers myself. important thing is the data below the headers.
Avatar of bluue s

ASKER

The file must contain minimum 109 lines. If it doesn't have, you must at least see an empty row in your worksheet for this file.

How to have a work around that even if less than 109 lines, the data will be out ?
@Hakan
Maybe we can read all file at once, then replace any CR or LF with CR+LF
If the file is from a *ix system, I can change the regex pattern to match it.  It is possible to make the change you suggest, but it would be difficult to do this correctly if we don't know the actual contents of the file.  For instance, if there is a mixture of the three end-of-line character sequences, such a global Replace() invocation would not help.  We would have to first change the CrLf sequences, if any exist, into something else before replacing the individual Cr or Lf characters.  Then the 'something else' strings would need to be changed back into CrLf.
@bluue

Have you installed Notepad++ ?

What utility do you recommend ?
* In past projects I've used PDF2txt.  It cost about $35
* XPDF is a good utility: http://www.xpdfreader.com/
You can use the Powershell script at the bottom of this article:
* PDF Helper is a good ($29) utility: https://www.pdf-helper.com/pdf-to-text/
* PDFMate is free: https://www.pdfmate.com/pdf-to-text.html
* Depending on the number of PDF files you need to convert, you might be able to use an online conversion site.
Examples
* I may be inclined to use Python
Examples
* You can upload your PDFs to your Google Drive and use Google's conversion utilities to extract the text.
https://support.google.com/drive/answer/176692?co=GENIE.Platform%3DDesktop&hl=en
https://www.wired.com/2010/06/google-docs-can-now-convert-images-and-pdfs-to-text/
Avatar of bluue s

ASKER

@aikimark
Have you installed Notepad++ ?

No not yet.
Possible to have a work around for  those less than 109 lines?

Thanks for the suggestion on pdf to text - but which one would you recommend for script idiot like me - just need to click and run and then all pdf will turn to txt files ? Thanks !
No not yet.
Let me know when you do.

recommend for script idiot like me
Select or evaluate items at the top of the list.

How many PDF files need to be converted?
Is this a one-time conversion or a regularly scheduled conversion?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bluue s

ASKER

@Hakan

Nothing happens when I run the macro.
Pls advise.
Avatar of bluue s

ASKER

How many PDF files need to be converted?
Is this a one-time conversion or a regularly scheduled conversion?

= a regularly scheduled conversion
Did you change the path on line 58?
Avatar of bluue s

ASKER

Did you change the path on line 58?

Yes i did.
I just noticed you have two different "Location" data. I've changed the code to return 2nd.
Specify the folder path by excluding the last "\" and make sure you have non-empty txt files in the folder you specifiy.
Sub allatonce()

    Dim fso As Object, targetfolder As Object, iterfile As Object, txtreader As Object
    Dim txtcontents As String
    Dim regexsettings As Object, regexoperator As Object, regexmatchcol As Object, regexmatches As Object
    Dim regexiteration As Variant
    Dim targetws As Object
    Dim targetwsrow As Integer
    
    Set regexsettings = CreateObject("Scripting.Dictionary")
    regexsettings.Add "Total Numbers", "\bTotal Numbers\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "JUZZ Order", "\bJUZZ Order\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "ABC - Order Line", "\bABC - Order Line\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Sales Unit", "\bSales Unit\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "BU Order Number", "\bBU Order Number\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "BU Customer Number", "\bBU Customer Number\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Product Number", "\bProduct Number\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Package", "\bPackage\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "TYP", "\bTYP\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Weight per piece", "\bWeight per piece\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Storage Unit", "\bStorage Unit\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "SPC", "\bSPC\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Huge", "\bHuge\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "A", "\bA\b\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "B", "\bB\b\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Storing Date", "\bStoring Date\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Storage Time", "\bStoring Time\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Internal BU Number", "\bInternal BU Number\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Company Order", "\bCompany Order\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Location", "Location[\S\s]*\bLocation\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Internal Pack", "\bInternal Pack\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Ordered Qty", "\bOrdered QTY\.\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Pick Qty", "\bPick Qty\.\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Delivered Qty", "\bDelivered Qty\.\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Pick Date", "\bPick Date\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Pick Time", "\bPick Time\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Picked by", "\bPicked by\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Packed as", "\bPacked as\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Gross Weight", "\bGross Weight\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Product Weight", "\bProduct Weight\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Net Weight", "\bNet Weight\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Packed by", "\bPacked by\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Date", "\bDate\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Some", "\bSome\s*\n([^\n\r]*)[\n\r]" 'special treatment
    regexsettings.Add "Sequence", "\bSequence\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Date Finished", "\bDate Finished\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Time Finished", "\bTime Finished\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "CMT", "\bCMT\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Shipment No.", "\bShipment No.\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Order Priority", "\bOrder Priority\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Company Data", "\bCompany Data\s*Delivery Adress\s*([^\n\r]*)[\n\r]" 'special treatment
    regexsettings.Add "Delivery Address", "\bDelivery Adress(?:.*[\r\n]){4}([^\r\n]*)[\r\n]" 'special treatment
    regexsettings.Add "Range", "\bRange\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Maximum Category", "\bMaximum Category\s*([^\n\r]*)[\n\r]"
    regexsettings.Add "Separate Category", "\bSeparate Category\s*([^\n\r]*)[\n\r]"

    Set fso = CreateObject("Scripting.FileSystemObject")
    Set targetfolder = fso.GetFolder("C:\Hakan")
    Set targetws = ActiveSheet
    Set regexoperator = CreateObject("VBScript.RegExp")
    
    targetwsrow = 1

    For Each iterfile In targetfolder.Files
        
        Interaction.DoEvents
        
        If Right(iterfile.Name, 4) = ".txt" Then

            targetwsrow = targetwsrow + 1

            Set txtreader = fso.OpenTextFile(iterfile)
            txtcontents = txtreader.ReadAll()
            txtreader.Close
            Set txtreader = Nothing

            Set regexmatches = CreateObject("Scripting.Dictionary")
            For Each regexiteration In regexsettings
                regexoperator.Pattern = regexsettings(regexiteration)
                Set regexmatchcol = regexoperator.Execute(txtcontents)
                If regexmatchcol.Count > 0 Then regexmatches.Add regexiteration, regexmatchcol(0).SubMatches(0)
            Next
            For i = 1 To targetws.UsedRange.Columns.Count
                targetws.Cells(targetwsrow, i) = regexmatches(targetws.Cells(1, i).Value)
            Next i
            Set regexmatches = Nothing

        End If
    
    Next iterfile
    
    Set regexoperator = Nothing
    Set targetws = Nothing
    Set targetfolder = Nothing
    Set fso = Nothing
    Set regexsettings = Nothing

End Sub

Open in new window

Avatar of bluue s

ASKER

@ Hakan, thanks for the code.
But just don't understand why it doesn't work after i have changed the directory and also the code

when i mean code, for example :
the generic text i gave was Maximum Category but in actual text is "Maximum Products" so I just changed to

regexsettings.Add "Maximum Products", "\bMaximum Products\s*([^\n\r]*)[\n\r]" 

Open in new window


accordingly. This concept applies the same for the rest of the text.

Nothing happens when i click on the macro.

What could have gone wrong ??
I used a dictionary object to store field name & regex patterns like @aikimark did.
Here in my solution, first parameter, the Key, holds the field names in your Excel file. Second parameter, the Item, holds the RegEx patterns to find corresponding field values in txt file.
I am actually here to guide you by answering your questions. But here I spent more time to explain than you spent for understanding.
If you want to learn VBA, please search google or EE for some tutorials.
@bluue

I'm still waiting for you to install/use Notepad++

I'm still waiting for you to evaluate and install/use some PDF text extraction utility.

==================
@Hakan
I spent more time to explain than you spent for understanding
After enough questions, you'll get a feel for this.
Avatar of bluue s

ASKER

@Hakan

I used a dictionary object to store field name & regex patterns like @aikimark did.
Here in my solution, first parameter, the Key, holds the field names in your Excel file. Second parameter, the Item, holds the RegEx patterns to find corresponding field values in txt file.
I am actually here to guide you by answering your questions. But here I spent more time to explain than you spent for understanding.
If you want to learn VBA, please search google or EE for some tutorials.

I am just trying to find out why the 2nd code doesn't do anything as i thought i might have done something wrong.
i would love to learn VBA but i have no time, coping with 2 young kids and work. Yes i will go through proper training and research tutorials than to ask here for sure. Thks for the time.
Avatar of bluue s

ASKER

@Aikimark
I'm still waiting for you to install/use Notepad++

I'm still waiting for you to evaluate and install/use some PDF text extraction utility.

It's easier said than done, let me find time as i am toggling between 2 pc - 1 is mine and 1 is company.
i can only install new software in my own pc.
Avatar of bluue s

ASKER

Thanks Ejgil for your great efforts - you know exactly what I wanted at once without much clarification, I didn't expect the macro to work until i re-try your macro again, although company data and delivery address is overlapping but the bulk of the data is in so it is of very great help ! The ability to choose the folder for the text files is a big bonus as I don't have to change the directory in the macro.
*NOTE: the file that you posted on 19 March is the solution, i accidentally also marked the one on 17 march which is not correct.

Thanks Hakan for the code.
for those that are less than 109 lines, i just retweaked it and change the line number for each field.
just paste as values in excel file to know the row number - just a quick workaround for now.

Thanks aikimark for the tips of notepad ++ and pdf utility.
In the sample file, the text "ABCDE FGH IJKLM L.P. PTY(T)", in the same line as "Delivery Address" was used as "Company Data" and the text "COUNTRY NAME" was "Delivery Address", so I used that, expecting it made sense with the real data.
It will be easy to change if you want something else.
It could also be all the lines after "Delivery Address" in the txt file.