Link to home
Start Free TrialLog in
Avatar of Zihif
ZihifFlag for United States of America

asked on

OpenAsTextStream a .doc file in VBA (Excel)

Hey all!

What I need to do is to open a .doc file, loop through each line, extract specific information and save that to a .txt file.  I have about 5,000-6,000 folders that I need to loop through, each containing anywhere from 5 to 200 files that I need this sub to run through.  Which means, this isn't a process that can be managed manually.

Below is the code that I have... it works Perfectly when trying to open .txt files... However, my current need is to have this work on .doc files as well.  I've tried .OpenAsTextStream(1, -1) and (1, 0), however neither one of those work.  I've tried Open objItem for Input as #1, but that doesn't work... Any other options here?  Any eclectic solutions out there?

Sub BID_SUMMARY_CREATION()
Dim FSO1, FSO2, FSO3, FSO4
Dim objItem As Object
Dim parFolder
Dim objFolder
Dim WLine As String
Dim VNUM As Long
Dim EndUser As String
Dim Dumpfile, RF, A1, A2
Dim Default1 As Integer
Dim BID As String
Dim ZDrive As String
Dim WriteTrue As String
Dim ExtID As String

ZDrive = "Z:\"
Set FSO2 = CreateObject("scripting.filesystemobject")
Set FSO3 = CreateObject("scripting.filesystemobject")
Set FSO4 = CreateObject("scripting.filesystemobject")

For Default1 = 2 To 5698
    BID = Worksheets("sheet1").Cells(Default1, 1).Value
    Debug.Print "BID: " & BID
    objFolder = Dir(ZDrive & BID & "*", vbDirectory)
    Debug.Print "objFolder: " & objFolder
    If Len(objFolder) >= 4 Then
    Set parFolder = FSO2.getfolder(ZDrive & objFolder)
    Set FSO1 = CreateObject("scripting.filesystemobject")
    Set A1 = FSO1.createtextfile(parFolder & "\Bid Summary.txt", True)
        For Each objItem In parFolder.Files
                Debug.Print "parFolder: " & parFolder
                Debug.Print "objItem: " & objItem
                Set Dumpfile = FSO3.getfile(objItem)
                Set RF = Dumpfile.openastextstream(1, -2)
                While Not RF.atendofstream
                    WLine = RF.readline
                    Debug.Print "WLine: " & WLine
                    If Left(WLine, 7) = "Version" Then
                        VNUM = Trim(Right(WLine, 4))
                        Debug.Print "VNUM: " & VNUM
                    End If
                    If Left(WLine, 8) = "Customer" Then
                        EndUser = Trim(Right(WLine, (Len(WLine) - 9)))
                        Debug.Print "EndUser: " & EndUser
                    End If
                    If Trim(WLine) = "Transaction Summary Table" Then
                        WriteTrue = "True"
                    ElseIf Trim(WLine) = "End Transaction Summary Table" Then
                        WriteTrue = "False"
                    End If
                    If WriteTrue = "True" Then
                        If Left(WLine, 4) <> "    " Then
                        If Trim(WLine) <> "" Then
                            A1.writeline WLine & "      V " & VNUM
                            Debug.Print "Wrote Line to file."
                        End If
                        End If
                    ElseIf Trim(WLine) = "End Transaction Summary Table" Then
                        A1.writeline WLine & "      V " & VNUM
                        Debug.Print "Wrote Line to file."
                    End If
                Wend
                RF.Close
        Next objItem
    Else
        Worksheets("sheet1").Cells(Default1, 2).Value = "No Folder Found!"
    End If
    VNUM = Empty
    EndUser = ""
    BID = ""
    WriteTrue = ""
    A1.Close
Next
End Sub

'=================
Thanks in advance for looking at this!
Zihif

Avatar of nmcdermaid
nmcdermaid

A DOC file is not a text file its a binary file.

If you want to export lines of text from a DOC file, write a Word VBA Macro.
Avatar of Zihif

ASKER

nmcdermaid

Well, if it's a binary file then I should be able to Open objItem for Binary as #1 and read it from there?  I've tried this method and it still just returns encrypted lines.

Any other thoughts or suggestions?

Zihif
Its saved in whatever binary file format that Microsoft Word is in. If you can get the spec of the .DOC format from Microsoft then you can happily open it as binary, parse it using the spec, then get the text out.

However even if you were to crack the format, you would find that it would most likely change in the next version of Word. That's why a Word macro is a good idea... you can just paste the macro into the next version of Word and it'll work.

If you want you can (from your existing code) use Word to save the current doc as a text file then process that text file, then go on to the next doc

If you are interested then I can whip up some code.





Avatar of Zihif

ASKER

nmcdermaid

I wrote the code in Word, and I'm still running into the same problem.  What method should I use to open the .doc files so I can read'em?

Zihif
ASKER CERTIFIED SOLUTION
Avatar of nmcdermaid
nmcdermaid

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Zihif

ASKER

Holy Crap!!

Sorry I haven't responded in awhile... I got busy going here, doing this, blah blah blah...

nmcdermaid

Thanks a ton for your help on this.

Zihif