NVIT
asked on
Extract field headers and values between quotes
Note to admin: I have posted to VBScript topic before (I think). Why doesn't it work now? so, I put VBA, even though I want a VBS or VBScript *sigh*
Here is a sample of a text file with records.
I'd like to extract the fields and their values
So it becomes...
Here is a sample of a text file with records.
I'd like to extract the fields and their values
<ET FP="Public\ACAD\PROJECTS\1400\In\112315A\Pages from Coconut Island Infrastructure and Bldg Improvements As-Builts.pdf" SZ="392051" LB="2040856" AT="1448320810" CT="1448320810" MT="1448320810" ST="9" AR="0" XA="8192" IN="0" OST="0" IIMG="9406"/>
<ET FP="Public\ACAD\PROJECTS\1400\In\112316A" SZ="11366959" LB="2041241" AT="1479924188" CT="1479924186" MT="1479924188" ST="10" AR="0" XA="8192" IN="0" OST="0" IIMG="9407"/>
<ET FP="Public\ACAD\PROJECTS\1400\In\112316A\SCHEME B.pdf" SZ="11366959" LB="2041242" AT="1479924188" CT="1479924188" MT="1479924188" ST="9" AR="0" XA="8192" IN="0" OST="0" IIMG="9408"/>
<ET FP="Public\ACAD\PROJECTS\1400\In\112316B" SZ="4968231" LB="2052356" AT="1479942325" CT="1479942322" MT="1479942325" ST="10" AR="0" XA="8192" IN="0" OST="0" IIMG="9409"/>
<ET FP="Public\ACAD\PROJECTS\1400\In\112316B\Nohea-PHASE 2 SITE PLAN.PDF" SZ="1539304" LB="2052357" AT="1479942325" CT="1479942325" MT="1479942325" ST="9" AR="0" XA="8192" IN="0" OST="0" IIMG="9410"/>
<ET FP="Public\ACAD\PROJECTS\1400\In\112316B\Project Location.pdf" SZ="3428927" LB="2053863" AT="1479942325" CT="1479942325" MT="1479942325" ST="9" AR="0" XA="8192" IN="0" OST="0" IIMG="9411"/>
So it becomes...
Public\ACAD\PROJECTS\1400\In\112315A\Pages from Coconut Island Infrastructure and Bldg Improvements As-Builts.pdf,392051,2040856,1448320810,1448320810,1448320810,9,0,8192,0,0,9406
Public\ACAD\PROJECTS\1400\In\112316A,11366959,2041241,1479924188,1479924186,1479924188,10,0,8192,0,0,9407
Public\ACAD\PROJECTS\1400\In\112316A\SCHEME B.pdf,11366959,2041242,1479924188,1479924188,1479924188,9,0,8192,0,0,9408
Public\ACAD\PROJECTS\1400\In\112316B,4968231,2052356,1479942325,1479942322,1479942325,10,0,8192,0,0,9409
Public\ACAD\PROJECTS\1400\In\112316B\Nohea-PHASE 2 SITE PLAN.PDF,1539304,2052357,1479942325,1479942325,1479942325,9,0,8192,0,0,9410
Public\ACAD\PROJECTS\1400\In\112316B\Project Location.pdf,3428927,2053863,1479942325,1479942325,1479942325,9,0,8192,0,0,9411
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Just to clarify, the code I posted earlier produced an output file exactly matching the sample desired output that was posted by NVIT.
~bp
~bp
ASKER
@Aikimark and Bill: Not sure how I should grade this one because I did say I wanted the header, which Bill's kind of provides but Aiki doesn't. I was hoping for a cleaner header line, i.e. just the field names and commas. I can live with that. On the other hand, although I didn't post a sample, Bill's missed the top record, which Aiki's provides. So, I'll divide it equally. To show, here's the top 2 lines:
Bill's:
" SZ=, LB=, AT=, CT=, MT=, ST=, AR=, XA=, IN=, OST=, IIMG=
Public,813139942623,66,146 1377486,12 17736680,1 461377486, 10,0,8192, 0,0,1
Aiki's:
,813139942623,64,0,0,0,10, 16,8192,0, 0,0
Public,813139942623,66,146 1377486,12 17736680,1 461377486, 10,0,8192, 0,0,1
@JesterToo: Yours was close but the 2nd-to last field missed the comma. Thus making the last field 07 instead of 0,7
Bill's:
" SZ=, LB=, AT=, CT=, MT=, ST=, AR=, XA=, IN=, OST=, IIMG=
Public,813139942623,66,146
Aiki's:
,813139942623,64,0,0,0,10,
Public,813139942623,66,146
@JesterToo: Yours was close but the 2nd-to last field missed the comma. Thus making the last field 07 instead of 0,7
NVIT,
Curios, where did you request a header record, I don't see it which was why I didn't plan on one.
~bp
Curios, where did you request a header record, I don't see it which was why I didn't plan on one.
~bp
In your question text, you wrote:
Since this question is now closed, I recommend that you open a new question and ask about a header line. It would be helpful to those experts to have a link to this question thread.
I'd like to extract the fields and their valuesAs Bill Prew noted in his most recent comment, you neither asked for a header nor showed any header in your output example.
Since this question is now closed, I recommend that you open a new question and ask about a header line. It would be helpful to those experts to have a link to this question thread.
Did the input text you posted accurately represent the actual input data?
ASKER
I was not clear. My apologies...
Is field and header the same? I stated in the title but not in the body.
The fields / headers would be on the left of the values. e.g.
i.e.
Is field and header the same? I stated in the title but not in the body.
The fields / headers would be on the left of the values. e.g.
FP="Public\ACAD\PROJECTS\1400\In\112316A" SZ="11366959" LB=
i.e.
FP, SZ, LB, ...
Give this try:
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Const TristateTrue = -1
Const TristateFalse = 0
Const TristateUseDefault = -2
strInFile = "B:\ee\EE28990550\in.txt"
strOutFile = "B:\ee\EE28990550\out.txt"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInFile = objFSO.OpenTextFile(strInFile, ForReading, False, TriStateUseDefault)
Set objOutFile = objFSO.OpenTextFile(strOutFile, ForWriting, True, TriStateUseDefault)
Set objRegex = New RegExp
objRegex.Global = True
blnHeader=True
Do Until objInFile.AtEndOfStream
strLine = objInFile.ReadLine
If blnHeader = True Then
blnHeader = False
strTemp = ""
objRegex.Pattern = " ([a-zA-Z0-9]+?)="
Set objMatches = objRegex.Execute(strLine)
For Each objMatch In objMatches
If strTemp = "" Then
strTemp = objMatch.SubMatches(0)
Else
strTemp = strTemp & "," & objMatch.SubMatches(0)
End If
Next
objOutFile.WriteLine strTemp
End If
strTemp = ""
objRegex.Pattern = "\""(.+?)\"""
Set objMatches = objRegex.Execute(strLine)
For Each objMatch In objMatches
If strTemp = "" Then
strTemp = objMatch.SubMatches(0)
Else
strTemp = strTemp & "," & objMatch.SubMatches(0)
End If
Next
objOutFile.WriteLine strTemp
Loop
objInFile.Close
objOutFile.Close
~bp
Open in new window
Note: It might even be possible to do something like this for the entire file, but my initial regex tests weren't promising.