save elements of string in access table

I have some string output from a html page which has had the tags removed by a reg exp pattern

I would like to save the info like this
PAN EUROPEAN ST 1100 ABS [field1]
     
    ST1100AV    [field2]
    AUSTRIA      [field3]
    1997             [field4]
    SC26B           [field5]
    JH2SC26B*    [field6]
    VM100001     [field7]
    VM100401    [field8]


so:
example output of which i want to save model by model in a table in access:
Number of model(s) : 4
             
      
        
          
          
          
          
          
          
          
          
        
            
    
    
    PAN EUROPEAN ST 1100 ABS
      
    ST1100AV
    AUSTRIA
    1997
    SC26B 
    JH2SC26B*
    VM100001
    VM100401
  
      
    
    
    PAN EUROPEAN ST 1100 ABS
      
    ST1100AV
    UNITED KINGDOM
    1997
    SC26B 
    JH2SC26B*
    VM100001
    VM100635
  
      
    
    
    PAN EUROPEAN ST 1100 ABS
      
    ST1100AV
    EUROPE
    1997
    SC26B 
    JH2SC26B*
    VM100001
    VM100624
  
      
    
    
    PAN EUROPEAN ST 1100 ABS
      
    ST1100AV
    ITALY
    1997
    SC26B 
    JH2SC26B*
    VM100006
    VM100445

Open in new window


reg exp function
Function stripHTML(strHTML)

'Strips the HTML tags from strHTML
   Dim oRE As Object
  Dim objRegExp, strOutput
  Set oRE = CreateObject("VBScript.RegExp")

    
  oRE.IgnoreCase = True
  oRE.Global = True
  oRE.Pattern = "<(.|\n)+?>"

  'Replace all HTML tag matches with the empty string
  strOutput = oRE.Replace(strHTML, "")

  'Replace all < and > with &lt; and &gt;
  strOutput = Replace(strOutput, "<", "&lt;")
  strOutput = Replace(strOutput, ">", "&gt;")
  
  stripHTML = Trim(strOutput)    'Return the value of strOutput

  Set oRE = Nothing
End Function

Open in new window

PeterBaileyUkAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Rey Obrero (Capricorn1)Commented:
first let us try if we can create a text file from the output of the function  stripHTML with this codes


sub createHtmlTextFile()
dim fs as object, f
set fs=createoblect("scripting.filesystemobject")
set f =fs.createtextfile("c:\htmlfile.txt", true)
      f.writeline(stripHTML(strHTML))
      f.close
end sub

if that is successful, you can open and read the htmltext.txt line by line and append to a table


post back the result.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
PeterBaileyUkAuthor Commented:
ok i passed the text:   createHtmlTextFile (x)

output as attached
htmlfile.txt
0
Robert ShermanOwnerCommented:
You should probably reference the earlier question, because the original format you were getting the data in may end up being better suited to being processed line-by-line, as it contains some structure to work with rather than just relying on line counts and spacing as you would with the stripped-down version.

EDIT: Further clarification, the benefit to the data as it was in your earlier question would be the tags giving you some idea of where you are in the file...  where you get to a <tr> you know you're at the start of a new record, <td> would be the next field, etc..   You'd still need to do some text processing to get to the actual data, but having the tags could save you from things like stray line-feed characters throwing fields or records off by one, etc...
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

PeterBaileyUkAuthor Commented:
the xml was with errors, so i gave up on using the dom model as i couldnt parse anything.
0
PeterBaileyUkAuthor Commented:
I wonder if I can add a comma at the end of each line to avoid counting
0
Robert ShermanOwnerCommented:
Perhaps in the code where you strip out the HTML tags, prior to that regexp just do a regular text replace of <tr> to something symbolic of a new record and <td> to something to represent field boundaries?   Something that wouldn't occur in the data, like {record} and {field}...  

(EDIT: after you make those replacements, the regex would then strip out the rest of the tags as it did before..)

At least that way, as you are going through the file, you can read lines in until you hit {record} and then you switch into
"reading data into a record" mode.   You might want to change </tr> into {/record} as well..  

This might seem like overkill, but you'll be glad you did it this way later when you are trying to troubleshoot why a set of data with a lot of records is behaving strangely based on something you didn't account for happening in the data.
0
Robert ShermanOwnerCommented:
I'll just throw in one more quick thing which I didn't think to ask on the earlier question...  is this something you are hoping to run as a "one-off" to just bring in a bulk of data, or will this be something that runs regularly?   If it lands closer to the one-off side, then you might not need to get as fancy as I suggested.  

Of course, most of the "one-off" things I have been asked to do in the past have had a strange way of being needed again.. and again.. :D
0
PeterBaileyUkAuthor Commented:
Its a one off but the scrape is quite large, ive just added the word boundary by replacing those tags as you suggested, ive attached the output file which is i guess what you expected.
0
PeterBaileyUkAuthor Commented:
I tried getting rid of the extra spaces too but they wont disappear either
0
Rey Obrero (Capricorn1)Commented:
<I tried getting rid of the extra spaces too but they wont disappear either >

because they are not spaces but, Tab

you can eliminate them with the replace() function

replace(strLine,chr(9),"")

to remove trailing and leading spaces

trim(replace(strLine,chr(9),"") )


were you able to read the text file line by line?
0
PeterBaileyUkAuthor Commented:
havnt got that far yet, i will just get rid of the tbs
0
PeterBaileyUkAuthor Commented:
interestingly with this sHTML = Replace(strLine, Chr(9), "") applied it results in sHTML =""
0
PeterBaileyUkAuthor Commented:
its ok i found out why
0
PeterBaileyUkAuthor Commented:
its looking healthier now
htmlfile.txt
0
PeterBaileyUkAuthor Commented:
as you can see I added also the commas to separate the field data, its very close. the logic of :
anything to the left of a comma within a boundary would pick them out
0
Rey Obrero (Capricorn1)Commented:
apply the trim() function like what i did above to make the text file cleaner

trim(replace(strLine,chr(9),"") )
0
Robert ShermanOwnerCommented:
In order to clean up the excess spaces with the trim function, as Rey has suggested, you will need to read the file back in and trim each line before writing it back out to a new file.  

Set inFile = fs.OpenTextFile("c:\htmlfile.txt", ForReading)
Set outFile = fs.CreateTextFile("c:\htmlfile_trim", ForWriting)
Do while inFile.AtEndOfStream = False
  strLine = inFile.ReadLine
  outFile.WriteLine(trim(strLine)
Loop
inFile.close
outFile.close

Open in new window


From there, getting this into a table would consist of something along the lines of

Set inFile = fs.OpenTextFile("c:\htmlfile_trim.txt", ForReading)

Do while inFile.AtEndOfStream = False
  strLine = inFile.ReadLine
  if strLine = "Boundary" Then 
     ' add a new record to your table using a Recordset, etc..
     ' 

     strLine = inFile.ReadLine         ' blank line after boundary
     strLine = inFile.ReadLine         ' this should be our first field

     ' set record.fieldname = strLine

    strLine = inFile.ReadLine        ' second field, repeat accordingly...

    ' then, save the record to the table 

Loop

Open in new window


That's the general gist of it...   alot depends on the line numbering, if an extra line appears in the middle of a record, or a line is missing that record will have data in the wrong fields or be missing a field..  the "boundary" markers help at least limit such issues to impacting individual records and not the entire import.
0
PeterBaileyUkAuthor Commented:
I am getting error 91 here so something is not quite right syntax or declaration wise
Set inFile = fs.OpenTextFile("c:\htmlfile.txt", ForReading)
Set outFile = fs.CreateTextFile("c:\htmlfile_trim", ForWriting)
0
Robert ShermanOwnerCommented:
Code 91 usually caused by missing declaration or initialization of object variables..  The code I posted was meant to build on the earlier code, so these need to happen at some earlier point in the code:

dim fs as object, inFile, outFile
set fs=createobject("scripting.filesystemobject")

Open in new window


Oh, and I neglected to add an extension to the output filename, suggest adding a .txt extension to the output filename.  (But that wouldn't cause an error)
0
PeterBaileyUkAuthor Commented:
its bypassed that error now it says error 5 on the set
0
Robert ShermanOwnerCommented:
Try changing

Set outFile = fs.CreateTextFile("c:\htmlfile_trim", ForWriting)

to

Set outFile = fs.CreateTextFile("c:\htmlfile_trim.txt", True)
0
PeterBaileyUkAuthor Commented:
I made the minor txt correction

Sub CleanTextFile()
Dim fs As Object, inFile, outFile
Set fs = CreateObject("scripting.filesystemobject")

Set inFile = fs.OpenTextFile("n:\htmlfile.txt", ForReading)
Set outFile = fs.CreateTextFile("n:\htmlfile_trim.txt", ForWriting)
Do While inFile.AtEndOfStream = False
  strLine = inFile.ReadLine
  outFile.WriteLine (Trim(strLine))
Loop
inFile.Close
outFile.Close

End Sub
0
PeterBaileyUkAuthor Commented:
ok i found a reference that was missing attached is the output file, still looks odd though
0
PeterBaileyUkAuthor Commented:
0
Rey Obrero (Capricorn1)Commented:
post the codes you used to produce the text file.
might be better if you upload a copy of the db with the relevant objects.
0
PeterBaileyUkAuthor Commented:
cutdown attached, opening the form and clicking the button will start the process.
ee.mdb
0
PeterBaileyUkAuthor Commented:
it run the main routine then in there is the one called submit which calls the text routines.
0
Rey Obrero (Capricorn1)Commented:
@peter,

replace this line
sHTML = Replace(sHTML, "</td>", ",</td>")

with

sHTML = Replace(sHTML, "</td>", ",")

run the codes and see the outcome of the text file.
0
PeterBaileyUkAuthor Commented:
is it where its removed the right hand closing TD
0
Rey Obrero (Capricorn1)Commented:
yes, just replace it with a comma
0
PeterBaileyUkAuthor Commented:
its created a respectable txtfile now, are the spaces or whatever they are at the beginning going to be an issue bringing it back into access?
htmlfinish.txt
0
Rey Obrero (Capricorn1)Commented:
create these two subs

create a clean text file
Sub CleanHtmlText()
Dim fs As Object, inFile, outFile, strLine As String, strNewText
Set fs = CreateObject("scripting.filesystemobject")

Set inFile = fs.OpenTextFile(CurrentProject.Path & "\htmlfile.txt", ForReading)
Set outFile = fs.createtextfile(CurrentProject.Path & "\htmlfile_trim.txt", ForWriting)
Do While inFile.AtEndOfStream = False
    strLine = inFile.ReadLine
    strLine = Replace(strLine, "Boundary", "")
    If Len(strLine) > 0 Then
        strNewText = strNewText & strLine & vbCrLf
    End If
     
Loop
  outFile.Write strNewText

inFile.Close
outFile.Close

End Sub

Open in new window


append to table

Sub AppendToTable()
Dim s As String, j As Integer, strArr() As String, x
Dim rs As DAO.Recordset
Set rs = CurrentDb.OpenRecordset("tblbailey")
Open CurrentProject.Path & "\htmlfile_trim.txt" For Input As #1
Do Until EOF(1)
    Line Input #1, s
    'remove the trailing comma
    s = Replace(s, ",", "")
    ReDim Preserve strArr(j)
    strArr(j) = s
    j = j + 1
    If j = 8 Then
        rs.AddNew
        rs!Field1 = strArr(0)
        rs!Field2 = strArr(1)
        rs!Field3 = strArr(2)
        rs!Field4 = strArr(3)
        rs!Field5 = strArr(4)
        rs!Field6 = strArr(5)
        rs!Field7 = strArr(6)
        rs!Field8 = strArr(7)
        rs.Update
        
         j = 0
    End If
  
Loop
Close #1

End Sub

Open in new window

0
PeterBaileyUkAuthor Commented:
its saving but with the boundary word removed its missing the fields. in the TblModelLookup table the first four rows are how it should be the rows after is how it actually saved.

uve attached a cutdown and the txt files
ee-.mdb
storefail.JPG
htmlfinish.zip
0
PeterBaileyUkAuthor Commented:
the remove odd takes away that odd decimal at the beginning and the odd first boundary word.

in this data the chassis number (vin) can be one of four models
vin  JH2SC26B2VM100323, id 4028
= PAN EUROPEAN ST 1100 ABS from the four countries
0
PeterBaileyUkAuthor Commented:
the vin and vehicl id are passed to the submit
0
Rey Obrero (Capricorn1)Commented:
read your original post...
0
PeterBaileyUkAuthor Commented:
Yes i see its my fault i hadnt specified for multiple records ok i will close the question as the appendto table is working perfectly.  any objection to me sharing the points between you and i think it was Richard?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.