parsing HTML code

hi - has anyone had experience in taking HTML code - searching for all possible graphics paths and changing those paths then resaving the HTML code-?
further explaination...
    I am creating a "packager" that the "graphic guys and gals"  can create a html page using graphics from several sources (ie paths) across our network
    When the HTML file is put in "the packager" it would seach though HTML code and find all external references -copy those files locally and strip the paths off within the HTML code - In other words the HTML would run properly if all the graphic files were in the same directory as the HTML code ( which is what I want)
   The program then makes a cab file of this HTML file and all  graphic files

* what is this used for?? well we send all types of graphic formats to be displayed on remote advertising machines - but these are all single files (mostly .swf s) - we need to keep this methodology ( 1 file ) so thats why the "packaging")
Ive got the interface done - they can drag and drop files and pick from file chooser

Ive got the cab file maker done

I can 'prolly seach the HTML text for obvious "scr=" and "BACKGROUND=" and " .jpg" and " .gif" ect and copy then strip out the paths but just wondered if anyone has done this kind of thing before ??


thanks in advance

1 Solution
Carl TawnSystems and Integration DeveloperCommented:
I've done a couple of similar things.  One was XML based, converting absolute paths to relative paths before packaging the document and uploading it to the web.  Done a fair bit of tag stripping and the such with HTML pages.  

Haven't come across a simple way of doing it. It usually just a case of reading the page line by line and looking for the start and end of tags and looking for "src=", "background=" and the rest.

Probably not what you wanted to hear :o)

bczingoAuthor Commented:
Not all I wanted but I appreciate the comment Carl
- what other tags are there that would point to external files?
Carl TawnSystems and Integration DeveloperCommented:
not many really.  

src and background obviously for images.  maybe <object> tags and <param> tags if you embed flash movies or anything like that.  possibly "includes" if you use external stylesheets or js files.

also, background-image (i think) if you use styles rather than just the "background" attribute.
Do you have an example HTML file (or at least part of one)?

bczingoAuthor Commented:
example wouldn't help really - could be anything at all - pick a page - any page :)

thanks again carl

(I thought I had posted this comment before ) gues ii\ didn't hit submit

Ok, could you give an example of a line of HTML and how it should be changed? I'm just not entirely sure how you want to alter the path is all :-)

Like... <IMG SRC=""> changes to <IMG SRC="images/file.jpg">?

Sorry about the confusion :/

bczingoAuthor Commented:
well I have it done :) $$$$$

not full proof but gets job done splendidly
- if anyones interested in the rest of the code give me a hollar

heres just the parse stuff
as you can see if theres other tags to look other then just
    " src"   " background"  " background-image"
 then just put in another case statement

Private Type theFilesInStr
    startLoc As Long
    endLoc As Long
    FileName As String
End Type
Public tF() As theFilesInStr
Private Const WEBSPACE = "%20"
Private theString As String
Private backupString As String

Public Function parseFindAttachments(sentfile As String) As Boolean
    Dim t As Double
    Dim namePart As String
    Dim fileStartLoc As Long
    Dim fileLen As Long
    Dim strLoc As Long
    Dim equalsLoc As Long
    Dim quoteLoc1 As Long
    Dim quoteLoc2 As Long
    Dim lastStrLoc As Long
    Dim startLook As Long
    Dim numFiles As Long
    Dim whatLookMain As String
    Dim whatLook As String
    Dim fsoFileSys As New FileSystemObject
    Dim theFile As String
    Dim whatCount As Long

    Dim f
    Set f = fsoFileSys.OpenTextFile(sentfile, ForReading, False)
    theString = f.ReadAll
    backupString = theString
    'the string has the file!!
    numFiles = -1
    whatCount = 0
        whatCount = whatCount + 1
        Select Case whatCount
            Case 1
                whatLookMain = " src"
            Case 2
                whatLookMain = " background"
            Case 3
                whatLookMain = " background-image"
        End Select
        startLook = 1
        strLoc = 0
            equalsLoc = 0
            quoteLoc1 = 0
            quoteLoc2 = 0
            whatLook = whatLookMain
            strLoc = InStr(startLook, theString, whatLook, vbTextCompare)
            If strLoc > 0 Then
                startLook = strLoc + 1
                whatLook = "="
                equalsLoc = InStr(startLook, theString, whatLook, vbTextCompare)
                If equalsLoc > 0 Then
                    startLook = equalsLoc + 1
                    whatLook = Chr(34)                     ' quote sign
                    quoteLoc1 = InStr(startLook, theString, whatLook, vbTextCompare)
                    If quoteLoc1 > 0 Then
                        startLook = quoteLoc1 + 1
                        whatLook = Chr(34)
                        quoteLoc2 = InStr(startLook, theString, whatLook, vbTextCompare)
                        If quoteLoc2 > 0 Then
                            numFiles = numFiles + 1
                            fileStartLoc = quoteLoc1 + 1
                            fileLen = (quoteLoc2 - quoteLoc1) - 1
                            theFile = Mid(theString, fileStartLoc, fileLen)
                            ReDim Preserve tF(numFiles)
                            tF(numFiles).startLoc = fileStartLoc
                            tF(numFiles).endLoc = fileLen
                            namePart = fs.getNamePart(theFile)
                            If InStr(namePart, WEBSPACE) > 0 Then
                                tF(numFiles).FileName = replaceChars_TSB(namePart, WEBSPACE, " ")
                                tF(numFiles).FileName = namePart
                            End If
                         '   Debug.Print theFile
                        '    Debug.Print tF(numFiles).FileName
                            startLook = quoteLoc2 + 1
                        End If
                    End If
                End If
            End If
        Loop While quoteLoc2 > 0
    Loop While whatCount < 4


End Function

Glad you got it solved :)

I don't think my comment should be the accepted answer though...

bczingoAuthor Commented:
'prolly not :)
do't know how to change it though
