Solved

parsing HTML code

Posted on 2004-08-26
9
227 Views
Last Modified: 2010-05-02
hi - has anyone had experience in taking HTML code - searching for all possible graphics paths and changing those paths then resaving the HTML code-?
further explaination...
    I am creating a "packager" that the "graphic guys and gals"  can create a html page using graphics from several sources (ie paths) across our network
    When the HTML file is put in "the packager" it would seach though HTML code and find all external references -copy those files locally and strip the paths off within the HTML code - In other words the HTML would run properly if all the graphic files were in the same directory as the HTML code ( which is what I want)
   The program then makes a cab file of this HTML file and all  graphic files

* what is this used for?? well we send all types of graphic formats to be displayed on remote advertising machines - but these are all single files (mostly .swf s) - we need to keep this methodology ( 1 file ) so thats why the "packaging")
 
Ive got the interface done - they can drag and drop files and pick from file chooser

Ive got the cab file maker done

I can 'prolly seach the HTML text for obvious "scr=" and "BACKGROUND=" and " .jpg" and " .gif" ect and copy then strip out the paths but just wondered if anyone has done this kind of thing before ??

(OR IS THERE A WAY THAT A WEB PAGE CAN BE PACKAGED WITH ALL GRAPHICS CONTAINED WITH IN ?) obviously im not a web page programmer :)

thanks in advance


0
Comment
Question by:bczingo
  • 4
  • 3
  • 2
9 Comments
 
LVL 52

Expert Comment

by:Carl Tawn
ID: 11907971
I've done a couple of similar things.  One was XML based, converting absolute paths to relative paths before packaging the document and uploading it to the web.  Done a fair bit of tag stripping and the such with HTML pages.  

Haven't come across a simple way of doing it. It usually just a case of reading the page line by line and looking for the start and end of tags and looking for "src=", "background=" and the rest.

Probably not what you wanted to hear :o)

0
 
LVL 3

Author Comment

by:bczingo
ID: 11908180
Not all I wanted but I appreciate the comment Carl
- what other tags are there that would point to external files?
0
 
LVL 52

Expert Comment

by:Carl Tawn
ID: 11910353
not many really.  

src and background obviously for images.  maybe <object> tags and <param> tags if you embed flash movies or anything like that.  possibly "includes" if you use external stylesheets or js files.

also, background-image (i think) if you use styles rather than just the "background" attribute.
0
 
LVL 7

Expert Comment

by:Burbble
ID: 11919207
Do you have an example HTML file (or at least part of one)?

-Burbble
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 3

Author Comment

by:bczingo
ID: 11931273
Burbble
example wouldn't help really - could be anything at all - pick a page - any page :)


thanks again carl

(I thought I had posted this comment before ) gues ii\ didn't hit submit


0
 
LVL 7

Accepted Solution

by:
Burbble earned 250 total points
ID: 11931600
Ok, could you give an example of a line of HTML and how it should be changed? I'm just not entirely sure how you want to alter the path is all :-)

Like... <IMG SRC="http://www.mysite.com/images/file.jpg"> changes to <IMG SRC="images/file.jpg">?

Sorry about the confusion :/

-Burbble
0
 
LVL 3

Author Comment

by:bczingo
ID: 11947689
well I have it done :) $$$$$

not full proof but gets job done splendidly
- if anyones interested in the rest of the code give me a hollar

heres just the parse stuff
as you can see if theres other tags to look other then just
    " src"   " background"  " background-image"
 then just put in another case statement
           

Private Type theFilesInStr
    startLoc As Long
    endLoc As Long
    FileName As String
End Type
Public tF() As theFilesInStr
Private Const WEBSPACE = "%20"
Private theString As String
Private backupString As String

Public Function parseFindAttachments(sentfile As String) As Boolean
    Dim t As Double
    Dim namePart As String
    Dim fileStartLoc As Long
    Dim fileLen As Long
    Dim strLoc As Long
    Dim equalsLoc As Long
    Dim quoteLoc1 As Long
    Dim quoteLoc2 As Long
    Dim lastStrLoc As Long
    Dim startLook As Long
    Dim numFiles As Long
    Dim whatLookMain As String
    Dim whatLook As String
    Dim fsoFileSys As New FileSystemObject
    Dim theFile As String
    Dim whatCount As Long

    Dim f
    Set f = fsoFileSys.OpenTextFile(sentfile, ForReading, False)
    theString = f.ReadAll
    backupString = theString
    'the string has the file!!
    numFiles = -1
    whatCount = 0
    Do
        whatCount = whatCount + 1
        Select Case whatCount
            Case 1
                whatLookMain = " src"
            Case 2
                whatLookMain = " background"
            Case 3
                whatLookMain = " background-image"
        End Select
        startLook = 1
        strLoc = 0
        Do
            equalsLoc = 0
            quoteLoc1 = 0
            quoteLoc2 = 0
            whatLook = whatLookMain
            strLoc = InStr(startLook, theString, whatLook, vbTextCompare)
            If strLoc > 0 Then
                startLook = strLoc + 1
                whatLook = "="
                equalsLoc = InStr(startLook, theString, whatLook, vbTextCompare)
                If equalsLoc > 0 Then
                    startLook = equalsLoc + 1
                    whatLook = Chr(34)                     ' quote sign
                    quoteLoc1 = InStr(startLook, theString, whatLook, vbTextCompare)
                    If quoteLoc1 > 0 Then
                        startLook = quoteLoc1 + 1
                        whatLook = Chr(34)
                        quoteLoc2 = InStr(startLook, theString, whatLook, vbTextCompare)
                        If quoteLoc2 > 0 Then
                            numFiles = numFiles + 1
                            fileStartLoc = quoteLoc1 + 1
                            fileLen = (quoteLoc2 - quoteLoc1) - 1
                            theFile = Mid(theString, fileStartLoc, fileLen)
                            ReDim Preserve tF(numFiles)
                            tF(numFiles).startLoc = fileStartLoc
                            tF(numFiles).endLoc = fileLen
                            namePart = fs.getNamePart(theFile)
                            If InStr(namePart, WEBSPACE) > 0 Then
                                tF(numFiles).FileName = replaceChars_TSB(namePart, WEBSPACE, " ")
                            Else
                                tF(numFiles).FileName = namePart
                            End If
                         '   Debug.Print theFile
                        '    Debug.Print tF(numFiles).FileName
                            startLook = quoteLoc2 + 1
                        End If
                    End If
                End If
            End If
        Loop While quoteLoc2 > 0
    Loop While whatCount < 4

    f.Close

End Function

0
 
LVL 7

Expert Comment

by:Burbble
ID: 11953731
Glad you got it solved :)

I don't think my comment should be the accepted answer though...

-Burbble
0
 
LVL 3

Author Comment

by:bczingo
ID: 11958275
'prolly not :)
do't know how to change it though
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Most everyone who has done any programming in VB6 knows that you can do something in code like Debug.Print MyVar and that when the program runs from the IDE, the value of MyVar will be displayed in the Immediate Window. Less well known is Debug.Asse…
Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now