Link to home
Start Free TrialLog in
Avatar of x_terminat_or_3
x_terminat_or_3

asked on

Parsing problem

Hi all

I'm facing a problem


This is wat I need to do:

In HTML documents there are the following tags present:

<meta name="xdescription" lang="xx" content="xxxxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxxx">
<meta name="xlevel" content="xxx">
<meta name="xthreadid" content="xxxx">

multiple entries are possible, but with different values for the lang parameter

I need to be able to read these entries and store them in memory for easy processing.


Any ideas for the parser part & how to store them in memory?

A complete solution is worth 1.500pts A-grade



Good luck



With kind regards


x_terminat_or_3
Avatar of Shauli
Shauli

'in form declaration
Option Explicit
Private Type parseHtml
    strName() As String
    strLang() As String
    strContent() As String
End Type
Dim MetaTags As parseHtml

'to call the sub
Private Sub Command1_Click()
Call ParseHtmlFile("c:\My Documents\test.html")
End Sub

'sub to parse and store in variables
'The variables are ARRAYS as in the type above:
'MetaTags.strName()
'MetaTags.strLang()
'MetaTags.strContent()

Private Sub ParseHtmlFile(ByVal sbFile As String)
Dim htmLine As String, lineSplit() As String, fracSplit() As String, cLoop As Integer, cCount As Integer
Open sbFile For Input As #1
    Do Until EOF(1)
        Line Input #1, htmLine
            lineSplit = Split(htmLine, Chr(34) & " ", -1)
            If UBound(lineSplit) > 0 Then
                For cLoop = 0 To UBound(lineSplit)
                    If InStr(1, lineSplit(cLoop), "name") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strName(cCount)
                        MetaTags.strName(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "lang") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strLang(cCount)
                        MetaTags.strLang(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "content") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strContent(cCount)
                        MetaTags.strContent(cCount) = fracSplit(1)
                    End If
                Next cLoop
            End If
            cCount = cCount + 1
        Loop
Close #1
End Sub

'Just for testing, you can locate three listboxes on your form (List1, List2 and List3) and use the code below instead of the code in the command button click event above:

Private Sub Command1_Click()
Dim c As Integer

Call ParseHtmlFile("c:\My Documents\test.html")

For c = 0 To UBound(MetaTags.strName)
    List1.AddItem MetaTags.strName(c)
Next c

For c = 0 To UBound(MetaTags.strLang)
    List2.AddItem MetaTags.strLang(c)
Next c

For c = 0 To UBound(MetaTags.strContent)
    List3.AddItem MetaTags.strContent(c)
Next c
End Sub

S

SOLUTION
Avatar of learning_t0_pr0gram
learning_t0_pr0gram

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of x_terminat_or_3

ASKER

Hi guys

I think a little clarification is in order.

The only thing that interest us are the following meta tags:

<meta name="xdescription" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxx">
<meta name="xkeywords" lang="xx" content="xxxx">

the three tags above can be present more then one time with different values for the lang parameter (and the content param as well)

<meta name="xlevel" content="xx">


<meta name="xthreadid" content="xx"> this tag is optional (may not be present)

After consideration, for the storage;
kindly load these in the following structure

type xProps
  lang as string
  title as string
  description as string
  keywords as string
end type

type xMetas
  docprops() as xProps
  threadid as long
  level as long
end type


I'm sorry for all the work, but I'm not that good anymore with string parsing because I need to do projects in many different languages (vb, html, javascript, asp, php,... and lately they descided to start teaching me assembly as well --- argh!)

If it's ok with you guys, the "input" is found in strBuffer
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial