Parsing problem

Posted on 2004-10-15
Last Modified: 2008-01-09
Hi all

I'm facing a problem

This is wat I need to do:

In HTML documents there are the following tags present:

<meta name="xdescription" lang="xx" content="xxxxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxxx">
<meta name="xlevel" content="xxx">
<meta name="xthreadid" content="xxxx">

multiple entries are possible, but with different values for the lang parameter

I need to be able to read these entries and store them in memory for easy processing.

Any ideas for the parser part & how to store them in memory?

A complete solution is worth 1.500pts A-grade

Good luck

With kind regards

Question by:x_terminat_or_3
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 19

Expert Comment

ID: 12322959
'in form declaration
Option Explicit
Private Type parseHtml
    strName() As String
    strLang() As String
    strContent() As String
End Type
Dim MetaTags As parseHtml

'to call the sub
Private Sub Command1_Click()
Call ParseHtmlFile("c:\My Documents\test.html")
End Sub

'sub to parse and store in variables
'The variables are ARRAYS as in the type above:

Private Sub ParseHtmlFile(ByVal sbFile As String)
Dim htmLine As String, lineSplit() As String, fracSplit() As String, cLoop As Integer, cCount As Integer
Open sbFile For Input As #1
    Do Until EOF(1)
        Line Input #1, htmLine
            lineSplit = Split(htmLine, Chr(34) & " ", -1)
            If UBound(lineSplit) > 0 Then
                For cLoop = 0 To UBound(lineSplit)
                    If InStr(1, lineSplit(cLoop), "name") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strName(cCount)
                        MetaTags.strName(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "lang") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strLang(cCount)
                        MetaTags.strLang(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "content") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strContent(cCount)
                        MetaTags.strContent(cCount) = fracSplit(1)
                    End If
                Next cLoop
            End If
            cCount = cCount + 1
Close #1
End Sub

'Just for testing, you can locate three listboxes on your form (List1, List2 and List3) and use the code below instead of the code in the command button click event above:

Private Sub Command1_Click()
Dim c As Integer

Call ParseHtmlFile("c:\My Documents\test.html")

For c = 0 To UBound(MetaTags.strName)
    List1.AddItem MetaTags.strName(c)
Next c

For c = 0 To UBound(MetaTags.strLang)
    List2.AddItem MetaTags.strLang(c)
Next c

For c = 0 To UBound(MetaTags.strContent)
    List3.AddItem MetaTags.strContent(c)
Next c
End Sub



Assisted Solution

learning_t0_pr0gram earned 100 total points
ID: 12323578
another way would be to go to the page.. with the inet control or something..

Dim xName() as string, xLang() as string, xContent() as string

Private Sub Command1_Click()

    Dim Page as string

    Page = Inet1.OpenURL("")

    ReDim xName(0)
    ReDim xLang(0)
    ReDim xContent(0)

    Call ParseInfo(Page)
End Sub

Private Sub ParseInfo(Page as string)

  Dim Tmp() As String, pStart as long, pEnd as long

  Tmp() = Split(Page, "<meta name=")

  For a = 1 to UBound(Tmp)
      ReDim Preserve xName(a - 1)
      ReDim Preserve xLang(a - 1)
      ReDim Preserve xContent(a - 1)
      xName(a - 1) = Mid(Tmp(a), 2, instr(2, Tmp(a), """") - 1)
      pStart = InStr(1, Tmp(a), "lang=""") + 6
      pEnd = Instr(pStart, Tmp(a), """")
      xLang(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
      pStart = InStr(1, Tmp(a), "Content=""") + 9
      pEnd = Instr(pStart, Tmp(a), """")
      xContent(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)

End Sub

i think that would work =\

Author Comment

ID: 12323976
Hi guys

I think a little clarification is in order.

The only thing that interest us are the following meta tags:

<meta name="xdescription" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxx">
<meta name="xkeywords" lang="xx" content="xxxx">

the three tags above can be present more then one time with different values for the lang parameter (and the content param as well)

<meta name="xlevel" content="xx">

<meta name="xthreadid" content="xx"> this tag is optional (may not be present)

After consideration, for the storage;
kindly load these in the following structure

type xProps
  lang as string
  title as string
  description as string
  keywords as string
end type

type xMetas
  docprops() as xProps
  threadid as long
  level as long
end type

I'm sorry for all the work, but I'm not that good anymore with string parsing because I need to do projects in many different languages (vb, html, javascript, asp, php,... and lately they descided to start teaching me assembly as well --- argh!)

If it's ok with you guys, the "input" is found in strBuffer
LVL 32

Accepted Solution

Erick37 earned 400 total points
ID: 12325425
It can be done using the MSHTML library.
Add a reference to the "Microsoft HTML Object Library"

Private Type xMetaTags
    name As String
    lang As String
    content As String
End Type

Dim idoc As HTMLDocument
Dim odoc As New HTMLDocument
Dim mElement As HTMLMetaElement
Dim xp() As xMetaTags
Dim i As Long
Dim sName As String, sLang As String, sContent As String

'Create the HTML document from file
Set idoc = odoc.createDocumentFromUrl("c:\test.html", vbNullString)
Do While idoc.ReadyState <> "complete": DoEvents: Loop

'Prepare the array
ReDim xp(0)

'Loop the "meta" elements in the document
For Each mElement In idoc.all.tags("meta")
    Debug.Print mElement.outerHTML
    sName = LCase(mElement.getAttribute("name"))
    sLang = LCase(mElement.getAttribute("lang"))
    sContent = LCase(mElement.getAttribute("content"))
    If (sName = "xdescription") Or _
        (sName = "xtitle") Or _
        (sName = "xkeywords") Or _
        (sName = "xlevel") Or _
        (sName = "xthreadid") Then
        'Expand the array
        ReDim Preserve xp(i)
        xp(i).name = sName
        xp(i).lang = sLang
        xp(i).content = sContent
        i = i + 1
    End If


Hope it helps!

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The debugging module of the VB 6 IDE can be accessed by way of the Debug menu item. That menu item can normally be found in the IDE's main menu line as shown in this picture.   There is also a companion Debug Toolbar that looks like the followin…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question