Solved

Parsing problem

Posted on 2004-10-15
4
296 Views
Last Modified: 2008-01-09
Hi all

I'm facing a problem


This is wat I need to do:

In HTML documents there are the following tags present:

<meta name="xdescription" lang="xx" content="xxxxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxxx">
<meta name="xlevel" content="xxx">
<meta name="xthreadid" content="xxxx">

multiple entries are possible, but with different values for the lang parameter

I need to be able to read these entries and store them in memory for easy processing.


Any ideas for the parser part & how to store them in memory?

A complete solution is worth 1.500pts A-grade



Good luck



With kind regards


x_terminat_or_3
0
Comment
Question by:x_terminat_or_3
4 Comments
 
LVL 19

Expert Comment

by:Shauli
ID: 12322959
'in form declaration
Option Explicit
Private Type parseHtml
    strName() As String
    strLang() As String
    strContent() As String
End Type
Dim MetaTags As parseHtml

'to call the sub
Private Sub Command1_Click()
Call ParseHtmlFile("c:\My Documents\test.html")
End Sub

'sub to parse and store in variables
'The variables are ARRAYS as in the type above:
'MetaTags.strName()
'MetaTags.strLang()
'MetaTags.strContent()

Private Sub ParseHtmlFile(ByVal sbFile As String)
Dim htmLine As String, lineSplit() As String, fracSplit() As String, cLoop As Integer, cCount As Integer
Open sbFile For Input As #1
    Do Until EOF(1)
        Line Input #1, htmLine
            lineSplit = Split(htmLine, Chr(34) & " ", -1)
            If UBound(lineSplit) > 0 Then
                For cLoop = 0 To UBound(lineSplit)
                    If InStr(1, lineSplit(cLoop), "name") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strName(cCount)
                        MetaTags.strName(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "lang") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strLang(cCount)
                        MetaTags.strLang(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "content") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strContent(cCount)
                        MetaTags.strContent(cCount) = fracSplit(1)
                    End If
                Next cLoop
            End If
            cCount = cCount + 1
        Loop
Close #1
End Sub

'Just for testing, you can locate three listboxes on your form (List1, List2 and List3) and use the code below instead of the code in the command button click event above:

Private Sub Command1_Click()
Dim c As Integer

Call ParseHtmlFile("c:\My Documents\test.html")

For c = 0 To UBound(MetaTags.strName)
    List1.AddItem MetaTags.strName(c)
Next c

For c = 0 To UBound(MetaTags.strLang)
    List2.AddItem MetaTags.strLang(c)
Next c

For c = 0 To UBound(MetaTags.strContent)
    List3.AddItem MetaTags.strContent(c)
Next c
End Sub

S

0
 
LVL 4

Assisted Solution

by:learning_t0_pr0gram
learning_t0_pr0gram earned 100 total points
ID: 12323578
another way would be to go to the page.. with the inet control or something..

Dim xName() as string, xLang() as string, xContent() as string

Private Sub Command1_Click()

    Dim Page as string

    Page = Inet1.OpenURL("http://www.yourpage.com")

    ReDim xName(0)
    ReDim xLang(0)
    ReDim xContent(0)

    Call ParseInfo(Page)
 
End Sub


Private Sub ParseInfo(Page as string)

  Dim Tmp() As String, pStart as long, pEnd as long

  Tmp() = Split(Page, "<meta name=")

  For a = 1 to UBound(Tmp)
      ReDim Preserve xName(a - 1)
      ReDim Preserve xLang(a - 1)
      ReDim Preserve xContent(a - 1)
      xName(a - 1) = Mid(Tmp(a), 2, instr(2, Tmp(a), """") - 1)
      pStart = InStr(1, Tmp(a), "lang=""") + 6
      pEnd = Instr(pStart, Tmp(a), """")
      xLang(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
      pStart = InStr(1, Tmp(a), "Content=""") + 9
      pEnd = Instr(pStart, Tmp(a), """")
      xContent(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
  Next

End Sub

i think that would work =\
0
 
LVL 2

Author Comment

by:x_terminat_or_3
ID: 12323976
Hi guys

I think a little clarification is in order.

The only thing that interest us are the following meta tags:

<meta name="xdescription" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxx">
<meta name="xkeywords" lang="xx" content="xxxx">

the three tags above can be present more then one time with different values for the lang parameter (and the content param as well)

<meta name="xlevel" content="xx">


<meta name="xthreadid" content="xx"> this tag is optional (may not be present)

After consideration, for the storage;
kindly load these in the following structure

type xProps
  lang as string
  title as string
  description as string
  keywords as string
end type

type xMetas
  docprops() as xProps
  threadid as long
  level as long
end type


I'm sorry for all the work, but I'm not that good anymore with string parsing because I need to do projects in many different languages (vb, html, javascript, asp, php,... and lately they descided to start teaching me assembly as well --- argh!)

If it's ok with you guys, the "input" is found in strBuffer
0
 
LVL 32

Accepted Solution

by:
Erick37 earned 400 total points
ID: 12325425
It can be done using the MSHTML library.
Add a reference to the "Microsoft HTML Object Library"

'~~~~~~~~~~~~~~~~~~~~~~~
Private Type xMetaTags
    name As String
    lang As String
    content As String
End Type
'~~~~~~~~~~~~~~~~~~~~~~~~


'~~~~~~~~~~~~~~~~~~~~~~~~
Dim idoc As HTMLDocument
Dim odoc As New HTMLDocument
Dim mElement As HTMLMetaElement
Dim xp() As xMetaTags
Dim i As Long
Dim sName As String, sLang As String, sContent As String

'Create the HTML document from file
Set idoc = odoc.createDocumentFromUrl("c:\test.html", vbNullString)
Do While idoc.ReadyState <> "complete": DoEvents: Loop

'Prepare the array
ReDim xp(0)

'Loop the "meta" elements in the document
For Each mElement In idoc.all.tags("meta")
   
    Debug.Print mElement.outerHTML
    sName = LCase(mElement.getAttribute("name"))
    sLang = LCase(mElement.getAttribute("lang"))
    sContent = LCase(mElement.getAttribute("content"))
   
    If (sName = "xdescription") Or _
        (sName = "xtitle") Or _
        (sName = "xkeywords") Or _
        (sName = "xlevel") Or _
        (sName = "xthreadid") Then
       
        'Expand the array
        ReDim Preserve xp(i)
       
        xp(i).name = sName
        xp(i).lang = sLang
        xp(i).content = sContent
       
        i = i + 1
    End If

Next
'~~~~~~~~~~~~~~~~~~~~~~~~~~

Hope it helps!
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction I needed to skip over some file processing within a For...Next loop in some old production code and wished that VB (classic) had a statement that would drop down to the end of the current iteration, bypassing the statements that were c…
There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

774 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question