Solved

Parsing problem

Posted on 2004-10-15
4
316 Views
Last Modified: 2008-01-09
Hi all

I'm facing a problem


This is wat I need to do:

In HTML documents there are the following tags present:

<meta name="xdescription" lang="xx" content="xxxxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxxx">
<meta name="xlevel" content="xxx">
<meta name="xthreadid" content="xxxx">

multiple entries are possible, but with different values for the lang parameter

I need to be able to read these entries and store them in memory for easy processing.


Any ideas for the parser part & how to store them in memory?

A complete solution is worth 1.500pts A-grade



Good luck



With kind regards


x_terminat_or_3
0
Comment
Question by:x_terminat_or_3
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 19

Expert Comment

by:Shauli
ID: 12322959
'in form declaration
Option Explicit
Private Type parseHtml
    strName() As String
    strLang() As String
    strContent() As String
End Type
Dim MetaTags As parseHtml

'to call the sub
Private Sub Command1_Click()
Call ParseHtmlFile("c:\My Documents\test.html")
End Sub

'sub to parse and store in variables
'The variables are ARRAYS as in the type above:
'MetaTags.strName()
'MetaTags.strLang()
'MetaTags.strContent()

Private Sub ParseHtmlFile(ByVal sbFile As String)
Dim htmLine As String, lineSplit() As String, fracSplit() As String, cLoop As Integer, cCount As Integer
Open sbFile For Input As #1
    Do Until EOF(1)
        Line Input #1, htmLine
            lineSplit = Split(htmLine, Chr(34) & " ", -1)
            If UBound(lineSplit) > 0 Then
                For cLoop = 0 To UBound(lineSplit)
                    If InStr(1, lineSplit(cLoop), "name") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strName(cCount)
                        MetaTags.strName(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "lang") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strLang(cCount)
                        MetaTags.strLang(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "content") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strContent(cCount)
                        MetaTags.strContent(cCount) = fracSplit(1)
                    End If
                Next cLoop
            End If
            cCount = cCount + 1
        Loop
Close #1
End Sub

'Just for testing, you can locate three listboxes on your form (List1, List2 and List3) and use the code below instead of the code in the command button click event above:

Private Sub Command1_Click()
Dim c As Integer

Call ParseHtmlFile("c:\My Documents\test.html")

For c = 0 To UBound(MetaTags.strName)
    List1.AddItem MetaTags.strName(c)
Next c

For c = 0 To UBound(MetaTags.strLang)
    List2.AddItem MetaTags.strLang(c)
Next c

For c = 0 To UBound(MetaTags.strContent)
    List3.AddItem MetaTags.strContent(c)
Next c
End Sub

S

0
 
LVL 4

Assisted Solution

by:learning_t0_pr0gram
learning_t0_pr0gram earned 100 total points
ID: 12323578
another way would be to go to the page.. with the inet control or something..

Dim xName() as string, xLang() as string, xContent() as string

Private Sub Command1_Click()

    Dim Page as string

    Page = Inet1.OpenURL("http://www.yourpage.com")

    ReDim xName(0)
    ReDim xLang(0)
    ReDim xContent(0)

    Call ParseInfo(Page)
 
End Sub


Private Sub ParseInfo(Page as string)

  Dim Tmp() As String, pStart as long, pEnd as long

  Tmp() = Split(Page, "<meta name=")

  For a = 1 to UBound(Tmp)
      ReDim Preserve xName(a - 1)
      ReDim Preserve xLang(a - 1)
      ReDim Preserve xContent(a - 1)
      xName(a - 1) = Mid(Tmp(a), 2, instr(2, Tmp(a), """") - 1)
      pStart = InStr(1, Tmp(a), "lang=""") + 6
      pEnd = Instr(pStart, Tmp(a), """")
      xLang(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
      pStart = InStr(1, Tmp(a), "Content=""") + 9
      pEnd = Instr(pStart, Tmp(a), """")
      xContent(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
  Next

End Sub

i think that would work =\
0
 
LVL 2

Author Comment

by:x_terminat_or_3
ID: 12323976
Hi guys

I think a little clarification is in order.

The only thing that interest us are the following meta tags:

<meta name="xdescription" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxx">
<meta name="xkeywords" lang="xx" content="xxxx">

the three tags above can be present more then one time with different values for the lang parameter (and the content param as well)

<meta name="xlevel" content="xx">


<meta name="xthreadid" content="xx"> this tag is optional (may not be present)

After consideration, for the storage;
kindly load these in the following structure

type xProps
  lang as string
  title as string
  description as string
  keywords as string
end type

type xMetas
  docprops() as xProps
  threadid as long
  level as long
end type


I'm sorry for all the work, but I'm not that good anymore with string parsing because I need to do projects in many different languages (vb, html, javascript, asp, php,... and lately they descided to start teaching me assembly as well --- argh!)

If it's ok with you guys, the "input" is found in strBuffer
0
 
LVL 32

Accepted Solution

by:
Erick37 earned 400 total points
ID: 12325425
It can be done using the MSHTML library.
Add a reference to the "Microsoft HTML Object Library"

'~~~~~~~~~~~~~~~~~~~~~~~
Private Type xMetaTags
    name As String
    lang As String
    content As String
End Type
'~~~~~~~~~~~~~~~~~~~~~~~~


'~~~~~~~~~~~~~~~~~~~~~~~~
Dim idoc As HTMLDocument
Dim odoc As New HTMLDocument
Dim mElement As HTMLMetaElement
Dim xp() As xMetaTags
Dim i As Long
Dim sName As String, sLang As String, sContent As String

'Create the HTML document from file
Set idoc = odoc.createDocumentFromUrl("c:\test.html", vbNullString)
Do While idoc.ReadyState <> "complete": DoEvents: Loop

'Prepare the array
ReDim xp(0)

'Loop the "meta" elements in the document
For Each mElement In idoc.all.tags("meta")
   
    Debug.Print mElement.outerHTML
    sName = LCase(mElement.getAttribute("name"))
    sLang = LCase(mElement.getAttribute("lang"))
    sContent = LCase(mElement.getAttribute("content"))
   
    If (sName = "xdescription") Or _
        (sName = "xtitle") Or _
        (sName = "xkeywords") Or _
        (sName = "xlevel") Or _
        (sName = "xthreadid") Then
       
        'Expand the array
        ReDim Preserve xp(i)
       
        xp(i).name = sName
        xp(i).lang = sLang
        xp(i).content = sContent
       
        i = i + 1
    End If

Next
'~~~~~~~~~~~~~~~~~~~~~~~~~~

Hope it helps!
0

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction While answering a recent question about filtering a custom class collection, I realized that this could be accomplished with very little code by using the ScriptControl (SC) library.  This article will introduce you to the SC library a…
The debugging module of the VB 6 IDE can be accessed by way of the Debug menu item. That menu item can normally be found in the IDE's main menu line as shown in this picture.   There is also a companion Debug Toolbar that looks like the followin…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question