Solved

Parsing problem

Posted on 2004-10-15
4
274 Views
Last Modified: 2008-01-09
Hi all

I'm facing a problem


This is wat I need to do:

In HTML documents there are the following tags present:

<meta name="xdescription" lang="xx" content="xxxxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxxx">
<meta name="xlevel" content="xxx">
<meta name="xthreadid" content="xxxx">

multiple entries are possible, but with different values for the lang parameter

I need to be able to read these entries and store them in memory for easy processing.


Any ideas for the parser part & how to store them in memory?

A complete solution is worth 1.500pts A-grade



Good luck



With kind regards


x_terminat_or_3
0
Comment
Question by:x_terminat_or_3
4 Comments
 
LVL 19

Expert Comment

by:Shauli
ID: 12322959
'in form declaration
Option Explicit
Private Type parseHtml
    strName() As String
    strLang() As String
    strContent() As String
End Type
Dim MetaTags As parseHtml

'to call the sub
Private Sub Command1_Click()
Call ParseHtmlFile("c:\My Documents\test.html")
End Sub

'sub to parse and store in variables
'The variables are ARRAYS as in the type above:
'MetaTags.strName()
'MetaTags.strLang()
'MetaTags.strContent()

Private Sub ParseHtmlFile(ByVal sbFile As String)
Dim htmLine As String, lineSplit() As String, fracSplit() As String, cLoop As Integer, cCount As Integer
Open sbFile For Input As #1
    Do Until EOF(1)
        Line Input #1, htmLine
            lineSplit = Split(htmLine, Chr(34) & " ", -1)
            If UBound(lineSplit) > 0 Then
                For cLoop = 0 To UBound(lineSplit)
                    If InStr(1, lineSplit(cLoop), "name") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strName(cCount)
                        MetaTags.strName(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "lang") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strLang(cCount)
                        MetaTags.strLang(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "content") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strContent(cCount)
                        MetaTags.strContent(cCount) = fracSplit(1)
                    End If
                Next cLoop
            End If
            cCount = cCount + 1
        Loop
Close #1
End Sub

'Just for testing, you can locate three listboxes on your form (List1, List2 and List3) and use the code below instead of the code in the command button click event above:

Private Sub Command1_Click()
Dim c As Integer

Call ParseHtmlFile("c:\My Documents\test.html")

For c = 0 To UBound(MetaTags.strName)
    List1.AddItem MetaTags.strName(c)
Next c

For c = 0 To UBound(MetaTags.strLang)
    List2.AddItem MetaTags.strLang(c)
Next c

For c = 0 To UBound(MetaTags.strContent)
    List3.AddItem MetaTags.strContent(c)
Next c
End Sub

S

0
 
LVL 4

Assisted Solution

by:learning_t0_pr0gram
learning_t0_pr0gram earned 100 total points
ID: 12323578
another way would be to go to the page.. with the inet control or something..

Dim xName() as string, xLang() as string, xContent() as string

Private Sub Command1_Click()

    Dim Page as string

    Page = Inet1.OpenURL("http://www.yourpage.com")

    ReDim xName(0)
    ReDim xLang(0)
    ReDim xContent(0)

    Call ParseInfo(Page)
 
End Sub


Private Sub ParseInfo(Page as string)

  Dim Tmp() As String, pStart as long, pEnd as long

  Tmp() = Split(Page, "<meta name=")

  For a = 1 to UBound(Tmp)
      ReDim Preserve xName(a - 1)
      ReDim Preserve xLang(a - 1)
      ReDim Preserve xContent(a - 1)
      xName(a - 1) = Mid(Tmp(a), 2, instr(2, Tmp(a), """") - 1)
      pStart = InStr(1, Tmp(a), "lang=""") + 6
      pEnd = Instr(pStart, Tmp(a), """")
      xLang(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
      pStart = InStr(1, Tmp(a), "Content=""") + 9
      pEnd = Instr(pStart, Tmp(a), """")
      xContent(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
  Next

End Sub

i think that would work =\
0
 
LVL 2

Author Comment

by:x_terminat_or_3
ID: 12323976
Hi guys

I think a little clarification is in order.

The only thing that interest us are the following meta tags:

<meta name="xdescription" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxx">
<meta name="xkeywords" lang="xx" content="xxxx">

the three tags above can be present more then one time with different values for the lang parameter (and the content param as well)

<meta name="xlevel" content="xx">


<meta name="xthreadid" content="xx"> this tag is optional (may not be present)

After consideration, for the storage;
kindly load these in the following structure

type xProps
  lang as string
  title as string
  description as string
  keywords as string
end type

type xMetas
  docprops() as xProps
  threadid as long
  level as long
end type


I'm sorry for all the work, but I'm not that good anymore with string parsing because I need to do projects in many different languages (vb, html, javascript, asp, php,... and lately they descided to start teaching me assembly as well --- argh!)

If it's ok with you guys, the "input" is found in strBuffer
0
 
LVL 32

Accepted Solution

by:
Erick37 earned 400 total points
ID: 12325425
It can be done using the MSHTML library.
Add a reference to the "Microsoft HTML Object Library"

'~~~~~~~~~~~~~~~~~~~~~~~
Private Type xMetaTags
    name As String
    lang As String
    content As String
End Type
'~~~~~~~~~~~~~~~~~~~~~~~~


'~~~~~~~~~~~~~~~~~~~~~~~~
Dim idoc As HTMLDocument
Dim odoc As New HTMLDocument
Dim mElement As HTMLMetaElement
Dim xp() As xMetaTags
Dim i As Long
Dim sName As String, sLang As String, sContent As String

'Create the HTML document from file
Set idoc = odoc.createDocumentFromUrl("c:\test.html", vbNullString)
Do While idoc.ReadyState <> "complete": DoEvents: Loop

'Prepare the array
ReDim xp(0)

'Loop the "meta" elements in the document
For Each mElement In idoc.all.tags("meta")
   
    Debug.Print mElement.outerHTML
    sName = LCase(mElement.getAttribute("name"))
    sLang = LCase(mElement.getAttribute("lang"))
    sContent = LCase(mElement.getAttribute("content"))
   
    If (sName = "xdescription") Or _
        (sName = "xtitle") Or _
        (sName = "xkeywords") Or _
        (sName = "xlevel") Or _
        (sName = "xthreadid") Then
       
        'Expand the array
        ReDim Preserve xp(i)
       
        xp(i).name = sName
        xp(i).lang = sLang
        xp(i).content = sContent
       
        i = i + 1
    End If

Next
'~~~~~~~~~~~~~~~~~~~~~~~~~~

Hope it helps!
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now