Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Parsing problem

Posted on 2004-10-15
4
Medium Priority
?
332 Views
Last Modified: 2008-01-09
Hi all

I'm facing a problem


This is wat I need to do:

In HTML documents there are the following tags present:

<meta name="xdescription" lang="xx" content="xxxxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxxx">
<meta name="xlevel" content="xxx">
<meta name="xthreadid" content="xxxx">

multiple entries are possible, but with different values for the lang parameter

I need to be able to read these entries and store them in memory for easy processing.


Any ideas for the parser part & how to store them in memory?

A complete solution is worth 1.500pts A-grade



Good luck



With kind regards


x_terminat_or_3
0
Comment
Question by:x_terminat_or_3
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 19

Expert Comment

by:Shauli
ID: 12322959
'in form declaration
Option Explicit
Private Type parseHtml
    strName() As String
    strLang() As String
    strContent() As String
End Type
Dim MetaTags As parseHtml

'to call the sub
Private Sub Command1_Click()
Call ParseHtmlFile("c:\My Documents\test.html")
End Sub

'sub to parse and store in variables
'The variables are ARRAYS as in the type above:
'MetaTags.strName()
'MetaTags.strLang()
'MetaTags.strContent()

Private Sub ParseHtmlFile(ByVal sbFile As String)
Dim htmLine As String, lineSplit() As String, fracSplit() As String, cLoop As Integer, cCount As Integer
Open sbFile For Input As #1
    Do Until EOF(1)
        Line Input #1, htmLine
            lineSplit = Split(htmLine, Chr(34) & " ", -1)
            If UBound(lineSplit) > 0 Then
                For cLoop = 0 To UBound(lineSplit)
                    If InStr(1, lineSplit(cLoop), "name") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strName(cCount)
                        MetaTags.strName(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "lang") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strLang(cCount)
                        MetaTags.strLang(cCount) = fracSplit(1)
                    ElseIf InStr(1, lineSplit(cLoop), "content") > 0 Then
                        fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
                        ReDim Preserve MetaTags.strContent(cCount)
                        MetaTags.strContent(cCount) = fracSplit(1)
                    End If
                Next cLoop
            End If
            cCount = cCount + 1
        Loop
Close #1
End Sub

'Just for testing, you can locate three listboxes on your form (List1, List2 and List3) and use the code below instead of the code in the command button click event above:

Private Sub Command1_Click()
Dim c As Integer

Call ParseHtmlFile("c:\My Documents\test.html")

For c = 0 To UBound(MetaTags.strName)
    List1.AddItem MetaTags.strName(c)
Next c

For c = 0 To UBound(MetaTags.strLang)
    List2.AddItem MetaTags.strLang(c)
Next c

For c = 0 To UBound(MetaTags.strContent)
    List3.AddItem MetaTags.strContent(c)
Next c
End Sub

S

0
 
LVL 4

Assisted Solution

by:learning_t0_pr0gram
learning_t0_pr0gram earned 400 total points
ID: 12323578
another way would be to go to the page.. with the inet control or something..

Dim xName() as string, xLang() as string, xContent() as string

Private Sub Command1_Click()

    Dim Page as string

    Page = Inet1.OpenURL("http://www.yourpage.com")

    ReDim xName(0)
    ReDim xLang(0)
    ReDim xContent(0)

    Call ParseInfo(Page)
 
End Sub


Private Sub ParseInfo(Page as string)

  Dim Tmp() As String, pStart as long, pEnd as long

  Tmp() = Split(Page, "<meta name=")

  For a = 1 to UBound(Tmp)
      ReDim Preserve xName(a - 1)
      ReDim Preserve xLang(a - 1)
      ReDim Preserve xContent(a - 1)
      xName(a - 1) = Mid(Tmp(a), 2, instr(2, Tmp(a), """") - 1)
      pStart = InStr(1, Tmp(a), "lang=""") + 6
      pEnd = Instr(pStart, Tmp(a), """")
      xLang(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
      pStart = InStr(1, Tmp(a), "Content=""") + 9
      pEnd = Instr(pStart, Tmp(a), """")
      xContent(a - 1) = Mid(Tmp(a), pStart, pEnd - pStart)
  Next

End Sub

i think that would work =\
0
 
LVL 2

Author Comment

by:x_terminat_or_3
ID: 12323976
Hi guys

I think a little clarification is in order.

The only thing that interest us are the following meta tags:

<meta name="xdescription" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxx">
<meta name="xkeywords" lang="xx" content="xxxx">

the three tags above can be present more then one time with different values for the lang parameter (and the content param as well)

<meta name="xlevel" content="xx">


<meta name="xthreadid" content="xx"> this tag is optional (may not be present)

After consideration, for the storage;
kindly load these in the following structure

type xProps
  lang as string
  title as string
  description as string
  keywords as string
end type

type xMetas
  docprops() as xProps
  threadid as long
  level as long
end type


I'm sorry for all the work, but I'm not that good anymore with string parsing because I need to do projects in many different languages (vb, html, javascript, asp, php,... and lately they descided to start teaching me assembly as well --- argh!)

If it's ok with you guys, the "input" is found in strBuffer
0
 
LVL 32

Accepted Solution

by:
Erick37 earned 1600 total points
ID: 12325425
It can be done using the MSHTML library.
Add a reference to the "Microsoft HTML Object Library"

'~~~~~~~~~~~~~~~~~~~~~~~
Private Type xMetaTags
    name As String
    lang As String
    content As String
End Type
'~~~~~~~~~~~~~~~~~~~~~~~~


'~~~~~~~~~~~~~~~~~~~~~~~~
Dim idoc As HTMLDocument
Dim odoc As New HTMLDocument
Dim mElement As HTMLMetaElement
Dim xp() As xMetaTags
Dim i As Long
Dim sName As String, sLang As String, sContent As String

'Create the HTML document from file
Set idoc = odoc.createDocumentFromUrl("c:\test.html", vbNullString)
Do While idoc.ReadyState <> "complete": DoEvents: Loop

'Prepare the array
ReDim xp(0)

'Loop the "meta" elements in the document
For Each mElement In idoc.all.tags("meta")
   
    Debug.Print mElement.outerHTML
    sName = LCase(mElement.getAttribute("name"))
    sLang = LCase(mElement.getAttribute("lang"))
    sContent = LCase(mElement.getAttribute("content"))
   
    If (sName = "xdescription") Or _
        (sName = "xtitle") Or _
        (sName = "xkeywords") Or _
        (sName = "xlevel") Or _
        (sName = "xthreadid") Then
       
        'Expand the array
        ReDim Preserve xp(i)
       
        xp(i).name = sName
        xp(i).lang = sLang
        xp(i).content = sContent
       
        i = i + 1
    End If

Next
'~~~~~~~~~~~~~~~~~~~~~~~~~~

Hope it helps!
0

Featured Post

Tech or Treat! - Giveaway

Submit an article about your scariest tech experience—and the solution—and you’ll be automatically entered to win one of 4 fantastic tech gadgets.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever wanted to restrict the users input in a textbox to numbers, and while doing that make sure that they can't 'cheat' by pasting in non-numeric text? Of course you can do that with code you write yourself but it's tedious and error-prone …
If you have ever used Microsoft Word then you know that it has a good spell checker and it may have occurred to you that the ability to check spelling might be a nice piece of functionality to add to certain applications of yours. Well the code that…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…
Suggested Courses

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question