x_terminat_or_3
asked on
Parsing problem
Hi all
I'm facing a problem
This is wat I need to do:
In HTML documents there are the following tags present:
<meta name="xdescription" lang="xx" content="xxxxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxxx">
<meta name="xlevel" content="xxx">
<meta name="xthreadid" content="xxxx">
multiple entries are possible, but with different values for the lang parameter
I need to be able to read these entries and store them in memory for easy processing.
Any ideas for the parser part & how to store them in memory?
A complete solution is worth 1.500pts A-grade
Good luck
With kind regards
x_terminat_or_3
I'm facing a problem
This is wat I need to do:
In HTML documents there are the following tags present:
<meta name="xdescription" lang="xx" content="xxxxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxxx">
<meta name="xlevel" content="xxx">
<meta name="xthreadid" content="xxxx">
multiple entries are possible, but with different values for the lang parameter
I need to be able to read these entries and store them in memory for easy processing.
Any ideas for the parser part & how to store them in memory?
A complete solution is worth 1.500pts A-grade
Good luck
With kind regards
x_terminat_or_3
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi guys
I think a little clarification is in order.
The only thing that interest us are the following meta tags:
<meta name="xdescription" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
the three tags above can be present more then one time with different values for the lang parameter (and the content param as well)
<meta name="xlevel" content="xx">
<meta name="xthreadid" content="xx"> this tag is optional (may not be present)
After consideration, for the storage;
kindly load these in the following structure
type xProps
lang as string
title as string
description as string
keywords as string
end type
type xMetas
docprops() as xProps
threadid as long
level as long
end type
I'm sorry for all the work, but I'm not that good anymore with string parsing because I need to do projects in many different languages (vb, html, javascript, asp, php,... and lately they descided to start teaching me assembly as well --- argh!)
If it's ok with you guys, the "input" is found in strBuffer
I think a little clarification is in order.
The only thing that interest us are the following meta tags:
<meta name="xdescription" lang="xx" content="xxxx">
<meta name="xtitle" lang="xx" content="xxxx">
<meta name="xkeywords" lang="xx" content="xxxx">
the three tags above can be present more then one time with different values for the lang parameter (and the content param as well)
<meta name="xlevel" content="xx">
<meta name="xthreadid" content="xx"> this tag is optional (may not be present)
After consideration, for the storage;
kindly load these in the following structure
type xProps
lang as string
title as string
description as string
keywords as string
end type
type xMetas
docprops() as xProps
threadid as long
level as long
end type
I'm sorry for all the work, but I'm not that good anymore with string parsing because I need to do projects in many different languages (vb, html, javascript, asp, php,... and lately they descided to start teaching me assembly as well --- argh!)
If it's ok with you guys, the "input" is found in strBuffer
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Option Explicit
Private Type parseHtml
strName() As String
strLang() As String
strContent() As String
End Type
Dim MetaTags As parseHtml
'to call the sub
Private Sub Command1_Click()
Call ParseHtmlFile("c:\My Documents\test.html")
End Sub
'sub to parse and store in variables
'The variables are ARRAYS as in the type above:
'MetaTags.strName()
'MetaTags.strLang()
'MetaTags.strContent()
Private Sub ParseHtmlFile(ByVal sbFile As String)
Dim htmLine As String, lineSplit() As String, fracSplit() As String, cLoop As Integer, cCount As Integer
Open sbFile For Input As #1
Do Until EOF(1)
Line Input #1, htmLine
lineSplit = Split(htmLine, Chr(34) & " ", -1)
If UBound(lineSplit) > 0 Then
For cLoop = 0 To UBound(lineSplit)
If InStr(1, lineSplit(cLoop), "name") > 0 Then
fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
ReDim Preserve MetaTags.strName(cCount)
MetaTags.strName(cCount) = fracSplit(1)
ElseIf InStr(1, lineSplit(cLoop), "lang") > 0 Then
fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
ReDim Preserve MetaTags.strLang(cCount)
MetaTags.strLang(cCount) = fracSplit(1)
ElseIf InStr(1, lineSplit(cLoop), "content") > 0 Then
fracSplit = Split(lineSplit(cLoop), Chr(34), -1)
ReDim Preserve MetaTags.strContent(cCount
MetaTags.strContent(cCount
End If
Next cLoop
End If
cCount = cCount + 1
Loop
Close #1
End Sub
'Just for testing, you can locate three listboxes on your form (List1, List2 and List3) and use the code below instead of the code in the command button click event above:
Private Sub Command1_Click()
Dim c As Integer
Call ParseHtmlFile("c:\My Documents\test.html")
For c = 0 To UBound(MetaTags.strName)
List1.AddItem MetaTags.strName(c)
Next c
For c = 0 To UBound(MetaTags.strLang)
List2.AddItem MetaTags.strLang(c)
Next c
For c = 0 To UBound(MetaTags.strContent
List3.AddItem MetaTags.strContent(c)
Next c
End Sub
S