Solved

HTML Post Parsing with VB

Posted on 2002-06-21
10
196 Views
Last Modified: 2010-05-02
Ok I am really blinded by VB's strange regex and need some HELP with post processing HTML docs before indexing em. I can do what I want in C in minutes but I like to keep everything VB.......so here what I need....

I need to read HTML and resolve the unresolved urls and ignore certain scripts within like openWindow commands <SCRIPT>....openWindow.....</SCRPIT>.

Hope someone has some code or know where I can get a parser MOD to do this.

0
Comment
Question by:ohmeohmy
10 Comments
 
LVL 1

Expert Comment

by:Benjy
ID: 7098274
listening
0
 
LVL 4

Expert Comment

by:gencross
ID: 7098506
You want to catch webpages before they are processed by browser, scrub them, and then send them to the browser?
0
 

Author Comment

by:ohmeohmy
ID: 7098615
gencross, Sure, whatever works to clean them up...
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7099111
If you already have the code in C maybe we could translate it...
0
 
LVL 38

Expert Comment

by:PaulHews
ID: 7099247
Look up the InStr (also InStrRev) function in the docs.  It shows the position of a substring within a larger string.  Then when you have the position and length (usually from subtracting two positions), extract the substring using mid$ function.

Since you want to clean the code of these substrings, then you can use the replace function to replace the substrings with empty strings.

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 9

Expert Comment

by:GivenRandy
ID: 7099364
0
 

Author Comment

by:ohmeohmy
ID: 7100090
Well this is a good suggestion to translate regex to VB I reckon. In C/perl you can quickly resolve HTML by simply doing something like this:

# Let's put a file contents into a string
open (IN, "microsoft.htm");
while (<IN>){
$html .= $_; # this apends each line of file .=
}
close IN;

# Use this base URL
$BASE_URL = "http://microsoft.com";

# do a case insensitive global search/replace
# This resolves href="/ or src="/or whatever else
$html =~ s!"/!"$BASE_URL/!gi;
print $html;
--------cut---------

Ya can repeat the substitution for other known instances of realtive urls in HTML like lack of double quotes/slashes and what not. All very easy and requires little code. Ya can also do the same for removing certain scripts because you can do matching with substitution. How I always do that is first strip all line breaks to make on big continous line then I would just look between <script> </script> and if I get a match of something undesirable I replace with nothing to remove it.

How would ya do the same above in VB??




0
 

Author Comment

by:ohmeohmy
ID: 7100106
Well this is a good suggestion to translate regex to VB I reckon. In C/perl you can quickly resolve HTML by simply doing something like this:

# Let's put a file contents into a string
open (IN, "microsoft.htm");
while (<IN>){
$html .= $_; # this apends each line of file .=
}
close IN;

# Use this base URL
$BASE_URL = "http://microsoft.com";

# do a case insensitive global search/replace
# This resolves href="/ or src="/or whatever else
$html =~ s!"/!"$BASE_URL/!gi;
print $html;
--------cut---------

Ya can repeat the substitution for other known instances of realtive urls in HTML like lack of double quotes/slashes and what not. All very easy and requires little code. Ya can also do the same for removing certain scripts because you can do matching with substitution. How I always do that is first strip all line breaks to make on big continous line then I would just look between <script> </script> and if I get a match of something undesirable I replace with nothing to remove it.

How would ya do the same above in VB??




0
 
LVL 49

Expert Comment

by:DanRollins
ID: 7923733
Hi ohmeohmy,
It appears that you have forgotten this question. I will ask Community Support to close it unless you finalize it within 7 days. I will ask a Community Support Moderator to:

    Refund points and save as a 0-pt PAQ.

ohmeohmy, Please DO NOT accept this comment as an answer.
EXPERTS: Post a comment if you are certain that an expert deserves credit.  Explain why.
==========
DanRollins -- EE database cleanup volunteer
0
 
LVL 1

Accepted Solution

by:
Computer101 earned 0 total points
ID: 7929877
Points refunded and placed in PAQ

Computer101
E-E Admin
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

24 Experts available now in Live!

Get 1:1 Help Now