Solved: HTML Post Parsing with VB

Benjy

listening

gencross

You want to catch webpages before they are processed by browser, scrub them, and then send them to the browser?

James

ASKER

gencross, Sure, whatever works to clean them up...

Richie_Simonetti

If you already have the code in C maybe we could translate it...

PaulHews

Look up the InStr (also InStrRev) function in the docs. It shows the position of a substring within a larger string. Then when you have the position and length (usually from subtracting two positions), extract the substring using mid$ function.

Since you want to clean the code of these substrings, then you can use the replace function to replace the substrings with empty strings.

GivenRandy

Using RegEx will get you there a lot quicker (try with VB.NET). Have you tried these links:

http://www.amazon.com/exec/obidos/ASIN/B0000632ZU/randygivennancyg/104-6125570-3694356

http://www.codeguru.com/columns/VB/PK041702.html

http://directory.google.com/Top/Computers/Programming/Languages/Regular_Expessions/

http://www.aivosto.com/regexpr.html

James

ASKER

Well this is a good suggestion to translate regex to VB I reckon. In C/perl you can quickly resolve HTML by simply doing something like this:

# Let's put a file contents into a string
open (IN, "microsoft.htm");
while (<IN>){
$html .= $_; # this apends each line of file .=
}
close IN;

# Use this base URL
$BASE_URL = "http://microsoft.com";

# do a case insensitive global search/replace
# This resolves href="/ or src="/or whatever else
$html =~ s!"/!"$BASE_URL/!gi;
print $html;
--------cut---------

Ya can repeat the substitution for other known instances of realtive urls in HTML like lack of double quotes/slashes and what not. All very easy and requires little code. Ya can also do the same for removing certain scripts because you can do matching with substitution. How I always do that is first strip all line breaks to make on big continous line then I would just look between <script> </script> and if I get a match of something undesirable I replace with nothing to remove it.

How would ya do the same above in VB??

James

ASKER

Well this is a good suggestion to translate regex to VB I reckon. In C/perl you can quickly resolve HTML by simply doing something like this:

# Let's put a file contents into a string
open (IN, "microsoft.htm");
while (<IN>){
$html .= $_; # this apends each line of file .=
}
close IN;

# Use this base URL
$BASE_URL = "http://microsoft.com";

# do a case insensitive global search/replace
# This resolves href="/ or src="/or whatever else
$html =~ s!"/!"$BASE_URL/!gi;
print $html;
--------cut---------

Ya can repeat the substitution for other known instances of realtive urls in HTML like lack of double quotes/slashes and what not. All very easy and requires little code. Ya can also do the same for removing certain scripts because you can do matching with substitution. How I always do that is first strip all line breaks to make on big continous line then I would just look between <script> </script> and if I get a match of something undesirable I replace with nothing to remove it.

How would ya do the same above in VB??

DanRollins

Hi ohmeohmy,
It appears that you have forgotten this question. I will ask Community Support to close it unless you finalize it within 7 days. I will ask a Community Support Moderator to:

Refund points and save as a 0-pt PAQ.

ohmeohmy, Please DO NOT accept this comment as an answer.
EXPERTS: Post a comment if you are certain that an expert deserves credit. Explain why.
==========
DanRollins -- EE database cleanup volunteer