Solved

HTML Post Parsing with VB

Posted on 2002-06-21
10
199 Views
Last Modified: 2010-05-02
Ok I am really blinded by VB's strange regex and need some HELP with post processing HTML docs before indexing em. I can do what I want in C in minutes but I like to keep everything VB.......so here what I need....

I need to read HTML and resolve the unresolved urls and ignore certain scripts within like openWindow commands <SCRIPT>....openWindow.....</SCRPIT>.

Hope someone has some code or know where I can get a parser MOD to do this.

0
Comment
Question by:ohmeohmy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 1

Expert Comment

by:Benjy
ID: 7098274
listening
0
 
LVL 4

Expert Comment

by:gencross
ID: 7098506
You want to catch webpages before they are processed by browser, scrub them, and then send them to the browser?
0
 

Author Comment

by:ohmeohmy
ID: 7098615
gencross, Sure, whatever works to clean them up...
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7099111
If you already have the code in C maybe we could translate it...
0
 
LVL 38

Expert Comment

by:PaulHews
ID: 7099247
Look up the InStr (also InStrRev) function in the docs.  It shows the position of a substring within a larger string.  Then when you have the position and length (usually from subtracting two positions), extract the substring using mid$ function.

Since you want to clean the code of these substrings, then you can use the replace function to replace the substrings with empty strings.

0
 
LVL 9

Expert Comment

by:GivenRandy
ID: 7099364
0
 

Author Comment

by:ohmeohmy
ID: 7100090
Well this is a good suggestion to translate regex to VB I reckon. In C/perl you can quickly resolve HTML by simply doing something like this:

# Let's put a file contents into a string
open (IN, "microsoft.htm");
while (<IN>){
$html .= $_; # this apends each line of file .=
}
close IN;

# Use this base URL
$BASE_URL = "http://microsoft.com";

# do a case insensitive global search/replace
# This resolves href="/ or src="/or whatever else
$html =~ s!"/!"$BASE_URL/!gi;
print $html;
--------cut---------

Ya can repeat the substitution for other known instances of realtive urls in HTML like lack of double quotes/slashes and what not. All very easy and requires little code. Ya can also do the same for removing certain scripts because you can do matching with substitution. How I always do that is first strip all line breaks to make on big continous line then I would just look between <script> </script> and if I get a match of something undesirable I replace with nothing to remove it.

How would ya do the same above in VB??




0
 

Author Comment

by:ohmeohmy
ID: 7100106
Well this is a good suggestion to translate regex to VB I reckon. In C/perl you can quickly resolve HTML by simply doing something like this:

# Let's put a file contents into a string
open (IN, "microsoft.htm");
while (<IN>){
$html .= $_; # this apends each line of file .=
}
close IN;

# Use this base URL
$BASE_URL = "http://microsoft.com";

# do a case insensitive global search/replace
# This resolves href="/ or src="/or whatever else
$html =~ s!"/!"$BASE_URL/!gi;
print $html;
--------cut---------

Ya can repeat the substitution for other known instances of realtive urls in HTML like lack of double quotes/slashes and what not. All very easy and requires little code. Ya can also do the same for removing certain scripts because you can do matching with substitution. How I always do that is first strip all line breaks to make on big continous line then I would just look between <script> </script> and if I get a match of something undesirable I replace with nothing to remove it.

How would ya do the same above in VB??




0
 
LVL 49

Expert Comment

by:DanRollins
ID: 7923733
Hi ohmeohmy,
It appears that you have forgotten this question. I will ask Community Support to close it unless you finalize it within 7 days. I will ask a Community Support Moderator to:

    Refund points and save as a 0-pt PAQ.

ohmeohmy, Please DO NOT accept this comment as an answer.
EXPERTS: Post a comment if you are certain that an expert deserves credit.  Explain why.
==========
DanRollins -- EE database cleanup volunteer
0
 
LVL 1

Accepted Solution

by:
Computer101 earned 0 total points
ID: 7929877
Points refunded and placed in PAQ

Computer101
E-E Admin
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
This article describes some techniques which will make your VBA or Visual Basic Classic code easier to understand and maintain, whether by you, your replacement, or another Experts-Exchange expert.
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question