Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

HTML Post Parsing with VB

Posted on 2002-06-21
10
Medium Priority
?
204 Views
Last Modified: 2010-05-02
Ok I am really blinded by VB's strange regex and need some HELP with post processing HTML docs before indexing em. I can do what I want in C in minutes but I like to keep everything VB.......so here what I need....

I need to read HTML and resolve the unresolved urls and ignore certain scripts within like openWindow commands <SCRIPT>....openWindow.....</SCRPIT>.

Hope someone has some code or know where I can get a parser MOD to do this.

0
Comment
Question by:James
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 1

Expert Comment

by:Benjy
ID: 7098274
listening
0
 
LVL 4

Expert Comment

by:gencross
ID: 7098506
You want to catch webpages before they are processed by browser, scrub them, and then send them to the browser?
0
 

Author Comment

by:James
ID: 7098615
gencross, Sure, whatever works to clean them up...
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 7099111
If you already have the code in C maybe we could translate it...
0
 
LVL 38

Expert Comment

by:PaulHews
ID: 7099247
Look up the InStr (also InStrRev) function in the docs.  It shows the position of a substring within a larger string.  Then when you have the position and length (usually from subtracting two positions), extract the substring using mid$ function.

Since you want to clean the code of these substrings, then you can use the replace function to replace the substrings with empty strings.

0
 

Author Comment

by:James
ID: 7100090
Well this is a good suggestion to translate regex to VB I reckon. In C/perl you can quickly resolve HTML by simply doing something like this:

# Let's put a file contents into a string
open (IN, "microsoft.htm");
while (<IN>){
$html .= $_; # this apends each line of file .=
}
close IN;

# Use this base URL
$BASE_URL = "http://microsoft.com";

# do a case insensitive global search/replace
# This resolves href="/ or src="/or whatever else
$html =~ s!"/!"$BASE_URL/!gi;
print $html;
--------cut---------

Ya can repeat the substitution for other known instances of realtive urls in HTML like lack of double quotes/slashes and what not. All very easy and requires little code. Ya can also do the same for removing certain scripts because you can do matching with substitution. How I always do that is first strip all line breaks to make on big continous line then I would just look between <script> </script> and if I get a match of something undesirable I replace with nothing to remove it.

How would ya do the same above in VB??




0
 

Author Comment

by:James
ID: 7100106
Well this is a good suggestion to translate regex to VB I reckon. In C/perl you can quickly resolve HTML by simply doing something like this:

# Let's put a file contents into a string
open (IN, "microsoft.htm");
while (<IN>){
$html .= $_; # this apends each line of file .=
}
close IN;

# Use this base URL
$BASE_URL = "http://microsoft.com";

# do a case insensitive global search/replace
# This resolves href="/ or src="/or whatever else
$html =~ s!"/!"$BASE_URL/!gi;
print $html;
--------cut---------

Ya can repeat the substitution for other known instances of realtive urls in HTML like lack of double quotes/slashes and what not. All very easy and requires little code. Ya can also do the same for removing certain scripts because you can do matching with substitution. How I always do that is first strip all line breaks to make on big continous line then I would just look between <script> </script> and if I get a match of something undesirable I replace with nothing to remove it.

How would ya do the same above in VB??




0
 
LVL 49

Expert Comment

by:DanRollins
ID: 7923733
Hi ohmeohmy,
It appears that you have forgotten this question. I will ask Community Support to close it unless you finalize it within 7 days. I will ask a Community Support Moderator to:

    Refund points and save as a 0-pt PAQ.

ohmeohmy, Please DO NOT accept this comment as an answer.
EXPERTS: Post a comment if you are certain that an expert deserves credit.  Explain why.
==========
DanRollins -- EE database cleanup volunteer
0
 
LVL 1

Accepted Solution

by:
Computer101 earned 0 total points
ID: 7929877
Points refunded and placed in PAQ

Computer101
E-E Admin
0

Featured Post

Tech or Treat!

Submit an article about your scariest tech experience—and the solution—and you’ll be automatically entered to win one of 4 fantastic tech gadgets.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
Suggested Courses

609 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question