?
Solved

Two Easy Qs About Retreiving Website Source and Modifying Strings

Posted on 2003-02-27
20
Medium Priority
?
248 Views
Last Modified: 2010-04-07
What would the command be to access and store the source code from a website into a string?
What is the function to search a string and replace certain parts?
For the seconds question, a link to a website is fine.
0
Comment
Question by:magglass1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 4
  • 2
  • +3
20 Comments
 
LVL 101

Expert Comment

by:mlmcc
ID: 8035063
Replace (stringtosearch,stringtofind,stringtoreplace)

returns a string with the replacements

mystring = Replace(mystring,"xyz","abc")

replaces all occurances of xyz with abc

mlmcc
0
 
LVL 101

Expert Comment

by:mlmcc
ID: 8035069
Open the page
View --> SOURCE
Save

Then open the file in VB

mlmccq
0
 
LVL 1

Author Comment

by:magglass1
ID: 8035111
I want it to be dynamic so that if a variable = "http://www.experts-exchange.com", then it will load the source for that website into a string variable.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:magglass1
ID: 8035126
I want it to be dynamic so that if a variable = "http://www.experts-exchange.com", then it will load the source for that website into a string variable.
0
 
LVL 5

Expert Comment

by:Rhaedes
ID: 8035180
To get the code of a webpage as a string, put an Internet Transfer Control (Inet1) on a form and add this code:

myHTML = Inet1.OpenURL("http://www.wherever.com")

The myHTML variable then contains the source as a string.

If you want to change it, you can use the 'replace' function as above, and then write the string into an instance of Explorer or a WebBrowser. This, however, is very heavy handed: For anything even slightly advanced, you are much better off learning about the DOM (Document Object Model) of an HTML document so that you can manipulate all parts of a webpage as you please.

Kindest regards,
Rhaedes
0
 
LVL 1

Author Comment

by:magglass1
ID: 8035347
What text in a TextBox equals a line return?
0
 
LVL 14

Expert Comment

by:aelatik
ID: 8035355
Here's a sample you can use without referencing :

Private Sub Form_Load()
MsgBox GetHTMLCode("http://www.google.com")
End Sub

Public Function GetHTMLCode(URL As String)
    Dim IE
    Set IE = CreateObject("InternetExplorer.Application")
        IE.Navigate URL
        While IE.Busy
            DoEvents
        Wend
            GetHTMLCode = IE.Document.body.innerHTML
End Function
0
 
LVL 1

Author Comment

by:magglass1
ID: 8035403
I am currently using

GetHTML.Enabled = False
myHTML = Inet1.OpenURL(PageURL.Text)
Text1.Text = myHTML
GetHTML.Enabled = True

and it works just how I want.  Now I just need to change some things in the string full of HTML.  Also, how would I copy a file such as http://www.google.com/Afile.wav to a location in the current App directory?
0
 
LVL 1

Author Comment

by:magglass1
ID: 8035449
Can this be done using FileCopy() and specifying a URL as the source file?
0
 
LVL 16

Expert Comment

by:Richie_Simonetti
ID: 8035581
Little modificaction to aelatik's code:
Private Sub Form_Load()
MsgBox GetHTMLCode("http://www.google.com")
End Sub

Public Function GetHTMLCode(URL As String)
   Dim IE
   Set IE = CreateObject("InternetExplorer.Application")
       IE.Navigate URL
       While IE.readystate<>4
           DoEvents
       Wend
           GetHTMLCode = IE.Document.documentelement.innerHTML
End Function

but i prefer Inet control in this issue.
0
 
LVL 1

Expert Comment

by:Computer101
ID: 8038959
Listening

Computer101
E-E Admin
0
 
LVL 1

Author Comment

by:magglass1
ID: 8041085
So that you understand specificaly what I want, I will tell you what I am trying to do.  I also increased the points.

What I am attempting is to make a proxy-like script.  It will connect to the website specified and then download the html of that page.  Then, it will search through the html, find files that are linked to (images, flash animations, videos, other web pages, etc..), then download those files to the webserver.  Next, the html file will have all the links and source URLs changed to link to the copied files (which should have the same path except for the first part ex: http://www.something.com/index.htm would change to http://www.webserver.com/folderwithfiles/index.htm).  The links in the html which link to other pages will be modified so that they point back to the proxy script page.  This whole task is easier than it seems.  I just need a little help with some commands.
Thanks!
0
 
LVL 1

Author Comment

by:magglass1
ID: 8041126
The comment I just made is more geared twards the php section, but let me modify part of it for the VB program I am working on.  Instead of saving the files to the server, I want them saved into a directory with folders.  The program will basicaly copy the website to a certain link depth.  In that case, files within pages linked to by the main page will also need to be downloaded.
0
 
LVL 5

Expert Comment

by:Rhaedes
ID: 8041234
Tell me if this concept meets your needs, If so, I can provide the code.

First you download the html page, If you do this using the Save As option (automated of course) then this does most of the work for for you, since it will download the webpage itself, and all the images and depedent files as well. Obviously, it will also update all the links to these in the main page so that everything works.
This much can be acheived using the code I've posted at
http://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/Q_20532410.html

Once this is done, all that is missing is to go through all the anchors in the downloaded html file, and
i)download the files that they point to (but not the images/depedencies?) and store them in the same folder as the other dependent files.
ii)Change the anchors in the original downloaded HTML file so that they point to the files in your folder.

Is this what you need?
Kindest regards,
Rhaedes
0
 
LVL 1

Author Comment

by:magglass1
ID: 8042180
Thanks for the reply.  That is prety much what I want.  The only difference between what the Save As option does and what I want to do involves the way linked files are stored.  I want a directory system constructed similar to that of the page being copied.  For example, if www.aol.com has an image stored in www.aol.com/images/pic.jpg, then pic.jpg will be stored in the folder www.aol.com and the subfolder images at the location you choose to save everything.  In theory, if you were to have it copy the whole website, you will end up with a file structure almost identical to that of the sites web server.

With using Save As, all other files are stored in the same folder.  This causes problems when you want to save multiple pages on the same site.  A new folder and index file are created for each page.  When you have 26 of these files and folders, things get a bit crowded and they don't link to each other.  With my method, the file system stays more organized and the files all link to each other.

I already read your code before I posted a link from there to here.  All it does is automate the task of doing Save As.  Unless I missed something...

Thanks in advance for your help!
0
 
LVL 5

Expert Comment

by:Rhaedes
ID: 8043599
Okay think I see what you need, which really amounts to a lot of href rewriting and careful directory structuring.
But let's go through this carefully. Say you have a top level directory on your hard drive called 'C:\AllSites' in which you want to store everything.
Basically, within AllSites you need to create a subdirectory for every domain, which could usefully have the same name as the domain. When you download a file from, say, 'http://www.wherever.com/stuff/nonsense/index.htm', this should appear on your hard drive as 'C:\AllSites\www.wherever.com\stuff\nonsense\index.htm'. Now if the index.htm contains an image referenced absolutely as 'http://www.somewherelse.com/anImage.jpg', then the program should create a new subdirectory called 'C:\AllSites\www.somewherelse.com' and save the jpg there, and href in the index.htm should be re-written as a relative link (ideally) to the file on your hard-drive.
In essence then, for every main page you download you need to parse all the relative and absolute hrefs, create directories where necessary, and then re-write the hrefs to relatively reference files of your hard disk.
Also, as I understand it, you need to download all the the non-dependent files in all the anchors (and scripts) in your main page, and rewrite their hrefs in a similar manner.

Well, of course it can be done. But it is a fair bit of work, and to some degree it is re-inventing the wheel. What about using an off-line browser? Or perhaps simply setting up a webbrowser to navigate through the links in the site and storing everything you need in the cache?

Kindest regards,
Rhaedes
0
 
LVL 1

Author Comment

by:magglass1
ID: 8043795
What if you, for example, wanted to copy a website and store it on a CD?  You couldn't just have it stored in the cache.
0
 
LVL 5

Accepted Solution

by:
Rhaedes earned 105 total points
ID: 8043956
...Then what about using an off-line browser? There are many applications available that will already accomplish (most of?) what you need. HTTrack Website Copier (http://www.httrack.com/index.php), for example, is a freeware program that will let you download a whole website, which you can then store on a CD if needs be.
With the final comments of my last post, I was really just wondering if you really need to go to the trouble of building your own custom VB app to do it.

Kindest regards,
Rhaedes
0
 
LVL 1

Author Comment

by:magglass1
ID: 8043992
I will give that program a try...
0
 
LVL 1

Author Comment

by:magglass1
ID: 8048627
That programs works good enough for me right now.  I have one more question (still related to main question).

If I were making a program to send and recieve data, using Winsock, how would I have different parts put into one string?  Like if I had a user name, password, and message, how could I have it transmitted all at once to another computer and then have the receiveing program divide that one string into 3 with the user name, password, and message each in a different string?
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction While answering a recent question about filtering a custom class collection, I realized that this could be accomplished with very little code by using the ScriptControl (SC) library.  This article will introduce you to the SC library a…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
Suggested Courses
Course of the Month8 days, 5 hours left to enroll

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question