• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 203
  • Last Modified:

download html to computer for process

I am a newbie of VB. I want to write a program that:

First, download some html pages from a certain site(which is generated by CGI) to a specified directory

Then, open the html file and do the processing.

Finally, save the list of processed data into a spread sheet file and delete the original html file

Can you tell me how to do the download step and the open file step?

thanks
0
leekf
Asked:
leekf
  • 6
  • 6
  • 3
  • +1
1 Solution
 
bruintjeCommented:
Hi leekf,

Is that spreadsheet Excel?

if so there's an easier way

:O)Bruintje
0
 
glass_cookieCommented:
Hi!

Maybe you'd like to try using the INet control to download pages onto your PC:

Download...
http://www.vb-helper.com/HowTo/inetgetchunk.zip
Description: Use the Internet Transfer Control's Execute method and GetChunk to download a file (3K)

That's it!

glass cookie : )
0
 
glass_cookieCommented:
Here's a last one:

Download...
http://www.planetsourcecode.com/vb/scripts/ShowZip.asp?lngWId=1&lngCodeId=21636&strZipAccessCode=ODE%5F216366631
Description: A Complete Multi Downloader (using multithreads)

That's it!

glass cookie : )
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
leekfAuthor Commented:
thanks... really cool

but after I downloaded a html file using the Multi Downloader, what is the code to open it so that I can edit it?
0
 
bruintjeCommented:
still wondering is this in excel or not?
0
 
leekfAuthor Commented:
thanks... really cool

but after I downloaded a html file using the Multi Downloader, what is the code to open it so that I can edit it?
0
 
leekfAuthor Commented:
o... in fact, there is no need to be excel... even text file will be ok... thanks
0
 
Richie_SimonettiIT OperationsCommented:
Ypu could open the resukting html file just like any other plain file:
dim strContents as string
Open "c:\somefile.htm" for input as 1
   strContents =input(lof(1),1)
close

' do what you want with strContents
0
 
bruintjeCommented:
LOL, glued your keys together Richie ;)
0
 
leekfAuthor Commented:
... sorry.. i miss out something in my question...

i also want to know how to fetch all the URL and email address that contain in the html file

thanks
0
 
glass_cookieCommented:
Hi!

You could use the instr() method to find the position in the text (after opening the file) where a "http://" exists.  From there, you cou;d just simply use the mid function to get the string.  In other words:

1. Use the instr function to determine where the "http" lies.
2. Use the instr method to find ".htm" or ".html" to determine the end of an address starting from the position of the 1st character of the "http://"

Sorry - no time to code them for you.  Let me know if you need me to code it for you : )  I'll probably do it tomorrow or some days later when I'm free : )

Use the mid function to retrieve the address from the difference in positions in the 1st and last characters.

That's it!

glass cookie : )
0
 
leekfAuthor Commented:
thanks... i have tried this method before...

Private Sub newprocess_Click()
Dim unprocess0, endvalue As String
Dim inresult1, inresult2, i As Integer
Dim result(100) As String

unprocess0 = txtResults.Text
For i = 1 To 100

inresult1 = InStr(unprocess0, "http://")
unprocess0 = Mid(unprocess0, inresult1 + 7, Len(unprocess0))
endvalue = "/"

inresult2 = InStr(unprocess0, endvalue)
result(i) = Left(unprocess0, inresult2 - 1)

List1.AddItem result(i)

Next i


End Sub


but there are some ugly URL in the list. i think the endvalue = "/" has problem. How to replace it?
0
 
Richie_SimonettiIT OperationsCommented:
I am not a good typist, i know ;)
0
 
glass_cookieCommented:
Could you post a sample of the file you're opening so that I can have a clearer picture?  I'm not very sure how those urls are 'embedded' in the file.  Thanks : )
0
 
leekfAuthor Commented:
just like any html code, eg:

<a href="http://www.experts-exchange.com/jsp/cmtyHelpDesk.jsp" class="eeTopLink">Help Desk</a>
<a href="http://www.experts-exchange.com/jsp/cmtyHelpDeskKp.jsp" class="eeTopLink">KPro Help</a>

so... how to get the URL in this kind of file?
how to determine the end of an address?
0
 
glass_cookieCommented:
Hi!

OK, simply look for a "http:// to start with and look for another " which indicates it's end through the instr() function.  In VB, they are:

i = instr(1,text1.text,"""http://")

and

instr(i+1,text1.text,"""")

That's it!

glass cookie : )
0
 
Richie_SimonettiIT OperationsCommented:
You could use HTML object library and avoid using plain text. There are collections for almost everything.
If you need to enumerate hyperlinks for a given page, take a look at wwww.angelfire.com/realm/vb-shared/index.html under "IE DOM..." topic.
0
 
glass_cookieCommented:
Thanks for the points  : )
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

  • 6
  • 6
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now