brownkj1
asked on
Help using winsock to get html source code
I need to be able to get the HTML source code from a website and parse the code for the file names I need to download. Any help?
This is the code I have tried to use, but the Winsock1_Connect() procedure is never envoked.
************************** ******
(Global variables)
Private m_sPage as string
Private Const m_sDownloadSite as string = "https://telemarketing.donotcall.gov/Download/DnldFull.aspx"
************************** ******
**************************
Main code
**************************
sHost = Mid(m_sDownloadSite, InStr(m_sDownloadSite, "://") + 3)
If InStr(sHost, "/") > 0 Then
m_sPage = Mid(sHost, InStr(sHost, "/"))
sHost = Left(sHost, InStr(sHost, "/") - 1)
Else
m_sPage = "/"
End If
If InStr(sHost, ":") > 0 Then
lPort = Mid(sHost, InStr(sHost, ":") + 1)
sHost = Left(sHost, InStr(sHost, ":") - 1)
Else
lPort = 80
End If
With Winsock1
If .State <> sckClosed Then .Close
.RemoteHost = sHost
.RemotePort = lPort
.Connect
End With
Code used from this website:
http://www.ostrosoft.com/vb/projects/get_html_source.asp
This is the code I have tried to use, but the Winsock1_Connect() procedure is never envoked.
**************************
(Global variables)
Private m_sPage as string
Private Const m_sDownloadSite as string = "https://telemarketing.donotcall.gov/Download/DnldFull.aspx"
**************************
**************************
Main code
**************************
sHost = Mid(m_sDownloadSite, InStr(m_sDownloadSite, "://") + 3)
If InStr(sHost, "/") > 0 Then
m_sPage = Mid(sHost, InStr(sHost, "/"))
sHost = Left(sHost, InStr(sHost, "/") - 1)
Else
m_sPage = "/"
End If
If InStr(sHost, ":") > 0 Then
lPort = Mid(sHost, InStr(sHost, ":") + 1)
sHost = Left(sHost, InStr(sHost, ":") - 1)
Else
lPort = 80
End If
With Winsock1
If .State <> sckClosed Then .Close
.RemoteHost = sHost
.RemotePort = lPort
.Connect
End With
Code used from this website:
http://www.ostrosoft.com/vb/projects/get_html_source.asp
for https the port is not 80
do this:
Private Const m_sDownloadSite as string = "https://telemarketing.donotcall.gov:443/Download/DnldFull.aspx"
Private Const m_sDownloadSite as string = "https://telemarketing.donotcall.gov:443/Download/DnldFull.aspx"
Also since you are using HTTPs you are going to need to use teh INET control...because you are going to get encrypted data.
-Brian
-Brian
ASKER
Is the port I'm supposed to use 443 then?
Also, I changed the string to the above and still get nothing.
Any more help?
Also, I changed the string to the above and still get nothing.
Any more help?
You may be better off using a WebBrowser control since it seems you need to automate logging-in as well.
ASKER
I already have automated that process and I do use the web browser. I was using this to download the files,
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
sDownloadFile = URLDownloadToFile(0, "http://download.donotcall.gov/Full/270_12_4_2004_z9b6y1g4ORDQAvErZdkdEA==.zip", "P:\Files\270.Zip, 0, 0)
but the file changes names every day so I need to capture these file names somehow. It was suggested to capture the HTML source code and parse it, but I don't know how to do that. Can you help?
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
sDownloadFile = URLDownloadToFile(0, "http://download.donotcall.gov/Full/270_12_4_2004_z9b6y1g4ORDQAvErZdkdEA==.zip", "P:\Files\270.Zip, 0, 0)
but the file changes names every day so I need to capture these file names somehow. It was suggested to capture the HTML source code and parse it, but I don't know how to do that. Can you help?
Yes the port you are supposed to use is 443.
you need to use the INET Control that is the only way you are going to be able in the end ot get Clear Text HTML.
-Brian
-Brian
If you automate IE to log-in, you can automate it to get the file-names to download. If you automate Winsock to log-in, you'll still get encrypted data (as mentioned).
ASKER
I guess I don't know how to automate the download of the file names. Could you elaborate?
Is it similar to doing something like this or???
IE.Document.All("txtActNum ber").Valu e = m_sOrgId
Is it similar to doing something like this or???
IE.Document.All("txtActNum
Does it automatically download after you log-in?
You can..
1.) Fill in the OrgID and Password (like you've shown above) and submit/click "Log-In".
2.) If it automatically downloads, skip to 3.), if not, you can automate clicking on a link to download.
3.) Watch for BeforeNavigate (for it to start downloading)... example:
Private Sub Form_Load()
'This is just for testing.. so a DL can start
WebBrowser1.Navigate "http://download.donotcall.gov/Full/270_12_4_2004_z9b6y1g4ORDQAvErZdkdEA==.zip"
End Sub
Private Sub WebBrowser1_BeforeNavigate 2(ByVal pDisp As Object, URL As Variant, Flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers As Variant, Cancel As Boolean)
'If it's navigating to a zip file...
If Right(LCase(URL), 4) = ".zip" Then
'Cancel it so IE won't download it
Cancel = True
'Use URLDownloadToFile() to download file here instead of MsgBox
MsgBox URL
End If
End Sub
If you want a proper working example, you can email a user/pass for me to test to log-in. If not, you'll have to get it working from what I've described.
You can..
1.) Fill in the OrgID and Password (like you've shown above) and submit/click "Log-In".
2.) If it automatically downloads, skip to 3.), if not, you can automate clicking on a link to download.
3.) Watch for BeforeNavigate (for it to start downloading)... example:
Private Sub Form_Load()
'This is just for testing.. so a DL can start
WebBrowser1.Navigate "http://download.donotcall.gov/Full/270_12_4_2004_z9b6y1g4ORDQAvErZdkdEA==.zip"
End Sub
Private Sub WebBrowser1_BeforeNavigate
'If it's navigating to a zip file...
If Right(LCase(URL), 4) = ".zip" Then
'Cancel it so IE won't download it
Cancel = True
'Use URLDownloadToFile() to download file here instead of MsgBox
MsgBox URL
End If
End Sub
If you want a proper working example, you can email a user/pass for me to test to log-in. If not, you'll have to get it working from what I've described.
ASKER
Unfortunately I can't give you a login to test with because it's dealing with customer information and I only have one secure id that's been given to me (and it's not my id, it's the user's I'm writing the application for).
The files do not automatically download once I get to the site.
And I was incorrect in saying I use the webbrowser component. I am declaring an object ("IE") and setting up that object to be an Internet Explorer Application.
Here are the processes I take:
1) Login
2) Navigate to download website
3) Download files by using URLDownloadToFile()
The problem that I'm having is I don't know how to get the file name. You have to assume here:
WebBrowser1.Navigate "http://download.donotcall.gov/Full/270_12_4_2004_z9b6y1g4ORDQAvErZdkdEA==.zip"
that I don't know the name of that file (which is "270_12_4_2004_z9b6y1g4ORD QAvErZdkdE A==.zip"
I have been testing the downloads and application by getting file name from the website manually and then hard-coding it into the application. I can not do that any more, as I'm trying to push this app out to the user and it needs to be automated.
I have tried to automate the clicking of the link to download the file but it downloads an empty zip file.
Someone suggested getting the HTML source code and parsing it for the file name. Is there a way I can do that through the IE object?
The files do not automatically download once I get to the site.
And I was incorrect in saying I use the webbrowser component. I am declaring an object ("IE") and setting up that object to be an Internet Explorer Application.
Here are the processes I take:
1) Login
2) Navigate to download website
3) Download files by using URLDownloadToFile()
The problem that I'm having is I don't know how to get the file name. You have to assume here:
WebBrowser1.Navigate "http://download.donotcall.gov/Full/270_12_4_2004_z9b6y1g4ORDQAvErZdkdEA==.zip"
that I don't know the name of that file (which is "270_12_4_2004_z9b6y1g4ORD
I have been testing the downloads and application by getting file name from the website manually and then hard-coding it into the application. I can not do that any more, as I'm trying to push this app out to the user and it needs to be automated.
I have tried to automate the clicking of the link to download the file but it downloads an empty zip file.
Someone suggested getting the HTML source code and parsing it for the file name. Is there a way I can do that through the IE object?
ASKER
I figured out a way to get the files downloaded, which is what I initially set out to do by grabbing the HTML source code. I figured a way to do it without getting the source code.
Thank you for the help.
Thank you for the help.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.