Gillz
asked on
vbscript to pull data from html page
Hi All
I am looking for a script that helps me consolidate the status of three programs being carried out for various cities.
Currently how the things are:
There are three programs, say for example
1. Audit
2. Survey
3. Competency
the above programs are carried out and tracked for various cities and the status can be seen through a link (I guess HTML page)
the URL that are formed has common prefix with city as parameter at the end
Example:
1. for Audit the URL would form something like below
http://www.audit.com/abc.jsp?city=Pune
http://www.audit.com/abc.jsp?city=Mumbai
http://www.audit.com/abc.jsp?city=Delhi
2. for Survey the URL would form something like below
http://www.survey.com/abc.jsp?city=Pune
http://www.survey.com/abc.jsp?city=Mumbai
http://www.survey.com/abc.jsp?city=Delhi
3. for Competency the url would form something like below
http://www.Competency.com/abc.jsp?city=Pune
http://www.Competency.com/abc.jsp?city=Mumbai
http://www.Competency.com/abc.jsp?city=Delhi
Now, the page that loads on all the above urls would have some description about the program at the begning followed by the data in a fixed tabular format with two columns (1st column shows heading and 2nd column shows the value) that is same for every program.
in the table, the 5th Row shows the "Status" and its value.
What I am looking for:
Now, I am looking for a vb script that will pull the name of the cities from a text file, generate the urls in background and pull out the status value from each url and display the same in below format in html file or excel file
City | Audit | Survey| Competency
Pune | Completed| In Progress | Initiated
Mumbai | Completed | Completed | Completed
Delhi | In Progress | Completed | Completed
Thanks in advance
I am looking for a script that helps me consolidate the status of three programs being carried out for various cities.
Currently how the things are:
There are three programs, say for example
1. Audit
2. Survey
3. Competency
the above programs are carried out and tracked for various cities and the status can be seen through a link (I guess HTML page)
the URL that are formed has common prefix with city as parameter at the end
Example:
1. for Audit the URL would form something like below
http://www.audit.com/abc.jsp?city=Pune
http://www.audit.com/abc.jsp?city=Mumbai
http://www.audit.com/abc.jsp?city=Delhi
2. for Survey the URL would form something like below
http://www.survey.com/abc.jsp?city=Pune
http://www.survey.com/abc.jsp?city=Mumbai
http://www.survey.com/abc.jsp?city=Delhi
3. for Competency the url would form something like below
http://www.Competency.com/abc.jsp?city=Pune
http://www.Competency.com/abc.jsp?city=Mumbai
http://www.Competency.com/abc.jsp?city=Delhi
Now, the page that loads on all the above urls would have some description about the program at the begning followed by the data in a fixed tabular format with two columns (1st column shows heading and 2nd column shows the value) that is same for every program.
in the table, the 5th Row shows the "Status" and its value.
What I am looking for:
Now, I am looking for a vb script that will pull the name of the cities from a text file, generate the urls in background and pull out the status value from each url and display the same in below format in html file or excel file
City | Audit | Survey| Competency
Pune | Completed| In Progress | Initiated
Mumbai | Completed | Completed | Completed
Delhi | In Progress | Completed | Completed
Thanks in advance
Is the data in the web page by going to the URL and you want ot screen scrape? Then reformat the data to something else?
ASKER
@Qlemo: I added excel here coz I need the output in excel format (.xls)
and for your second question, Yes, powershell would do
@Padas: not sure if I understood your correctly. But currently, If I need to get status of 10 cities for all three programs, I have to visit 10 x3 = 30 url (Screens) and manually check the Status on each screen and note it down in an excel sheet. I need a script that would automate this process of checking the status row (row 5 in the html page) and capturing it in an excel shee.
and for your second question, Yes, powershell would do
@Padas: not sure if I understood your correctly. But currently, If I need to get status of 10 cities for all three programs, I have to visit 10 x3 = 30 url (Screens) and manually check the Status on each screen and note it down in an excel sheet. I need a script that would automate this process of checking the status row (row 5 in the html page) and capturing it in an excel shee.
Let's test something:
Open Excel,
go into the "Data" or "Extra" menu to locate something regarding web queries.
Enter one of the URLs,
press OK,
and in the modified browser window spot an yellow array pointing to the table with the data you want to get.
Click on that error (should get green).
Then press Import.
You should now have something you can work with, including the Status field. Can you confirm that?
Open Excel,
go into the "Data" or "Extra" menu to locate something regarding web queries.
Enter one of the URLs,
press OK,
and in the modified browser window spot an yellow array pointing to the table with the data you want to get.
Click on that error (should get green).
Then press Import.
You should now have something you can work with, including the Status field. Can you confirm that?
ASKER
Thank you Qlemo. two things I noticed
1. I gave me yellow arrow next to each row in column 1. assuming that it will fetch entire row (column 1 and column 2) I selected the arrow that then turned to green tick mark.
When I clicked Import, it just fetched the value in column 1 that is "Status" but it did not import the value of Status that is displayed in column 2
2. If I do this way, I will have to do it for each url. So, for example if I need to check status for 200 cities, I will have to follow the above steps 200 x 3 = 600 times to cover all 200 cities for the 3 programs?
1. I gave me yellow arrow next to each row in column 1. assuming that it will fetch entire row (column 1 and column 2) I selected the arrow that then turned to green tick mark.
When I clicked Import, it just fetched the value in column 1 that is "Status" but it did not import the value of Status that is displayed in column 2
2. If I do this way, I will have to do it for each url. So, for example if I need to check status for 200 cities, I will have to follow the above steps 200 x 3 = 600 times to cover all 200 cities for the 3 programs?
No, that was only for testing whether it would work at all; the URL generation and data scraping should be performed by VBA code then.
Are you able to select a yellow arrow for a table embedding the data you need (instead of the particular row)?
Are you able to select a yellow arrow for a table embedding the data you need (instead of the particular row)?
ASKER
I see. Btw I was able to select for entire table and it imported successfully in the spreadsheet.
so, after importing, the value in B9 cell is what I need along with the city name that I can see in cell B1
so, after importing, the value in B9 cell is what I need along with the city name that I can see in cell B1
Great - that gives us something we can work on. I do some scraping myself that way (from EE pages - Member Rank and such).
ASKER
cool - so whats the next step? How do I get that value for hunderds of cities in one report?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you Qlemo
Two things, when I first ran the script it fetched the web data into Scratch sheet and then pulled the B6 (Status) value in the Data sheet but just for one program, ie Audit and then nothing happened.
Then, when I reran the script, it keeps giving me below error
(I guess for line: Set qt = Worksheets("Scratch").Quer yTables(1) )
"Subscript out of range"
Two things, when I first ran the script it fetched the web data into Scratch sheet and then pulled the B6 (Status) value in the Data sheet but just for one program, ie Audit and then nothing happened.
Then, when I reran the script, it keeps giving me below error
(I guess for line: Set qt = Worksheets("Scratch").Quer
"Subscript out of range"
The error is the result of something having changed the columns the Web Query is assigned to. But I cannot debug that for you unless I have the workbook you used. If you post, make sure the Web Query works - if not, recreate it.
ASKER
I did not create any web query. Do i need to create it manually?
The Scratch one is where you've built the web query in, starting in A1.You need to do that once, together with inserting the code in VBA, in a template workbook. Then copy the workbook, and run the macro on the copy.
ASKER
"The Operation cannot be done because the data is refreshing in the backgroud"
thats the error I am getting now.
steps I followed:
Created a workbook with two sheets, Scratch and Data.
In Data, I filled A1 with value "City", B1 as "Audit", C1 = "Survey" and D1 = "Competency"
Then I filled column A with city names.
Then, in Scratch Sheet I followed your above steps:
go into the "Data" or "Extra" menu to locate something regarding web queries.
Enter one of the URLs,
press OK,
and in the modified browser window spot an yellow array pointing to the table with the data you want to get.
Click on that error (should get green).
Then press Import.
Then saved the file as Template.xlsm. Then copied the file, new file name now is Template - Copy.xlsm and ran the macro.
thats the error I am getting now.
steps I followed:
Created a workbook with two sheets, Scratch and Data.
In Data, I filled A1 with value "City", B1 as "Audit", C1 = "Survey" and D1 = "Competency"
Then I filled column A with city names.
Then, in Scratch Sheet I followed your above steps:
go into the "Data" or "Extra" menu to locate something regarding web queries.
Enter one of the URLs,
press OK,
and in the modified browser window spot an yellow array pointing to the table with the data you want to get.
Click on that error (should get green).
Then press Import.
Then saved the file as Template.xlsm. Then copied the file, new file name now is Template - Copy.xlsm and ran the macro.
ASKER
Also, before giving the error it filled the value for B2 with status of first city for Audit.
So seems that it worked to fetch first value, but then while running the loop it gets stuck
So seems that it worked to fetch first value, but then while running the loop it gets stuck
ASKER
just a quick update, when I did Step-In and kept pressing F8, it worked perfectly fine.
Is it something that the refresh is taking time and before the refresh is happening the code is trying to execute the next lines and causing error?
Is it something that the refresh is taking time and before the refresh is happening the code is trying to execute the next lines and causing error?
ASKER
I tried Application.Wait (Now() + CDate("00:00:03")) after the refresh but that too did not work. it still gives error when I run the application but runs perfectly fine when I do step in and keep pressing F8
ASKER
btw I changed qt.Refresh to qt.Refresh (False). It works great now, except that its too slow but then thats okay :)
Thanks a ton for all your help
Thanks a ton for all your help
b) Open to different approaches? Excel VBA (and Web Queries), Powershell, or the like?