Link to home
Start Free TrialLog in
Avatar of Gillz
Gillz

asked on

vbscript to pull data from html page

Hi All

I am looking for a script that helps me consolidate the status of three programs being carried out for various cities.

Currently how the things are:

There are three programs, say for example
1. Audit
2. Survey
3. Competency

the above programs are carried out and tracked for various cities and the status can be seen through a link (I guess HTML page)

the URL that are formed has common prefix with city as parameter at the end

Example:
1. for Audit the URL would form something like below
http://www.audit.com/abc.jsp?city=Pune
http://www.audit.com/abc.jsp?city=Mumbai
http://www.audit.com/abc.jsp?city=Delhi

2. for Survey the URL would form something like below
http://www.survey.com/abc.jsp?city=Pune
http://www.survey.com/abc.jsp?city=Mumbai
http://www.survey.com/abc.jsp?city=Delhi

3. for Competency the url would form something like below

http://www.Competency.com/abc.jsp?city=Pune
http://www.Competency.com/abc.jsp?city=Mumbai
http://www.Competency.com/abc.jsp?city=Delhi


Now, the page that loads on all the above urls would have some description about the program at the begning followed by the data in a fixed tabular format with two columns (1st column shows heading and 2nd column shows the value) that is same for every program.

in the table, the 5th Row shows the "Status" and its value.

What I am looking for:

Now, I am looking for a vb script that will pull the name of the cities from a text file, generate the urls in background and pull out the status value from each url and display the same in below format in html file or excel file

City | Audit | Survey| Competency
Pune | Completed| In Progress | Initiated  
Mumbai | Completed | Completed | Completed
Delhi | In Progress | Completed | Completed

Thanks in advance
Avatar of Qlemo
Qlemo
Flag of Germany image

a) What is the role of Excel here, as you have added that Topic Area?
b) Open to different approaches? Excel VBA (and Web Queries), Powershell, or the like?
Is the data in the web page by going to the URL and you want ot screen scrape?  Then reformat the data to something else?
Avatar of Gillz
Gillz

ASKER

@Qlemo: I added excel here coz I need the output in excel format (.xls)
and for your second question, Yes, powershell would do

@Padas: not sure if I understood your correctly. But currently, If I need to get status of 10 cities for all three programs, I have to visit 10 x3 = 30 url (Screens) and manually check the Status on each screen and note it down in an excel sheet. I need a script that would automate this process of checking the status row (row 5 in the html page) and capturing it in an excel shee.
Let's test something:
Open Excel,
go into the "Data" or "Extra" menu to locate something regarding web queries.
Enter one of the URLs,
press OK,
and in the modified browser window spot an yellow array pointing to the table with the data you want to get.
Click on that error (should get green).
Then press Import.

You should now have something you can work with, including the Status field. Can you confirm that?
Avatar of Gillz

ASKER

Thank you Qlemo. two things I noticed
1. I gave me yellow arrow next to each row in column 1. assuming that it will fetch entire row (column 1 and column 2) I selected the arrow that then turned to green tick mark.
When I clicked Import, it just fetched the value in column 1 that is "Status" but it did not import the value of Status that is displayed in column 2

2. If I do this way, I will have to do it for each url. So, for example if I need to check status for 200 cities, I will have to follow the above steps 200 x 3 = 600 times to cover all 200 cities for the 3 programs?
No, that was only for testing whether it would work at all; the URL generation and data scraping should be performed by VBA code then.

Are you able to select a yellow arrow for a table embedding the data you need (instead of the particular row)?
Avatar of Gillz

ASKER

I see. Btw I was able to select for entire table and it imported successfully in the spreadsheet.
so, after importing, the value in B9 cell is what I need along with the city name that I can see in cell B1
Great - that gives us something we can work on. I do some scraping myself that way (from EE pages - Member Rank and such).
Avatar of Gillz

ASKER

cool - so whats the next step? How do I get that value for hunderds of cities in one report?
ASKER CERTIFIED SOLUTION
Avatar of Qlemo
Qlemo
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Gillz

ASKER

Thank you Qlemo

Two things, when I first ran the script it fetched the web data into Scratch sheet and then pulled the B6 (Status) value in the Data sheet but just for one program, ie Audit and then nothing happened.

Then, when I reran the script, it keeps giving me below error
(I guess for line: Set qt = Worksheets("Scratch").QueryTables(1) )

"Subscript out of range"
The error is the result of something having changed the columns the Web Query is assigned to. But I cannot debug that for you unless I have the workbook you used. If you post, make sure the Web Query works - if not, recreate it.
Avatar of Gillz

ASKER

I did not create any web query. Do i need to create it manually?
The Scratch one is where you've built the web query in, starting in A1.
You need to do that once, together with inserting the code in VBA, in a template workbook. Then copy the workbook, and run the macro on the copy.
Avatar of Gillz

ASKER

"The Operation cannot be done because the data is refreshing in the backgroud"

thats the error I am getting now.

steps I followed:

Created a workbook with two sheets, Scratch and Data.
In Data, I filled A1 with value "City", B1 as "Audit", C1 = "Survey" and D1 = "Competency"
Then I filled column A with city names.

Then, in Scratch Sheet I followed your above steps:

go into the "Data" or "Extra" menu to locate something regarding web queries.
Enter one of the URLs,
press OK,
and in the modified browser window spot an yellow array pointing to the table with the data you want to get.
Click on that error (should get green).
Then press Import.

Then saved the file as Template.xlsm. Then copied the file, new file name now is Template - Copy.xlsm and ran the macro.
Avatar of Gillz

ASKER

Also, before giving the error it filled the value for B2 with status of first city for Audit.
So seems that it worked to fetch first value, but then while running the loop it gets stuck
Avatar of Gillz

ASKER

just a quick update, when I did Step-In and kept pressing F8, it worked perfectly fine.
Is it something that the refresh is taking time and before the refresh is happening the code is trying to execute the next lines and causing error?
Avatar of Gillz

ASKER

I tried Application.Wait (Now() + CDate("00:00:03")) after the refresh but that too did not work. it still gives error when I run the application but runs perfectly fine when I do step in and keep pressing F8
Avatar of Gillz

ASKER

btw I changed qt.Refresh to qt.Refresh (False). It works great now, except that its too slow but then thats okay :)

Thanks a ton for all your help