Link to home
Start Free TrialLog in
Avatar of richardandro
richardandro

asked on

wget - Need original URL in output file

I am extracting over 50,000 census block / tract numbers from a census website. I am using wget for windows to extract the data from the web. My problem is: I have the answer but I don't know the question. Example: I lookup 53,369 records. I end up with 53, 344 output records, 25 of the lookups resulted in no-record found, which means that I do not get an output record. Therefore I have no idea how to match my results with my searched criteria. If my output could be proceeded by the URL of each input then I would be able to parse the results and extract the initial Lat/Lon and pair it with the resulting FIPS code. Does anyone have a way to make this or something similar happen?

p.s. I think I can work around this issue by also creating a log file and then comparing the
results, but it is not near as clean and easy as my request above.

input file:http://data.fcc.gov/api/block/find?latitude=29.860508&longitude=-95.2568                            
                (followed by 53,000+ records with different lat/lons)
batch file: wget --input-file=census.in --output-document=census.out -nv
Avatar of BillDL
BillDL
Flag of United Kingdom of Great Britain and Northern Ireland image

Richard, I'm not clever enough to answer your question, but if you add the following Zone/Topic to it I'm confident that the batch file geniuses over there will be able to work out something for you:

https://www.experts-exchange.com/OS/Microsoft_Operating_Systems/MS_DOS/
Avatar of Bill Prew
Bill Prew

I think you will need to do separate wget's for each url, and manually add them to the output file.  But that's not too bad I don't think.  Here's what I'm thinking, see if this makes sense.

@echo off

REM Define input and output files
set InputFile=census.in
set OutputFile=census.out

REM Delete output file if it exists
if exist "%OutputFile%" del "%OutputFile%"

REM Read input file, line by line
for "usebackq tokens=*" %%U in ("%InputFile%") do (

  REM Write the URL being processed to the output file (appending)
  echo %%~U>>"%OutputFile%"

  REM Download the URL content to the output file (appending)
  wget --append-output="%OutputFile%" -nv "%%~U"

)

Open in new window

~bp
Avatar of richardandro

ASKER

I am unfamiliar with this type of code, but I created a batch file, pasted the above code to it and then ran it from the command window of my Win 7 machine. I received this error:

"usebackq tokens=*" was unexpected at this time
Sorry about that, change line 11 from:

for "usebackq tokens=*" %%U in ("%InputFile%") do (

to

for /F "usebackq tokens=*" %%U in ("%InputFile%") do (

~bp
I now receive these errors in my output file:


http://data.fcc.gov/api/block/find?latitude=29.775466&longitude=-95.6036&format=json               
find?latitude=29.775466&longitude=-95.6036&format=json                : Invalid argument
Cannot write to `find?latitude=29.775466&longitude=-95.6036&format=json                ' (Invalid argument).
http://data.fcc.gov/api/block/find?latitude=29.860508&longitude=-95.2568&format=json               
find?latitude=29.860508&longitude=-95.2568&format=json                : Invalid argument
Cannot write to `find?latitude=29.860508&longitude=-95.2568&format=json                ' (Invalid argument).
SOLUTION
Avatar of Bill Prew
Bill Prew

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you, it works, but would you please help me clean it up a little. The output file is as follows:

http://data.fcc.gov/api/block/find?latitude=29.775466&longitude=-95.6036&format=json               
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response xmlns="http://data.fcc.gov/api" status="OK" executionTime="5"><Block FIPS="482014503001041"/><County FIPS="48201" name="Harris"/><State FIPS="48" code="TX" name="Texas"/></Response>http://data.fcc.gov/api/block/find?latitude=29.860508&longitude=-95.2568&format=json                
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response xmlns="http://data.fcc.gov/api" status="OK" executionTime="6"><Block FIPS="482012312001036"/><County FIPS="48201" name="Harris"/><State FIPS="48" code="TX" name="Texas"/></Response>http://data.fcc.gov/api/block/find?latitude=29.70542&longitude=-95.2021&format=json                

Basically the next record's URL is concatenated to the previous record's output and then a new line begins. It would help if the same record's URL and output were on the same line.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Can you manually edit up a 5 or 10 line example of what you want the output file to look like and upload it here, so I can try and adjust the script to match.

~bp
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
billprew had the solution 90% done. Just needed a tweak on the output.
Thank you Richard
Thanks, glad that was useful.

~bp