Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

wget - Need original URL in output file

Posted on 2012-08-30
14
Medium Priority
?
419 Views
Last Modified: 2012-09-06
I am extracting over 50,000 census block / tract numbers from a census website. I am using wget for windows to extract the data from the web. My problem is: I have the answer but I don't know the question. Example: I lookup 53,369 records. I end up with 53, 344 output records, 25 of the lookups resulted in no-record found, which means that I do not get an output record. Therefore I have no idea how to match my results with my searched criteria. If my output could be proceeded by the URL of each input then I would be able to parse the results and extract the initial Lat/Lon and pair it with the resulting FIPS code. Does anyone have a way to make this or something similar happen?

p.s. I think I can work around this issue by also creating a log file and then comparing the
results, but it is not near as clean and easy as my request above.

input file:http://data.fcc.gov/api/block/find?latitude=29.860508&longitude=-95.2568                            
                (followed by 53,000+ records with different lat/lons)
batch file: wget --input-file=census.in --output-document=census.out -nv
0
Comment
Question by:richardandro
  • 5
  • 5
  • 3
13 Comments
 
LVL 39

Expert Comment

by:BillDL
ID: 38354953
Richard, I'm not clever enough to answer your question, but if you add the following Zone/Topic to it I'm confident that the batch file geniuses over there will be able to work out something for you:

http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/MS_DOS/
0
 
LVL 59

Expert Comment

by:Bill Prew
ID: 38356467
I think you will need to do separate wget's for each url, and manually add them to the output file.  But that's not too bad I don't think.  Here's what I'm thinking, see if this makes sense.

@echo off

REM Define input and output files
set InputFile=census.in
set OutputFile=census.out

REM Delete output file if it exists
if exist "%OutputFile%" del "%OutputFile%"

REM Read input file, line by line
for "usebackq tokens=*" %%U in ("%InputFile%") do (

  REM Write the URL being processed to the output file (appending)
  echo %%~U>>"%OutputFile%"

  REM Download the URL content to the output file (appending)
  wget --append-output="%OutputFile%" -nv "%%~U"

)

Open in new window

~bp
0
 

Author Comment

by:richardandro
ID: 38357297
I am unfamiliar with this type of code, but I created a batch file, pasted the above code to it and then ran it from the command window of my Win 7 machine. I received this error:

"usebackq tokens=*" was unexpected at this time
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 59

Expert Comment

by:Bill Prew
ID: 38357309
Sorry about that, change line 11 from:

for "usebackq tokens=*" %%U in ("%InputFile%") do (

to

for /F "usebackq tokens=*" %%U in ("%InputFile%") do (

~bp
0
 

Author Comment

by:richardandro
ID: 38357447
I now receive these errors in my output file:


http://data.fcc.gov/api/block/find?latitude=29.775466&longitude=-95.6036&format=json               
find?latitude=29.775466&longitude=-95.6036&format=json                : Invalid argument
Cannot write to `find?latitude=29.775466&longitude=-95.6036&format=json                ' (Invalid argument).
http://data.fcc.gov/api/block/find?latitude=29.860508&longitude=-95.2568&format=json               
find?latitude=29.860508&longitude=-95.2568&format=json                : Invalid argument
Cannot write to `find?latitude=29.860508&longitude=-95.2568&format=json                ' (Invalid argument).
0
 
LVL 59

Assisted Solution

by:Bill Prew
Bill Prew earned 1800 total points
ID: 38357476
What version of wget are you using, 1.11.4 ?

I couldn't reproduce that error here, it seemed to work.  I did notice a problem with where the downloaded data went though, so this will correct that.

@echo off

REM Define input and output files
set InputFile=census.in
set OutputFile=census.out
set TempFile=_temp_.txt

REM Delete output file if it exists
if exist "%OutputFile%" del "%OutputFile%"

REM Read input file, line by line
for /F "usebackq tokens=*" %%U in ("%InputFile%") do (

  REM Write the URL being processed to the output file (appending)
  echo %%~U>>"%OutputFile%"

  REM Download the URL content to the output file (appending)
  wget --output-document="%TempFile%" -nv "%%~U"

  REM Append this data to the output file
  type "%TempFile%">>"%OutputFile%"

  REM Delete temp file if it exists
  if exist "%TempFile%" del "%TempFile%"

)

Open in new window

~bp
0
 

Author Comment

by:richardandro
ID: 38357532
Thank you, it works, but would you please help me clean it up a little. The output file is as follows:

http://data.fcc.gov/api/block/find?latitude=29.775466&longitude=-95.6036&format=json               
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response xmlns="http://data.fcc.gov/api" status="OK" executionTime="5"><Block FIPS="482014503001041"/><County FIPS="48201" name="Harris"/><State FIPS="48" code="TX" name="Texas"/></Response>http://data.fcc.gov/api/block/find?latitude=29.860508&longitude=-95.2568&format=json                
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response xmlns="http://data.fcc.gov/api" status="OK" executionTime="6"><Block FIPS="482012312001036"/><County FIPS="48201" name="Harris"/><State FIPS="48" code="TX" name="Texas"/></Response>http://data.fcc.gov/api/block/find?latitude=29.70542&longitude=-95.2021&format=json                

Basically the next record's URL is concatenated to the previous record's output and then a new line begins. It would help if the same record's URL and output were on the same line.
0
 
LVL 39

Assisted Solution

by:BillDL
BillDL earned 200 total points
ID: 38357555
You should be able to separate the records by just adding the following commands where required:

echo.>>"%OutputFile%"

All that does is writes a blank line wherever you put the command.  Before you redirect the actual URL to the output file (line 15) just add two blank lines to separate the previous XML from the new URL, and then add another blank line so that the new XML code is separated from its URL above:

REM Write the URL being processed to the output file (appending)
  echo.>>"%OutputFile%"
  echo.>>"%OutputFile%"
  echo %%~U>>"%OutputFile%"
  echo.>>"%OutputFile%"

Incidentally, here's my own version that I wrote after seeing Bill Prew's first batch file that didn't work for me.  I embarked on a different idea (used my own file names but it should be easy enough to compare), and it's coincidentally similar in functionality to Bill Prew's follow-up batch file.
(Using GNU Wget 1.11.4.3287 on Windows XP)
@echo off
SetLocal EnableDelayedExpansion

set InputFile=census_In.txt
set OutputFile=census_Out.txt

if exist "%OutputFile%" del "%OutputFile%"

for /f "tokens=2 delims=^?" %%A in ('type %InputFile%') do (
  set URL=http://data.fcc.gov/api/block/find?%%A
  set FileName=find@%%A
  wget !URL! -nv
  echo !URL!>>"%OutputFile%"
  echo.>>"%OutputFile%"
  type "!FileName!">>"%OutputFile%"
  echo.>>"%OutputFile%"
  echo.>>"%OutputFile%"
  if exist "!FileName!" del "!FileName!">nul
)
pause

Open in new window

0
 
LVL 59

Expert Comment

by:Bill Prew
ID: 38357579
Can you manually edit up a 5 or 10 line example of what you want the output file to look like and upload it here, so I can try and adjust the script to match.

~bp
0
 

Accepted Solution

by:
richardandro earned 0 total points
ID: 38357666
I put the "echo.>>"%OutputFile%" suggestion by BillDL into your script and the result is easier to parse. It would be slightly easier if the URL was immediately followed by the result, all on the same line. But now I get the URL on line 1 and the result on line 2, etc... Thank you.

@echo off

REM Define input and output files
set InputFile=census_json.in
set OutputFile=census7.out
set TempFile=_temp_.txt

REM Delete output file if it exists
if exist "%OutputFile%" del "%OutputFile%"

REM Read input file, line by line
for /F "usebackq tokens=*" %%U in ("%InputFile%") do (

  REM Write the URL being processed to the output file (appending)
  echo %%~U>>"%OutputFile%"

  REM Download the URL content to the output file (appending)
  wget --output-document="%TempFile%" -nv "%%~U"

  REM Append this data to the output file
  type "%TempFile%">>"%OutputFile%"

echo.>>"%OutputFile%

  REM Delete temp file if it exists
  if exist "%TempFile%" del "%TempFile%"
0
 

Author Closing Comment

by:richardandro
ID: 38371461
billprew had the solution 90% done. Just needed a tweak on the output.
0
 
LVL 39

Expert Comment

by:BillDL
ID: 38371648
Thank you Richard
0
 
LVL 59

Expert Comment

by:Bill Prew
ID: 38371716
Thanks, glad that was useful.

~bp
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Finding a job can be stressful - searches, resume tweaks, and networking events can be super boring. Luckily we're here to help you land your dream job!
This article is written by John Gates, CISSP. Gates, the SNUG President-Elect, currently holds the position of Manager of Information Systems at Lake Park High School in Roselle, Illinois.
Notifications on Experts Exchange help you keep track of your activity and updates in one place. Watch this video to learn how to use them on the site to quickly access the content that matters to you.
Articles on a wide range of technology and professional topics are available on Experts Exchange. These resources are written by members, for members, and can be written about any topic you feel passionate about. Learn how to best write an article t…

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question