We help IT Professionals succeed at work.

Problems using wget in script

jpetter
jpetter asked
on
3,171 Views
Last Modified: 2013-12-21
HI,

I don't even know if this is possible, but after spending far too much time playing around with different switches, I thought I would check here before I went in a different direction.

We are using Blue Coat proxy servers, and also the Blue Coat Web Filter (BCWF) product so that we can block certain categories. What I need to do is take long lists of URL's (around 1000 per list) and find out how they are categorized by BCWF. There are a number of ways to do this, but all that I know of are manual.

The one way I thought it may be able to be automated is to use wget in a script, but regardless of how I tweak the wget switches, I cannot get the correct response back, but instead get the response of Unknown/unsupported protocol. The most recent version of what I have been trying (with no luck) is:
wget --proxy-user=user --proxy-passwd=user -S  htttps://<ip_address>:8082/ContentFilter/TestUrl/www.llbean.com/

If this was working correctly, I should see the category returned by the query, which is Shopping. Instead though, I just get the Unknown/unsupported protocol response.

Does anyone know if this can be done, and spot where I've gone off the path?
I would greatly appreciate any and all help with this.

Thanks,
Jeff
Comment
Watch Question

Brian UtterbackPrinciple Software Engineer
CERTIFIED EXPERT

Commented:
Perhaps you coule re-run the wget with the -d command line flag and then post the whole session?

Author

Commented:
blu,
Here it is.
bash-2.05$ /usr/sfw/bin/wget --proxy-user=user --proxy-passwd=user -S -d https://192.176.12.63:8082/ContentFilter/TestUrl/www.llbean.com/
DEBUG output created by Wget 1.7 on solaris2.9.
 
parseurl ("https://192.176.12.63:8082/ContentFilter/TestUrl/www.llbean.com/") -> https://138.83.65.232:8082/ContentFilter/TestUrl/www.llbean.com/: Unknown/unsupported protocol.
Top Expert 2005

Commented:
Looks like Your wget will not support https, recompile with https support.
Or maybe Your server is not https, and therefore responds with plain text? Have You tries with http://

Author

Commented:
ravenpl,
I hadn't even thought of the https issue....I bet that is the problem and as you suggest will likely have to recompile the wget, and then port it to this box.

Unfortunately, the https is required as that is the only way to get to the Blue Coat proxy info.

I'll try to recompile, and will post with the results.
Thanks,
Jeff
Brian UtterbackPrinciple Software Engineer
CERTIFIED EXPERT

Commented:
You should install patch 125326-01. This upgrades wget to the needed version.

Author

Commented:
Hi,

Okay, I did get a chance to get the correct options installed so that I can now use wget along with https. However, I'm still having a problem getting this to work in a script. From a command line, if I enter this:
/usr/bin/wget --quiet --output-document=tmp.txt --no-check-certificate --user=admin --password=admin https://192.176.12.63:8082/ContentFilter/TestUrl/www.cisco.com, and then check out the file tmp.txt, it will contain "Computers/Internet". This is a good start as it is the beginning of what I wanted. However, I am having two problems that I need to resolve.

First, if I run this from the command line multiple times using different URLs, the file tmp.txt is overwritten with the results from the last command executed. Second, if I place this command in a script it doesn't even write to the file tmp.txt. Below I have pasted the script I have been playing with. Even if this script did work as I had hoped, this doesn't do what I would like. Ideally, in the end, I would like a file with a list of the URL being checked, followed by the category. If I can't get that, I would at least like to be able to end up with a file of the categories provided by the wget command, and not just the category of the last URL checked.

#!/bin/bash
for i in `cat ./urllist75.txt`
do
  echo $i
  /usr/bin/wget --quiet --output-document=tmp.txt --no-check-certificate --user=admin --password=admin https://192.176.12.63:8082/ContentFilter/TestUrl/$i
continue
done

Why this command works from the command line, but not the script is one problem, and appending the output document rather than overwriting it are the primary problems I would love some help with. If anyone can spot someting wrong, or suggest what I might try, I would greatly appreciate it.

Thanks,
Jeff
Top Expert 2005

Commented:
to append rather than overwritting
wget -output-document=-  ..more options.. >> tmp.txt

As for the script issue - will it print the "echo $i" lines?

Author

Commented:
Yes, it will print the echo $i lines.

Looking over your solution for the overwriting issue, I want to make sure I understand it....the switch you're using --output-document=-  ...so, rather than specifying the name here, I just enter the "-", and then redirect to the file with the standard append syntax?

Top Expert 2005

Commented:
Yes "-" file means stdout.

Author

Commented:
very strange....I used the new syntax you provided, and again was able to see the echo commands scroll by, and when it was over, noticed that the script created the tmp.txt file, but it was empty. Also, if I copy the wget line out of the script, and paste it on the command line, replace the $i with a URL, it will not populate the txt file with the output as it was previously doing.

I appreciate your help.
Thanks,
Jeff
Top Expert 2005

Commented:
Strange indeed, You sure the https capable wget lives in /usr/bin?

try following, should wotk with whitespaces
#!/bin/bash
: > tmp.txt # empty it
cat urllist75.txt | while read i; do
 echo "getting $i"
 /usr/bin/wget --quiet --output-document=- --no-check-certificate --user=admin --password=admin "https://192.176.12.63:8082/ContentFilter/TestUrl/$i" >> tmp.txt
done

Author

Commented:
Hmm....well, I did get the echos as it was processing the list, but the file was empty. I also noticed that it created an empty file with the name -

Thanks again for all the help. Definitely a "head-scratcher"....specially for me.

Thanks,
Jeff
Top Expert 2005
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
thanks so much....it is now finally working. For some reason it likes the "-O -", but didn't like the "--output-document=-".

I really appreciate the help.

Thanks,
Jeff
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.