Google Keyword Research

Hello,

I was looking at Google's keyword tool earlier and noticed that if you input a website, the tool generates a list of keywords.

https://adwords.google.com/select/KeywordToolExternal

I have a large list of domains and thought it would be easier to accomplish this with 'wget' by passing a list of domains as an argument and printing 1 word before credit and one after except to generate a list.

I came across this online:

wget -m -i input_file (domains)

Is there a better way of doing this without having to download each site locally? Also, what would be the command to grep for "credit" and output surrounding words.?

Thanks
faithless1Asked:
Who is Participating?
 
farzanjCommented:
Try this:
#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump $url | grep -i credit >> domains_output.txt
done

Open in new window

0
 
mwochnickCommented:
here's a great article on grep with examples http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/     still thinking about  the other part of your question
0
 
mwochnickCommented:
you could attempt to use curl to post the form for each website, but I'm not sure you can get past the captcha part of the form
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
farzanjCommented:
So you have one URL per line in your file, right.

This should do close to what you want to do.
You should have lynx installed on your system
 
urls=$(<input_file)
for url in $urls
do
    lynx -dump | grep -i credit >> output_file
done

Open in new window

0
 
faithless1Author Commented:
Thanks.

If possible, can you provide usage since I'm fairly new.

Do I place this into a file and name it php and then run 'php script.php'? Thank you


urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done
0
 
farzanjCommented:
This in bash.
You can paste or type on the shell, if your default shell is bash
Or, you can paste it into a file, script.sh

 
#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done

Open in new window

0
 
faithless1Author Commented:
Thanks,

I'm getting an empty output file.



Here's the script I'm using (script.sh)

#!/bin/bash



urls=$(<domains.txt)
for url in $urls
do
    links -dump | grep -i credit >> domains_output.txt
done


domain.txt includes

http://www.site.com

command
sh script.sh

Can't seem to figure out why there is no output

Thanks
0
 
farzanjCommented:
You were not supposed to run it like this.

You should have done this

#make it executable--just once
chmod +x script.sh

#run it
./script.sh


Second try some domain where you know you should find credit and do this
lynx -dump | grep -i credit


Also, I am using lynx, you are using links.

If you don't have lynx, install it or use curl
0
 
faithless1Author Commented:
I just tried running it but still no output..

#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done

I did test lynx on the command line and it worked fine
lynx -dump | grep -i credit
0
 
faithless1Author Commented:
Thanks, that worked! One other question I have is how I would include the entire site and not just the home page (it's currently only including the index page). Thank you.
0
 
farzanjCommented:
Please read the following docs.

http://daniel.haxx.se/docs/curl-vs-wget.html
http://williamjxj.wordpress.com/2010/12/17/curl-vs-wget-vs-lynx/
http://linux.die.net/man/1/lynx

You can try wget instead of lynx.  That would download recursively, except when wget is restricted by the web host.
I think the easiest way to do it, would be to know your complete URLS.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.