Solved

Google Keyword Research

Posted on 2011-03-08
11
434 Views
Last Modified: 2012-05-11
Hello,

I was looking at Google's keyword tool earlier and noticed that if you input a website, the tool generates a list of keywords.

https://adwords.google.com/select/KeywordToolExternal

I have a large list of domains and thought it would be easier to accomplish this with 'wget' by passing a list of domains as an argument and printing 1 word before credit and one after except to generate a list.

I came across this online:

wget -m -i input_file (domains)

Is there a better way of doing this without having to download each site locally? Also, what would be the command to grep for "credit" and output surrounding words.?

Thanks
0
Comment
Question by:faithless1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
11 Comments
 
LVL 12

Expert Comment

by:mwochnick
ID: 35078635
here's a great article on grep with examples http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/     still thinking about  the other part of your question
0
 
LVL 12

Expert Comment

by:mwochnick
ID: 35078667
you could attempt to use curl to post the form for each website, but I'm not sure you can get past the captcha part of the form
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35082382
So you have one URL per line in your file, right.

This should do close to what you want to do.
You should have lynx installed on your system
 
urls=$(<input_file)
for url in $urls
do
    lynx -dump | grep -i credit >> output_file
done

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:faithless1
ID: 35088180
Thanks.

If possible, can you provide usage since I'm fairly new.

Do I place this into a file and name it php and then run 'php script.php'? Thank you


urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35088220
This in bash.
You can paste or type on the shell, if your default shell is bash
Or, you can paste it into a file, script.sh

 
#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done

Open in new window

0
 

Author Comment

by:faithless1
ID: 35089809
Thanks,

I'm getting an empty output file.



Here's the script I'm using (script.sh)

#!/bin/bash



urls=$(<domains.txt)
for url in $urls
do
    links -dump | grep -i credit >> domains_output.txt
done


domain.txt includes

http://www.site.com

command
sh script.sh

Can't seem to figure out why there is no output

Thanks
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 500 total points
ID: 35089973
You were not supposed to run it like this.

You should have done this

#make it executable--just once
chmod +x script.sh

#run it
./script.sh


Second try some domain where you know you should find credit and do this
lynx -dump | grep -i credit


Also, I am using lynx, you are using links.

If you don't have lynx, install it or use curl
0
 

Author Comment

by:faithless1
ID: 35090710
I just tried running it but still no output..

#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done

I did test lynx on the command line and it worked fine
lynx -dump | grep -i credit
0
 
LVL 31

Accepted Solution

by:
farzanj earned 500 total points
ID: 35093779
Try this:
#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump $url | grep -i credit >> domains_output.txt
done

Open in new window

0
 

Author Comment

by:faithless1
ID: 35178451
Thanks, that worked! One other question I have is how I would include the entire site and not just the home page (it's currently only including the index page). Thank you.
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 500 total points
ID: 35180037
Please read the following docs.

http://daniel.haxx.se/docs/curl-vs-wget.html
http://williamjxj.wordpress.com/2010/12/17/curl-vs-wget-vs-lynx/
http://linux.die.net/man/1/lynx

You can try wget instead of lynx.  That would download recursively, except when wget is restricted by the web host.
I think the easiest way to do it, would be to know your complete URLS.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The following is a collection of cases for strange behaviour when using advanced techniques in DOS batch files. You should have some basic experience in batch "programming", as I'm assuming some knowledge and not further explain the basics. For some…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Six Sigma Control Plans

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question