?
Solved

Google Keyword Research

Posted on 2011-03-08
11
Medium Priority
?
435 Views
Last Modified: 2012-05-11
Hello,

I was looking at Google's keyword tool earlier and noticed that if you input a website, the tool generates a list of keywords.

https://adwords.google.com/select/KeywordToolExternal

I have a large list of domains and thought it would be easier to accomplish this with 'wget' by passing a list of domains as an argument and printing 1 word before credit and one after except to generate a list.

I came across this online:

wget -m -i input_file (domains)

Is there a better way of doing this without having to download each site locally? Also, what would be the command to grep for "credit" and output surrounding words.?

Thanks
0
Comment
Question by:faithless1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
11 Comments
 
LVL 12

Expert Comment

by:mwochnick
ID: 35078635
here's a great article on grep with examples http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/     still thinking about  the other part of your question
0
 
LVL 12

Expert Comment

by:mwochnick
ID: 35078667
you could attempt to use curl to post the form for each website, but I'm not sure you can get past the captcha part of the form
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35082382
So you have one URL per line in your file, right.

This should do close to what you want to do.
You should have lynx installed on your system
 
urls=$(<input_file)
for url in $urls
do
    lynx -dump | grep -i credit >> output_file
done

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:faithless1
ID: 35088180
Thanks.

If possible, can you provide usage since I'm fairly new.

Do I place this into a file and name it php and then run 'php script.php'? Thank you


urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35088220
This in bash.
You can paste or type on the shell, if your default shell is bash
Or, you can paste it into a file, script.sh

 
#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done

Open in new window

0
 

Author Comment

by:faithless1
ID: 35089809
Thanks,

I'm getting an empty output file.



Here's the script I'm using (script.sh)

#!/bin/bash



urls=$(<domains.txt)
for url in $urls
do
    links -dump | grep -i credit >> domains_output.txt
done


domain.txt includes

http://www.site.com

command
sh script.sh

Can't seem to figure out why there is no output

Thanks
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 2000 total points
ID: 35089973
You were not supposed to run it like this.

You should have done this

#make it executable--just once
chmod +x script.sh

#run it
./script.sh


Second try some domain where you know you should find credit and do this
lynx -dump | grep -i credit


Also, I am using lynx, you are using links.

If you don't have lynx, install it or use curl
0
 

Author Comment

by:faithless1
ID: 35090710
I just tried running it but still no output..

#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump | grep -i credit >> domains_output.txt
done

I did test lynx on the command line and it worked fine
lynx -dump | grep -i credit
0
 
LVL 31

Accepted Solution

by:
farzanj earned 2000 total points
ID: 35093779
Try this:
#!/bin/bash

urls=$(<domains.txt)
for url in $urls
do
    lynx -dump $url | grep -i credit >> domains_output.txt
done

Open in new window

0
 

Author Comment

by:faithless1
ID: 35178451
Thanks, that worked! One other question I have is how I would include the entire site and not just the home page (it's currently only including the index page). Thank you.
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 2000 total points
ID: 35180037
Please read the following docs.

http://daniel.haxx.se/docs/curl-vs-wget.html
http://williamjxj.wordpress.com/2010/12/17/curl-vs-wget-vs-lynx/
http://linux.die.net/man/1/lynx

You can try wget instead of lynx.  That would download recursively, except when wget is restricted by the web host.
I think the easiest way to do it, would be to know your complete URLS.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Six Sigma Control Plans
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question