collecting medical symtoms from google groups via rss

Hi, I have two tasks I am trying to accomplish. Both involve I guess what is referred to as scraping.
First, I want to collect data from google groups related to illnesses. There are many groups where people discuss there illnesses and I want to build a database consisting of the words in these groups. By a database I only mean a spreadsheet with the columns being the words (every word appearing in the thread would be a column heading) and each row being a separate thread (or posting). I was told that using RSS would be a good idea because all google groups have rss feeds.
Can anyone give me a roadmap for how to go about doing this?
Thanks so much.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Just use cURL to connect to the url you want and then get it into a variable. Then you can search for any keywords or information you want.

            $url = "";
            $cookie_jar = "/path/to/cookie.txt";
            $ch = curl_init("$url");
            curl_setopt($ch, CURLOPT_HEADER, 1);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
            curl_setopt($ch, CURLOPT_VERBOSE, 1);
            curl_setopt($ch, CURLOPT_USERAGENT, "$_SERVER[HTTP_USER_AGENT]");
            curl_setopt($ch, CURLOPT_COOKIEJAR, "$cookie_jar");
            curl_setopt($ch, CURLOPT_COOKIEFILE, "$cookie_jar");
            curl_setopt($ch, CURLOPT_URL, "$url");  
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($ch, CURLOPT_REFERER, "");

Now the results from the url that you want is stored in $result. Now just use preg_match, strpos or any other method for searching the variable for the information you want. Hope that helps :)
onyourmarkAuthor Commented:

Is that Python? Also, in the case where I am not looking for any particular word but rather if I were to want to collect the entire feed into a database word for word, could you say how to modify it or would I just not use preg_match, strpos or any other method for searching?
Thanks again.
Thats cURL and when you drop that in your PHP page basically your get all the html from the page into the $result variable in that example. With cURL you can POST variables to a page, gather results and do just about anything a user with a browser could. On the cookie jar just drop a blank file in the web directory of your site or a folder like /cookies/ and make the file readable and writable. Then If the server is trying to pass cookies then cURL will save them here and pass them back to the server. Now use follow location to tell cURL to either just get the results from the page you want or to follow redirects.

Now on a group of words you could just put them in an array then loop through using the method of matching of your choice. Maybe something like:

foreach($keywords as $keyword) {
    strpos($keyword, $result);

Then just alter the results for what you need :)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
onyourmarkAuthor Commented:
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.