Solved

Retrieving Information from another site

Posted on 2001-08-06
28
186 Views
Last Modified: 2008-02-26
Ok, here is a unique situation.  I am not even sure which programming language would be best to use.

What I need to do is to create a script to go to a web page on another site...scan through the html coding for a specific block of information (all contained within <PRE> tags) However there are multiple sets of <pre> tags and I need to pin down just one of them. (The logic is to search for an anchor tag by the name of "lincoln" then  begin to cut at the beginning of the next <pre> tag and ending the cut at the end of the next </pre> tag then paste this code into it's own file)

After getting this code, I need it pasted into it's own file, which will overwrite the previous file.

This script needs to run nightly.

The purpose of this is for player statistics for a local hockey team.  Right now I have to copy and paste the coding by hand and I want to eliminate this maintenance step.  

Any ideas?

ZacSod
0
Comment
Question by:zsoder
  • 15
  • 10
  • 2
  • +1
28 Comments
 
LVL 2

Expert Comment

by:curri
ID: 6358514
Probably perl would be more apropriate, but php would do :) Actually, maybe even just a shell script :) I'd recommend perl if you know it, but wget and sed would do just fine.

The idea would be to get the page (use fopen or file in php, LWP:simple in perl and wget from a shell script) and then use regular expressions to transform the file (ereg_replace in php, s operator in perl, sed in shell file).

The syntax for your regular expression varies depending on what you use, but the idea would be to do a regexp with:
<PRE> (not l)* lincoln (not<)* </PRE><PRE>(not<)*</PRE> , making sure that the stuff in the second <PRE> gets stored in a variable, and then just print that variable.

Actually, probably the easiest would be to just get a sample, play with sed until you get what you need, and then make a script to call wget to get the file, then sed it redirecting to where you want it. Of course, this assumes unix :)

Orlando
0
 
LVL 2

Expert Comment

by:curri
ID: 6358520
Forgot to mention, after you have the script (in whatever language) you have to use cron or at (in unix) to make it run at a given time. I know there is a simmilar program in NT, but I can't remember its name.
0
 
LVL 3

Expert Comment

by:izwiz
ID: 6358532
In NT it's the scheduler service controlled with AT.

To be honest I agree, perl/shell script is what I'd use, unless of course you are running NT or something.
0
 
LVL 2

Author Comment

by:zsoder
ID: 6360478
I use Linux...(Microsoft Sucks!, Sorry had to get that off my chest *L*)

wget and sed...are those PHP, Perl, or Unix commands?

Unfortunatly you have me completely lost...I am not familiar with the terms/commands you are using...could you clarify?

Thanks,

ZacSod

0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6360629
wget is a *ix app that pretty much allows you grab a remote file. you could also prob just write a small php script like the following:

<?
   $sUrl = "http://www.domain.com/thefile.htm";
   $fp = fopen($sUrl,"r");
   while(!feof($fp)) {
      $buffer .= fgets($fp, 4096);
   }
   fclose($fp);

   // Now $buffer contains your file you need to search
   // From here you would use curri's regexp suggestion
?>

i have not tested this but it looks like it might work.

you could schedule cron to run this file from your webserver, or if you have php compiled as a cgi you could execute this as a shell script.

i agree as perl would prob be a better solution though. but you could def do it in php.

- rumblefiz

   
   
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361230
I want to try and keep the solution in php...

rumblefiz,  thank you, that worked to retrieve the file...now I need to trim it down to the text I need...I am looking into how to do this...any help?

Zack
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361263
do you have a sample of the file you are trying to trim and i will see if i can assist. you can either e-mail it to me at chris@mindpointe.com or post it here. i would post it here in case someone else can help you faster than i can, but if you don't want to make the file public you can e-mail me directly.

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361334
Ok, I am using the following code and got it to work,

$results = ereg_replace("<a name=\"lincoln\">", "test", $buffer);

BUT
how do I modify this to replace <a name="lincoln"> AND everything before it with "test"?

I am fairly new to programing and carri regexp example confussed the dickens out of me...*L*
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361345
rumblefiz...the page I am working with is at

http://www.lincolnstarshockey.com/statstest.php3

There is a set of Stats for the Lincoln Stars...the stats are enclosed in a single set of <pre> tags...basically I need to trim everything before and after the <pre></pre> block  I don't need the titles or any thing...just the stats in the <pre></pre> tags.

I really appreciate your help
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361353
to clarify... I only need the stats listed in "Regular Season" not the playoffs

0
 
LVL 2

Author Comment

by:zsoder
ID: 6361358
also, that page displays the content of $buffer from your previous script
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361365
when i click the link above, i get page not found.

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361373
also, that page displays the content of $buffer from your previous script
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361380
Hmmm...works on my end...have you tried typing it into your browser?

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 2

Author Comment

by:zsoder
ID: 6361382
Hmmm...works on my end...have you tried typing it into your browser?

0
 
LVL 2

Author Comment

by:zsoder
ID: 6361386
Hmmm...works on my end...have you tried typing it into your browser?

0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361398
ok.  when i click on the link it re-directs to:

http://www.lincolnstarshockey.com/ushl0001.css

so i opened with php code and now i see the data. let me see if i can help..

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361510
Thanks rumblefiz
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361614
do you need the stats for all of the teams? also, is the file format changing because everytime i refresh the page looks different. are you changing the format?

- rumblefiz
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361620
do you need the stats for all of the teams? also, is the file format changing because everytime i refresh the page looks different. are you changing the format?

- rumblefiz
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361648
where is the original file located before you do any parsing to it?

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361698
It is probably changing because I am trimming it down...I actually got it down to what I need...

Now I am trying to determine how best to use the script...

Basically the concept is so that I do not have to go get the new stats after everygame and it auto updates the stats on my site every 24 hours (or so)...
I don't know if I should try to save the data to a file and set a cron job on the php script or if I should just put the script in the page itself and let it run everytime someone requests the page...????
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361716
you could try to put the script on your webserver and then run it using cron. so say the path to your script is:

http://www.mydomain.com/myscript.php

you should have lynx, or wget, or curl or something on your system to get pages. i would try lynx because it is a text based web browser and should be on your system already. then you need to do something like:

$shell>lynx http://www.mydomain.com/myscript.php

that should run your script so all you would need to do is put an entry in your cron file to run the above command. i think that will do the trick. let me know if you need any more help.

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361789
Sounds like that would work...but how do I save the variable into a text file so I can display the info?

I need to keep all spaces and line breaks

0
 
LVL 1

Accepted Solution

by:
rumblefiz earned 150 total points
ID: 6361834
let's assume that $buffer now contains the trimmed text you want to write to a file:

<?
   $sFilename = "filetosave.txt";
   $fp = fopen($sFilename,"w");
   fwrite($fp,$buffer);
   fclose($fp);

   // this will overwrite the file if it exists. if you want to append to the file use:
   // fopen($sFilename,"a");
?>

you will just need to chmod the file(s) or dir you want to write to. hope that helps. this will write the data exactly like you have it. so it will include the html tags if any. this way when you reload the textfile into a var and display it on an html page, everything will look the same. if you want it to look normal in a textfile (like through vi or something), you will need to search for <p> and replace with "\r\n\r\n" and replace <br> with the same thing. you would also want to remove all of the </p> tags.

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361865
Sounds like that would work...but how do I save the variable into a text file so I can display the info?

I need to keep all spaces and line breaks

0
 
LVL 2

Author Comment

by:zsoder
ID: 6362271
rumblefiz,

It is working 100%! Thanks for all of your help. You are awesome bud!

Although when setting the crontab I accidentally wiped out the root crontab and needless to say the boss was a little pissed off...but he got over it! (I never messed with cron jobs till today, so it was a good learning experience)

I bumped up the points to my max alotment (although not much there, I will try and get you somemore when I have them) Again, thank you very much...my hat's off to you!

ZacSod
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6362303
no problem. dont worry about the points, everyone is here to help each other. i am glad you got it working and let me know if i can be any more help.

- rumblefiz
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php (http://lmgtfy.com?q=how+easy+is+php)  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

947 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now