Solved

Retrieving Information from another site

Posted on 2001-08-06
28
185 Views
Last Modified: 2008-02-26
Ok, here is a unique situation.  I am not even sure which programming language would be best to use.

What I need to do is to create a script to go to a web page on another site...scan through the html coding for a specific block of information (all contained within <PRE> tags) However there are multiple sets of <pre> tags and I need to pin down just one of them. (The logic is to search for an anchor tag by the name of "lincoln" then  begin to cut at the beginning of the next <pre> tag and ending the cut at the end of the next </pre> tag then paste this code into it's own file)

After getting this code, I need it pasted into it's own file, which will overwrite the previous file.

This script needs to run nightly.

The purpose of this is for player statistics for a local hockey team.  Right now I have to copy and paste the coding by hand and I want to eliminate this maintenance step.  

Any ideas?

ZacSod
0
Comment
Question by:zsoder
  • 15
  • 10
  • 2
  • +1
28 Comments
 
LVL 2

Expert Comment

by:curri
ID: 6358514
Probably perl would be more apropriate, but php would do :) Actually, maybe even just a shell script :) I'd recommend perl if you know it, but wget and sed would do just fine.

The idea would be to get the page (use fopen or file in php, LWP:simple in perl and wget from a shell script) and then use regular expressions to transform the file (ereg_replace in php, s operator in perl, sed in shell file).

The syntax for your regular expression varies depending on what you use, but the idea would be to do a regexp with:
<PRE> (not l)* lincoln (not<)* </PRE><PRE>(not<)*</PRE> , making sure that the stuff in the second <PRE> gets stored in a variable, and then just print that variable.

Actually, probably the easiest would be to just get a sample, play with sed until you get what you need, and then make a script to call wget to get the file, then sed it redirecting to where you want it. Of course, this assumes unix :)

Orlando
0
 
LVL 2

Expert Comment

by:curri
ID: 6358520
Forgot to mention, after you have the script (in whatever language) you have to use cron or at (in unix) to make it run at a given time. I know there is a simmilar program in NT, but I can't remember its name.
0
 
LVL 3

Expert Comment

by:izwiz
ID: 6358532
In NT it's the scheduler service controlled with AT.

To be honest I agree, perl/shell script is what I'd use, unless of course you are running NT or something.
0
 
LVL 2

Author Comment

by:zsoder
ID: 6360478
I use Linux...(Microsoft Sucks!, Sorry had to get that off my chest *L*)

wget and sed...are those PHP, Perl, or Unix commands?

Unfortunatly you have me completely lost...I am not familiar with the terms/commands you are using...could you clarify?

Thanks,

ZacSod

0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6360629
wget is a *ix app that pretty much allows you grab a remote file. you could also prob just write a small php script like the following:

<?
   $sUrl = "http://www.domain.com/thefile.htm";
   $fp = fopen($sUrl,"r");
   while(!feof($fp)) {
      $buffer .= fgets($fp, 4096);
   }
   fclose($fp);

   // Now $buffer contains your file you need to search
   // From here you would use curri's regexp suggestion
?>

i have not tested this but it looks like it might work.

you could schedule cron to run this file from your webserver, or if you have php compiled as a cgi you could execute this as a shell script.

i agree as perl would prob be a better solution though. but you could def do it in php.

- rumblefiz

   
   
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361230
I want to try and keep the solution in php...

rumblefiz,  thank you, that worked to retrieve the file...now I need to trim it down to the text I need...I am looking into how to do this...any help?

Zack
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361263
do you have a sample of the file you are trying to trim and i will see if i can assist. you can either e-mail it to me at chris@mindpointe.com or post it here. i would post it here in case someone else can help you faster than i can, but if you don't want to make the file public you can e-mail me directly.

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361334
Ok, I am using the following code and got it to work,

$results = ereg_replace("<a name=\"lincoln\">", "test", $buffer);

BUT
how do I modify this to replace <a name="lincoln"> AND everything before it with "test"?

I am fairly new to programing and carri regexp example confussed the dickens out of me...*L*
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361345
rumblefiz...the page I am working with is at

http://www.lincolnstarshockey.com/statstest.php3

There is a set of Stats for the Lincoln Stars...the stats are enclosed in a single set of <pre> tags...basically I need to trim everything before and after the <pre></pre> block  I don't need the titles or any thing...just the stats in the <pre></pre> tags.

I really appreciate your help
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361353
to clarify... I only need the stats listed in "Regular Season" not the playoffs

0
 
LVL 2

Author Comment

by:zsoder
ID: 6361358
also, that page displays the content of $buffer from your previous script
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361365
when i click the link above, i get page not found.

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361373
also, that page displays the content of $buffer from your previous script
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361380
Hmmm...works on my end...have you tried typing it into your browser?

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 2

Author Comment

by:zsoder
ID: 6361382
Hmmm...works on my end...have you tried typing it into your browser?

0
 
LVL 2

Author Comment

by:zsoder
ID: 6361386
Hmmm...works on my end...have you tried typing it into your browser?

0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361398
ok.  when i click on the link it re-directs to:

http://www.lincolnstarshockey.com/ushl0001.css

so i opened with php code and now i see the data. let me see if i can help..

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361510
Thanks rumblefiz
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361614
do you need the stats for all of the teams? also, is the file format changing because everytime i refresh the page looks different. are you changing the format?

- rumblefiz
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361620
do you need the stats for all of the teams? also, is the file format changing because everytime i refresh the page looks different. are you changing the format?

- rumblefiz
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361648
where is the original file located before you do any parsing to it?

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361698
It is probably changing because I am trimming it down...I actually got it down to what I need...

Now I am trying to determine how best to use the script...

Basically the concept is so that I do not have to go get the new stats after everygame and it auto updates the stats on my site every 24 hours (or so)...
I don't know if I should try to save the data to a file and set a cron job on the php script or if I should just put the script in the page itself and let it run everytime someone requests the page...????
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6361716
you could try to put the script on your webserver and then run it using cron. so say the path to your script is:

http://www.mydomain.com/myscript.php

you should have lynx, or wget, or curl or something on your system to get pages. i would try lynx because it is a text based web browser and should be on your system already. then you need to do something like:

$shell>lynx http://www.mydomain.com/myscript.php

that should run your script so all you would need to do is put an entry in your cron file to run the above command. i think that will do the trick. let me know if you need any more help.

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361789
Sounds like that would work...but how do I save the variable into a text file so I can display the info?

I need to keep all spaces and line breaks

0
 
LVL 1

Accepted Solution

by:
rumblefiz earned 150 total points
ID: 6361834
let's assume that $buffer now contains the trimmed text you want to write to a file:

<?
   $sFilename = "filetosave.txt";
   $fp = fopen($sFilename,"w");
   fwrite($fp,$buffer);
   fclose($fp);

   // this will overwrite the file if it exists. if you want to append to the file use:
   // fopen($sFilename,"a");
?>

you will just need to chmod the file(s) or dir you want to write to. hope that helps. this will write the data exactly like you have it. so it will include the html tags if any. this way when you reload the textfile into a var and display it on an html page, everything will look the same. if you want it to look normal in a textfile (like through vi or something), you will need to search for <p> and replace with "\r\n\r\n" and replace <br> with the same thing. you would also want to remove all of the </p> tags.

- rumblefiz
0
 
LVL 2

Author Comment

by:zsoder
ID: 6361865
Sounds like that would work...but how do I save the variable into a text file so I can display the info?

I need to keep all spaces and line breaks

0
 
LVL 2

Author Comment

by:zsoder
ID: 6362271
rumblefiz,

It is working 100%! Thanks for all of your help. You are awesome bud!

Although when setting the crontab I accidentally wiped out the root crontab and needless to say the boss was a little pissed off...but he got over it! (I never messed with cron jobs till today, so it was a good learning experience)

I bumped up the points to my max alotment (although not much there, I will try and get you somemore when I have them) Again, thank you very much...my hat's off to you!

ZacSod
0
 
LVL 1

Expert Comment

by:rumblefiz
ID: 6362303
no problem. dont worry about the points, everyone is here to help each other. i am glad you got it working and let me know if i can be any more help.

- rumblefiz
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php (http://lmgtfy.com?q=how+easy+is+php)  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now