Solved

extract overture keywords and save to text file line by line

Posted on 2004-04-30
9
820 Views
Last Modified: 2013-11-28
hi there

Im looking at a script that can save out overture keywords and save line by line to a text file. If its possible then to extract each of the keyword possibilities into the text file as well. Can this be saved in numalphabetic order.

best regards

0
Comment
Question by:playstat
  • 5
  • 4
9 Comments
 
LVL 9

Expert Comment

by:techtonik
ID: 10962533
I'm not an english native, but a kind of PHP programmer, so if provide an example of this Overture, then perhaps I could understad you. Otherwise I just can't help.
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10962535
I'm not an english native, but a kind of PHP programmer, so if you provide an example of this Overture, then perhaps I could understad you. Otherwise I just can't help.
0
 

Author Comment

by:playstat
ID: 10964588
http://inventory.overture.com/d/searchinventory/suggestion/

this is the actual url enter a keyword and possibilities are displayed. I need something that can extract that info into a text file line by line and if yer notice that the actual results have links to another set of possiblities of that keyword.

A script that can do this would be great!
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 

Author Comment

by:playstat
ID: 10964601
If yer can extract all under that keyword that would be ideal and make sure there are no duplicates thx
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10967998
Easy. Here is an example.

<?php
$your_keyword = "mykey";

$htmlpage = file_get_contents("http://inventory.overture.com/d/searchinventory/suggestion/?term=".$your_keyword."&mkt=us&lang=en_US");

$textpage = strip_tags( $htmlpage );
$textpage = str_replace("&nbsp;","", $textpage);

$texttosave = strstr($textpage, "Searches done in");

$f = fopen("file.txt", "w");
fwrite($f, $texttosave);
fclose($f);

?>

I think you've got an idea. Further refinement can be done with String functions.
http://us2.php.net/manual/en/ref.strings.php
0
 

Author Comment

by:playstat
ID: 10988778
Can you show me how to refine it further and where to take out the numbers it produces in the files

and if possible line by line.

the other thing is it takes out the actual keywords but what about going another level for each href and extracting those to then removing duplicates.
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10989275
There are two possibilities with refinement. First - using regular expressions and second - using PHP string functions. Regexps are more convenient in many cases but require more effort to learn.
Falling back to EE rules http://www.experts-exchange.com/Web/Web_Languages/PHP/help.jsp#hi56 I feel lazy to write code for you. =) Since I don't know what kind of knowledge do you require. If you in doubts about how this script works or can't see a way how to improve it, please show what exactly you do not understand.
0
 

Author Comment

by:playstat
ID: 11006771
im trying to understand how the information extracts the infiormation from the page.

For example

Where does it know where to start to extract and stop.

how the filters take place etc

If you can give me many examples from say a html php pages then the appropriate filter maybe I can work out the rest its just a means of doing this and then using variations for other applications.

The text file output could you make that into line by line save without the numbers I would be most grateful.

best regards

0
 
LVL 9

Accepted Solution

by:
techtonik earned 500 total points
ID: 11016217
Ok. Here we go.. While making filters echo your intermediate results to see what result have you got.
<?php
// here you specify keyword to substitute in URL to fetch html page with results
$your_keyword = "mykey";

// now reading whole page into string with all html markup - note
// use of $you_keyword defined above
$htmlpage = file_get_contents("http://inventory.overture.com/d/searchinventory/suggestion/?term=".$your_keyword."&mkt=us&lang=en_US");

// filter section
// i'll modify it a bit from previous example, where I just stripped html tags
// here we will crop the text to contain only result table
$htmlpage = strstr($htmlpage, "Searches done in");
// RTFM: string strstr ( string haystack, string needle )
// strstr returns part of haystack string from the first occurrence of needle to
// the end of haystack php.net/strstr

// now $htmlpage variable contains following fragment
/*
Searches done in March 2004</font></th>
  </tr>
  <tr align=left bgcolor=#999999>
    <th><font face="verdana,sans-serif" size=2 color=E8E8E8>Count</font></th>
    <th><font face="verdana,sans-serif" size=2 color=E8E8E8>Search Term</font></th>
  </tr>
<tr bgcolor=#333333>
<td><font face="verdana,sans-serif" size=2 color=E8E8E8>&nbsp;24306</td>
<td><font face="verdana,sans-serif" size=2 color=E8E8E8>&nbsp;overture</a></td>
</tr>
<tr bgcolor="#F4F4F4">
<td><font face="verdana,sans-serif" size=1>&nbsp;4266</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=080101%20ctxtid%20ilgan%2Ejoins%2Ecom%20ilgan%2Eshtml%20overture%20sports&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>080101 ctxtid ilgan.joins.com ilgan.shtml overture sports</a></td>
</tr>
<tr>
<td><font face="verdana,sans-serif" size=1>&nbsp;2733</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=1812%20overture&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>1812 overture</a></td>
</tr>
<tr bgcolor="#F4F4F4">
<td><font face="verdana,sans-serif" size=1>&nbsp;1898</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=international%20overture&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>international overture</a></td>
</tr>
*/

// now, if you strip all html markup and &nbsp; entit- you will end with the output from  the
// previous example, but now we will go a little bit further to make a more sophisticated filter
// we will extract fields "count" and "search term" from html table into array, where
// search term will be a key and "count" will be value associated with that key
// additionally we will extract all links with suggestions into third array to be
// able to parse these also

// now look at the html markup
// each html row begins with a <tr> tag, so we should split string by this tag to get an
// array of html rows for further processing, but first we need to strip table header
// that is, all up to first <tr bgcolor=#333333>
// since value of bgcolor is not 100% guaranteed to be #333333, we will use only first part
$htmlpage = strstr($htmlpage, "<tr bgcolor");
// since header rows begin with a <tr align=left they will not match and hence will be stripped
// you can check what you've got with the following construction
// echo $htmlpage; die();

// next, split string with php.net/explode
$htmlrowsarr = explode("<tr", $htmlpage);
// test your result with print_r($htmlrowsarr);

// now filling arrays
for ($i = 1; $i <count($htmlrowsarr); $i++) {
// number begins right after the first &nbsp; and to the next closing tag </td>
// strip to &nbsp;
  $str = strstr($htmlrowsarr[$i], "&nbsp;");
// determine position of </td>
  $to = strpos($str, "</td>");
// getting value for the first array - substring from 7th symbol (skip &nbsp;) to position
// of </td> closing tag. indexes are numerated from zero, so 7th symbol have an index 6
  $value = substr($str,6,$to-6);
// $to-6 indicates how much symbols do we need to extract
// echo "$value.";

// next &nbsp; will precede our search term or a link to other suggestion, so
// search string for &nbsp; with preceding > to match only second &nbsp;
  $str = strstr($str, ">&nbsp;");
// determine where the link ends
  $to = strpos($str, "</td>");
// getting link  
  $link = substr($str,7,$to-7);
// strip html markup to get the key
  $key = strip_tags( $link );
// extracting actual URL from href attribute
// it will be substring from symbol after href's quote and up to next quote
  $from = strpos($str,"href=") + 6;
  $to = strpos($str, "\"", $from);
// first element is our search term so it doesnt have any links
  if ($i != 1) {
     $url = substr($str, $from, $to-$from);
  } else {
     $url = "";
  }

// now building arrays
$overtures[$key] = $value;
$overlinks[$key] = $url;
}

print_r($overtures);
// now when you've got this info - do what you want =)

// actually you do not need to extract links - just supply $key as a parameter
// $your_keyword at the beginning of this script and it will fetch a page for parsing
?>

You can make a function from this example.
Download the manual and.. good luck. =)
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Marketers need statistics and metrics like everybody else needs oxygen. In this article we explain how to enable marketing campaign statistics for Microsoft Exchange mail.
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question