Solved

extract overture keywords and save to text file line by line

Posted on 2004-04-30
9
821 Views
Last Modified: 2013-11-28
hi there

Im looking at a script that can save out overture keywords and save line by line to a text file. If its possible then to extract each of the keyword possibilities into the text file as well. Can this be saved in numalphabetic order.

best regards

0
Comment
Question by:playstat
  • 5
  • 4
9 Comments
 
LVL 9

Expert Comment

by:techtonik
ID: 10962533
I'm not an english native, but a kind of PHP programmer, so if provide an example of this Overture, then perhaps I could understad you. Otherwise I just can't help.
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10962535
I'm not an english native, but a kind of PHP programmer, so if you provide an example of this Overture, then perhaps I could understad you. Otherwise I just can't help.
0
 

Author Comment

by:playstat
ID: 10964588
http://inventory.overture.com/d/searchinventory/suggestion/

this is the actual url enter a keyword and possibilities are displayed. I need something that can extract that info into a text file line by line and if yer notice that the actual results have links to another set of possiblities of that keyword.

A script that can do this would be great!
0
Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

 

Author Comment

by:playstat
ID: 10964601
If yer can extract all under that keyword that would be ideal and make sure there are no duplicates thx
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10967998
Easy. Here is an example.

<?php
$your_keyword = "mykey";

$htmlpage = file_get_contents("http://inventory.overture.com/d/searchinventory/suggestion/?term=".$your_keyword."&mkt=us&lang=en_US");

$textpage = strip_tags( $htmlpage );
$textpage = str_replace("&nbsp;","", $textpage);

$texttosave = strstr($textpage, "Searches done in");

$f = fopen("file.txt", "w");
fwrite($f, $texttosave);
fclose($f);

?>

I think you've got an idea. Further refinement can be done with String functions.
http://us2.php.net/manual/en/ref.strings.php
0
 

Author Comment

by:playstat
ID: 10988778
Can you show me how to refine it further and where to take out the numbers it produces in the files

and if possible line by line.

the other thing is it takes out the actual keywords but what about going another level for each href and extracting those to then removing duplicates.
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10989275
There are two possibilities with refinement. First - using regular expressions and second - using PHP string functions. Regexps are more convenient in many cases but require more effort to learn.
Falling back to EE rules http://www.experts-exchange.com/Web/Web_Languages/PHP/help.jsp#hi56 I feel lazy to write code for you. =) Since I don't know what kind of knowledge do you require. If you in doubts about how this script works or can't see a way how to improve it, please show what exactly you do not understand.
0
 

Author Comment

by:playstat
ID: 11006771
im trying to understand how the information extracts the infiormation from the page.

For example

Where does it know where to start to extract and stop.

how the filters take place etc

If you can give me many examples from say a html php pages then the appropriate filter maybe I can work out the rest its just a means of doing this and then using variations for other applications.

The text file output could you make that into line by line save without the numbers I would be most grateful.

best regards

0
 
LVL 9

Accepted Solution

by:
techtonik earned 500 total points
ID: 11016217
Ok. Here we go.. While making filters echo your intermediate results to see what result have you got.
<?php
// here you specify keyword to substitute in URL to fetch html page with results
$your_keyword = "mykey";

// now reading whole page into string with all html markup - note
// use of $you_keyword defined above
$htmlpage = file_get_contents("http://inventory.overture.com/d/searchinventory/suggestion/?term=".$your_keyword."&mkt=us&lang=en_US");

// filter section
// i'll modify it a bit from previous example, where I just stripped html tags
// here we will crop the text to contain only result table
$htmlpage = strstr($htmlpage, "Searches done in");
// RTFM: string strstr ( string haystack, string needle )
// strstr returns part of haystack string from the first occurrence of needle to
// the end of haystack php.net/strstr

// now $htmlpage variable contains following fragment
/*
Searches done in March 2004</font></th>
  </tr>
  <tr align=left bgcolor=#999999>
    <th><font face="verdana,sans-serif" size=2 color=E8E8E8>Count</font></th>
    <th><font face="verdana,sans-serif" size=2 color=E8E8E8>Search Term</font></th>
  </tr>
<tr bgcolor=#333333>
<td><font face="verdana,sans-serif" size=2 color=E8E8E8>&nbsp;24306</td>
<td><font face="verdana,sans-serif" size=2 color=E8E8E8>&nbsp;overture</a></td>
</tr>
<tr bgcolor="#F4F4F4">
<td><font face="verdana,sans-serif" size=1>&nbsp;4266</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=080101%20ctxtid%20ilgan%2Ejoins%2Ecom%20ilgan%2Eshtml%20overture%20sports&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>080101 ctxtid ilgan.joins.com ilgan.shtml overture sports</a></td>
</tr>
<tr>
<td><font face="verdana,sans-serif" size=1>&nbsp;2733</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=1812%20overture&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>1812 overture</a></td>
</tr>
<tr bgcolor="#F4F4F4">
<td><font face="verdana,sans-serif" size=1>&nbsp;1898</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=international%20overture&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>international overture</a></td>
</tr>
*/

// now, if you strip all html markup and &nbsp; entit- you will end with the output from  the
// previous example, but now we will go a little bit further to make a more sophisticated filter
// we will extract fields "count" and "search term" from html table into array, where
// search term will be a key and "count" will be value associated with that key
// additionally we will extract all links with suggestions into third array to be
// able to parse these also

// now look at the html markup
// each html row begins with a <tr> tag, so we should split string by this tag to get an
// array of html rows for further processing, but first we need to strip table header
// that is, all up to first <tr bgcolor=#333333>
// since value of bgcolor is not 100% guaranteed to be #333333, we will use only first part
$htmlpage = strstr($htmlpage, "<tr bgcolor");
// since header rows begin with a <tr align=left they will not match and hence will be stripped
// you can check what you've got with the following construction
// echo $htmlpage; die();

// next, split string with php.net/explode
$htmlrowsarr = explode("<tr", $htmlpage);
// test your result with print_r($htmlrowsarr);

// now filling arrays
for ($i = 1; $i <count($htmlrowsarr); $i++) {
// number begins right after the first &nbsp; and to the next closing tag </td>
// strip to &nbsp;
  $str = strstr($htmlrowsarr[$i], "&nbsp;");
// determine position of </td>
  $to = strpos($str, "</td>");
// getting value for the first array - substring from 7th symbol (skip &nbsp;) to position
// of </td> closing tag. indexes are numerated from zero, so 7th symbol have an index 6
  $value = substr($str,6,$to-6);
// $to-6 indicates how much symbols do we need to extract
// echo "$value.";

// next &nbsp; will precede our search term or a link to other suggestion, so
// search string for &nbsp; with preceding > to match only second &nbsp;
  $str = strstr($str, ">&nbsp;");
// determine where the link ends
  $to = strpos($str, "</td>");
// getting link  
  $link = substr($str,7,$to-7);
// strip html markup to get the key
  $key = strip_tags( $link );
// extracting actual URL from href attribute
// it will be substring from symbol after href's quote and up to next quote
  $from = strpos($str,"href=") + 6;
  $to = strpos($str, "\"", $from);
// first element is our search term so it doesnt have any links
  if ($i != 1) {
     $url = substr($str, $from, $to-$from);
  } else {
     $url = "";
  }

// now building arrays
$overtures[$key] = $value;
$overlinks[$key] = $url;
}

print_r($overtures);
// now when you've got this info - do what you want =)

// actually you do not need to extract links - just supply $key as a parameter
// $your_keyword at the beginning of this script and it will fetch a page for parsing
?>

You can make a function from this example.
Download the manual and.. good luck. =)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
A great marketing strategy is diverse.  Read about the not so popular, yet effective, marketing tactics you can start using today!
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question