Solved

extract overture keywords and save to text file line by line

Posted on 2004-04-30
9
816 Views
Last Modified: 2013-11-28
hi there

Im looking at a script that can save out overture keywords and save line by line to a text file. If its possible then to extract each of the keyword possibilities into the text file as well. Can this be saved in numalphabetic order.

best regards

0
Comment
Question by:playstat
  • 5
  • 4
9 Comments
 
LVL 9

Expert Comment

by:techtonik
ID: 10962533
I'm not an english native, but a kind of PHP programmer, so if provide an example of this Overture, then perhaps I could understad you. Otherwise I just can't help.
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10962535
I'm not an english native, but a kind of PHP programmer, so if you provide an example of this Overture, then perhaps I could understad you. Otherwise I just can't help.
0
 

Author Comment

by:playstat
ID: 10964588
http://inventory.overture.com/d/searchinventory/suggestion/

this is the actual url enter a keyword and possibilities are displayed. I need something that can extract that info into a text file line by line and if yer notice that the actual results have links to another set of possiblities of that keyword.

A script that can do this would be great!
0
VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

 

Author Comment

by:playstat
ID: 10964601
If yer can extract all under that keyword that would be ideal and make sure there are no duplicates thx
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10967998
Easy. Here is an example.

<?php
$your_keyword = "mykey";

$htmlpage = file_get_contents("http://inventory.overture.com/d/searchinventory/suggestion/?term=".$your_keyword."&mkt=us&lang=en_US");

$textpage = strip_tags( $htmlpage );
$textpage = str_replace("&nbsp;","", $textpage);

$texttosave = strstr($textpage, "Searches done in");

$f = fopen("file.txt", "w");
fwrite($f, $texttosave);
fclose($f);

?>

I think you've got an idea. Further refinement can be done with String functions.
http://us2.php.net/manual/en/ref.strings.php
0
 

Author Comment

by:playstat
ID: 10988778
Can you show me how to refine it further and where to take out the numbers it produces in the files

and if possible line by line.

the other thing is it takes out the actual keywords but what about going another level for each href and extracting those to then removing duplicates.
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10989275
There are two possibilities with refinement. First - using regular expressions and second - using PHP string functions. Regexps are more convenient in many cases but require more effort to learn.
Falling back to EE rules http://www.experts-exchange.com/Web/Web_Languages/PHP/help.jsp#hi56 I feel lazy to write code for you. =) Since I don't know what kind of knowledge do you require. If you in doubts about how this script works or can't see a way how to improve it, please show what exactly you do not understand.
0
 

Author Comment

by:playstat
ID: 11006771
im trying to understand how the information extracts the infiormation from the page.

For example

Where does it know where to start to extract and stop.

how the filters take place etc

If you can give me many examples from say a html php pages then the appropriate filter maybe I can work out the rest its just a means of doing this and then using variations for other applications.

The text file output could you make that into line by line save without the numbers I would be most grateful.

best regards

0
 
LVL 9

Accepted Solution

by:
techtonik earned 500 total points
ID: 11016217
Ok. Here we go.. While making filters echo your intermediate results to see what result have you got.
<?php
// here you specify keyword to substitute in URL to fetch html page with results
$your_keyword = "mykey";

// now reading whole page into string with all html markup - note
// use of $you_keyword defined above
$htmlpage = file_get_contents("http://inventory.overture.com/d/searchinventory/suggestion/?term=".$your_keyword."&mkt=us&lang=en_US");

// filter section
// i'll modify it a bit from previous example, where I just stripped html tags
// here we will crop the text to contain only result table
$htmlpage = strstr($htmlpage, "Searches done in");
// RTFM: string strstr ( string haystack, string needle )
// strstr returns part of haystack string from the first occurrence of needle to
// the end of haystack php.net/strstr

// now $htmlpage variable contains following fragment
/*
Searches done in March 2004</font></th>
  </tr>
  <tr align=left bgcolor=#999999>
    <th><font face="verdana,sans-serif" size=2 color=E8E8E8>Count</font></th>
    <th><font face="verdana,sans-serif" size=2 color=E8E8E8>Search Term</font></th>
  </tr>
<tr bgcolor=#333333>
<td><font face="verdana,sans-serif" size=2 color=E8E8E8>&nbsp;24306</td>
<td><font face="verdana,sans-serif" size=2 color=E8E8E8>&nbsp;overture</a></td>
</tr>
<tr bgcolor="#F4F4F4">
<td><font face="verdana,sans-serif" size=1>&nbsp;4266</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=080101%20ctxtid%20ilgan%2Ejoins%2Ecom%20ilgan%2Eshtml%20overture%20sports&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>080101 ctxtid ilgan.joins.com ilgan.shtml overture sports</a></td>
</tr>
<tr>
<td><font face="verdana,sans-serif" size=1>&nbsp;2733</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=1812%20overture&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>1812 overture</a></td>
</tr>
<tr bgcolor="#F4F4F4">
<td><font face="verdana,sans-serif" size=1>&nbsp;1898</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=international%20overture&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>international overture</a></td>
</tr>
*/

// now, if you strip all html markup and &nbsp; entit- you will end with the output from  the
// previous example, but now we will go a little bit further to make a more sophisticated filter
// we will extract fields "count" and "search term" from html table into array, where
// search term will be a key and "count" will be value associated with that key
// additionally we will extract all links with suggestions into third array to be
// able to parse these also

// now look at the html markup
// each html row begins with a <tr> tag, so we should split string by this tag to get an
// array of html rows for further processing, but first we need to strip table header
// that is, all up to first <tr bgcolor=#333333>
// since value of bgcolor is not 100% guaranteed to be #333333, we will use only first part
$htmlpage = strstr($htmlpage, "<tr bgcolor");
// since header rows begin with a <tr align=left they will not match and hence will be stripped
// you can check what you've got with the following construction
// echo $htmlpage; die();

// next, split string with php.net/explode
$htmlrowsarr = explode("<tr", $htmlpage);
// test your result with print_r($htmlrowsarr);

// now filling arrays
for ($i = 1; $i <count($htmlrowsarr); $i++) {
// number begins right after the first &nbsp; and to the next closing tag </td>
// strip to &nbsp;
  $str = strstr($htmlrowsarr[$i], "&nbsp;");
// determine position of </td>
  $to = strpos($str, "</td>");
// getting value for the first array - substring from 7th symbol (skip &nbsp;) to position
// of </td> closing tag. indexes are numerated from zero, so 7th symbol have an index 6
  $value = substr($str,6,$to-6);
// $to-6 indicates how much symbols do we need to extract
// echo "$value.";

// next &nbsp; will precede our search term or a link to other suggestion, so
// search string for &nbsp; with preceding > to match only second &nbsp;
  $str = strstr($str, ">&nbsp;");
// determine where the link ends
  $to = strpos($str, "</td>");
// getting link  
  $link = substr($str,7,$to-7);
// strip html markup to get the key
  $key = strip_tags( $link );
// extracting actual URL from href attribute
// it will be substring from symbol after href's quote and up to next quote
  $from = strpos($str,"href=") + 6;
  $to = strpos($str, "\"", $from);
// first element is our search term so it doesnt have any links
  if ($i != 1) {
     $url = substr($str, $from, $to-$from);
  } else {
     $url = "";
  }

// now building arrays
$overtures[$key] = $value;
$overlinks[$key] = $url;
}

print_r($overtures);
// now when you've got this info - do what you want =)

// actually you do not need to extract links - just supply $key as a parameter
// $your_keyword at the beginning of this script and it will fetch a page for parsing
?>

You can make a function from this example.
Download the manual and.. good luck. =)
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
PPC Advertising 2 36
maybe no no httpd.conf 6 47
How can I make this form submit to itself? 10 27
how to use Initialization Vector for openssl_encrypt() 5 26
This article discusses how to create an extensible mechanism for linked drop downs.
Preparing an email is something we should all take special care with – especially when the email is for somebody you may not know very well. The pressures of everyday working life stacked with a hectic office environment can make this a real challen…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question