Solved

extract overture keywords and save to text file line by line

Posted on 2004-04-30
9
822 Views
Last Modified: 2013-11-28
hi there

Im looking at a script that can save out overture keywords and save line by line to a text file. If its possible then to extract each of the keyword possibilities into the text file as well. Can this be saved in numalphabetic order.

best regards

0
Comment
Question by:playstat
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 9

Expert Comment

by:techtonik
ID: 10962533
I'm not an english native, but a kind of PHP programmer, so if provide an example of this Overture, then perhaps I could understad you. Otherwise I just can't help.
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10962535
I'm not an english native, but a kind of PHP programmer, so if you provide an example of this Overture, then perhaps I could understad you. Otherwise I just can't help.
0
 

Author Comment

by:playstat
ID: 10964588
http://inventory.overture.com/d/searchinventory/suggestion/

this is the actual url enter a keyword and possibilities are displayed. I need something that can extract that info into a text file line by line and if yer notice that the actual results have links to another set of possiblities of that keyword.

A script that can do this would be great!
0
Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

 

Author Comment

by:playstat
ID: 10964601
If yer can extract all under that keyword that would be ideal and make sure there are no duplicates thx
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10967998
Easy. Here is an example.

<?php
$your_keyword = "mykey";

$htmlpage = file_get_contents("http://inventory.overture.com/d/searchinventory/suggestion/?term=".$your_keyword."&mkt=us&lang=en_US");

$textpage = strip_tags( $htmlpage );
$textpage = str_replace("&nbsp;","", $textpage);

$texttosave = strstr($textpage, "Searches done in");

$f = fopen("file.txt", "w");
fwrite($f, $texttosave);
fclose($f);

?>

I think you've got an idea. Further refinement can be done with String functions.
http://us2.php.net/manual/en/ref.strings.php
0
 

Author Comment

by:playstat
ID: 10988778
Can you show me how to refine it further and where to take out the numbers it produces in the files

and if possible line by line.

the other thing is it takes out the actual keywords but what about going another level for each href and extracting those to then removing duplicates.
0
 
LVL 9

Expert Comment

by:techtonik
ID: 10989275
There are two possibilities with refinement. First - using regular expressions and second - using PHP string functions. Regexps are more convenient in many cases but require more effort to learn.
Falling back to EE rules http://www.experts-exchange.com/Web/Web_Languages/PHP/help.jsp#hi56 I feel lazy to write code for you. =) Since I don't know what kind of knowledge do you require. If you in doubts about how this script works or can't see a way how to improve it, please show what exactly you do not understand.
0
 

Author Comment

by:playstat
ID: 11006771
im trying to understand how the information extracts the infiormation from the page.

For example

Where does it know where to start to extract and stop.

how the filters take place etc

If you can give me many examples from say a html php pages then the appropriate filter maybe I can work out the rest its just a means of doing this and then using variations for other applications.

The text file output could you make that into line by line save without the numbers I would be most grateful.

best regards

0
 
LVL 9

Accepted Solution

by:
techtonik earned 500 total points
ID: 11016217
Ok. Here we go.. While making filters echo your intermediate results to see what result have you got.
<?php
// here you specify keyword to substitute in URL to fetch html page with results
$your_keyword = "mykey";

// now reading whole page into string with all html markup - note
// use of $you_keyword defined above
$htmlpage = file_get_contents("http://inventory.overture.com/d/searchinventory/suggestion/?term=".$your_keyword."&mkt=us&lang=en_US");

// filter section
// i'll modify it a bit from previous example, where I just stripped html tags
// here we will crop the text to contain only result table
$htmlpage = strstr($htmlpage, "Searches done in");
// RTFM: string strstr ( string haystack, string needle )
// strstr returns part of haystack string from the first occurrence of needle to
// the end of haystack php.net/strstr

// now $htmlpage variable contains following fragment
/*
Searches done in March 2004</font></th>
  </tr>
  <tr align=left bgcolor=#999999>
    <th><font face="verdana,sans-serif" size=2 color=E8E8E8>Count</font></th>
    <th><font face="verdana,sans-serif" size=2 color=E8E8E8>Search Term</font></th>
  </tr>
<tr bgcolor=#333333>
<td><font face="verdana,sans-serif" size=2 color=E8E8E8>&nbsp;24306</td>
<td><font face="verdana,sans-serif" size=2 color=E8E8E8>&nbsp;overture</a></td>
</tr>
<tr bgcolor="#F4F4F4">
<td><font face="verdana,sans-serif" size=1>&nbsp;4266</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=080101%20ctxtid%20ilgan%2Ejoins%2Ecom%20ilgan%2Eshtml%20overture%20sports&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>080101 ctxtid ilgan.joins.com ilgan.shtml overture sports</a></td>
</tr>
<tr>
<td><font face="verdana,sans-serif" size=1>&nbsp;2733</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=1812%20overture&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>1812 overture</a></td>
</tr>
<tr bgcolor="#F4F4F4">
<td><font face="verdana,sans-serif" size=1>&nbsp;1898</td>
<td>&nbsp;<a href="/d/searchinventory/suggestion/?term=international%20overture&mkt=us&lang=en_US"><font face="verdana,sans-serif" size=1 color=#000000>international overture</a></td>
</tr>
*/

// now, if you strip all html markup and &nbsp; entit- you will end with the output from  the
// previous example, but now we will go a little bit further to make a more sophisticated filter
// we will extract fields "count" and "search term" from html table into array, where
// search term will be a key and "count" will be value associated with that key
// additionally we will extract all links with suggestions into third array to be
// able to parse these also

// now look at the html markup
// each html row begins with a <tr> tag, so we should split string by this tag to get an
// array of html rows for further processing, but first we need to strip table header
// that is, all up to first <tr bgcolor=#333333>
// since value of bgcolor is not 100% guaranteed to be #333333, we will use only first part
$htmlpage = strstr($htmlpage, "<tr bgcolor");
// since header rows begin with a <tr align=left they will not match and hence will be stripped
// you can check what you've got with the following construction
// echo $htmlpage; die();

// next, split string with php.net/explode
$htmlrowsarr = explode("<tr", $htmlpage);
// test your result with print_r($htmlrowsarr);

// now filling arrays
for ($i = 1; $i <count($htmlrowsarr); $i++) {
// number begins right after the first &nbsp; and to the next closing tag </td>
// strip to &nbsp;
  $str = strstr($htmlrowsarr[$i], "&nbsp;");
// determine position of </td>
  $to = strpos($str, "</td>");
// getting value for the first array - substring from 7th symbol (skip &nbsp;) to position
// of </td> closing tag. indexes are numerated from zero, so 7th symbol have an index 6
  $value = substr($str,6,$to-6);
// $to-6 indicates how much symbols do we need to extract
// echo "$value.";

// next &nbsp; will precede our search term or a link to other suggestion, so
// search string for &nbsp; with preceding > to match only second &nbsp;
  $str = strstr($str, ">&nbsp;");
// determine where the link ends
  $to = strpos($str, "</td>");
// getting link  
  $link = substr($str,7,$to-7);
// strip html markup to get the key
  $key = strip_tags( $link );
// extracting actual URL from href attribute
// it will be substring from symbol after href's quote and up to next quote
  $from = strpos($str,"href=") + 6;
  $to = strpos($str, "\"", $from);
// first element is our search term so it doesnt have any links
  if ($i != 1) {
     $url = substr($str, $from, $to-$from);
  } else {
     $url = "";
  }

// now building arrays
$overtures[$key] = $value;
$overlinks[$key] = $url;
}

print_r($overtures);
// now when you've got this info - do what you want =)

// actually you do not need to extract links - just supply $key as a parameter
// $your_keyword at the beginning of this script and it will fetch a page for parsing
?>

You can make a function from this example.
Download the manual and.. good luck. =)
0

Featured Post

Enroll in June's Course of the Month

June’s Course of the Month is now available! Experts Exchange’s Premium Members, Team Accounts, and Qualified Experts have access to a complimentary course each month as part of their membership—an extra way to sharpen your skills and increase training.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Starting your own business is always a daunting process, and for most people it is brand new experience. Avoid the common pitfalls by following these tips to start on the road to success.
There are many other benefits to creating an inbound marketing strategy. Let’s take a look at five of the biggest and how they'll affect your business.
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
Use Wufoo, an online form creation tool, to make powerful forms. Learn how to selectively show certain fields based on user input using rules to gather relevant information and data from your forms. The rules feature provides you with an opportunity…

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question