Solved

Where is my URL in a search engine????

Posted on 2003-11-17
9
235 Views
Last Modified: 2008-03-06
Hi,
Here is a cool one. If I submit a search string to a search engine such as Google, how can I parse the results to find what page my URL appears on?

1.) Can it be done?
2.) How?

A realy interesting one this, Open for debate and if anyone knows, there are certainly some points availible.

Stu
0
Comment
Question by:08718712060
  • 4
  • 3
  • 2
9 Comments
 
LVL 14

Expert Comment

by:ThG
ID: 9763724
Funny that, a customer asked me the same thing about one month ago. I'll tell you the same I replied to him.

It surely can be done, the mechanical is quite simple, fopen() the composed search URL (including the search string), and parse the result probably with a regexp, counting the results (they are usually created from a fixed template, so you should be able to count the results before yours).

But you have to consider the following problems:
- Each search engine works in a different way, so you will probably need to change most of the code to adapt it to each search engine (and you'd end up with an ugly program).
- Search engines are always under development. I can think of a monthly changing base. This would mean that almost every 30 days you would need to update your scripts or they will stop working.

It won't be VERY hard, but you need a really good reason to do that. And.. if you end up doing it, it would be nice to publish the results in a free php scripts database :-)
0
 

Author Comment

by:08718712060
ID: 9763904
I would definatly publish the code, after all that is the joy of O/Source and PHP. Could you provide a basic example as I am new to php.

Cheers in advance
0
 
LVL 6

Expert Comment

by:aolXFT
ID: 9768204
Personally I wouldn't recommend it. Google are sure to not like people leaching off their search engine, and are sure to change the parsing of their html differently.

Having that said:

<?php

$html = file_get_contents("http://www.google.ie/search?q=google+php&ie=UTF-8&oe=UTF-8&hl=en&meta=");

$matches = array();

// Google stores the results in a <div>
preg_match("#<div>(.*?)</div>#is", $src, $matches);
$results = $matches[1];

// Google starts each result with a <p class=g> tag // Now I know that google isn't xhtml compliant.
...
use regular expressions to sort the rest out for you.
...

?>

Another approach would involve making googles output xhtml compliant, using something like tidy, and then using something like domxml, etc.

It would look like it would be simpler to do something like this if you had something like DOM to fall back on, like you do on the clientside.
0
 
LVL 14

Expert Comment

by:ThG
ID: 9768290
nice function that file_get_contents(), i didn't notice it. I thought there was only file() doing something like that. Nice to know.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:08718712060
ID: 9769861
aolXFT,
Great stuff, as a novice I am struggling with what is probably a simple task but I am easger to do this.

Your code will return the results matching the submitted keywords but how can I then increment through the $matches to find my URL to show a listing number.

What I am trying to achive is a single php script with my own keywords in it to show something like

"Today I am ranked at number XYZ using keywords (X,Y,Z)"

Please can you help me further?.
0
 
LVL 6

Accepted Solution

by:
aolXFT earned 500 total points
ID: 9774317
Hmmm

I think I understand you better now, you only want to search for a few words, and see where on that list your entry is, okay, the following is some VERY UGLY CODE, but it might do the trick:


################################
<?php

$words = "X Y Z";
$my_url = "http://www.example.com";

$html = file_get_contents('http://www.google.com/search?num=100&q=' . urlencode($words));

if(strpos($html, $my_url) !== false){
    list($above, $below) = explode($my_url, $html, 2);
    $pos = substr_count($above, "<p class=g>");
} else {
    $pos = 0;
}

$str = $pos ? "We are position: $pos" : "We are not in that list";

?>
#########################################
I'm being kicked out of the computer suite now, so I haven't time to test this.
0
 

Author Comment

by:08718712060
ID: 9774745
aolXFT
Absaloutly spot on!. Thank you for your quick response and a big thank you to ThG who also dived on this query.

I will continue to flick through my copy of "Beginning PHP4" and see how far I get with my project. Like ThG suggested if all works out well, the code will be released in free forums around the net (With obvious mention of you both in the headers!).

Thanks to you both!
S
0
 
LVL 14

Expert Comment

by:ThG
ID: 9775180
Hi 08718712060,
If you set up yourself seriously in this project, keep us informed. If you just want to show us some code open a question with a small amount of points and make clear in the subject that it's related to this thread.

The problem is that a script like this, even being useful, would stop working very soon for the reasons I explained above.

Good luck
0
 

Author Comment

by:08718712060
ID: 9775283
Hi ThG,
I agree completely with what you are saying so I think I need an extra validation in there so that everytime it runs it checks to see if the start and end html tags are the same, if not it should email myself or the administrator to investigate and suspend that particular search engine. Idealy it would be great to put some sort of logic in there that looks for a pattern and re-ammends itself. This could work for the results but it wouldnt work on the search string itself.

So far so good though. I code a great deal in Delphi but php is completely new to me so I am having fun not having to worry about screen designs for a change! lol

Thanks for your help and I will try and push it on and keep you informed etc.

Cheers

S
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php (http://lmgtfy.com?q=how+easy+is+php)  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
This article will explain how to display the first page of your Microsoft Word documents (e.g. .doc, .docx, etc...) as images in a web page programatically. I have scoured the web on a way to do this unsuccessfully. The goal is to produce something …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now