?
Solved

Increasing my custom search engine's results?

Posted on 2013-06-30
13
Medium Priority
?
1,210 Views
Last Modified: 2013-07-27
Hi
I know that some searches made in Google can return millions of results items. In my Java custom search engine, CSE, I am at the point where it returns only 10 urls per call, as it states in the documentation. How do I get a massive return on my calls? I'd like to process thousands!

Here is what works right now...(appropriate setup done) (it returns 10 url's for the search given)

String searchString = "Michael Jordan";
       
        List<Result> items = customsearch.cse().list(searchString).execute().getItems();
       
       
        System.out.println("Search  for "+searchString+" , size = "+items.size());
        StringArrayOfLinks = new String[items.size()];
        int linkCount=0;
       
        for (Result item : items) {
            System.out.println(item.getTitle() + " (" + item.getLink() + ")");
            StringArrayOfLinks[linkCount] = item.getLink();
           
           
            linkCount++;
        }

I'd like to be able to process way more than the 10 items returned from Google.
Right now, I get the same 10 links returned.  It should be a different 10 every time.
Ideas?
I'd like linkCount to be >1000, for sure.
thanks
0
Comment
Question by:beavoid
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
13 Comments
 
LVL 18

Expert Comment

by:nap0leon
ID: 39290596
If you want 10 different links every time the request runs... then you need to somehow create a bank of links for it to pick from.  What it is doing now is running the search and returning the top 10 items every time.  The only time you would see a difference in the search results is if that term's search results rankings have changed.
0
 
LVL 36

Expert Comment

by:mccarl
ID: 39292046
some searches made in Google can return millions of results items
Well, not exactly!! They tell you that there are potentially millions of results available, but they only return ~10 results per page.

This brings me to another point... Probably the main reason why Google limits the number of results to 100 via the API... Ads! Google is a business and as such they need to make money to continue the service that they provide. When you search via the normal webpage, Google slip some ads in there and therefore they get paid some money by the company that the ad is for. When you do your search via the API, there is no such mechanism to get paid for their effort, and hence they limit the service and charge for higher usage of that service.

One thing that I will clarify with you, just because you have touched on this a couple of times now, are you interested in finding out the number that is Google's estimate of the total number of search results? (say for ranking the popularity of a subject or something)  Because that IS something that the API returns... Breakout that line that executes the search and gets the results, into two separate lines and then you have access to the "total results" number, ie...
Search searchResult = customsearch.cse().list(searchString).execute();
System.out.println("About " + searchResult.getSearchInformation().getTotalResults() + " results available");
List<Result> items = searchResult.getItems();

Open in new window

0
 

Author Comment

by:beavoid
ID: 39292125
getTotalResults still returns 10.
I think that is the basic startup return count package
How does the retrieval count work? I read thousands are possible!
Thanks
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 36

Expert Comment

by:mccarl
ID: 39292169
When I search for the word "test" I get the following output...
About 341000000 results available
1:    Create Tests for Organizational Training and Certification Programs ... (http://www.test.com/)
2:    Speedtest.net - The Global Broadband Speed Test (http://www.speedtest.net/)
3:    Personality test based on C. Jung and I. Briggs Myers type theory (http://www.humanmetrics.com/cgi-win/jtypes2.asp)
4:    Speakeasy Speed Test (http://www.speakeasy.net/speedtest/)
5:    Test your IPv6. (http://test-ipv6.com/)
6:    Test - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Test)
7:    The HTML5 test - How well does your browser support HTML5? (http://html5test.com/)
8:    Test cricket - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Test_cricket)
9:    The Acid3 Test (http://acid3.acidtests.org/)
10:    Test (wrestler) - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Test_(wrestler))

Open in new window

0
 

Author Comment

by:beavoid
ID: 39295169
Could you please attach this output's code file to a comment? Super
Thanks
0
 
LVL 36

Expert Comment

by:mccarl
ID: 39295256
Here is the code. You will need to set your own api key and cx values...
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.util.ArrayList;
import java.util.List;

import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.customsearch.Customsearch;
import com.google.api.services.customsearch.Customsearch.Builder;
import com.google.api.services.customsearch.CustomsearchRequest;
import com.google.api.services.customsearch.CustomsearchRequestInitializer;
import com.google.api.services.customsearch.model.Result;
import com.google.api.services.customsearch.model.Search;

public class TestCustomSearchAPI {
    
    public static void main(String[] args) throws GeneralSecurityException, IOException {
        List<Result> items = new ArrayList<Result>();
        for (long i = 1; i <= 10; i += 10) {
            items.addAll(executeSearch("test", i));
        }
        
        int i = 1;
        for (Result item : items) {
            System.out.println(i++ + ":    " + item.getTitle() + " (" + item.getLink() + ")");
        }
    }

    private static List<Result> executeSearch(String searchTerm, final Long start) throws GeneralSecurityException, IOException {
        Builder builder = new Customsearch.Builder(GoogleNetHttpTransport.newTrustedTransport(), new JacksonFactory(), null);
        builder.setApplicationName("Search Test");
        builder.setCustomsearchRequestInitializer(new CustomsearchRequestInitializer() {
            @Override
            protected void initializeCustomsearchRequest(CustomsearchRequest<?> request) throws IOException {
                request.setKey("###########");
                request.set("cx", "%%%%%%%%%%%%");
                request.set("start", start);
            }
        });
        Customsearch customsearch = builder.build();
        Search searchResult = customsearch.cse().list(searchTerm).execute();
        System.out.println("About " + searchResult.getSearchInformation().getTotalResults() + " results available");
        List<Result> items = searchResult.getItems();
        return items;
    }
}

Open in new window

0
 

Author Comment

by:beavoid
ID: 39296925
Thanks,

"About 340000000 results available"
Not too shabby.

Pity that I only get 10 results per search.

This page discusses bigger returns.
But it is always ten links returned, no matter what? That probably isn't the worst thing in the world, really. Just curious.

here

Thanks
0
 
LVL 36

Expert Comment

by:mccarl
ID: 39297014
This page discusses bigger returns.
That page refers to the number of search requests, not the number of results that are achievable.

The problem is that Google don't provide an API for there plain old vanilla "Search" service. This is a "Custom Search Engine" whose intended purpose was to provide search facilities over a site (or a set of sites) and those sites would have X number of total pages. So returning any more than 100 page results is probably not that important when X might not be that much more than 100. But the way that we are setting the CSE up is non-standard, so that is why it is really catered for to return large numbers of results.
0
 

Author Comment

by:beavoid
ID: 39334498
Thanks,
I'm still seeing only 10
What else can we fiddle with?
0
 

Author Comment

by:beavoid
ID: 39338079
Thanks
I'm interested in getting the most links possible returned by my code, even if they are in super large quantities of 10 links returned. - to get close to the results numbers they claim to have found.

I don't see a way around this. I have signed up for Google's billable searching service, to have a look, see, they claim only to bill me if I top a threshold, which I have not topped, but it still returns only the 10 links. Is there a place where I can stipulate massive returns? Might it return a new list of 10 links on successive calls, or is it always the best 10? I seem to think I saw a results count text entry somewhere on the panel? Or did you mention a way to still get many different replies in the old system?
Thanks
0
 

Author Comment

by:beavoid
ID: 39356900
What is the final piece of the puzzle? to get large numbers?

thx
0
 
LVL 36

Accepted Solution

by:
mccarl earned 2000 total points
ID: 39356903
I think I have well and truly answered this enough times already. Here's one more.. it CAN'T be done! :) I thought you were getting somewhere with using Bing though?
0
 

Author Comment

by:beavoid
ID: 39357926
I think so, thanks for making me see that. I'll ask another question, jut to test you all :)
Bing looks very promising. Their API is very straightforward, so is Google's, but I was impressed by Bing.
I have a question for them at stack exchange, because Bing had a control panel page where you could enter the required link count response, and I can't find that panel. I entered 1, so I could get something working first. I want to try 100 or more. They expect payment for very large returns, not surprisingly. I'm allowed 5,000 free searches a day, so 500* 100 found pages a day will keep me happy.

Thanks
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this post we will learn different types of Android Layout and some basics of an Android App.
Make the most of your online learning experience.
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question