Solved

Increasing my custom search engine's results?

Posted on 2013-06-30
13
1,027 Views
Last Modified: 2013-07-27
Hi
I know that some searches made in Google can return millions of results items. In my Java custom search engine, CSE, I am at the point where it returns only 10 urls per call, as it states in the documentation. How do I get a massive return on my calls? I'd like to process thousands!

Here is what works right now...(appropriate setup done) (it returns 10 url's for the search given)

String searchString = "Michael Jordan";
       
        List<Result> items = customsearch.cse().list(searchString).execute().getItems();
       
       
        System.out.println("Search  for "+searchString+" , size = "+items.size());
        StringArrayOfLinks = new String[items.size()];
        int linkCount=0;
       
        for (Result item : items) {
            System.out.println(item.getTitle() + " (" + item.getLink() + ")");
            StringArrayOfLinks[linkCount] = item.getLink();
           
           
            linkCount++;
        }

I'd like to be able to process way more than the 10 items returned from Google.
Right now, I get the same 10 links returned.  It should be a different 10 every time.
Ideas?
I'd like linkCount to be >1000, for sure.
thanks
0
Comment
Question by:beavoid
  • 7
  • 5
13 Comments
 
LVL 18

Expert Comment

by:nap0leon
Comment Utility
If you want 10 different links every time the request runs... then you need to somehow create a bank of links for it to pick from.  What it is doing now is running the search and returning the top 10 items every time.  The only time you would see a difference in the search results is if that term's search results rankings have changed.
0
 
LVL 35

Expert Comment

by:mccarl
Comment Utility
some searches made in Google can return millions of results items
Well, not exactly!! They tell you that there are potentially millions of results available, but they only return ~10 results per page.

This brings me to another point... Probably the main reason why Google limits the number of results to 100 via the API... Ads! Google is a business and as such they need to make money to continue the service that they provide. When you search via the normal webpage, Google slip some ads in there and therefore they get paid some money by the company that the ad is for. When you do your search via the API, there is no such mechanism to get paid for their effort, and hence they limit the service and charge for higher usage of that service.

One thing that I will clarify with you, just because you have touched on this a couple of times now, are you interested in finding out the number that is Google's estimate of the total number of search results? (say for ranking the popularity of a subject or something)  Because that IS something that the API returns... Breakout that line that executes the search and gets the results, into two separate lines and then you have access to the "total results" number, ie...
Search searchResult = customsearch.cse().list(searchString).execute();
System.out.println("About " + searchResult.getSearchInformation().getTotalResults() + " results available");
List<Result> items = searchResult.getItems();

Open in new window

0
 

Author Comment

by:beavoid
Comment Utility
getTotalResults still returns 10.
I think that is the basic startup return count package
How does the retrieval count work? I read thousands are possible!
Thanks
0
 
LVL 35

Expert Comment

by:mccarl
Comment Utility
When I search for the word "test" I get the following output...
About 341000000 results available
1:    Create Tests for Organizational Training and Certification Programs ... (http://www.test.com/)
2:    Speedtest.net - The Global Broadband Speed Test (http://www.speedtest.net/)
3:    Personality test based on C. Jung and I. Briggs Myers type theory (http://www.humanmetrics.com/cgi-win/jtypes2.asp)
4:    Speakeasy Speed Test (http://www.speakeasy.net/speedtest/)
5:    Test your IPv6. (http://test-ipv6.com/)
6:    Test - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Test)
7:    The HTML5 test - How well does your browser support HTML5? (http://html5test.com/)
8:    Test cricket - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Test_cricket)
9:    The Acid3 Test (http://acid3.acidtests.org/)
10:    Test (wrestler) - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Test_(wrestler))

Open in new window

0
 

Author Comment

by:beavoid
Comment Utility
Could you please attach this output's code file to a comment? Super
Thanks
0
 
LVL 35

Expert Comment

by:mccarl
Comment Utility
Here is the code. You will need to set your own api key and cx values...
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.util.ArrayList;
import java.util.List;

import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.customsearch.Customsearch;
import com.google.api.services.customsearch.Customsearch.Builder;
import com.google.api.services.customsearch.CustomsearchRequest;
import com.google.api.services.customsearch.CustomsearchRequestInitializer;
import com.google.api.services.customsearch.model.Result;
import com.google.api.services.customsearch.model.Search;

public class TestCustomSearchAPI {
    
    public static void main(String[] args) throws GeneralSecurityException, IOException {
        List<Result> items = new ArrayList<Result>();
        for (long i = 1; i <= 10; i += 10) {
            items.addAll(executeSearch("test", i));
        }
        
        int i = 1;
        for (Result item : items) {
            System.out.println(i++ + ":    " + item.getTitle() + " (" + item.getLink() + ")");
        }
    }

    private static List<Result> executeSearch(String searchTerm, final Long start) throws GeneralSecurityException, IOException {
        Builder builder = new Customsearch.Builder(GoogleNetHttpTransport.newTrustedTransport(), new JacksonFactory(), null);
        builder.setApplicationName("Search Test");
        builder.setCustomsearchRequestInitializer(new CustomsearchRequestInitializer() {
            @Override
            protected void initializeCustomsearchRequest(CustomsearchRequest<?> request) throws IOException {
                request.setKey("###########");
                request.set("cx", "%%%%%%%%%%%%");
                request.set("start", start);
            }
        });
        Customsearch customsearch = builder.build();
        Search searchResult = customsearch.cse().list(searchTerm).execute();
        System.out.println("About " + searchResult.getSearchInformation().getTotalResults() + " results available");
        List<Result> items = searchResult.getItems();
        return items;
    }
}

Open in new window

0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:beavoid
Comment Utility
Thanks,

"About 340000000 results available"
Not too shabby.

Pity that I only get 10 results per search.

This page discusses bigger returns.
But it is always ten links returned, no matter what? That probably isn't the worst thing in the world, really. Just curious.

here

Thanks
0
 
LVL 35

Expert Comment

by:mccarl
Comment Utility
This page discusses bigger returns.
That page refers to the number of search requests, not the number of results that are achievable.

The problem is that Google don't provide an API for there plain old vanilla "Search" service. This is a "Custom Search Engine" whose intended purpose was to provide search facilities over a site (or a set of sites) and those sites would have X number of total pages. So returning any more than 100 page results is probably not that important when X might not be that much more than 100. But the way that we are setting the CSE up is non-standard, so that is why it is really catered for to return large numbers of results.
0
 

Author Comment

by:beavoid
Comment Utility
Thanks,
I'm still seeing only 10
What else can we fiddle with?
0
 

Author Comment

by:beavoid
Comment Utility
Thanks
I'm interested in getting the most links possible returned by my code, even if they are in super large quantities of 10 links returned. - to get close to the results numbers they claim to have found.

I don't see a way around this. I have signed up for Google's billable searching service, to have a look, see, they claim only to bill me if I top a threshold, which I have not topped, but it still returns only the 10 links. Is there a place where I can stipulate massive returns? Might it return a new list of 10 links on successive calls, or is it always the best 10? I seem to think I saw a results count text entry somewhere on the panel? Or did you mention a way to still get many different replies in the old system?
Thanks
0
 

Author Comment

by:beavoid
Comment Utility
What is the final piece of the puzzle? to get large numbers?

thx
0
 
LVL 35

Accepted Solution

by:
mccarl earned 500 total points
Comment Utility
I think I have well and truly answered this enough times already. Here's one more.. it CAN'T be done! :) I thought you were getting somewhere with using Bing though?
0
 

Author Comment

by:beavoid
Comment Utility
I think so, thanks for making me see that. I'll ask another question, jut to test you all :)
Bing looks very promising. Their API is very straightforward, so is Google's, but I was impressed by Bing.
I have a question for them at stack exchange, because Bing had a control panel page where you could enter the required link count response, and I can't find that panel. I entered 1, so I could get something working first. I want to try 100 or more. They expect payment for very large returns, not surprisingly. I'm allowed 5,000 free searches a day, so 500* 100 found pages a day will keep me happy.

Thanks
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now