Link to home
Start Free TrialLog in
Avatar of Camillia
CamilliaFlag for United States of America

asked on

How do other people do "searches"?

I've designed an ASP.Net/VB.Net application that goes agianst both sql server and as400.

User can do name, address, phone number, etc searches.

*** I ONLY BRING BACK 20 ROWS of data *** this is because our database has too many indexes, wrong indexes or repeated indexes. Tables dont have Foreign Keys. I even broke my stored proc sql in 2 ...to first read from table1...dump into a temp table...join to table2 to cut back on # of resultsets and processing.

So, my coworker asked...how do other poeple ..for example...yellowpages.com do their searches. She said she can type in "jones" and get thousands of rows back and page thru them. My response was that their DB is designed correctly...

But she's right...how do sites like yellowpages.com can bring back so many rows....or even google...do they have a secret "search algorithm" ??
Avatar of Member_2_2484401
Member_2_2484401
Flag of United States of America image


Can you post the table structure and your query?

Basically, if the data is in the database, your query should return the correct results. If there are thousands of "jones" entries, then thousands of records should be returned.

You may want to check if you're searching in lowercase, uppercase, or mixed-case. Are you searches case-sensitive or case-insensitive.

There are LOTS of ways to go about this. Let's start with your table-structure and your query.

Regards,
DaveSlash
Avatar of Member_2_276102
Member_2_276102

Minor note... Google limits returned results to (IIRC) 1000 rows. Doesn't matter how many 'hits' it reports -- it doesn't return every 'hit'.

Tom
Avatar of Camillia

ASKER

dave - I go against 2 tables. One has like 700,000 something rows. The other has 800,000 something. This was the largest database we have for me to code against.

I use all "upper case" for my search. I use "like" and do it like this : like 'jones%'.

But return thousands of jones...my stored proc..both in sql server AND AS400 times out. Google returns 1000 rows. So even they have a limit. For the AS400 stored proc..i even broke the sql in 2 as i mentioned above...

post my table strucures...it's long..should I post anyway?
i think there has to be a limit on the number of rows returned. If yellowpages.com has 200,000 row of "jones"... i dont think it brings back 200,000 rows...
Farzadw:

Various ways to implement on your AS/400. You might issue a select with a FETCH FIRST n ROWS ONLY clause for example. But I _think_ MS SQL has something of a different syntax. (They don't follow SQL standards well.)

Best might be creating stored procs to return result sets. Your client code might then be able to call essentially the same stored procs on the two different platforms and get similar results. The stored procs would both return a result set with whatever limited number of rows you wanted. The procs could LOOP to fetch rows.

You could create the interfaces so that your client code would see the same looking result set either way and call the procs with the same parm definitions.

Tom
Tom, yes, that's what i'm doing . I use "fetch first 20 rows only". In sql server, i use "select top 20 ..."...

The code and the both stored procs in sql server and as400 are working...

BUT....my coworker and managers are asking my why I cant bring back THOUSANDS of rows instead of the 20 rows...they asked why yellowpages.com can bring back thousands of rows without the site timing out. How could google bring back so many rows without the site timing out...

My response was that our database is not designed correctly. I analyzed my sql in both sql-server and as400. Looked at joins and cost of joins...etc... if i dont limit it...query will take a long time.

*** I KNOW it's because of the DB design..the insane amount of indexes/logicals in both DBs ****

So, was just wondering about the same question as they did...how could yellowpages.com bring back thousands of rows...is it because they use an innovative search technique?? is it because their DB is designed right?? was looking for thoughts....
hi farzadw

1. if you sql is designed right , then you should be able to return 1000 rather quickly. unloess your timeout value is set too low, so i guess the problem is with you sql. i suggest you post here your sql, and post the indexes that are defined on each table
2. first of all. google don't use a regular database, they use something they developed on thier own that is optimize to do searches, but that is not the point
when yellowpage / google / any other search engine returns thounds of results, you never get them all at once. you get the first 10 to 20 in the first result page, then you need to go to the next page etc etc,
so even if the serach return 1000 results, each time they only get 10 so they can display the result page for you.
what you need is a similar mechanism. you need to develop a paging mechanism, that is, you stored procedure should get parameters that will tell it what was the last result it retuned, and the next time you invoke the stored procedure it will return the next 20 resluts
this way you support 1000 results but each time you get from the databse only 20

did you understand ? was i clear ?
Farzadw:

momi_sabag is right. Google only returns only a few at a time. And even then, it won't return any more after you scroll through the first 1000.

IOW, it works essentially the same way you're doing it -- except it fetches the NEXT 10 instead of fetching FIRST 10 ONLY.

Without a picture of the database, though, I'm not sure we can give much advice.

Tom
I totally understand it. I've done ASP.Net paging before (my other jobs), returning lots of data, display 10 per page and everyone was happy... But with this database...both in SQL Server and AS400...returning lots of data timesout. ...

With the SQL Server one, since i'm more familiar with that,.,, i turned on Execution Plan and I do see that all those unnecessary indexes is causing extra processing...With AS400...i actually had to break the sql in two...

One thing I have to mention: I join 2 tables in DB1. But I also have "exists" clause that looks at a table in ANOTHER database. So basically 3 tables in 2 databases...

I really think it's the design of the DB...Let me see if I can post the table structure here...it could be long.
Farzadw:

Start with just most relevant columns of (physical) tables. No need for everything (yet). For the tables, a general statement of average row length and number of rows in the tables will probably be useful. (I.e., will help guide next questions.)

Tom
I havent forgetten about this question. Got busy with other things. Now my managers are on my back again asking me why I cant bring backmore rows. I'll post the 3 tables I'm selecting from and my query. Hopfully someone can help out.
ok, I trimmed the tables and only got the columns i'm using. Because the query takes time to execute if I join the 3 tables, I created a table called WQTEMP and dump results into it and then join.

This is long . So, please if you dont have the time, it's ok. My managers just have to live with it.

********** Tables ************
CREATE TABLE library.DETAIL (

      D2DCODE CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D2SRC CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D2REF CHAR(30) CCSID 37 NOT NULL DEFAULT '' ,

      D2TOT NUMERIC(15, 2) NOT NULL DEFAULT 0 ,

      D2ASDT DATE NOT NULL DEFAULT CURRENT_DATE ,

      D2CRED CHAR(1) CCSID 37 NOT NULL DEFAULT '' ,

      D2STAT CHAR(3) CCSID 37 NOT NULL DEFAULT '' ,

      D2DEL CHAR(1) CCSID 37 NOT NULL DEFAULT '') ;


CREATE TABLE library.MASTER (



      D1DCOD CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D1FNAME CHAR(15) CCSID 37 NOT NULL DEFAULT '' ,

      D1LNAME CHAR(20) CCSID 37 NOT NULL DEFAULT '' ,

      D1ADDR CHAR(35) CCSID 37 NOT NULL DEFAULT '' ,

      D1ZIP CHAR(9) CCSID 37 NOT NULL DEFAULT '' ,

      D1CITY CHAR(18) CCSID 37 NOT NULL DEFAULT '' ,

      D1ST CHAR(2) CCSID 37 NOT NULL DEFAULT '' ,

      D1PHN CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D1SSNO CHAR(20) CCSID 37 NOT NULL DEFAULT '' ,

      D1SPO CHAR(11) CCSID 37 NOT NULL DEFAULT '' ,

      D1EPHN CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D1TOT NUMERIC(15, 2) NOT NULL DEFAULT 0 ,

      D1DEL CHAR(1) CCSID 37 NOT NULL DEFAULT '' ,

      D1XPHN CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D1CELL CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      PRIMARY KEY( D1DCOD ) ) ;


CREATE TABLE Library.USERSRC (



      SU4CLT# CHAR(3) CCSID 37 NOT NULL DEFAULT '' ,

      SU4PRF CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      SU4SRC CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      PRIMARY KEY( SU4CLT# , SU4PRF , SU4SRC ) ) ;

**********************************
*************** My Query ********************
 SET SQLStmt='INSERT into WQTEMP (COL1,COl5,COL6,COL7,COl8,COL9,COL10,COL11,COL12,COl13,COL14,ID)
                        Select  Rtrim(d1lname) || '','' || RTrim(d1fname) as COL1,
                                   D1COLL as COL5,
                                  d1ssno as COL6,
                                  d1phn as COL7,
                                  d1addr as COL8,
                                  D1City as COL9,
                                  D1ST as COL10,
                                  D1Zip as COL11,
                                  D1CELL as COL12,
                                  d1xphn as COL13,
                                  d1ephn as COL14,
                                  d1dcod as ID
     FROM Master
      WHERE
       
        D1DEL <> ''D''' concat Trim(SQLStatement)
                    Concat(' Fetch first 20 rows only') ;

   
     SET SQLStmtSelect ='Select COL1,
         Case D2Del When ''I'' Then 0 Else sum(d2tot) End as COL2,
         d2ref as COL3,
         d2stat as COL4,
         COL5,
         COL6,
         COL7,
         COL8,
         COL9,
         COL10,
         COL11,
         COL12,
         COL13,
         COL14,
         d2snam as COL15 ,
         d2cred as COL16,
          ID
  from wqtemp
       inner join Detail on ID= d2dcode
  where
    exists ( select SU4SRC from library.usersrc where  SU4PRF=''' || UserLogin || '''  AND SU4SRC=D2SRC AND SU4CLT# = Right(current schema,3) )
  AND d2del <> ''D''
   group by COL1, COL2, D2REF , d2stat , COL5, COL6,  COL7,  COL8,  COL9, COL10, COL11, COL12,  COL13, COL14, d2snam ,  d2cred ,  ID, D2Del
   order by Col1 Fetch First 20 rows only' ;


----------********************** Indexes ************
There are no indexes. There are Logicals that are keyed by some of the fields. If this is the same as indexes, i can post those as well.
SOLUTION
Avatar of momi_sabag
momi_sabag
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I will try your recommandations.

I have no choice with group by and order by. Cant remove them.

When you say add index...do you actually mean an index or a logical? In sql server, views(logicals) and indexes are 2 different things. So add an index?
With the "order by"...I will look into doing it in the .net code. I am using a dataset so maybe I can do the group by and order by in the code. Not sure tho.
i mean an index, not a logical
and it will be better to let the database do the group by and order by rather then doing it in code
thanks , let me try what you said.

Whatever I do in AS400, has to work in SQL Server tho. I hope i can speed this up and move on with my life :D

Kamila
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I made the changes yesterday both in SQL Server and AS400. All was fine. I did get all the results back. I even tried my sql server command on a client box that has a large database and it worked fine:

In SQL Server: I made the changes momi_sabag mentioned. There was already an index for D2Dcode , i turned on execution plan and it wasnt using that index, so I used "index hint".

In AS400, I did the same thing. I removed that WQTEMP, added an index for D2Dcode and D2SRC. Changed the query and it was so fast. I could bring 4000 something rows back.

I played around with the index...dropped, added it. And as Tom mentions, sometimes it picked up the index i created, sometimes it picked up the LF. I can see this in Navigator's query plan.

***First, not sure why AS400 sometimes picks up LF and sometimes my index. Is there an "index hint" in as400 I can use to force the query to use a specific index? ***

I do understand what Tom is explaining but i cant drop the LFs. We will be giving this to our clients so I dont want to play around with their database. Also, we have a vendor app that does checks behind the scene on the database so I dont want to do it. Also, I dont know how to drop an LF and recreate it :)

Second, yesterday afternoon seems like we got hit by a virus and we lost our network connection and internet. Today, I ran the same query and it takes forever to run. I get errors like internal sql error. Even bringing the first 5 rows takes forever. I've asked my manager to reboot the AS400 and will try this again tomorrow. The sql server version also seemed much faster yesterday with the changes I made. Today, it seems slower.

If all fails, I might bring back 50 rows to make my manager happy.
I said I'll bring back 50 rows but momi is correct...issue is not the # of rows.
"exists" is not faster than "inner join"?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
so no such thing as 'index hint" in as400? to force the query to use an index?
hi
when db2 see an exists predicate , it tries to convert it to a join, since there are 4 ways db2 can perform a join but only 1 way to perform an exists,
that is why it is generally recommended to avoid exists as possible and use joins instead
thanks, rebooted the as400 and will try my query again. It worked fine 2 days ago tho..

But again, no way to have "index hint " in as400?

Nope. There is no way to influence the query optimizer on AS/400 like there is in Oracle.

Sorry,
Dave
Farzadw:

Since I'm not familiar with "index hint", I can't comment directly. AFAIK, Dave is right that no such action exists on AS/400. But I'm not sure that one would make any sense.

As mentioned before, if a matching index exists that's appropriate, it _will_ be used. There's no need to tell SQL to use an index that's being used.

And even if both a SQL INDEX and a LF exist, as long as they're indexing the same way, there is only one "index". It is irrelevant which one actually gets reported as being used. Pointing to one will give the exact same result as pointing to the other -- they're two names for the same "thing".

I can't do more than guess why one might be reported some times, and the other reported other times, if the _same_ query is run. It might simply be a matter of which one responds to a lock request first or something else just as trivial.

Now, if selection criteria changes, if it's a different [UserLogin] or if it's a different D1DEL value or D2DEL value or some other change in input, then maybe it does influence how the choice is made between the objects. But it should make no difference in the final result. The selected rows should be the same either way and performance should be the same either way. The actual internal "index" remains the same either way.

Tom
Tom - it makes sense. Got busy with sql server stuff again but will try my query again on Monday.

I found this link and wanted to try this as well:
http://www.centerfieldtechnology.com/publications/archive/December%202005%20Newsletter.pdf

I did a 'select * from qusrsYS/QAQQINI '  And I got back:
 Parameter                      
                                 
 IGNORE_DERIVED_INDEX            
 ********  End of data  ********

-------- How would I know if it's a yes or no?

Scroll to the right.  -- DaveSlash
silly me, thanks. It's set to *YES
Thanks. all makes sense. I still have a long way to go to learn AS400 :)