[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

How do other people do "searches"?

Posted on 2007-07-20
32
Medium Priority
?
371 Views
Last Modified: 2010-04-21
I've designed an ASP.Net/VB.Net application that goes agianst both sql server and as400.

User can do name, address, phone number, etc searches.

*** I ONLY BRING BACK 20 ROWS of data *** this is because our database has too many indexes, wrong indexes or repeated indexes. Tables dont have Foreign Keys. I even broke my stored proc sql in 2 ...to first read from table1...dump into a temp table...join to table2 to cut back on # of resultsets and processing.

So, my coworker asked...how do other poeple ..for example...yellowpages.com do their searches. She said she can type in "jones" and get thousands of rows back and page thru them. My response was that their DB is designed correctly...

But she's right...how do sites like yellowpages.com can bring back so many rows....or even google...do they have a secret "search algorithm" ??
0
Comment
Question by:Camillia
  • 18
  • 6
  • 4
  • +1
32 Comments
 
LVL 18

Expert Comment

by:Dave Ford
ID: 19536493

Can you post the table structure and your query?

Basically, if the data is in the database, your query should return the correct results. If there are thousands of "jones" entries, then thousands of records should be returned.

You may want to check if you're searching in lowercase, uppercase, or mixed-case. Are you searches case-sensitive or case-insensitive.

There are LOTS of ways to go about this. Let's start with your table-structure and your query.

Regards,
DaveSlash
0
 
LVL 27

Expert Comment

by:tliotta
ID: 19536507
Minor note... Google limits returned results to (IIRC) 1000 rows. Doesn't matter how many 'hits' it reports -- it doesn't return every 'hit'.

Tom
0
 
LVL 7

Author Comment

by:Camillia
ID: 19536556
dave - I go against 2 tables. One has like 700,000 something rows. The other has 800,000 something. This was the largest database we have for me to code against.

I use all "upper case" for my search. I use "like" and do it like this : like 'jones%'.

But return thousands of jones...my stored proc..both in sql server AND AS400 times out. Google returns 1000 rows. So even they have a limit. For the AS400 stored proc..i even broke the sql in 2 as i mentioned above...

post my table strucures...it's long..should I post anyway?
0
Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

 
LVL 7

Author Comment

by:Camillia
ID: 19536568
i think there has to be a limit on the number of rows returned. If yellowpages.com has 200,000 row of "jones"... i dont think it brings back 200,000 rows...
0
 
LVL 27

Expert Comment

by:tliotta
ID: 19536793
Farzadw:

Various ways to implement on your AS/400. You might issue a select with a FETCH FIRST n ROWS ONLY clause for example. But I _think_ MS SQL has something of a different syntax. (They don't follow SQL standards well.)

Best might be creating stored procs to return result sets. Your client code might then be able to call essentially the same stored procs on the two different platforms and get similar results. The stored procs would both return a result set with whatever limited number of rows you wanted. The procs could LOOP to fetch rows.

You could create the interfaces so that your client code would see the same looking result set either way and call the procs with the same parm definitions.

Tom
0
 
LVL 7

Author Comment

by:Camillia
ID: 19536856
Tom, yes, that's what i'm doing . I use "fetch first 20 rows only". In sql server, i use "select top 20 ..."...

The code and the both stored procs in sql server and as400 are working...

BUT....my coworker and managers are asking my why I cant bring back THOUSANDS of rows instead of the 20 rows...they asked why yellowpages.com can bring back thousands of rows without the site timing out. How could google bring back so many rows without the site timing out...

My response was that our database is not designed correctly. I analyzed my sql in both sql-server and as400. Looked at joins and cost of joins...etc... if i dont limit it...query will take a long time.

*** I KNOW it's because of the DB design..the insane amount of indexes/logicals in both DBs ****

So, was just wondering about the same question as they did...how could yellowpages.com bring back thousands of rows...is it because they use an innovative search technique?? is it because their DB is designed right?? was looking for thoughts....
0
 
LVL 37

Expert Comment

by:momi_sabag
ID: 19537613
hi farzadw

1. if you sql is designed right , then you should be able to return 1000 rather quickly. unloess your timeout value is set too low, so i guess the problem is with you sql. i suggest you post here your sql, and post the indexes that are defined on each table
2. first of all. google don't use a regular database, they use something they developed on thier own that is optimize to do searches, but that is not the point
when yellowpage / google / any other search engine returns thounds of results, you never get them all at once. you get the first 10 to 20 in the first result page, then you need to go to the next page etc etc,
so even if the serach return 1000 results, each time they only get 10 so they can display the result page for you.
what you need is a similar mechanism. you need to develop a paging mechanism, that is, you stored procedure should get parameters that will tell it what was the last result it retuned, and the next time you invoke the stored procedure it will return the next 20 resluts
this way you support 1000 results but each time you get from the databse only 20

did you understand ? was i clear ?
0
 
LVL 27

Expert Comment

by:tliotta
ID: 19538179
Farzadw:

momi_sabag is right. Google only returns only a few at a time. And even then, it won't return any more after you scroll through the first 1000.

IOW, it works essentially the same way you're doing it -- except it fetches the NEXT 10 instead of fetching FIRST 10 ONLY.

Without a picture of the database, though, I'm not sure we can give much advice.

Tom
0
 
LVL 7

Author Comment

by:Camillia
ID: 19538982
I totally understand it. I've done ASP.Net paging before (my other jobs), returning lots of data, display 10 per page and everyone was happy... But with this database...both in SQL Server and AS400...returning lots of data timesout. ...

With the SQL Server one, since i'm more familiar with that,.,, i turned on Execution Plan and I do see that all those unnecessary indexes is causing extra processing...With AS400...i actually had to break the sql in two...

One thing I have to mention: I join 2 tables in DB1. But I also have "exists" clause that looks at a table in ANOTHER database. So basically 3 tables in 2 databases...

I really think it's the design of the DB...Let me see if I can post the table structure here...it could be long.
0
 
LVL 27

Expert Comment

by:tliotta
ID: 19540481
Farzadw:

Start with just most relevant columns of (physical) tables. No need for everything (yet). For the tables, a general statement of average row length and number of rows in the tables will probably be useful. (I.e., will help guide next questions.)

Tom
0
 
LVL 7

Author Comment

by:Camillia
ID: 19920295
I havent forgetten about this question. Got busy with other things. Now my managers are on my back again asking me why I cant bring backmore rows. I'll post the 3 tables I'm selecting from and my query. Hopfully someone can help out.
0
 
LVL 7

Author Comment

by:Camillia
ID: 19920796
ok, I trimmed the tables and only got the columns i'm using. Because the query takes time to execute if I join the 3 tables, I created a table called WQTEMP and dump results into it and then join.

This is long . So, please if you dont have the time, it's ok. My managers just have to live with it.

********** Tables ************
CREATE TABLE library.DETAIL (

      D2DCODE CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D2SRC CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D2REF CHAR(30) CCSID 37 NOT NULL DEFAULT '' ,

      D2TOT NUMERIC(15, 2) NOT NULL DEFAULT 0 ,

      D2ASDT DATE NOT NULL DEFAULT CURRENT_DATE ,

      D2CRED CHAR(1) CCSID 37 NOT NULL DEFAULT '' ,

      D2STAT CHAR(3) CCSID 37 NOT NULL DEFAULT '' ,

      D2DEL CHAR(1) CCSID 37 NOT NULL DEFAULT '') ;


CREATE TABLE library.MASTER (



      D1DCOD CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D1FNAME CHAR(15) CCSID 37 NOT NULL DEFAULT '' ,

      D1LNAME CHAR(20) CCSID 37 NOT NULL DEFAULT '' ,

      D1ADDR CHAR(35) CCSID 37 NOT NULL DEFAULT '' ,

      D1ZIP CHAR(9) CCSID 37 NOT NULL DEFAULT '' ,

      D1CITY CHAR(18) CCSID 37 NOT NULL DEFAULT '' ,

      D1ST CHAR(2) CCSID 37 NOT NULL DEFAULT '' ,

      D1PHN CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D1SSNO CHAR(20) CCSID 37 NOT NULL DEFAULT '' ,

      D1SPO CHAR(11) CCSID 37 NOT NULL DEFAULT '' ,

      D1EPHN CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D1TOT NUMERIC(15, 2) NOT NULL DEFAULT 0 ,

      D1DEL CHAR(1) CCSID 37 NOT NULL DEFAULT '' ,

      D1XPHN CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      D1CELL CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      PRIMARY KEY( D1DCOD ) ) ;


CREATE TABLE Library.USERSRC (



      SU4CLT# CHAR(3) CCSID 37 NOT NULL DEFAULT '' ,

      SU4PRF CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      SU4SRC CHAR(10) CCSID 37 NOT NULL DEFAULT '' ,

      PRIMARY KEY( SU4CLT# , SU4PRF , SU4SRC ) ) ;

**********************************
*************** My Query ********************
 SET SQLStmt='INSERT into WQTEMP (COL1,COl5,COL6,COL7,COl8,COL9,COL10,COL11,COL12,COl13,COL14,ID)
                        Select  Rtrim(d1lname) || '','' || RTrim(d1fname) as COL1,
                                   D1COLL as COL5,
                                  d1ssno as COL6,
                                  d1phn as COL7,
                                  d1addr as COL8,
                                  D1City as COL9,
                                  D1ST as COL10,
                                  D1Zip as COL11,
                                  D1CELL as COL12,
                                  d1xphn as COL13,
                                  d1ephn as COL14,
                                  d1dcod as ID
     FROM Master
      WHERE
       
        D1DEL <> ''D''' concat Trim(SQLStatement)
                    Concat(' Fetch first 20 rows only') ;

   
     SET SQLStmtSelect ='Select COL1,
         Case D2Del When ''I'' Then 0 Else sum(d2tot) End as COL2,
         d2ref as COL3,
         d2stat as COL4,
         COL5,
         COL6,
         COL7,
         COL8,
         COL9,
         COL10,
         COL11,
         COL12,
         COL13,
         COL14,
         d2snam as COL15 ,
         d2cred as COL16,
          ID
  from wqtemp
       inner join Detail on ID= d2dcode
  where
    exists ( select SU4SRC from library.usersrc where  SU4PRF=''' || UserLogin || '''  AND SU4SRC=D2SRC AND SU4CLT# = Right(current schema,3) )
  AND d2del <> ''D''
   group by COL1, COL2, D2REF , d2stat , COL5, COL6,  COL7,  COL8,  COL9, COL10, COL11, COL12,  COL13, COL14, d2snam ,  d2cred ,  ID, D2Del
   order by Col1 Fetch First 20 rows only' ;


----------********************** Indexes ************
There are no indexes. There are Logicals that are keyed by some of the fields. If this is the same as indexes, i can post those as well.
0
 
LVL 37

Assisted Solution

by:momi_sabag
momi_sabag earned 600 total points
ID: 19921028
well
since you use a <> predicate, the first statement can not be optimized because indexes can not be used for such type of queries
if you know that most rows of the table has D1DEL = 'D' then you can define an index on that column and change your predicate to something else (for example if D1DEL can be D,I,U or S) for example
where D1DEL in ('U','I','S')

regarding the second query, i would add indexes on the following columns (this is an overkill but it's hard to know unless you specify more data about your columns such as cardinality)
i would also try to get rid of that exists and transform it into a join, like

     SET SQLStmtSelect ='Select COL1,
         Case D2Del When ''I'' Then 0 Else sum(d2tot) End as COL2,
         d2ref as COL3,d2stat as COL4,COL5,COL6,COL7,COL8,
         COL9,COL10,COL11,COL12,COL13,COL14,d2snam as COL15 ,
         d2cred as COL16,ID
  from wqtemp,  
         Detail,
        library.usersrc
  where ID= d2dcode
  and SU4PRF=''' || UserLogin || '''  
   AND SU4SRC=D2SRC
    AND SU4CLT# = Right(current schema,3) )
  -- AND d2del <> ''D''      this is redundant since you already used that when inserting into wqtemp
   group by COL1, COL2, D2REF , d2stat , COL5, COL6,  COL7,  COL8,  COL9, COL10, COL11, COL12,  COL13, COL14, d2snam ,  d2cred ,  ID, D2Del
   order by Col1 Fetch First 20 rows only' ;

the fetch first 20 rows only does not make your query go faster, since the group by and order by causes db2 to process all the rows either way, you just don't return all of them to the application
so now, the indexes i would add are :
id
d2dcode
d2stc

i will also try to merge it back into one big query,
spllitting into 2 queries in this case will not benefit you in any way since in order to populate the temp table you scan the entire master table
0
 
LVL 7

Author Comment

by:Camillia
ID: 19921150
I will try your recommandations.

I have no choice with group by and order by. Cant remove them.

When you say add index...do you actually mean an index or a logical? In sql server, views(logicals) and indexes are 2 different things. So add an index?
0
 
LVL 7

Author Comment

by:Camillia
ID: 19921242
With the "order by"...I will look into doing it in the .net code. I am using a dataset so maybe I can do the group by and order by in the code. Not sure tho.
0
 
LVL 37

Expert Comment

by:momi_sabag
ID: 19921298
i mean an index, not a logical
and it will be better to let the database do the group by and order by rather then doing it in code
0
 
LVL 7

Author Comment

by:Camillia
ID: 19921336
thanks , let me try what you said.

Whatever I do in AS400, has to work in SQL Server tho. I hope i can speed this up and move on with my life :D

Kamila
0
 
LVL 27

Accepted Solution

by:
tliotta earned 1000 total points
ID: 19922785
Farzadw:

>  When you say add index...do you actually mean an index or a logical? In sql server, views(logicals) and indexes are 2 different things. So add an index?...

Under OS/400 and i5/OS, indexes are implemented as *FILE objects with an 'LF' attribute. I.e., an SQL INDEX physically is a "Logical File" object. However, the SQL statement CREATE INDEX will be better simply because it can be implemented directly in SQL SERVER.

Also, if a LF already exists that implements a particular 'index', then creating an SQL INDEX will have no practical effect other than creating the INDEX object description. The 'index' part of that object will simply point to the previous LF index. I.e., the access path will be shared.

In short, there is no significant impact of creating an additional SQL INDEX when an appropriate LF index exists that does the work.

Ideally, if the access paths in fact are the same, your system will be better off if you first delete the LF, then create the SQL INDEX, then recreate the LF. That causes the LF index to use (share) the SQL INDEX which can be created for better performance. DB2 has been updated to accommodate SQL actions while DDS support has stabilized.

Tom
0
 
LVL 7

Author Comment

by:Camillia
ID: 19931409
I made the changes yesterday both in SQL Server and AS400. All was fine. I did get all the results back. I even tried my sql server command on a client box that has a large database and it worked fine:

In SQL Server: I made the changes momi_sabag mentioned. There was already an index for D2Dcode , i turned on execution plan and it wasnt using that index, so I used "index hint".

In AS400, I did the same thing. I removed that WQTEMP, added an index for D2Dcode and D2SRC. Changed the query and it was so fast. I could bring 4000 something rows back.

I played around with the index...dropped, added it. And as Tom mentions, sometimes it picked up the index i created, sometimes it picked up the LF. I can see this in Navigator's query plan.

***First, not sure why AS400 sometimes picks up LF and sometimes my index. Is there an "index hint" in as400 I can use to force the query to use a specific index? ***

I do understand what Tom is explaining but i cant drop the LFs. We will be giving this to our clients so I dont want to play around with their database. Also, we have a vendor app that does checks behind the scene on the database so I dont want to do it. Also, I dont know how to drop an LF and recreate it :)

Second, yesterday afternoon seems like we got hit by a virus and we lost our network connection and internet. Today, I ran the same query and it takes forever to run. I get errors like internal sql error. Even bringing the first 5 rows takes forever. I've asked my manager to reboot the AS400 and will try this again tomorrow. The sql server version also seemed much faster yesterday with the changes I made. Today, it seems slower.

If all fails, I might bring back 50 rows to make my manager happy.
0
 
LVL 7

Author Comment

by:Camillia
ID: 19932082
I said I'll bring back 50 rows but momi is correct...issue is not the # of rows.
0
 
LVL 7

Author Comment

by:Camillia
ID: 19932604
"exists" is not faster than "inner join"?
0
 
LVL 18

Assisted Solution

by:Dave Ford
Dave Ford earned 400 total points
ID: 19932648

There are no "hard and fast" rules, but GENERALLY "inner join" is faster than "exists" for large numbers of values (especially if the table referenced is properly indexed).

-- DaveSlash
0
 
LVL 7

Author Comment

by:Camillia
ID: 19934312
so no such thing as 'index hint" in as400? to force the query to use an index?
0
 
LVL 37

Expert Comment

by:momi_sabag
ID: 19934766
hi
when db2 see an exists predicate , it tries to convert it to a join, since there are 4 ways db2 can perform a join but only 1 way to perform an exists,
that is why it is generally recommended to avoid exists as possible and use joins instead
0
 
LVL 7

Author Comment

by:Camillia
ID: 19935046
thanks, rebooted the as400 and will try my query again. It worked fine 2 days ago tho..

But again, no way to have "index hint " in as400?
0
 
LVL 18

Expert Comment

by:Dave Ford
ID: 19936255

Nope. There is no way to influence the query optimizer on AS/400 like there is in Oracle.

Sorry,
Dave
0
 
LVL 27

Expert Comment

by:tliotta
ID: 19939354
Farzadw:

Since I'm not familiar with "index hint", I can't comment directly. AFAIK, Dave is right that no such action exists on AS/400. But I'm not sure that one would make any sense.

As mentioned before, if a matching index exists that's appropriate, it _will_ be used. There's no need to tell SQL to use an index that's being used.

And even if both a SQL INDEX and a LF exist, as long as they're indexing the same way, there is only one "index". It is irrelevant which one actually gets reported as being used. Pointing to one will give the exact same result as pointing to the other -- they're two names for the same "thing".

I can't do more than guess why one might be reported some times, and the other reported other times, if the _same_ query is run. It might simply be a matter of which one responds to a lock request first or something else just as trivial.

Now, if selection criteria changes, if it's a different [UserLogin] or if it's a different D1DEL value or D2DEL value or some other change in input, then maybe it does influence how the choice is made between the objects. But it should make no difference in the final result. The selected rows should be the same either way and performance should be the same either way. The actual internal "index" remains the same either way.

Tom
0
 
LVL 7

Author Comment

by:Camillia
ID: 19945762
Tom - it makes sense. Got busy with sql server stuff again but will try my query again on Monday.

0
 
LVL 7

Author Comment

by:Camillia
ID: 19948740
I found this link and wanted to try this as well:
http://www.centerfieldtechnology.com/publications/archive/December%202005%20Newsletter.pdf

I did a 'select * from qusrsYS/QAQQINI '  And I got back:
 Parameter                      
                                 
 IGNORE_DERIVED_INDEX            
 ********  End of data  ********

-------- How would I know if it's a yes or no?
0
 
LVL 18

Expert Comment

by:Dave Ford
ID: 19948784

Scroll to the right.  -- DaveSlash
0
 
LVL 7

Author Comment

by:Camillia
ID: 19948973
silly me, thanks. It's set to *YES
0
 
LVL 7

Author Closing Comment

by:Camillia
ID: 31407523
Thanks. all makes sense. I still have a long way to go to learn AS400 :)
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
If you are a mobile app developer and especially develop hybrid mobile apps then these 4 mistakes you must avoid for hybrid app development to be the more genuine app developer.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question