Link to home
Start Free TrialLog in
Avatar of ITContractor
ITContractor

asked on

FoxPro 7.0 - Query takes 90+ seconds to execute.

Ok experts, I am working on a project that is pulling data from a FoxPro database. I have 2 versions of the same query below, one takes 90+ seconds to execute and the other takes  0.5 seconds (half a second)

I will try and explain it the best i can.  i have data i need from all three tables.  

If i join them together in series, i.e. Table1 join to Table2 and Table2 joined to Table3 (which is how it should be done) it takes 90+ seconds to execute.

However,  if i join them in paralel i.e. Table1 to Table2 and Table1 to Table3 it returns the same data as the other method, and the correct data, but returns it in under a second. (0.52 to be exact)


The only difference in the queries below is the alias in the last part of the where clause, using Tabl1 instead of Table2

This works ok, however as you probably have guessed the data is not normalized at all, and this only works because Table1 (INV_TRN) happens to have duplicated data, the join SHOULD happen usinf INV_MST and Code2.

Some other things to know: The query have valid, exact and correct indexes.  removing any one of the 3 tables, and keeping the where clause the same results in execution under 1 second.  the indexs are correct tested and valid and it only executes very slow when 3 tables are joined like this.  It has been tested in multiple locations, with multiple hardware configurations and varying amounts of database activity.  Has been tested with both Novell and Windows hosting the foxpro database.  

90 + seconds
----------------------
Select * from INV_TRN as it,INV_MST as im,code2 as c2 where 
(it.FBATCH='0822617') and  (it.ftrack = (im.fbatch + im.fsequence)) and 
((im.fcustcode+im.fprodgroup+im.fproduct+im.fowner+im.fsuplrprod) =(c2.fcustcode+c2.fprodgroup+c2.fproduct+c2.fowner+c2.fsuplrprod))
 
 
runs in under 1 second
------------------------------------------------
Select * from INV_TRN as it,INV_MST as im,code2 as c2 where 
(it.FBATCH='0822617') and (it.ftrack = (im.fbatch + im.fsequence)) and 
((it.fcustcode+it.fprodgroup+it.fproduct+it.fowner+it.fsuplrprod) = (c2.fcustcode+c2.fprodgroup+c2.fproduct+c2.fowner+c2.fsuplrprod))

Open in new window

Avatar of Cyril Joudieh
Cyril Joudieh
Flag of Lebanon image

It could be that one table does not have the key indexed or it's not the same length as the others.
Avatar of ITContractor
ITContractor

ASKER

That was my first thought, but i am 100% certain that the indexes are fully optimized. Removing any of the 3 tables results in fast execution, I have verified and reverified the indexes.  the result set does return around 200 fields, but only 4 records.  i am using select * for testing purposes only as listing all 30 fields would dilute the readability of my examples above.  i have used the same query and had the same results only returning 1 field.  

I have tried the query in multiple ways.
Table1 = INV_TRN
Table2 = INV_MST
Table3 = Code2

Table1 to Table2 = under 1 second
Table2 to Table3 = under 1 second
Table1 to Table3 = under 1 second

I am not very well versed in Foxpro, but a couple of the developers here have been using it for many years, and i have them stumped as well, we have spent about a week making sure we have the query fully optimized.  We have 36 locations across the counrty, with varrying amounts of data and database activity and we get the same results on all of them.
The query will run even much faster if you have the concatenated fields as on field.
The combination of the concatenation is not properly indexed.
the index is the concatination of all 5 fields, as i mentioned before, the indexes work great when only 2 tables are used, and i don't need to to run much faster than 0.5 seconds.

changing the database tables or indexes is not an option, I am just looking for an explanation of why the query works perfectly when I am only using 2 tables, or when i use all three and match both sub tables to the first table, but when i try and join in series, like it should be, it takes 90+ seconds...
It has to do with the order of logical expressions in the join. How about removing the parenthesis?
We have been adding "and .T." expression into the WHERE clause of certains SELECT command several times and the speed was doing similar singularities...

Did you try SYS(3054) ?
How many records are in tables?
How many records have it.FBATCH='0822617' ?
And what about INNER JOINs?
Select * 
  from INV_TRN as it 
  INNER JOIN INV_MST as im ON it.ftrack = im.fbatch + im.fsequence)
  INNER JOIN code2 as c2 ON im.fcustcode+im.fprodgroup+im.fproduct+im.fowner+im.fsuplrprod = c2.fcustcode+c2.fprodgroup+c2.fproduct+c2.fowner+c2.fsuplrprod
  where it.FBATCH='0822617' 
 
------------------------------------------------
Select * 
  from INV_TRN as it 
  INNER JOIN INV_MST as im ON it.ftrack = im.fbatch + im.fsequence)
  INNER JOIN code2 as c2 ON it.fcustcode+it.fprodgroup+it.fproduct+it.fowner+it.fsuplrprod = c2.fcustcode+c2.fprodgroup+c2.fproduct+c2.fowner+c2.fsuplrprod
  where it.FBATCH='0822617' 

Open in new window

Please remove the right parenthesis from the first join expression (and add semicolons).

Another thing not discussed here is VFP version.

I've been using FoxPro 1.0 which has been filling disk during the @ SAY output to screen, so the maximal application run was reduced to 1 day only... (solution I don't remember)
I've been using FoxPro 2.0 which has been passing the "FOR lnI = expr1 to expr2" loop four times whereas both expressions showed result 2... (solution was unexpected: INT(expr1) to INT(expr2) )

I am not saying your problem is not interesting but you should move forward.

SQL engine was extended in each VFP version and to stack on some odd behaviour in old version is probably not a good idea.

I would guess VFP9 will give you different results.
To analise what's happeing, what I'm missing is indexes expressions.

Even assumed all three tables are indexed on fcustcode+fprodgroup+fproduct+fowner+fsuplrprod I'm still not sure what rushmore will try to do, unless I'd have sample data and indexes.

What can be assumed is, that a join can never use the indexes of both joined tables, you can lookup values in an index, but you can't lookup one index in another, so the sql engine will optimise scanning the smaller table and looking up values in the index of the larger table for a join. If you do joins from one of the tables only, and not chaining joines like "it should be", then you don't allow foxpro to optimise the index lookup for the join just scanning one of the tables.

To inspect what's happening you could put in "AND logrows()" in the query with the following function. Try to change the order of joins to optimise the performance. Just make sure you have no other aliases open before doing the sql to minimize the work of logrows.

Create the logfile empty before doing the sql. the logrows() function will be called for each combination of recnos vfp needs to check and can't optimise by indexes. I would expect a huge difference in the two versions of your sql of course.

As a small example for a fully optimised query:

Local lnCount
Close Tables all

Create Cursor curTest (iID I, cChar C(1))
Index on iID tag xID

For lnCount = 1 to 9
    Insert into curTest values (lnCount, Transform(lnCount))
EndFor

StrToFile("","logrows.log")
Select * from curTest where iID = 5 and logrows()

This gives three rows in th logfile, even though you'd perhaps only expect one row, as the result only has one row. The first log row is the call that is made to determine the result structure. The second log row reflects the result, the third is generated, when the final result is done.

If a table of the sql is fully scanned and not index optimised, logrows() will be called for each record of that table of course, leading to much more log rows. In the end counting the rows of the log file you can see how many combination of rows of your three tables were processed. If you put in a set setp on in that routine you can inspect further what's happening rowwise.

My guess is, your double join would be ideally optimised even when not joinind indirectly, if you put the middle table first in the from clause, then join right and left from there towards the other two tables with JOIN clauses, not with an indirect join via where clause.

Bye, Olaf.
Procedure logrows()
   * log all recnos of each open alias
   Local lnCount, laAliases(1)
 
   For lnCount = 1 TO AUSED(laAliases,1)
      StrToFile(Dbf(laAliases(lnCount,1))+":"+Transform(Recno(laAliases(lnCount,1)))+". ","logrows.log",1) 
   EndFor
   StrToFile(Chr(13)+Chr(10),"logrows.log",1)
   
   Return .T.
EndProc

Open in new window

sys(3054) shows that when doing the Join between INV_MST and Code2 it is using an index from the INV_MST table. when i query just INV_MST and Code2 sys(3054) shows it using the index from Code2

So i am not sure exactly why it chooses a different index to use when 3 tables are present instead of just the 2 tables.

pcelba - I tried the re-written queries using the Inner join and get the exact same results, if i join code2 using IM, 90+ seconds, joining Code2 using IT, less than a second.

the only thing i havent tried yet is moving INV_MST to the start of the query and using the joins off of that, my guess is that will work.

I guess my ultimate question, and the purpose of this thread is, why would i need to take my joins out of the logical order to acomplish this, what is it about foxpro that makes this needed?

I do wish we could go to FP9, however the very large foxpro app is written in FP7, and i can't change that part of this equation, well, not yet anyway.

I am sorry, but the above query is an excellent example of how NOT to program and poor database design. If this is not a onetime query, it would be better if you had a calculated field in, inv_mst and code2 which as their default values added all those fields you need with indexes set on them (calculated fields) as opposed to multiple fields as you have.
The first query is taking long and will take forever as the table grow, as first VFP opens inv_tran with all the fields, As no fields have been specified for inv_mst, again all fields are selected for Inv_mast and fsequence is selected for the match with INV_TRN, (as all fields are open in INV_TRN, your second query is executed in 1 second) , now it has to open code2, again as no fields have been specified all fields are selected, searched, and fields in the Where clause are added together for both inv_mast and code2 then matched.... Kind of convoluted dont you think?
Try the following, it may or may not be faster than 5 seconds
Select INV_TRN.*,;
(INV_MST.fcustcode+INV_MST.fprodgroup+INV_MST.fproduct+INV_MST.fowner+INV_MST.fsuplrprod) As exp2,;
(code2.fcustcode+code2.fprodgroup+code2.fproduct+code2.fowner+code2.fsuplrprod) As exp3 ;
from INV_TRN ,INV_MST ,code2 ;
where (((INV_TRN.fbatch='0822617') And  (INV_TRN.ftrack = INV_MST.fsequence)) And ;
(exp2 = exp3))
Select INV_TRN.*,;
INV_MST.fsequence, ;(INV_MST.fcustcode+INV_MST.fprodgroup+INV_MST.fproduct+INV_MST.fowner+INV_MST.fsuplrprod) As exp2,;
(code2.fcustcode+code2.fprodgroup+code2.fproduct+code2.fowner+code2.fsuplrprod) As exp3 ;
from INV_TRN ,INV_MST ,code2 ;
where (((INV_TRN.fbatch='0822617') And  (INV_TRN.ftrack = INV_MST.fsequence)) And ;
(exp2 = exp3))
ASKER CERTIFIED SOLUTION
Avatar of Pavel Celba
Pavel Celba
Flag of Czechia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I agree with imaginecorp about the database design.

You should also be using JOIN to join the tables instead of using the WHERE clause

In case you are running the queries one after the other, the 2nd query will be very fast compared to the first as the data will be in RAM instead of having to get it over the network.



BTW, there is  the FORCE keyword which specifies that tables are joined in the order in which they appear in the FROM clause of SQL SELECT.

I've never used it.
""SQL: Column EXP2 is not found" error. ..."
Yes; this will fail... not thinking
i absolutley agree that the underlying database is designed very inefficently, and over the next few years i will be re-writing it using sql2008 and .NET, however, it is what it is for now and sadly it is all i have to work with
I guess the answer to this question is simply, there really isnt one.
Thanks again, I posted it thinking this would be the ultimate answer to the question.
Well, answering "why would i need to take my joins out of the logical order to acomplish this, what is it about foxpro that makes this needed?"

let's simplyfiy the situation and say there would be perfectly normal IDs for joining three tables, Those ID would be a primary key in one table, eg in INV_MST, and foreign keys in two other tables INV_TRN adn code2. It wouldn't change how vfp would optimise the query.

There's nobody to tell you the inner workings on how the vfp sql engine decides in what order to join and what indexes to use. But setting the FROM order of tables in a way the smallest table is the initial source and larger tables are joined is the natural order regarding joins only.

Your main where clause is (it.FBATCH='0822617'), a caluse limiting the INV_TRN table. Still I'd not join from there to INV_MST, just a feeling. There's much guessing involved without having tables structures, indexes and sample data, so it's merely guessing unless you don't give further info. How about running ATAGINFO() on the three tables to tell us the index expressions?

Have you tried my sample code? Having problems with it?
The point split would be more fair, the discussion showed we have some dark corners in VFP and everything we can do for it is to find some meander around...

Thanks ITContractor, you reminded it.
I'm quite sure the reson for this behavior can be determined, I just need more info. I also provided some means of investigating, but ITContractor didn't respond to it anyway.

Bye, Olaf.
Just to give an example: Rushmore does not use all indexes, eg indexes made with different collation sequences. But I can't tell that is the reason without knowing further information I asked for.

Bye, Olaf.
I agree, Olaf, you have not enough information to determine the reason of unoptimized results. I would say the 2-table queries showed us the optimization is possible, so no collation problems exist.

The logrows procedure is perfect but I would be afraid of its timing and result analyses for the query which takes 90 seconds now. Yes, it could show us something even if the query is not finished but the main reason WHY the query does not use the best index available will remain unknown. This information we have from SYS(3054) already.

And the most important output "How to write queries to be sure they are optimized (the best possible way)" will remain unknown, as well.

Do you think it is worth to generate tables and indexes and try to reproduce this problem? Maybe yes.
Yes, I think so. The log could also show what tables are not optimised. Last not least I think rushmore (SYS(3054)) could "lie". It's not the only thing to look at. I especially would like to experiment with order of tables in the FROM clause, comparing real joins vs. implicit joins via where clause.

Bye, Olaf.
In a join only one index is used, so one table is ordered by an index, the other one is scanned and evaluated expressions are seeked in the index used for joining. Therefore the order of joins matters. Rushmore might fail in doing joins in the optimal order, even if all joins are inner joins and join order doesn't matter theoretical. If one table has 100 rows, the other 100,000 rows, it does matter much, if you use the index on the larger table and lookup the 100 values of the smaller one or vice versa.

Bye, Olaf.
I never had this problem even with close to a million records. I always use single key join when the record count is too high and I have sometimes 7 tables in my join of which 3 are using the same key.
Hi Cyril,

well, it's fact that SYS(3054,11) or SYS(3054,12) reports using ONE index to optimise a join condition. So one side of the join condition is seeked in the index of the other table.

It doesn't matter how high the reccount is though. If there are conditions in the where limiting the valid rows anyway, that does not mean a full table scan in the smaller table, just a scan within the rows being candidates for the result by other (where) conditions. I wasn't expressing this right, you got me there, I was only talking about the rows, so you surely interpreted this as talking about reccounts() of the tables. No, rushmore first sees how it can optimise where clauses, this already can reduce what rows are examined.

Eg if you have a table of 10,000 recipes and 200,000 ingredients and select a recipe the following sql, then the small table (recipes) has a reccount of 10,000, but due to the where clause there's only 1 row in question to join FROM and about 20 or so ingredients will be looked up in the index of the ingredients table optimised by the foreign key index, tag name recipeid. So rushmore will lookup recipes.id in the index recipeid, but will only do so for recipes.id = 123, as the where clause is optimised by the id index of the recipes table, not for all 10,000 recipes.

See what I mean? I fear not.

I've written this join as left and right join and in this simple case both SQLs are optimised by the index of ingredients on the ingredients.recipeid field, because the recipes table can be optimised by the primary key index via the where clause, and so is the smaller table here in regard to partly results. In this case also by it's reccount(), but that's not what matters.

I think if you introduce a third table rushmore will not optimise all joins you could write - for getting the same result - the same way and might not choose the optimum indexes for the joins.
Close Databases All
Close Tables All
 
Cd GETENV("TEMP")
Erase recipes.*
Erase ingredients.*
 
Local lnCount, lnCount2
 
Create Database recipes.dbc
Create Table recipes (Id I, Primary Key Id Tag Id, Name C(20))
Create Table ingredients (Id I, recipeid I, Name C(40), amount B)
Index On recipeid Tag recipeid
 
For lnCount = 1 To 10000
   Insert Into recipes Values (lnCount,"recipe "+Transform(lnCount))
   For lnCount2 = 1 To 20
       INSERT INTO ingredients VALUES ((lnCount-1)*20+lnCount2,lnCount,"ingredient "+Transform(lnCount2), RAND()*100)
   Endfor
ENDFOR
 
Clear
Sys(3054,12)
Select * From recipes Left Join ingredients On ingredients.recipeid = recipes.Id  Where recipes.Id = 123 Into Cursor curResult1 nofilter
Select * From ingredients Right Join recipes On recipes.Id = ingredients.recipeid Where recipes.Id = 123 Into Cursor curResult2 nofilter

Open in new window