[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 473
  • Last Modified:

How do I speed up this function?

Hi, I wrote this program which works okay on smaller tables, but the amount of time it takes gets exponentially longer the larger the table.

For instance, a table of 150,000 records takes about 6 minutes, a table of 550,000 records takes about 1 hour and a table of 3.5 million records takes over 1 day. Obviously, I am not doing something right here.

This is a table of price information. I am trying to populate the buybreak and sellbreak fields, which calculate the number of days since a previous low or high price has been realized. As an example, a file is attached with the proper results.

NOTE: the file in the ZIP file is a DBF file, even though it has the extension PNG. Foxpro will read it if you simply type in the extension, for example:

use file1.png in 0


alte tabl file1 add colu buybreak f(16,8)
alte tabl file1 add colu sellbreak f(16,8)
sele contract dist from file1 into curs contracts
scan
m.tempcontract=contract
sele date, contract, abshigh, abslow, recno() as rec from file1 wher contract=m.tempcontract orde by rec desc into curs breakout readwrit
alte tabl breakout add colu buybreak f(6,0)
alte tabl breakout add colu sellbreak f(6,0)
sele breakout
scan
m.temprec=rec
m.temprecno1=recno()
m.abshigh=abshigh
m.abslow=abslow
loca rest for abshigh>m.abshigh
m.temprecno2=recno()-m.temprecno1
sele breakout
repl all buybreak with m.temprecno2 for rec=m.temprec
go m.temprecno1
loca rest for abslow<m.abslow
m.temprecno3=recno()-m.temprecno1
repl all sellbreak with m.temprecno3 for rec=m.temprec
go m.temprecno1
ends
upda file1 set file1.buybreak=breakout.buybreak, file1.sellbreak=breakout.sellbreak from breakout wher allt(file1.contract)=allt(breakout.contract) and file1.date=breakout.date
use in breakout
ends

Open in new window

file1.zip
0
jglapski
Asked:
jglapski
  • 8
  • 3
  • 3
  • +2
1 Solution
 
wrmichaelCommented:
have you reviewed the indices to see if you can speed it up with new indices?
0
 
pcelbaCommented:
Horrible program - it will take some time...
0
 
CaptainCyrilFounder, Software Engineer, Data ScientistCommented:
The program is very hard to understand. It's not even written in a structured form.

However, you can speed the performance by creating the right indices.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
pcelbaCommented:
Indexes could optimize the execution but comparisons are < or >, so the optimization would probably be just partial. Also depends on number of records processed for each contract.

I would start with this update:

SET NOTIFY OFF
SET TALK OFF

lnT1 = SECONDS()

IF TYPE('file1.buybreak') = 'U'
  ALTER TABLE file1 ADD COLUMN buybreak F(16,8)
ENDIF
IF TYPE('file1.sellbreak') = 'U'
  ALTER TABLE file1 ADD COLUMN sellbreak F(16,8)
ENDIF

SELECT DISTINCT contract ;
FROM file1 ;
INTO CURSOR contracts

LOCAL tempcontract, temprec, temprecno1, abshigh, abslow, temprecno2

SCAN
  
  tempcontract = contract
  
  SELECT DATE, contract, abshigh, abslow, RECNO() AS rec ;
    FROM file1 ;
   WHERE contract = m.tempcontract ;
   ORDER BY rec DESC ;
    INTO CURSOR breakout READWRITE
  
  SELE breakout
  SCAN
    
    temprec = rec
    temprecno1 = RECNO()
    abshigh = abshigh
    abslow = abslow
    STORE -1 TO temprecno2, temprecno3
    
    LOCA REST FOR abshigh > m.abshigh OR abslow < m.abslow
    IF FOUND()
      
      IF abshigh > m.abshigh
        temprecno2 = RECNO() - m.temprecno1
      ENDIF
      IF abslow < m.abslow
        temprecno3 = RECNO() - m.temprecno1
      ENDIF
      
      DO CASE
      CASE m.temprecno3 = -1
        LOCA REST FOR abslow < m.abslow
        temprecno3 = RECNO() - m.temprecno1
      CASE m.temprecno2 = -1
        LOCA REST FOR abshigh > m.abshigh
        temprecno2 = RECNO() - m.temprecno1
      ENDCASE
      
    ELSE
      temprecno2 = RECNO() - m.temprecno1
      temprecno3 = m.temprecno2
    ENDIF
    
    GO m.temprecno1
    
    SELECT file1
    GO m.temprec
    REPLACE buybreak WITH m.temprecno2, sellbreak WITH m.temprecno3
    
  ENDSCAN
  
  USE IN breakout
  
ENDSCAN

? ALLTRIM(STR(RECCOUNT('file1'))) + " records processed in " + ALLTRIM(STR(SECONDS()-lnT1, 10, 2)) + " seconds"

Open in new window

0
 
jglapskiAuthor Commented:
Thanks for the program. It sped things up, but it still takes 40 minutes to run this through 500K records.

I tried the indexing, but that didn't seem to have much in the way of results.
0
 
tusharkanvindeCommented:
Why do you need the temporary cursor. Create an index on contract +str(recno()) in the original table and run the loop there.

Something like

USE file1
INDEX on contract+STR(RECNO(),10) TAG contractrec
DO WHILE NOT EOF()
      m.tempcontract=contract
      SCAN while m.tempcontract=contract
            m.temprecno1=RECNO()
            m.abshigh=abshigh
            m.abslow=abslow
            SET ORDER TO contractrec desc
            SKIP
            COUNT REST WHILE abshigh<=m.abshigh AND m.tempcontract=contract TO lnSkippedRecs
            GO m.temprecno1
            REPL buybreak WITH m.lnSkippedRecs+1
            SKIP
            COUNT REST WHILE abslow>=m.abslow AND m.tempcontract=contract TO lnSkippedRecs
            GO m.temprecno1
            REPL sellbreak WITH m.lnSkippedRecs+1
            SET ORDER TO contractrec ASCENDING
      ENDSCAN
ENDDO  


Above code is totally untested.


If you want to add columns to sql cursor , add them in the sql select statement like

sele date, contract, abshigh, abslow, recno() as rec,cast(0 as f(6,0)) as buybreak from file1 wher contract=m.tempcontract orde by rec desc into curs breakout readwrite

Also your cursor would be faster if you indexed in on REC and ABSHIGH and ABSLOW
0
 
pcelbaCommented:
Post sample data (500000 records).
0
 
pcelbaCommented:
To reduce data size you may post just necessary columns.
0
 
jglapskiAuthor Commented:
EXCELLENT. This provided a huge time saving.
0
 
jglapskiAuthor Commented:
Thanks so much - what once took hours is now taking seconds.
0
 
pcelbaCommented:
Just a question: How long it takes to process 500000 records now and what is your latest code to achieve the time in seconds?

Adding appropriate indexes to my solution can speed it 10 times up and the total time is below 2 minutes (on notebook) but it still does not mean seconds.

Solution from tusharkanvinde is very simple and stright forward but I would guess it must have same problems like my solution without indexes... and indexes cannot help here.  The time measured for 500000 records was about 35 min.

This is purely professional interest.
TIA for answer.
0
 
CaptainCyrilFounder, Software Engineer, Data ScientistCommented:
One time I was helping someone with hotel management software that uses 100,000s of records. His query used to take anywhere between 8 to 10 minutes. He did not use the indices properly. All I did was to properly build the index and the query took less than 1 second. All it needed was one constructed field:
ClientCode+DTOS(Date).
0
 
pcelbaCommented:
Yes, to create appropriate index should be the first step always... It is just me who is trying to optimize code first then adding indexes :-)
0
 
CaptainCyrilFounder, Software Engineer, Data ScientistCommented:
The code should have been written optimized in the first place ;-)

However, most newcomers to Fox don't really use the speed and power of its indeces.
0
 
pcelbaCommented:
The last clarification:

Artificial data I've created yesterday were (occassionaly) extreme sample thus tusharkan's algorithm was not efficient at all (see answer ID:26275387). The reason was minute data having similar values and small number of different contracts.

When testing on real daily data of the similar size but with a large number of short contracts then tusharkan's code is incredibly fast and seconds to finish the task are reality.

The "cursor solution" (with indexes) is better for large data with small number of contracts.
0
 
tusharkanvindeCommented:
If there are a lot of contracts, there will be the overheads of a lot of SQL selects and the creation of a lot of cursors. That is what my code prevents. Actually for small number of contracts, I had thought that my code would perform similarly to the other codes. Guess OPs data has a lot of contracts.
0
 
pcelbaCommented:
The problem with contracts having many records is in time consumption dependency which is not linear. Your code must check  (N-1) + (N-2) + (N-3) + ... records for each contract in the worst case and indexes cannot help much. To locate a record in indexed cursor must be much faster but I did not suppose as big cursor and index creation overhead...
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

  • 8
  • 3
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now