Solved

? MS ACCESS 2010 sample to calculate median quickly?

Posted on 2014-02-15
6
419 Views
Last Modified: 2014-02-20
I have a data base with several payments for each participant.
Each payment has participant number and date-of-payment.

I'd like to easily and QUICKLY calculate the Median number of days between payments for each participant.

I found a method using a sub-query, but, it ran for over 12 hours on only about 30,000 rows.
I'd like to be able to run it on a half-million rows in less than an hour.

help?
0
Comment
Question by:philkryder
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
6 Comments
 
LVL 48

Expert Comment

by:Dale Fye (Access MVP)
ID: 39862121
What did your query look like?

To start with, you have to get the payment date and the previous payment date,  I generally do that something like:

SELECT P1.ParticipantID, P1.PaymentDate, Max(P2.PaymentDate) as PrevPayment
FROM yourTable as P1
LEFT JOIN yourTable as P2
ON P1.ParticipantID = P2.ParticipantID
AND P1.PaymentDate >P2.PaymentDate
GROUP By P1.ParticipantID, P1.PaymentDate

This should run relatively quickly, but is not editable in the query grid, only in the SQL view.  Once you have that working, you can either save it, or make it a subquery to get the difference in dates between the PaymentDate and PrevPayment.  You might want to make that a temp table or even make the above into a temp table, to speed up succeeding logic.

What logic are you using to compute the median for each participant, once you know the number of days between payments?
0
 
LVL 48

Expert Comment

by:Dale Fye (Access MVP)
ID: 39862123
BTW, you might want to check out this article by MatthewsPatrick that discusses Median, Mode, Skewness, and Kurtosis
0
 
LVL 92

Accepted Solution

by:
Patrick Matthews earned 500 total points
ID: 39863383
Have you had a chance to review the approach described in my article above?  (Thanks for referring it, Dale!)
0
Is Your Team Achieving Their Full Potential?

74% of employees feel they are not achieving their full potential. With Linux Academy, not only will you strengthen your team's core competencies but also their knowledge of of the newest IT topics.

With new material every week, we'll make sure that you stay ahead of the game.

 
LVL 1

Author Comment

by:philkryder
ID: 39863470
yes - I used the DMEDIAN on an INTERVAL value that I obtained using the code Dale showed for finding the interval between the two dates.

After finding the INTERVAL it would find the DMEDIAN.

I saved intermediate tables after each step.

I was running this on an I3 intel and it pegged 100% of one of the CPU engines for about 12 hours to do my 32k rows.


My typical grouping on Participant will have from 7 to 20 intervals.


Have you run this on similar data quantities?

I'll post the code when I get back to work after the holiday tomorrow.
Phil
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 39863542
A sample file would be very helpful
0
 
LVL 1

Author Comment

by:philkryder
ID: 39868810
I'd like to close this.
On closer examination, I found the major delay was in calculating the interval between two timestamps.

I subsequently calculated the interval by sorting in order by the key, and the timestamp, then assigning a sequential ID, then doing a self-join on the next row. This seemed much faster than the sub-query to find the interval.

Matthew -
regarding the median -
would you consider adding a couple of global variables such that if the CRITERIA was the same between calls to DMEDIAN, then the DMEDIAN returned would be the same as on the prior call.
Not as elegant,
But, It seems it would avoid creating and running the subquery much of the time  ...
something like:
' =================
Global DMedianSave As Long
Global CriteriaSave As String
...

    If Criteria <> "" Then
   
        If CriteriaSave = Criteria Then
            DMedian = DMedianSave
            GoTo Cleanup
        End If
    End If
    CriteriaSave = Criteria
...
 '        calculate DMEDIAN as before
...
   End If        
    DMedianSave = DMedian
    GoTo Cleanup
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s been over a month into 2017, and there is already a sophisticated Gmail phishing email making it rounds. New techniques and tactics, have given hackers a way to authentically impersonate your contacts.How it Works The attack works by targeti…
Preparing an email is something we should all take special care with – especially when the email is for somebody you may not know very well. The pressures of everyday working life stacked with a hectic office environment can make this a real challen…
Windows 8 came with a dramatically different user interface known as Metro. Notably missing from that interface was a Start button and Start Menu. Microsoft responded to negative user feedback of the Metro interface, bringing back the Start button a…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question