Solved

Benford's Law in SQL Server 2005/2008

Posted on 2009-05-06
25
1,754 Views
Last Modified: 2016-10-02
Hi,
I tried to create an SQL statement that calculates the frequency of appearance of the first digit of a population (column, like amount) and to compare to Benford's law (the expected number of occurences).

I came across this algorith in a different question:
Public Function BenfordsLaw(D As Integer) As Double
    BenfordsLaw = Log(1 + (1 / D)) / Log(10)
End Function

Do you have an idea whether this is an SQL statement, if not is there an sql statement to compare the occurence of the first digit of a column to the expected occurence according to Benford's Law. I also need to calculate the Z-Scores (Z-stat ratios).
0
Comment
Question by:TechNovation
  • 12
  • 8
  • 2
  • +3
25 Comments
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313703
use this.

declare @d int
 set @d=10
 select Log(1 + (1 / @d)) / Log(10)
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313719
the function you saw belong to VB but you could do the same task in SQL Server as I saw you above
0
 

Author Comment

by:TechNovation
ID: 24313787
It might be a bit stupid question, but how do I link the code above to a particular column in a table in a database?
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313796
or you can use float as well.
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313798
sorry forgot to paste the script.

declare @d float
 set @d=0.21
 select Log(1 + (1 / @d)) / Log(10)
 
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313808
>>It might be a bit stupid question, but how do I link the code above to a particular column in a table in a database?<<

that's is upto you, while inserting the data, you can calculate and insert it in your table, or can make calculation on fly while using SELECT statement.

0
 

Author Comment

by:TechNovation
ID: 24313849
What about telling the code which column it should look at in which table and in which database? do I need to add something else to the code?
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313866
if you can show me table structure and tell me which value should be used for calculation, I will make one SELECT statement for you as demonstration.
0
 

Author Comment

by:TechNovation
ID: 24313876
Sorry didn't refresh and posted the question again after your answer. But how would the select stataement look like? say I have two columns "Reference" and " Amount", in a table called "BLTABLE" in a database called BLDATABASE. How do I apply the code?
0
 

Author Comment

by:TechNovation
ID: 24313886
THe value for calculation is column "Amount" (followed from above)
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313912
I want to see your TABLE STRUCTURE.
0
 

Author Comment

by:TechNovation
ID: 24313921
Structure would be for example:
Reference Amount
A1             100
A2             333
C4             456
34             785
45             453
...              ...

with in total more than 100 million records
0
Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313946
well in that case you can do something like.

 select  Reference, Amount, Log(1 + (1 / amount)) / Log(10) as BenFordValue from yourTableName
where reference='A1'

this is just an example. Do you want to see value in SELECT only or want to store it in database permenently by adding one more column to your table?
0
 

Author Comment

by:TechNovation
ID: 24314018
Actually the final result of the query should look like this:
Digit       Actual Ocurrence (in %)      Expected Occurence (BL)       Z-Scores
1                   20                                                30                                     ...
2                   10                                                 20                                    ...
3                     5                                                 10                                    ...
4                   40                                                  5                                    ...
5                     1                                                  5                                    ...
6                     1                                                   5                                    ...
7                     3                                                   5                                    ...
8                   10                                                   5                                    ...
9                   10                                                   5                                    ...
END OF TABLE

BL occurence is made up I didn't have time to populate here the right figures. So the final result should be one newly created table with exactly 4 columns and 9 rows as above.
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 24314027
maybe create a function for it

create function dbo.udf_Benfords
(
@amount money
)
returns float
as
begin
declare @x float
set @x = Log(1 + (1 / @amount)) / Log(10)
return(@x)
end
0
 

Author Comment

by:TechNovation
ID: 24314078
By the way isn't the code trying to calculate the expected values (formula based)? What I need is how to collect the actual occurence, let the SQL statement go through the column and grab the first digit, group it and count the percentage of actual occurence. And then calculate the Z-scores. So it is more the actual data that I need.
0
 
LVL 26

Accepted Solution

by:
Chris Luttrell earned 400 total points
ID: 24320322
OK, this put me back in Stats Class but this might work.  It gives results like the attached image.  I played with an existing table I had and was not sure you you would handle negative and < 1.0 amounts so you may have to adjust it some. Hope this is on the right track.

CREATE FUNCTION dbo.udfBenfordsLaw (@Digit INT)

RETURNS FLOAT

as

BEGIN

	RETURN (CASE @Digit WHEN 0 THEN 0.0 ELSE Log(1.0 + (1.0 / @Digit)) / Log(10) END)

END

GO
 

SELECT *, dbo.udfBenfordsLaw(FirstDigit) BenfordLaw, (ActualOcurrance-[AVG])/Std Z

FROM (

SELECT SUBSTRING(CONVERT(VARCHAR,ABS(Amount)),1,1) FirstDigit, COUNT(*) ActualOcurrance, SUM(COUNT(*)) OVER () Total, 100.0 * COUNT(*) / SUM(COUNT(*)) OVER () PCT, AVG(COUNT(*)) OVER () AVG, STDEV(COUNT(*)) OVER () Std

FROM LifecycleBudget LB

WHERE ABS(LB.Amount) > 1

GROUP BY SUBSTRING(CONVERT(VARCHAR,ABS(Amount)),1,1)

) x

ORDER BY FirstDigit

Open in new window

Zscores.png
0
 

Author Comment

by:TechNovation
ID: 24323394
I will try the code above of CGLuttrell but from the picture, it already looks very good, I will check
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 100 total points
ID: 24324090
Just one small enhancement to CGLuttrell 's solution...

Mainly because of the use of the scaler udf is not really needed and with 100 million rows, will take it's toll... See :  http://sqlblogcasts.com/blogs/simons/archive/2008/11/03/TSQL-Scalar-functions-are-evil-.aspx from a performance perspective.


SELECT *, CASE FirstDigit WHEN 0 THEN 0.0 ELSE Log(1.0 + (1.0 / FirstDigit)) / Log(10) END BenfordLaw, (ActualOcurrance-[AVG])/Std Z
FROM (
SELECT SUBSTRING(CONVERT(VARCHAR,ABS(Amount)),1,1) FirstDigit, COUNT(*) ActualOcurrance, SUM(COUNT(*)) OVER () Total, 100.0 * COUNT(*) / SUM(COUNT(*)) OVER () PCT, AVG(COUNT(*)) OVER () AVG, STDEV(COUNT(*)) OVER () Std
FROM LifecycleBudget LB
WHERE ABS(LB.Amount) > 1
GROUP BY SUBSTRING(CONVERT(VARCHAR,ABS(Amount)),1,1)
) x
ORDER BY FirstDigit


CGLuttrell please drop me a line, email is on my bio...
0
 

Author Comment

by:TechNovation
ID: 24330137
Excellent. Many thanks to CGLuttrell, and Mark Wills as well. (increased point to 500 as it is a very concise code and valued added.
0
 

Author Closing Comment

by:TechNovation
ID: 31578418
Excellent
0
 

Author Comment

by:TechNovation
ID: 24330180
I don't know how the garde 8.6 was calculated, but this for me a 10 out of 10.
0
 

Author Comment

by:TechNovation
ID: 24330203
Both CGluttrell and Mark Wills you may want to help further with my question about Pareto Analysis in SQL Server:
http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/Q_24389862.html
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24333882
Hi technovation, thanks for the assist, unexpected, and most welcomed...

The grade is a scientific calculation based on the angle of the sun and the phase of the moon... It has to do with elapsed time and number of entries, not always a good indicator as you have pointed out.

Just had a quick look at the other question as well...

Cheers,
Mark Wills
0
 

Expert Comment

by:Darren Smerald
ID: 41825559
Gave me the absolute horn I had to take action on!!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Naughty Me. While I was changing the database name from DB1 to DB_PROD1 (yep it's not real database name ^v^), I changed the database name and notified my application fellows that I did it. They turn on the application, and everything is working. A …
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now