[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Benford's Law in SQL Server 2005/2008

Posted on 2009-05-06
25
Medium Priority
?
2,023 Views
Last Modified: 2016-10-02
Hi,
I tried to create an SQL statement that calculates the frequency of appearance of the first digit of a population (column, like amount) and to compare to Benford's law (the expected number of occurences).

I came across this algorith in a different question:
Public Function BenfordsLaw(D As Integer) As Double
    BenfordsLaw = Log(1 + (1 / D)) / Log(10)
End Function

Do you have an idea whether this is an SQL statement, if not is there an sql statement to compare the occurence of the first digit of a column to the expected occurence according to Benford's Law. I also need to calculate the Z-Scores (Z-stat ratios).
0
Comment
Question by:TechNovation
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 12
  • 8
  • 2
  • +3
25 Comments
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313703
use this.

declare @d int
 set @d=10
 select Log(1 + (1 / @d)) / Log(10)
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313719
the function you saw belong to VB but you could do the same task in SQL Server as I saw you above
0
 

Author Comment

by:TechNovation
ID: 24313787
It might be a bit stupid question, but how do I link the code above to a particular column in a table in a database?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313796
or you can use float as well.
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313798
sorry forgot to paste the script.

declare @d float
 set @d=0.21
 select Log(1 + (1 / @d)) / Log(10)
 
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313808
>>It might be a bit stupid question, but how do I link the code above to a particular column in a table in a database?<<

that's is upto you, while inserting the data, you can calculate and insert it in your table, or can make calculation on fly while using SELECT statement.

0
 

Author Comment

by:TechNovation
ID: 24313849
What about telling the code which column it should look at in which table and in which database? do I need to add something else to the code?
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313866
if you can show me table structure and tell me which value should be used for calculation, I will make one SELECT statement for you as demonstration.
0
 

Author Comment

by:TechNovation
ID: 24313876
Sorry didn't refresh and posted the question again after your answer. But how would the select stataement look like? say I have two columns "Reference" and " Amount", in a table called "BLTABLE" in a database called BLDATABASE. How do I apply the code?
0
 

Author Comment

by:TechNovation
ID: 24313886
THe value for calculation is column "Amount" (followed from above)
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313912
I want to see your TABLE STRUCTURE.
0
 

Author Comment

by:TechNovation
ID: 24313921
Structure would be for example:
Reference Amount
A1             100
A2             333
C4             456
34             785
45             453
...              ...

with in total more than 100 million records
0
 
LVL 31

Expert Comment

by:RiteshShah
ID: 24313946
well in that case you can do something like.

 select  Reference, Amount, Log(1 + (1 / amount)) / Log(10) as BenFordValue from yourTableName
where reference='A1'

this is just an example. Do you want to see value in SELECT only or want to store it in database permenently by adding one more column to your table?
0
 

Author Comment

by:TechNovation
ID: 24314018
Actually the final result of the query should look like this:
Digit       Actual Ocurrence (in %)      Expected Occurence (BL)       Z-Scores
1                   20                                                30                                     ...
2                   10                                                 20                                    ...
3                     5                                                 10                                    ...
4                   40                                                  5                                    ...
5                     1                                                  5                                    ...
6                     1                                                   5                                    ...
7                     3                                                   5                                    ...
8                   10                                                   5                                    ...
9                   10                                                   5                                    ...
END OF TABLE

BL occurence is made up I didn't have time to populate here the right figures. So the final result should be one newly created table with exactly 4 columns and 9 rows as above.
0
 
LVL 60

Expert Comment

by:chapmandew
ID: 24314027
maybe create a function for it

create function dbo.udf_Benfords
(
@amount money
)
returns float
as
begin
declare @x float
set @x = Log(1 + (1 / @amount)) / Log(10)
return(@x)
end
0
 

Author Comment

by:TechNovation
ID: 24314078
By the way isn't the code trying to calculate the expected values (formula based)? What I need is how to collect the actual occurence, let the SQL statement go through the column and grab the first digit, group it and count the percentage of actual occurence. And then calculate the Z-scores. So it is more the actual data that I need.
0
 
LVL 27

Accepted Solution

by:
Chris Luttrell earned 1600 total points
ID: 24320322
OK, this put me back in Stats Class but this might work.  It gives results like the attached image.  I played with an existing table I had and was not sure you you would handle negative and < 1.0 amounts so you may have to adjust it some. Hope this is on the right track.

CREATE FUNCTION dbo.udfBenfordsLaw (@Digit INT)
RETURNS FLOAT
as
BEGIN
	RETURN (CASE @Digit WHEN 0 THEN 0.0 ELSE Log(1.0 + (1.0 / @Digit)) / Log(10) END)
END
GO
 
SELECT *, dbo.udfBenfordsLaw(FirstDigit) BenfordLaw, (ActualOcurrance-[AVG])/Std Z
FROM (
SELECT SUBSTRING(CONVERT(VARCHAR,ABS(Amount)),1,1) FirstDigit, COUNT(*) ActualOcurrance, SUM(COUNT(*)) OVER () Total, 100.0 * COUNT(*) / SUM(COUNT(*)) OVER () PCT, AVG(COUNT(*)) OVER () AVG, STDEV(COUNT(*)) OVER () Std
FROM LifecycleBudget LB
WHERE ABS(LB.Amount) > 1
GROUP BY SUBSTRING(CONVERT(VARCHAR,ABS(Amount)),1,1)
) x
ORDER BY FirstDigit

Open in new window

Zscores.png
0
 

Author Comment

by:TechNovation
ID: 24323394
I will try the code above of CGLuttrell but from the picture, it already looks very good, I will check
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 400 total points
ID: 24324090
Just one small enhancement to CGLuttrell 's solution...

Mainly because of the use of the scaler udf is not really needed and with 100 million rows, will take it's toll... See :  http://sqlblogcasts.com/blogs/simons/archive/2008/11/03/TSQL-Scalar-functions-are-evil-.aspx from a performance perspective.


SELECT *, CASE FirstDigit WHEN 0 THEN 0.0 ELSE Log(1.0 + (1.0 / FirstDigit)) / Log(10) END BenfordLaw, (ActualOcurrance-[AVG])/Std Z
FROM (
SELECT SUBSTRING(CONVERT(VARCHAR,ABS(Amount)),1,1) FirstDigit, COUNT(*) ActualOcurrance, SUM(COUNT(*)) OVER () Total, 100.0 * COUNT(*) / SUM(COUNT(*)) OVER () PCT, AVG(COUNT(*)) OVER () AVG, STDEV(COUNT(*)) OVER () Std
FROM LifecycleBudget LB
WHERE ABS(LB.Amount) > 1
GROUP BY SUBSTRING(CONVERT(VARCHAR,ABS(Amount)),1,1)
) x
ORDER BY FirstDigit


CGLuttrell please drop me a line, email is on my bio...
0
 

Author Comment

by:TechNovation
ID: 24330137
Excellent. Many thanks to CGLuttrell, and Mark Wills as well. (increased point to 500 as it is a very concise code and valued added.
0
 

Author Closing Comment

by:TechNovation
ID: 31578418
Excellent
0
 

Author Comment

by:TechNovation
ID: 24330180
I don't know how the garde 8.6 was calculated, but this for me a 10 out of 10.
0
 

Author Comment

by:TechNovation
ID: 24330203
Both CGluttrell and Mark Wills you may want to help further with my question about Pareto Analysis in SQL Server:
http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/Q_24389862.html
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24333882
Hi technovation, thanks for the assist, unexpected, and most welcomed...

The grade is a scientific calculation based on the angle of the sun and the phase of the moon... It has to do with elapsed time and number of entries, not always a good indicator as you have pointed out.

Just had a quick look at the other question as well...

Cheers,
Mark Wills
0
 

Expert Comment

by:Darren Smerald
ID: 41825559
Gave me the absolute horn I had to take action on!!
0

Featured Post

Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question