happy 1001
asked on
SQL Server - Query for Outlier
Hi Experts
I want to know, what are the various ways available in SQL Server by which I can find out the "Normal Value RANGE" and "outliers" from a data table.
Here is the script that will create the Sample Database named " TestDB1001 " and Table named " Table1001" used in this work.
Suppose I want to find out outliers for each separate stock symbol, from the above table, based on [Volume Percent ] column.
Suppose normally the [Volume Percent ] for a stock has values of +20 % to -20 %, and then on one day, suddenly it has a value of 280 %, then that is certainly an outlier value for this column for this stock.
Please suggest, which methods could be used to -
1.First define the "Normal Value RANGE" for each stock, over past x days.
2. Find out the outliers, which have values OUTSIDE of the "Normal Value RANGE" that we defined above.
Any ideas regarding how such an analysis could be done are welcome. I am not yet sure, which are the various options available for such work and whether SQL Server is capable of doing such work or if I will have to use Excel for doing this.
Thanks a lot for any help
PS: I guess we will need to use some concepts like -
https://en.wikipedia.org/wiki/Standard_deviation
https://www.mathsisfun.com/data/standard-normal-distribution.html
etc. for doing this work, but I am not sure, how exactly it could be done.
We might need to add a few columns which will show the 1,2 and 3 Standard Deviation Value / Range, and anything outside that range might qualify as an outlier.
If it is not possible to be done in SQL Server, then I am open to doing it in Excel as well.
I want to know, what are the various ways available in SQL Server by which I can find out the "Normal Value RANGE" and "outliers" from a data table.
Here is the script that will create the Sample Database named " TestDB1001 " and Table named " Table1001" used in this work.
CREATE DATABASE TestDB1001
GO
USE TestDB1001
GO
CREATE TABLE TestDB1001.dbo.Table1001 (
xdate DATETIME2(0) NULL,
sector NVARCHAR(255) NULL,
symbol NVARCHAR(255) NULL,
[Close Price ] FLOAT NULL,
Volume FLOAT NULL,
[Volume Percent ] FLOAT NULL,
OpenInterest FLOAT NULL,
[OpenInterest Percent] FLOAT NULL
) ON [PRIMARY]
GO
SET DATEFORMAT ymd
SET ARITHABORT, ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER, ANSI_NULLS, NOCOUNT ON
SET NUMERIC_ROUNDABORT, IMPLICIT_TRANSACTIONS, XACT_ABORT OFF
GO
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-01 00:00:00.0000000', N'Realty', N'DLF', 171.45, 236.57, 16.5, 657.59007, 6.88)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-01 00:00:00.0000000', N'Realty', N'HDIL', 55.6, 89.4, 61.43, 152.6776, 17.62)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-01 00:00:00.0000000', N'Realty', N'UNITECH', 15.9, 120.13, 45, 282.43848, 6.4)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-02 00:00:00.0000000', N'Realty', N'DLF', 166.45, 420.65, 77.81, 658.68621999999993, 0.17)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-02 00:00:00.0000000', N'Realty', N'HDIL', 51.85, 124.27, 39, 147.08239999999998, -3.67)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-02 00:00:00.0000000', N'Realty', N'UNITECH', 15.35, 119.73, -0.33, 273.67355999999995, -3.1)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-03 00:00:00.0000000', N'Realty', N'DLF', 169, 369.57, -12.14, 707.74436999999989, 7.45)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-03 00:00:00.0000000', N'Realty', N'HDIL', 53.5, 87.35, -29.71, 158.88595999999998, 8.03)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-03 00:00:00.0000000', N'Realty', N'UNITECH', 15.35, 57.75, -51.77, 279.11279999999994, 1.99)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-06 00:00:00.0000000', N'Realty', N'DLF', 167.25, 217.96, -41.02, 722.15043999999989, 2.04)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-06 00:00:00.0000000', N'Realty', N'HDIL', 52.95, 50.37, -42.34, 157.18616000000006, -1.07)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-06 00:00:00.0000000', N'Realty', N'UNITECH', 15.3, 38.19, -33.87, 286.6113, 2.69)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-07 00:00:00.0000000', N'Realty', N'DLF', 164.6, 265.73, 21.92, 710.63243, -1.5899999999999999)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-07 00:00:00.0000000', N'Realty', N'HDIL', 52.15, 108.35, 115.11, 167.99551999999997, 6.87)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-07 00:00:00.0000000', N'Realty', N'UNITECH', 15, 70.69, 85.1, 290.49611999999991, 1.3599999999999999)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-08 00:00:00.0000000', N'Realty', N'DLF', 162.9, 258.36, -2.77, 697.20230999999978, -1.8900000000000001)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-08 00:00:00.0000000', N'Realty', N'HDIL', 53.3, 70.91, -34.55, 174.64184, 3.95)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-08 00:00:00.0000000', N'Realty', N'UNITECH', 14.95, 55.43, -21.59, 291.71525999999994, 0.42)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-09 00:00:00.0000000', N'Realty', N'DLF', 159.5, 186.23, -27.92, 699.0476, 0.27)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-09 00:00:00.0000000', N'Realty', N'HDIL', 50.2, 103.81, 46.4, 182.34456, 4.41)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-09 00:00:00.0000000', N'Realty', N'UNITECH', 14.7, 50.19, -9.45, 294.89627999999988, 1.09)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-10 00:00:00.0000000', N'Realty', N'DLF', 156.85, 200.27, 7.54, 699.8979999999998, 0.12)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-10 00:00:00.0000000', N'Realty', N'HDIL', 49.55, 91.61, -11.75, 180.54768, -0.98)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-10 00:00:00.0000000', N'Realty', N'UNITECH', 14.25, 56.77, 13.11, 296.6907599999999, 0.61)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-13 00:00:00.0000000', N'Realty', N'DLF', 161, 197.51, -1.38, 703.72603999999978, 0.55)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-13 00:00:00.0000000', N'Realty', N'HDIL', 49.75, 72.79, -20.54, 183.90999999999997, 1.8599999999999999)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-13 00:00:00.0000000', N'Realty', N'UNITECH', 13.8, 61.04, 7.52, 300.79445999999996, 1.38)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-14 00:00:00.0000000', N'Realty', N'DLF', 157.8, 188.98, -4.32, 728.44625999999982, 3.51)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-14 00:00:00.0000000', N'Realty', N'HDIL', 49.6, 79.6, 9.36, 203.92476000000008, 10.88)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-14 00:00:00.0000000', N'Realty', N'UNITECH', 13.5, 51.78, -15.17, 305.99279999999982, 1.73)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-15 00:00:00.0000000', N'Realty', N'DLF', 160.5, 197.29, 4.4, 728.82591000000036, 0.05)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-15 00:00:00.0000000', N'Realty', N'HDIL', 51.3, 73.09, -8.18, 208.26824, 2.13)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-15 00:00:00.0000000', N'Realty', N'UNITECH', 14.2, 125.87, 143.09, 317.5926, 3.79)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-16 00:00:00.0000000', N'Realty', N'DLF', 162.4, 208.88, 5.87, 735.66176, 0.94)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-16 00:00:00.0000000', N'Realty', N'HDIL', 50.65, 68.36, -6.47, 208.24432000000002, -0.01)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-16 00:00:00.0000000', N'Realty', N'UNITECH', 13.95, 70.59, -43.92, 314.64828000000006, -0.93)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-17 00:00:00.0000000', N'Realty', N'DLF', 155.6, 284.44, 36.17, 756.6858900000002, 2.86)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-17 00:00:00.0000000', N'Realty', N'HDIL', 48.6, 86.94, 27.18, 210.18943999999993, 0.94)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-17 00:00:00.0000000', N'Realty', N'UNITECH', 13.65, 55.82, -20.92, 313.36584, -0.41)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-20 00:00:00.0000000', N'Realty', N'DLF', 156.1, 238.3, -16.22, 784.67971000000011, 3.7)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-20 00:00:00.0000000', N'Realty', N'HDIL', 49.55, 73.1, -15.92, 213.88799999999998, 1.76)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-20 00:00:00.0000000', N'Realty', N'UNITECH', 13.8, 76.18, 36.47, 318.67127999999997, 1.69)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-21 00:00:00.0000000', N'Realty', N'DLF', 157.8, 201.98, -15.24, 788.57473000000016, 0.5)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-21 00:00:00.0000000', N'Realty', N'HDIL', 50.9, 71.67, -1.96, 218.67824000000007, 2.24)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-21 00:00:00.0000000', N'Realty', N'UNITECH', 13.75, 44.2, -41.98, 318.27546, -0.12)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-22 00:00:00.0000000', N'Realty', N'DLF', 159.1, 194.93, -3.49, 794.0979600000004, 0.7)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-22 00:00:00.0000000', N'Realty', N'HDIL', 50.8, 68.03, -5.08, 217.37912000000003, -0.59)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-22 00:00:00.0000000', N'Realty', N'UNITECH', 13.75, 23.48, -46.88, 323.47625999999991, 1.63)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-23 00:00:00.0000000', N'Realty', N'DLF', 159.05, 199.54, 2.36, 801.66750000000025, 0.95)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-23 00:00:00.0000000', N'Realty', N'HDIL', 50.45, 61.91, -9, 218.71092000000002, 0.61)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-23 00:00:00.0000000', N'Realty', N'UNITECH', 13.7, 31.98, 36.2, 322.67046000000005, -0.25)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-24 00:00:00.0000000', N'Realty', N'DLF', 152.8, 396.39, 98.65, 882.00591000000031, 10.02)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-24 00:00:00.0000000', N'Realty', N'HDIL', 48.1, 105.47, 70.36, 226.20192000000003, 3.43)
INSERT TestDB1001.dbo.Table1001(xdate, sector, symbol, [Close Price], Volume, [Volume Percent], OpenInterest, [OpenInterest Percent]) VALUES ('2014-01-24 00:00:00.0000000', N'Realty', N'UNITECH', 13.5, 89.6, 180.18, 337.96794000000006, 4.74)
GO
Suppose I want to find out outliers for each separate stock symbol, from the above table, based on [Volume Percent ] column.
Suppose normally the [Volume Percent ] for a stock has values of +20 % to -20 %, and then on one day, suddenly it has a value of 280 %, then that is certainly an outlier value for this column for this stock.
Please suggest, which methods could be used to -
1.First define the "Normal Value RANGE" for each stock, over past x days.
2. Find out the outliers, which have values OUTSIDE of the "Normal Value RANGE" that we defined above.
Any ideas regarding how such an analysis could be done are welcome. I am not yet sure, which are the various options available for such work and whether SQL Server is capable of doing such work or if I will have to use Excel for doing this.
Thanks a lot for any help
PS: I guess we will need to use some concepts like -
https://en.wikipedia.org/wiki/Standard_deviation
https://www.mathsisfun.com/data/standard-normal-distribution.html
etc. for doing this work, but I am not sure, how exactly it could be done.
We might need to add a few columns which will show the 1,2 and 3 Standard Deviation Value / Range, and anything outside that range might qualify as an outlier.
If it is not possible to be done in SQL Server, then I am open to doing it in Excel as well.
For example you can see some figures with the following query:
Bye, Olaf.
Select symbol, count(*), sum([volume percent]), avg([volume percent]), stdev([volume percent]) FROM TestDB1001.dbo.Table1001 group by symbol
Bye, Olaf.
It may be better to post what you're trying to achieve in the Math / Science topic to determine a general approach and then move on to how to achieve that via SQL Server or Excel.
ASKER
Thanks for the comments Experts.
What way can I move this topic to Math/Science section ?
Thanks and regards
@ Doug, that is an excellent suggestion. I should first become clear about which method suits my need, and then I should post it in the SQL Server Section to see how it could be done.
It may be better to post what you're trying to achieve in the Math / Science topic to determine a general approach and then move on to how to achieve that via SQL Server or Excel.
What way can I move this topic to Math/Science section ?
Thanks and regards
If you don't see a way to add this topic, request attention using the button at the bottom of the right side of your original post.
You could also just start a new question in that topic and include a link to this post.
ASKER
Thank you so much for your help @Paul
:)
:)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
These mathematical concepts have their (more or less) simple formula and you can compute several things in SQL, eg SUM(field), AVG(field) and compute more complex things with these results. But even such coimplex things as STDEV(field). You're not the first one doing stock or math in databases ;)
Bye, Olaf.